Ideas: - Feature selection: select a certain proportion of words with highest information gain - Normalize feature vectors to the average vector length observed in the data - Investigate locally weighted learning Resources: - [A Comparative Study on Feature Selection in Text Categorization](http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf) - [Multinomial Naive Bayes for Text Categorization Revisited](http://www.cs.waikato.ac.nz/ml/publications/2004/kibriya_et_al_cr.pdf) - [scikit-learn TfidfVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)