Feature Selection is a library of feature selection algorithms.
en.wikipedia.org/wiki/Feature_selection
gem sources -a http://gemcutter.org sudo gem install feature_selection
There are currently 3 implemented feature collections: Chi-Squared, Mutual Information and Frequency Based.
Each return a hash that looks similar to:
{klass => {term => score, term => score}, klass => {term => score}}
Example:
data = { :spam => [['this', 'is', 'some', 'information'], ['this', 'is', 'something', 'that', 'is', 'information']], :ham => [['this', 'test', 'some', 'more', 'information'], ['there', 'are', 'some', 'things']], } a = FeatureSelection::ChiSquared.new(data) # You can also use... # FeatureSelection::MutualInformation # FeatureSelection::FrequencyBased a.rank_features #=> {:spam => {term => score, term => score}, :ham => {term => score}}
There are two ways to log the activity:
# Provide a path of somewhere to log to log = File.expand_path(File.dirname(__FILE__) + '/log.txt') FeatureSelection::MutualInformation.new(data, :log_to => log) # Provide an existing Logger object log = Logger.new('log.txt') FeatureSelection::MutualInformation.new(data, :log_to => log)
Copyright © 2009 reddavis. See LICENSE for details.