Skip to content

Can't manage to have a string get analysed by brain.js #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mohammedmulazada opened this issue Apr 10, 2018 · 17 comments
Closed

Can't manage to have a string get analysed by brain.js #188

mohammedmulazada opened this issue Apr 10, 2018 · 17 comments
Assignees

Comments

@mohammedmulazada
Copy link

mohammedmulazada commented Apr 10, 2018

A GIF or MEME to give some spice of the internet

What is wrong?

The training network doesn't train, it returns NaN for the data that it should have analysed.

Where does it happen?

In the app.js run through node, after receiving data from the Twitter API

How do we replicate the issue?

  1. Use the dummy data below
const trainingData =[  
   {  
      input:'RT @ObamaFoundation: This week—50 years since Dr. Martin Luther King, Jr. was killed—@BarackObama and @RepJohnLewis sat down with a group o…',
      output:{  
         Barack:1
      }
   },
   {  
      input:'Incredible to have a Chicago team in the Final Four. I’ll take that over an intact bracket any day! Congratulations to everybody @LoyolaChicago - let’s keep it going!',
      output:{  
         Barack:1
      }
   },
   {  
      input:'In Singapore with young people who are advocating for education, empowering young women, and getting involved all over Southeast Asia with a profoundly optimistic commitment to building the world they want to see. ',
      output:{  
         Barack:1
      }
   },
   {  
      input:'Very thankful for President Xi of China’s kind words on tarrifs and automobile barriers...also, his enlightenment on intellectual property and technology transfers. We will make great progress together!',
      output:{  
         Donald:1
      }
   },
   {  
      input:'Last night, it was my great honor to host America’s senior defense and military leaders for dinner at the White House. America’s military is the GREATEST fighting force in the history of the world. They all have my pledge of unwavering commitment to our men and women in uniform! ',
      output:{  
         Donald:1
      }
   },
   {  
      input:'A TOTAL WITCH HUNT!!!',
      output:{  
         Donald:1
      }
   }
]
  1. Convert the input using encode
function encode(arg) {
return arg.split('').map(x => (x.charCodeAt(0) / 400));
}
  1. Run it through net.trainAsync()

How important is this (1-5)?

A 4, since I think this should be able to work, but it could be my bad too!

Expected behavior (i.e. solution)

The data should have been analysed, but somewhere inbetween something went wrong!

Other Comments

I'm think the encoding method might not be correct here.

@robertleeplummerjr
Copy link
Contributor

Do you have a code sample, perhaps even a jsfiddle?

@mohammedmulazada
Copy link
Author

mohammedmulazada commented Apr 10, 2018

I do, it's a Node project so not sure what would be the best way to go about this, but if you know a better way, please let me know and I'll gladly do it!

The only difference between this code and mine is that mine uses the twitter API to get the data I provided above.

Basically my ideal goal would be for it to be able to separate the tweets by author and then be able to feed it some text for it to make a prediction on who it belongs to.

const brain = require('brain.js')
const http = require('http')
const express = require('express')
const app = express()

const path = require('path')
app.set('view engine', 'ejs')
const publicPath = path.join(__dirname, './public')
app.use(express.static(publicPath))
let trainedNet
let net = new brain.NeuralNetwork()

const trainingdata = require('./js/training-data.js')

function train(data) {
    console.log('training')
    net.trainAsync(processTrainingData(data, {
        iterations: 1,
        log:true,
        learningRate: 0.1,
        timeout: 500
    }))
    trainedNet = net.toFunction();
}

function encode(arg) {
    return arg.split('').map(x => (x.charCodeAt(0) / 400));
}

function processTrainingData(data) {
    return data.map(d => {
        return {
            input: encode(d.input),
            output: d.output
        }
    })
}

function execute(input) {
    console.log(input, 'input')
    let results = trainedNet(encode(input));
    let output;
    let certainty;
    console.log(results)
    console.log(results.Donald)
    if (results.Donald > results.Barack) {
        output = 'Donald Trump'
        certainty = Math.floor(results.Donald * 100)
    } else { 
        output = 'Barack Obama'
        certainty = Math.floor(results.Barack * 100)
    }

    return "I'm " + certainty + "% sure that tweet was written by " + output;
}

app.get('/', (req, res) => {
    train(trainingdata)
    
    console.log(execute("After years of rebuilding OTHER nations, we are finally rebuilding OUR nation - and we are restoring our confidence and our pride!"));
    res.render('index')
})

server.listen(3000, () => {
    console.log('Example app listening on port 3000!')
})

@robertleeplummerjr
Copy link
Contributor

The main problem here is the inputs are going to have varying sizes, which won't train correctly.

@robertleeplummerjr
Copy link
Contributor

Here is a working prototype: https://jsfiddle.net/8Lvynxz5/36/

@robertleeplummerjr
Copy link
Contributor

Here is a better working prototype (the other one I forgot to click save on) https://jsfiddle.net/8Lvynxz5/38/

@robertleeplummerjr
Copy link
Contributor

Let me know if that isn't enough of an example to get this right.
Things to be aware of:

  • Likely this is limited enough data that the net will probably over-train on
  • The recurrent neural network will likely give better results

@mohammedmulazada
Copy link
Author

mohammedmulazada commented Apr 10, 2018

Thank you very much for your help and pointers. I will have a good look at this tomorrow morning, really looking forward to it! :)

@mubaidr
Copy link
Contributor

mubaidr commented Apr 12, 2018

If I am not mistaken, this step (to encode and fix length of input data) is not required with LSTM network and is more suitable for these type of problems.
Example here: https://github.com/BrainJS/brain.js/blob/develop/examples/childrens-book.js

@mohammedmulazada
Copy link
Author

@mubaidr the unfortunate thing is that it seems like I can't use this, since I am running this on Node and as far as I understand that isn't available yet.

Currently the app is working while getting data from the twitter API, though for some reason it leans heavily to one side at the moment!

It seems like I am very close, but still a few tweaks needed here and there.

const dotenv = require('dotenv').config()
const brain = require('brain.js')
const Twitter = require('twitter')
const http = require('http')
const express = require('express')
const app = express()
const socketIO = require('socket.io')
const server = http.createServer(app)
const io = socketIO(server);
const path = require('path')
let dataTweet = []
app.set('view engine', 'ejs')
const publicPath = path.join(__dirname, './public')
app.use(express.static(publicPath))

let net = new brain.NeuralNetwork();
let trainedNet;
let longest;

const params = { screen_name: 'realdonaldtrump', count: 10, result_type: 'recent', tweet_mode: 'extended' };
const paramsObama = { screen_name: 'barackobama', count: 10, result_type: 'recent', tweet_mode: 'extended' };

const tokens = {
	consumer_key: process.env.CONSUMERKEY,
	consumer_secret: process.env.consumer_secret,
	access_token: process.env.access_token,
	access_token_key: process.env.access_token_key,
	access_token_secret: process.env.access_token_secret
}

const client = new Twitter(tokens)

function getTweets(user) {
	const promise = new Promise((resolve, reject) => {
		const params = { screen_name: `${user}`, count: 500, result_type: 'recent', tweet_mode: 'extended' };
		console.log(params)
		client.get('statuses/user_timeline', params)
		.then((data) => {
			if (data) {
				resolve(data)
			}
		})
	})
	
	return promise
}

getTweets('realdonaldtrump')
	.then((data) => {
		data.forEach((tweet) => {
			dataTweet.push({
				input: tweet.full_text.split('https:')[0],
				output: {[tweet.user.name.split(' ')[0]]: 1}
			})
		})
	}).then(() => {
		// train(getTrainingData(dataTweet))
	}).then(() => {
		

		getTweets('barackobama').then((data) => {
			data.forEach((tweet) => {
				dataTweet.push({
					input: tweet.full_text.split('https:')[0],
					output: {[tweet.user.name.split(' ')[0]]: 1}
				})
			})
		}).then(() => {
			train(getTrainingData(dataTweet))
	console.log(trainedNet(encode(adjustSize('A TOTAL WITCH HUNT!!!'))));

	console.log(trainedNet(encode(adjustSize('Incredible to have a Chicago team in the Final Four. I’ll take that over an intact bracket any day! Congratulations to everybody @loyolachicago - let’s keep it going!'))));
		})
	}).catch((e) => {
		console.log(e)
	})



// console.log(trainedNet(encode(adjustSize('Last night, it was my great honor to host America’s senior defense and military leaders for dinner at the White House. America’s military is the GREATEST fighting force in the history of the world. They all have my pledge of unwavering commitment to our men and women in uniform! '))));

// console.log(trainedNet(encode(adjustSize('Incredible to have a Chicago team in the Final Four. I’ll take that over an intact bracket any day! Congratulations to everybody @loyolachicago - let’s keep it going!'))));

function train(data) {
    net.train(processTrainingData(data), {
        iterations: 2000,
        log:true,
        learningRate: 0.1,
        timeout: 5000
    });
    trainedNet = net.toFunction();
}

function encode(arg) {
    return arg.split('').map(x => (x.charCodeAt(0) / 400));
}

function processTrainingData(data) {
    const processedValues = data.map(d => {
        return {
            input: encode(d.input),
            output: d.output
        }
    });
    // console.log(processedValues);
    return processedValues;
}

function getTrainingData(data) {
	const trainingData = data
  longest = trainingData.reduce((a, b) =>
    a.input.length > b.input.length ? a : b).input.length;
  for (let i = 0; i < trainingData.length; i++) {
    trainingData[i].input = adjustSize(trainingData[i].input);
  }
  return trainingData;
}

function adjustSize(string) {
  while (string.length < longest) {
    string += ' ';
  }
  return string;  
}

@mubaidr
Copy link
Contributor

mubaidr commented Apr 12, 2018

Api is same whether you use in browser or node.js. I am currently on mobile, will look into code later.

@robertleeplummerjr
Copy link
Contributor

The api is the same in whatever javascript you are on. I started experimenting with a lstm version, which I believe is the answer here, I'll see what I can come up with as well.

@mohammedmulazada
Copy link
Author

In case you would like to see where I am at right now, here is a link to the repo: https://github.com/moniac/real-time-web

Thanks for the help already, it's really exciting to see this develop!

@robertleeplummerjr
Copy link
Contributor

robertleeplummerjr commented Apr 14, 2018

It has been a while since I looked at the api (even though I built it, lol). We really should document stuff like this better but here you go: https://jsfiddle.net/j638LfLd/

Outputs:

iterations: 0 training error: 102.38539514646675
iterations: 10 training error: 22.821843183158208
iterations: 20 training error: 11.492725393552938
iterations: 30 training error: 8.732626567498885
iterations: 40 training error: 6.30779809294381
iterations: 50 training error: 5.643141357916936
iterations: 60 training error: 4.7329927373696865
iterations: 70 training error: 3.923181896715045
iterations: 80 training error: 3.5271946017753346
iterations: 90 training error: 3.713248549303406
iterations: 100 training error: 3.9557870752674535
iterations: 110 training error: 3.034900554836554
iterations: 120 training error: 2.9839399371764377
iterations: 130 training error: 2.6454871466040264
iterations: 140 training error: 3.4238002357793262
iterations: 150 training error: 2.4393232345710634
iterations: 160 training error: 2.554969571658671
iterations: 170 training error: 2.3398548776565775
iterations: 180 training error: 2.1756967055287038
iterations: 190 training error: 2.267156470151439
Donald
Barack

Definition:

let net = new brain.recurrent.LSTM();
net.train([
  {
    input: 'I say yes!',
    output: 'positive'
  },
  {
    input: 'I say no!'.
    output: 'negative'
  }
]);

const standaloneFunction = net.toFunction();

// for the curious at heart and want the non-readable version:
console.log(standaloneFunction.toString());

@mubaidr
Copy link
Contributor

mubaidr commented Apr 15, 2018

What does 'net.toFunction' do?

@robertleeplummerjr
Copy link
Contributor

// for the curious at heart...

It compiles the whole network into a single static function that is only used for the purpose it was trained for.

robertleeplummerjr added a commit that referenced this issue Apr 21, 2018
Partially fix #188.  Add missing api documentation to readme.
@mohammedmulazada
Copy link
Author

mohammedmulazada commented Apr 23, 2018

I apologize for reopening this issue, I have ran into another problem!

Currently, I am trying to check tweets based on a hashtag and compare them to past tweets, to see if they fit with the previous tweets or not, doing this might be able to have the system recognize spam/trolls.

I have tried both the LSTM and the NN, where the LSTM only returns 'gibberish', and the NN basically says any input is good input!

Currently the system is looking at tweets that use the hashtag fortnite. Would I need to add another output classified as 'other'?

This would be the NN version:

let net = new brain.NeuralNetwork()
let trainedNet
let longest
let tweets = []

function train(data) {
	net.train(processTrainingData(data), {
		iterations: 2000,
		log: true,
		learningRate: 0.1,
		timeout: 4000
	})
	trainedNet = net.toFunction()
}

function encode(arg) {
	return arg.split('').map(x => x.charCodeAt(0) / 400)
}

function processTrainingData(data) {
	const processedValues = data.map(d => {
		return {
			input: encode(d.input),
			output: d.output
		}
	})
	console.log(processedValues)
	return processedValues
}

function getTrainingData(data) {
	const trainingData = data
	longest = trainingData.reduce(
		(a, b) => (a.input.length > b.input.length ? a : b)
	).input.length
	for (let i = 0; i < trainingData.length; i++) {
		trainingData[i].input = adjustSize(trainingData[i].input)
	}
	return trainingData
}

function adjustSize(string) {
	while (string.length < longest) {
		string += ' '
	}
	return string
}

var es = new EventSource('/stream')

es.addEventListener('connect', function(event) {
	const text = JSON.parse(event.data)

	if (text.tweet) {
		tweets.push({
			input: text.tweet,
			output: { fortnite: 1 }
		})
	}

	if (tweets.length === 2) {
		console.log(getTrainingData(tweets))
		train(tweets)
	}
	console.log(trainedNet(encode(adjustSize(text.tweet))))
	console.log(trainedNet(encode(adjustSize('the legend of zelda'))))
})

And the LSTM version

let net = new brain.recurrent.LSTM()
let trainedNet
let longest
let trainingData = []


function train(data) {
	net.train(data, {
		iterations: 1,
		log: true
	})
	trainedNet = net.toFunction()
}

var es = new EventSource('/stream')

es.addEventListener('connect', function(event) {
	const text = JSON.parse(event.data)

	if (text.tweet && trainingData.length < 5) {
		trainingData.push({
			input: text.tweet,
			output: { fortnite: 1 }
		})
	}

	if (trainingData.length === 4) {
		train(getTrainingData())
		console.log(trainedNet('Fortnite is nice!', 1000))
	}
})

function getTrainingData() {
	return trainingData
}

@mohammedmulazada
Copy link
Author

Some dummy data

const trainingData = [
	{
		input:
			"Fortnite, camo and old shoes: The 'normal' life of Steven Adams - via @ESPN",
		output: { fornite: 1 }
	},
	{
		input: 'U can get a scholarship for playing fortnite now? Wooooowwwwww',
		output: { fornite: 1 }
	},
	{
		input:
			'they’re playing fortnite mobile, clear sign of special needs, they can’t help it :/',
		output: { fornite: 1 }
	},
	{
		input: 'I liked a @YouTube video ',
		output: { fornite: 1 }
	}
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants