GitHub - TheoSimier/Moderation-online-content: Moderation of online content through Natural Language Processing, Machine Learning and Deep Learning

TheoSimier / Moderation-online-content Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Moderation of online content through Natural Language Processing, Machine Learning and Deep Learning

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.txt		README.txt
Report - Moderation of online content.pdf		Report - Moderation of online content.pdf
moderation_online_content.html		moderation_online_content.html
moderation_online_content.ipynb		moderation_online_content.ipynb

Repository files navigation

Project: Moderation of online content through Natural Language Processing, Machine Learning and Deep Learning

The objective of this master project is to illustrate:
how techniques of Natural Language Processing and Machine Learning can be used to gain meaningful information from text. 

During this thesis, I explain the most commonly used techniques and apply them on a concrete example: 
the creation of an algorithm capable of monitoring questions by checking if 
a question respects or not the terms and conditions of a question-and-answer website named Quora. 
I started by creating statistical features, normalizing the questions and transforming them into a format that classification algorithms can handle. 
Finally, I used a Logistic Regression, a Random Forest and a Deep Learning model to predict if the questions were compliant or not. 
The best algorithm reached an accuracy of 87%.

The database can be found at the following url:
https://www.kaggle.com/c/quora-insincere-questions-classification/data

Glove embeddings can be retrieved at the following url:
https://nlp.stanford.edu/projects/glove/

Details on the sources of the project can be found on the references of "Report - Moderation of online content.pdf"

This master project was realized during my MSc in Data Analytics & Artificial Intelligence at EDHEC.
Author: Theo Simier under the direction of: Prof. Dr. Christophe Croux