Skip to content

TheoSimier/Moderation-online-content

Repository files navigation

Project: Moderation of online content through Natural Language Processing, Machine Learning and Deep Learning

The objective of this master project is to illustrate:
how techniques of Natural Language Processing and Machine Learning can be used to gain meaningful information from text. 

During this thesis, I explain the most commonly used techniques and apply them on a concrete example: 
the creation of an algorithm capable of monitoring questions by checking if 
a question respects or not the terms and conditions of a question-and-answer website named Quora. 
I started by creating statistical features, normalizing the questions and transforming them into a format that classification algorithms can handle. 
Finally, I used a Logistic Regression, a Random Forest and a Deep Learning model to predict if the questions were compliant or not. 
The best algorithm reached an accuracy of 87%.

The database can be found at the following url:
https://www.kaggle.com/c/quora-insincere-questions-classification/data

Glove embeddings can be retrieved at the following url:
https://nlp.stanford.edu/projects/glove/

Details on the sources of the project can be found on the references of "Report - Moderation of online content.pdf"

This master project was realized during my MSc in Data Analytics & Artificial Intelligence at EDHEC.
Author: Theo Simier under the direction of: Prof. Dr. Christophe Croux

About

Moderation of online content through Natural Language Processing, Machine Learning and Deep Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published