Detecting Depression in Social Media using Machine Learning

Depression is one of the most afflicting mental health illnesses in America; however, with the sheer amount of data available in social media, machine learning, a field of artificial intelligence, can be used to detect early signs of depression. Machine learning is used in many industries today, most notably voice-to-text applications like Siri or Spotify song recommendations, however, its’ use in public health is just beginning to emerge.

For my research project, I will create a model to help detect depression in social media. Using data scrapped from Reddit, a community-forum built on user voting, I will create a model that classifies depressed and non-depressed individuals. Approximately 20,000 comments will be collected from subreddits, subforums in Reddit centered around a certain topic. The experimental, depressed group will be comprised of 10,000 comments from /r/depression. The control, non-depressed group will be comprised of 10,000 comments from non-specific subreddits such as /r/news, /r/casualconversation, /r/memes.

From there the comments are preprocessed, or cleaned, of stop-words, links, and punctuation, and the words are divided. After this, the data will be fitted into machine learning algorithms like Naive Bayes and SVM to help train and test the model.

 

Note: link title is wrong. Old project idea

Speak Your Mind