Detecting Depression in Social Media using Machine Learning

Depression is one of the most afflicting mental health illnesses in America; however, with the sheer amount of data available in social media, machine learning, a field of artificial intelligence, can be used to detect early signs of depression. Machine learning is used in many industries today, most notably voice-to-text applications like Siri or Spotify song recommendations, however, its’ use in public health is just beginning to emerge.

For my research project, I will create a model to help detect depression in social media. Using data scrapped from Reddit, a community-forum built on user voting, I will create a model that classifies depressed and non-depressed individuals. Approximately 20,000 comments will be collected from subreddits, subforums in Reddit centered around a certain topic. The experimental, depressed group will be comprised of 10,000 comments from /r/depression. The control, non-depressed group will be comprised of 10,000 comments from non-specific subreddits such as /r/news, /r/casualconversation, /r/memes.

From there the comments are preprocessed, or cleaned, of stop-words, links, and punctuation, and the words are divided. After this, the data will be fitted into machine learning algorithms like Naive Bayes and SVM to help train and test the model.


