Date of Completion

2019

Document Type

Honors College Thesis

Department

Department of Romance Languages & Linguistics

Thesis Type

College of Arts and Science Honors, Honors College

First Advisor

Julie Roberts

Keywords

Linguistics, Corpus Linguistics, Discourse Analysis, Twitter, Threat Detection, Social Media

Abstract

As social media increases in popularity, its ability to create culturally meaningful tools grows as well. One of the most promising tools is categorization software, which analyzes the linguistic data in social media posts to make predictions. It does with the help of corpus linguistics, a form of analysis designed to pick out the most frequent and/or significant words in a dataset. This study focuses on software intended to detect threats. While this technology has the potential to flag threatening language used by groups or individuals, the text search strategies it currently uses often result in a high number of false positives, making it too unreliable for effective use. The software is most effective at marking whether or not a specific word is present in a tweet, not determining whether or not this word is actually being used in a threatening way (e.g. "I'm planning on killing him" vs. "this silence is killing me"). Discourse analysis, which looks at the role context plays in language, could minimize these errors by helping researchers refine the software in a manner that more closely matches how people actually use language. The goal of this project, then, is to investigate ways of combining corpus linguistics and discourse analysis with a Twitter database to improve threat analysis.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS