The Language of Legal and Illegal Activity on the Darknet

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Documents

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well do off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drugs-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.
Original languageEnglish
Title of host publicationProceedings of the 57th Annual Meeting of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Publication date2019
Pages4271-4279
Publication statusPublished - 2019
Externally publishedYes
Event57th Annual Meeting of the Association for Computational Linguistics - Florence, Italy
Duration: 1 Jul 20191 Jul 2019

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics
LandItaly
ByFlorence,
Periode01/07/201901/07/2019

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 239016644