Automated Semantic Analysis for Stylometry – Københavns Universitet

Automated Semantic Analysis for Stylometry

Master thesis defense by Steffen Hedegaard

Resume:

We examine the usefulness of semantic frames for authorship attribution, in order to judge its feasibility for improving upon the state of the art marker "function words".

By combining the information gained from semantic frames and function words, the number of misclassications was reduced by 83% compared to a purely function word based classifier, on a corpus of Victorian Romantic Literature, causing the achieved accuracy to improve from 97.95% to 99.66%.

Semantic frames performed poorly on the classical testbed "The Federalist Papers", due to insufficient document size. Practical limits were found for the necessary size of documents.

The use of all frames as features are necessary for semantic frames to be viable for improving upon the use of function words.

---

Vejleder: Jakob Grue Simonsen, DIKU

Censor: Troels Andreasen, RUC