Improving natural language processing with human data: Eye tracking and other data sources reflecting cognitive text processing

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Dokumenter

  • Maria Barrett
When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.
OriginalsprogEngelsk
ForlagDet Humanistiske Fakultet, Københavns Universitet
Antal sider171
StatusUdgivet - okt. 2018

Note vedr. afhandling

Ph.d. afhandling forsvaret 10. oktober 2018

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk


Ingen data tilgængelig

ID: 203773341