21 June 2017

High School Students! Don't cheat! You'll get busted eventually

ghost-writing detection

More than 6% of the Danish high school pupils cheat with assignments or at their final exams. Some are caught. The majority are not - because there is no system to support the screening. Researchers at Department of Computer Science in Copenhagen work on detecting ghost-writing in high school assignments - and results are promising.

By the end of June the tradition-bound student caps are awarded to the freshly graduated high school students. Loud music and shouts of joy echo through the major Danish cities, announcing that a new generation of students are either heading for their next career step, e.g. a university education, or perhaps taking a well deserved sabbatical after a hard and hectic year of final exams.

But how well deserved is the cap after all? Research and randomized surveys show that cheating with assignments and exams is an increasing problem. According to a survey carried out by TV2 (Danish Broadcast company) more than 6% of all Danish high school students admit that they have cheated with their assignment.

Plagiarism can be detected through pattern recognition

Common plagiarism, which means copying parts of other pupils' assignments, can already now be detected by machine learning. The Lectio platform, which is used by most Danish high schools for study administration and communication with students, has a module that can analyse an assignment handed in by a pupil and compare it to his/her former assignments and to the complexity of assignments contained in Lectio. If the system recognises passages that are identical to other pupils' assignments, it can send a warning to the teacher, who may then take action.

Ghost-writing is more difficult to detect

If the pupil has ordered an assigment by another person, it is more difficult for the machine to detect the deviation, since there is probably no other identical text passages within the system. In this case, new methods using machine learning and big data that analyse the personal writing style of the pupil can solve the task:

On the basis of former assignments handed in by the pupil the computer has established a personal profile of the writing style of this pupil based on e.g. typical words and phrases, spelling mistakes, use of long vs. short words and sentences. If the analysis comes up with a significantly different writing style compared to the normal style of this pupil, it does not necessarily mean that the pupil has cheated, e.g. by copying other peoples' work. But the machine may prompt the teacher responsible for revewing the assignment to take a closer and more critical look at its originality.

In Denmark the phenomenon of ordering assignments from 'ghost writers' is relatively new, whereas it has been known for some time in e.g. the USA. An industry of 'papermills' has grown, whose business model is to write assignments for stressed or ambitious students. At university level, more than 7% of the American students have admitted to filing ghost-written assignments.

The newly established Danish service 'Fix-my-assignment', where students can define a requested note and get a professionally written assignment within 24 or 48 hours by paying a certain amount, has recently experienced a growing interest from Danish high school students.

Cheating can be detected and fought - e.g. through research in machine learning and big data

The larger the volume of comparison, the better methods can be achieved to fight cheating. The Danish company MaCom delivering the Lectio-platform today has a collection of 15 m assignments handed in digitally by Danish high school students over the recent years.

Professor Stephen Alstrup, DIKU

These assignments now form part of a major research project carried out by researchers at University of Copenhagen in collaboration with MaCom. The purpose is to detect ghost writing. Professor Stephen Alstrup who is in charge of a major research project applying big data and digitization within Education in the framework of DABAI.DK, explains:

- Back in 2013 we initiated a project in collaboration with MaCom to analyse the individual writing style of high school pupils. One of my master's students, Niels Dalum Hansen did a project on plagiarism control based on 100,000 digital assignments and was able to reach a 60% probability proof that a certain pupil had cheated with an assignment.

An accuracy of 73.3% has been reached - and we are improving all the time

Today (2017) another master's student is working on next generation of the same project using more refined analytical methods to detect ghost-writing. In this project we compare the person's writing style with a cluster of assignments handed in by both this person and assignments submitted by other persons. By this, it can be demonstrated whether plagiarism stems from other persons in this cluster or from another source.

The results are promising, since we can demonstrate deviations or plagiarisms with an accuracy of 73% using various advanced machine learning algoritms.

Assignment X is compared to a broad collection of assignments focusing on the pupil's personal writing style vestil. Er Opgave X for afvigende, kan metoden med 73% sikkerhed afsløre snyd.

When the collection of assignments grows even bigger and the methods get more refined we will be able to look at more markers and analytic elements and carry out even more complex comparisons, which might e.g. include profiling the individuel writing style over time.

The purpose of this new project is not first and foremost to play 'Big Brother' and hunt exam stressed high school pupils - and the output of the machine analysis must never stand alone. The idea is to give the teachers a general warning system and a tool to assist in identifying possible cheaters - both to stop this illegal practice and to detect ghost-writing at an early stage.

But as important is the goal of helping the individual student supporting his own learning curve and improving his or her learning style and progression throughout the three years in high school.