An algorithm to sniff out fake news, linguistically.

The Conversation looks at a way that Fatemeh Torabi Asr, a computational linguistics researcher at Simon Fraser University, has devised to use computers to instantaneously identify the language patterns of misleading news stories:

The linguistic characteristics of a written piece can tell us a lot about the authors and their motives. For example, specific words and phrases tend to occur more frequently in a deceptive text compared to one written honestly.

Our analysis of a large collection of fact-checked news articles on a variety of topics shows that, on average, fake news articles use more expressions that are common in hate speech, as well as words related to sex, death and anxiety. Genuine news, on the other hand, contains a larger proportion of words related to work (business) and money (economy).

This suggests that a stylistic approach combined with machine learning might be useful in detecting suspicious news.

Our fake news detector is built based on linguistic characteristics extracted from a large body of news articles. It takes a piece of text and shows how similar it is to the fake news and real news items that it has seen before. (Try it out!)

The main challenge, however, is to build a system that can handle the vast variety of news topics and the quick change of headlines online, because computer algorithms learn from samples and if these samples are not sufficiently representative of online news, the model’s predictions would not be reliable.

This seems like really impressive research. It also seems, from my view as a layperson, like something that could go quite wrong as language keeps evolving. But a step in the right direction, though.