The third kind of lie: science vs. p-hacking.

Fusion goes beyond the three kinds of lies (“Lies, damned lies, and statistics,” according to… someone) and into the awful implications of trusting the data as it lies, lies, lies to us:

In many fields of research right now, scientists collect data until they see a pattern that appears statistically significant, and then they use that tightly selected data to publish a paper. Critics have come to call this p-hacking, and the practice uses a quiver of little methodological tricks that can inflate the statistical significance of a finding. As enumerated by one research group, the tricks can include:

  • “conducting analyses midway through experiments to decide whether to continue collecting data,”
  • “recording many response variables and deciding which to report postanalysis,”
  • “deciding whether to include or drop outliers postanalyses,”
  • “excluding, combining, or splitting treatment groups postanalysis,”
  • “including or excluding covariates postanalysis,”
  • “and stopping data exploration if an analysis yields a significant p-value.”

Add it all up, and you have a significant problem in the way our society produces knowledge.

An average person scrolling through a newsfeed won’t realize that much of the shit that “science” or “a study” says wouldn’t hold up on closer examination, especially if it was published in a journal.

And that’s the professional science! It’s to say nothing of all the data-driven decision-making that’s happening in business right now.

The chunk most worth thinking about comes a few grafs later on:

Take the ad-supported digital media ecosystem. The idea is brilliant: capture data on people all over the web and then use what you know to show them relevant ads, ads they want to see. Not only that, but because it’s all tracked, unlike broadcast or print media, an advertiser can measure what they’re getting more precisely. And certainly the digital advertising market has grown, taking share from most other forms of media. The spreadsheet makes a ton of sense—which is one reason for the growth predictions that underpin the massive valuations of new media companies.

But scratch the surface, like Businessweek recently did, and the problems are obvious. A large percentage of the traffic to many stories and videos consists of software pretending to be human.

“The art is making the fake traffic look real, often by sprucing up websites with just enough content to make them appear authentic,” Businessweek says. “Programmatic ad-buying systems don’t necessarily differentiate between real users and bots, or between websites with fresh, original work, and Potemkin sites camouflaged with stock photos and cut-and-paste articles.”

Of course, that’s not what high-end media players are doing. But the cheap programmatic ads, fueled by fake traffic, drive down the prices across the digital media industry, making it harder to support good journalism. Meanwhile, users of many sites are rebelling against the business model by installing ad blockers.

Something to consider: Those of us who do use ad blockers, nowadays, have trouble understanding those of us who don’t. The phrase that I’ve seen again and again is “It’s like we’re seeing a whole different internet.”

Be the first to comment

Leave a Reply