When Occam’s Razor Cuts You

The past couple of weeks have had a fun on-again, off-again fixing a “legacy” problem, if that term can be applied to a four-month old site: IDing when Trump himself uses @realdonaldtrump, vs his staff.

Trumpologists have known for a while that he used a Samsung Galaxy S3, aka an Android phone. His staff all used iPhones. Further, multiple analyses of the language style, time of day, etc. (nerd out here and here) validated the connection between the Android and tweets from the thumbs of Trump.

But things change. His Android phone was last seen the morning of March 8th:

Then… nothing. Android showed up briefly for two tweets on March 25th within four minutes of each other (I’m picturing a brief wrestling match with the Secret Service as he pulled the phone from its hiding spot in the limo on the way to Trump National in Potomac Falls, VA… the two tweets were during his ride over, 20 minutes before arrival… maybe while on the Toll Road?) and disappeared again.

Meanwhile, there were 139 other tweets, mostly from the iPhone. Including lots of FAKE NEWS references and other tweets that are universally agreed to have come from him.

So, here in the Fact Cave, we’ve had an algorithm for months that looked at his text. The trouble: it essentially biased the heck out of Android as an indicator. When the Android disappeared, the algo rolled over, showed its belly and promptly failed miserably. Back to the drawing board.

So we got to work. And iterated, tested, iterated and iterated some more.

Meanwhile, Andrew McGill at The Atlantic remembered the golden rule we forgot: Always. Be. Shipping. His excellent code can be found here.

We agreed with most of his approach, but now properly motivated, we blew the dust off perfection, hunkered down and reclassified all the tweets from March 8th forward?

The logic? We kept it simple:

  • Control. In 2016, removing retweets, the Android phone tweeted 1,357 times. Other devices tweeted 2,264 times. For our purposes, this was treated as gospel, where Android = Trump, not Android = staff
  • Words. Trump is very distinctive. We generated a deep count of commonly used words on both accounts. Simple word frequency
  • Hashtags. Trump almost never uses hashtags. His staff uses them frequently. Appearance of hashtags biases heavily to Staff
  • URLs and Photos. Outside of retweets, Trump used either a URL or photo a total of 10 times out of 1,357 tweets. Another heavy bias
  • Others. What we tested and ignored: sentiment (Trump tweets bias negative, but not consistently enough to be a factor), user mentions, use of capitalization, use of exclamation points. None of these were as clear as hashtags and URLs.
  • New platforms. Twitter Ads and Media Studio – both social platforms / products unlikely to be in Trump’s hands on his phone, are an automatic staff.
  • Then we guessed…

… well, not really. We took those three factors (for each we did a log odds ratio) and threw it into a test that automatically adjusted the weighting and scores for each of the factors and compared it against a random sample of the control (1,000 items) until it settled on the best outcome.

The best result we could get without relying on the device as a major indicator? The robot can correctly identify Trump tweets 91% of the time, and staff tweets 85% of the time.

Perfect? No. Better than nothing? Yes.

We’re going to keep working on it and getting that number up. And we’ll be faster about it next time.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s