When Occam’s Razor Cuts You

The past couple of weeks have had a fun on-again, off-again fixing a “legacy” problem, if that term can be applied to a four-month old site: IDing when Trump himself uses @realdonaldtrump, vs his staff.

Trumpologists have known for a while that he used a Samsung Galaxy S3, aka an Android phone. His staff all used iPhones. Further, multiple analyses of the language style, time of day, etc. (nerd out here and here) validated the connection between the Android and tweets from the thumbs of Trump.

But things change. His Android phone was last seen the morning of March 8th:

Then… nothing. Android showed up briefly for two tweets on March 25th within four minutes of each other (I’m picturing a brief wrestling match with the Secret Service as he pulled the phone from its hiding spot in the limo on the way to Trump National in Potomac Falls, VA… the two tweets were during his ride over, 20 minutes before arrival… maybe while on the Toll Road?) and disappeared again.

Meanwhile, there were 139 other tweets, mostly from the iPhone. Including lots of FAKE NEWS references and other tweets that are universally agreed to have come from him.

So, here in the Fact Cave, we’ve had an algorithm for months that looked at his text. The trouble: it essentially biased the heck out of Android as an indicator. When the Android disappeared, the algo rolled over, showed its belly and promptly failed miserably. Back to the drawing board.

So we got to work. And iterated, tested, iterated and iterated some more.

Meanwhile, Andrew McGill at The Atlantic remembered the golden rule we forgot: Always. Be. Shipping. His excellent code can be found here.

We agreed with most of his approach, but now properly motivated, we blew the dust off perfection, hunkered down and reclassified all the tweets from March 8th forward?

The logic? We kept it simple:

  • Control. In 2016, removing retweets, the Android phone tweeted 1,357 times. Other devices tweeted 2,264 times. For our purposes, this was treated as gospel, where Android = Trump, not Android = staff
  • Words. Trump is very distinctive. We generated a deep count of commonly used words on both accounts. Simple word frequency
  • Hashtags. Trump almost never uses hashtags. His staff uses them frequently. Appearance of hashtags biases heavily to Staff
  • URLs and Photos. Outside of retweets, Trump used either a URL or photo a total of 10 times out of 1,357 tweets. Another heavy bias
  • Others. What we tested and ignored: sentiment (Trump tweets bias negative, but not consistently enough to be a factor), user mentions, use of capitalization, use of exclamation points. None of these were as clear as hashtags and URLs.
  • New platforms. Twitter Ads and Media Studio – both social platforms / products unlikely to be in Trump’s hands on his phone, are an automatic staff.
  • Then we guessed…

… well, not really. We took those three factors (for each we did a log odds ratio) and threw it into a test that automatically adjusted the weighting and scores for each of the factors and compared it against a random sample of the control (1,000 items) until it settled on the best outcome.

The best result we could get without relying on the device as a major indicator? The robot can correctly identify Trump tweets 91% of the time, and staff tweets 85% of the time.

Perfect? No. Better than nothing? Yes.

We’re going to keep working on it and getting that number up. And we’ll be faster about it next time.


Factba.se 2.0. Now with 0.2 More!

Okay, been focusing on quite a few things, but we just pushed out a fairly large update, and we’ve got some news as well. But first, the updates from today and the past two weeks:

  • Full Access to transcripts. We’ve been asked about this repeatedly. Now, you can browser through everything Trump has said that we have in the system in a handy timeline here: https://factba.se/transcripts. In addition, it surfaces some of the behind-the-scenes analytics, like emotion analysis, sentiment analysis, keywords, entities and more. Just click on an item to see a detailed breakout. (for example: https://factba.se/transcript/donald-trump-remarks-greek-ceo-march-24-2017).
  • White House Schedule. A simple little doodad. It lists the President’s Schedule (public schedule), broken out as appointments. As analysis comes in, it is linked into the schedule. It’s also available in JSON, CSV, and of course iCal format, as well as in a public Google calendar. https://factba.se/topic/calendar
  • iOS App. This grew out of the consolidated White House feed we did, so everyone can monitor all the White House’s social feeds, website and email list to the press in one spot. We were asked for realtime alerts. Then we thought about an app. Then I said “Hey, how hard can it be to learn to code an iOS app?” Seven days laters, with about four hours of sleep total and three cases of Diet Coke, the keyword-friendly-named “Trump White House Consolidated News Release Feed” app was born. A whopping $0.99, which after using for a year, means we lose money on the push alert costs. But it needed to be done.  http://apple.co/2nEVN7Y

Whew, that’s quite a bit for an update. One more piece of news…

Open Data Access. After a fair amount of discussion, we’ve decided to pursue freely distributing the entire Trump dataset via APIs. This will provide data access to:

  • Complete Transcript Library (3MM+ words) + Meta Data
  • The live Trump Twitter Archive
  • The complete screenshot library of his @realdonaldtrump feed
  • Financial records in data form and mapped to company holdings
  • H1B Filings
  • Court Records

We’ve already started doing that with our live feeds and calendars. Anyone who wants to data mine or come up with new ways of using the data will be free to do so.

We need to get the infrastructure in place, and that may take a couple of weeks, but we’ll have managed public APIs that let you get some, or all, of the data for public use, on the condition the work product is shared publicly as well.

The live White House feed is available freely now as:

The President’s Schedule is available similarly as:

That’s enough of an update for today. Onward.

Factba.se v1.8 – We’re almost to v2

We’ll get to the v2 to release. But first, a couple of notes:

1. We’ve been a bit overwhelmed by the requests to assist — voluntarily — with the site and information collection. If we’ve been slow to follow up, see #3 below.

2. We pushed live an internal tool that we think would be useful. One of our unexpected challenges was the lack of a centralized place to gather new information. Update can appear on Twitter, Facebook, Youtube, the White House site. To that end, we centralized a realtime feed that pulls together:

  • Whitehouse.gov
  • Facebook (DonaldTrump, POTUS, Whitehouse)
  • Instagram (Whitehouse)
  • YouTube (WhiteHouse)
  • Twitter (realDonaldTrump, WhiteHouse, POTUS, VP, Mike_Pence, SeanSpicer)
  • The White House press distribution list (immediate release only)

…and puts it here: https://factba.se/topic/latest . This is the same feed we use to monitor and add all new public statements. The social feeds update realtime. The White House and email are every 60 seconds. You can also plug it in to RSS, hit the JSON directly, or follow it live on the robotic @FactbaseFeed

If you see a source missing, just let us know. As near as we can tell, it’s the only source that monitors everything coming out of the WHPO from all sources.

3. A bit of personal news. Since the election, Factba.se has become an increasing focus of my life in particular. To that end, I left my day job last week as Vice President / Entrepreneur-in-Residence at U.S. News and World Report to dedicate more time and focus on the platform and content. Based on the traffic, it’s getting regular, repeat use in newsrooms. We hope to become only more valuable as time goes on. This includes getting back to the folks in regard to what we need (mostly: tracking down video for documents).

And if you know a good project manager who could take some time to yell at me daily to stop obsessing on minor details, please send them my way… or just randomly call and yell at me. The 120 Jira tickets aren’t going down fast enough :-).




v1.7 Data Is Easy. Messy Data is Hard

“Hey, I wonder if the President ever filed for H-1Bs?” sayeth I this morning. How hard can that be to track down.

Tracking down? Not so bad.

Gathering records that were originally OCR’ed from fax? (2001-2006)? A helpful Department of Labor uploading neat data… in Excel… 500,000 rows at a time?

So, a simple data mining exercise ended up being about six hours of glorious frustration dealing used hermes briefcase calabasas ca
with 200+MB XLSX to free the data into a database. Oh, and please change the field hermes belts names every two years. And rearrange the columns while you’re at it.

So, at 4:50 am ET, the data is free, in a searchable database… all 6.1MM records since the start of the millennium (h/t to FLC for archiving the older records pre-2008) and all 564 companies checked, including for rough typos, then hand checked. And, the answer, it turns out is 162 H-1Bs filed for 288 open positions.

Note we included Eric Trump’s winery, since it appears President Trump was involved in the organization. Also birkin bag hermes included are some hotels that license President’s Trump name, as standard licenses from the Trump Organization involve upholding standards 2015 hermes scarf catalogue monoprix en
set by the organization.

Expect a flurry of releases in the coming hermes bags days. The last hermes h belt week was spent on the back end refining the composite model for better hermes handbags
transcription. Also, we’re archiving and internalizing video as a backup, as videos of Trump campaign speeches have been disappearing from Youtube.

Also, as a side note, we’d love more information poloponynetwork.com on the property at 2265 Aragon St, Sebring, Florida. It’s a 0.25 acre parcel of undeveloped swampland, owned by President Trump since July, 2005. It doesn’t appear on either of his Form 278e financial disclosure forms, but taxes are current as of November, 2016.

Within the next seven days, we should be back on track with speeches and statements automatically appearing on the site. We wanted to make sure that whatever we did, we wouldn’t run into video links dying on us again.

Until then, the salary ranges and positions for the H-1Bs make for an interesting read.

v1.6 – This is Vacation?

Yeesh, okay. Got it in just under the wire before inauguration. Quick note but the basics:

  • Added about 50 more hours of interviews, including interviews from the past two weeks, press conferences, 18 new stump speeches, and three paid speaking transcripts, including Australia.
  •  Up to date as of the Lincoln poloponynetwork.com Memorial concerts on 19 January
  • Margaret is better at transcribing. You’ll see this in a lot of the new uploads.
  • We think we have the feed set for the press office transcripts and videos… we’ll know in nine hours
  • Smarter search used hermes briefcase calabasas ca
    – now searches for things like “deleted tweets” surface hermes birkin bag the right topic pages.

Thanks for your patience. Back from vacation Sunday, and full hermes belts replica steam on 1.7.

Where’s Version 1.6? Huh?

As always, happening a bit slower hermes h belt than expected. The curveball used hermes briefcase calabasas ca
here: the two founders are on vacation this week. To answer questions to that effect, this was planned and booked before either political convention in 2016. Poor timing.

Also, being in Washington, we take a parochial view of Inauguration hermes belts regardless of who it is: it majorly messes up traffic. So this was a logical week for vacation months before Factba.se was conceived.

With all that said, the work looks something like this photo, taken this morning (81°F / 22°C, light breeze from the East, partly cloudy, thanks for asking). However, our next update will drop before vacation. Margaret has an upgrade that will speed up video straight to the site. And we’re on standby with the White House Press Office to get feeds of speeches hermes bags and videos the moment they are available. Plus, some new search refinements, topic pages, and some other pieces.

Onward, and thanks for your patience! And thanks to David Mack @ Buzzfeed for the shout out last night (а также поблагодарить Московский комсомолец).



Version 1.5 Released – Financials, Deleted Tweets, More


Well, that’s 14 hours used hermes briefcase calabasas ca
well spent!

You’ll now find:

Coming shortly:

  • A vastly streamlined system to get get his speeches into the system in near realtime
  • A large hermes h bracelet tranche of video (more than 200 additional view the website
  • The fabled Stern interviews. Promise.

We are not slowing down!

Questions Suggestions? Issues (given these new topic pages are under 3 hours old, it’s possible)? Let us know!

On to v1.6. After a nap.


We’re Not Obsessing…

It’s completely normal to play with line-height and icon hermes h belt shading

for two hours… right?

Skipping to v1.5. A lot of new transcripts including today’s press conference. And an interactive explorer of all his assets: companies, stocks, including a live used hermes briefcase calabasas ca
total of Net Worth, as reported hermes belt buckle
in FEC filings.

No sleep hermes belt ’til v1.5