Trump / Putin Meeting: Who’s Frustrated? Who’s in Control? Ask the Data

Short version

Our robot’s audio analysis of Trump from his two minutes on camera with Putin: “Disappointed, feeling of missed opportunity. Cold and remote. Pompous, overexcited, empowered.”

Our robot’s audio analysis of Putin from his two minutes on camera with Trump: “Feels in control. Controlled speaking.”

Questions? I hope so. Explanation below. It’s worth it. If not, skip to the Infographic,

Long Version

We’re constantly bringing new processes, techniques and tools online at FactSquared. We’ve been using machine learning for analyzing audio, video and text since we launched (all the way back in… January!)

But we’re also very agnostic about tools. We don’t have all the answers, and we watch this space closely for new developments. When something new comes along, we try it. If it adds value, we integrate into our composite.

“Oh people can come up with statistics to prove anything Kent. Forty percent of all people know that.”
— Homer Simpson
The Simpsons, S05E11

We were going through a round of testing on a new approach right when Donald Trump met with Vladimir Putin in Hamburg, Germany on Friday, July 7, 2017.

In a peanut-butter-meets-chocolate moment, we said: “let’s try this out!”

In keeping with the past blog posts, you get some background and details. It’s like Neil deGrasse Tyson, but not as funny. Or smart. Or handsome. Or charismatic…

…back from the therapist. All better. Picking it back up…

A huge part of what we do, separate from pulling all this data together, is the analysis. Most of it is behind the scenes because it’s a lot of data. 115 datapoints per word. Or, in the average 10-minute speech (1,132 words, at current 30-day moving average of Trump’s speeches and remarks of 113.2 words per minute), 130,180 datapoints. You do not want all that on a page.

It all feeds our search engine to make the results hyper-accurate, but the goal has always been a way to surface the information that doesn’t overwhelm. You’ll start seeing some of it in the next few days as we get charts and dashboards on the search, and in our daily newsletter (yes, it’s coming).

Text Analysis

Part of all this is text analytics of course. Using established approaches and methodologies, it analyzes the words and groups of words to score how positive or negative a statement is, what emotions it conveys, the topics of conversation, and so on.

For example, when we analyze word usage to determine odd turns of phrase, or how “normal” a statement is in terms of language, it utilizes the Corpus of Contemporary American English, a statistical compilation of 520 million words across books, newspapers, magazines, books, spoken words from 1990 – 2015 (it’s cool, but bring Dramamine). The raw data we generate is reproducible.

Audio Analysis

The same principle applies to audio analysis, which measures voice stress reliably, as well as comparing the frequency, tremors and other ticks against things like the Toronto Emotional Speech Set or the Berlin Database of Emotional Speech, and quite a few others. From there, we tailor the models, building on top of the core data. Ditto for video. You get the point.

Taking all the above, for example, our current system generated a composite of the Trump / Putin discussion. It described Trump and Putin as very positive. This was of course challenging as the text analysis was off the translator for Putin’s comments. The text emotion reflected “Joy” and “Agreeable” for both.

But, this provides an analysis of the words, not the person.

The audio analysis, which is important, told a different story. It characterized Trump as being moderately positive, but low energy. Putin was characterized at the midline: neither high nor low energy, neither positive nor negative. Put another way, the robot said Trump was upbeat in tone, happy, but lower energy. The same robot said Putin was a cipher. Neutral across the board.

Something Old, Something New

That brings us current. We’ve been meaning to test an expanded voice analysis tool. The company, BeyondVerbal, had built their analysis off of more than 60,000 samples, far larger than the others. Analysis tools such as these are the embodiment of “more is more.” The much larger sample set lets the analysis be much more finely sliced. So we took it out for a spin.

We used the below for Trump…

…and Putin.

Because the tool specifically measures voice frequencies, the camera noise should not impact it. That being said, we tested it anyway after removing the camera noise…

…and found the analysis, with 1-2%, to be nearly identical. Feel free to test on your own to validate.

Our findings are below in the table. The data indicated a more restrained, less confident Trump, while Putin appeared to be in tight control of his voice and more confident of his position. The data, combined with several dozen other tests, also proved to be an improvement on our audio analysis. So we’ll be integrating it into our composite in the coming days.

Is Trump Going Senile? (Beta)

No, you didn’t catch us opinion-ating.

We’re getting ready to debut a new daily feature that will try to use data to validate assertions, or to uncover insights that would be impossible without the data collection at Factba.se

It’s not ready to go just yet, and it’s not as pretty as we want it. But the data’s more important than the prettiness.

To that end, please see our first try at this, and let us know what you think. This was based on an article in May in Stat that asserted Trump was potentially going through cognitive decline. The term “senile” was latched on to by the press (and thus this infographic), but it was not used in the article, just the comments. Much of the data cited was due to speaking style and vocabulary.

Well, said we. We have a definitive record spanning 37 years. Let’s take a look.

We focused on two areas: the Flesch-Kincaid Reading Level, that basically scores word complexity, and rate of speech. It’s worth noting that while the average American is between 135-160 words per minute speaking, the average New Yorker is close to 200 words per minute.

We are not offering opinions as to why, but what we can say definitively:

Overall, his rate of speech has dropped consistently;
This has coincided with a decrease in his unscripted public statements (e.g. interviews) and an increase in his scripted public appearances (speeches, remarks);
There is a statistically significant difference in his rate of speech when looking at the type of appearance. Interviews and debates, he speaks much faster. Remarks and speeches, much slower (almost half speed)
The complexity of words used in speeches is almost double the grade level of those used in testimony and debates, which are less likely to be scripted.

Statistically, the press conference is a bit of an outlier, though Jennifer (yes, Jennifer) pointed out that he often begins these with a prepared statement, which is included in the vocabulary and the rate of speech, which may skew the results.

That said, let us know what you think.

[Correction 7/1/17: Note the original infographic incorrectly compressed the X axis in terms of years. This does not change or alter the data, but does affect the two timeline charts in the appearance of the data. It has been updated.]

Sigh… Thanks for the Weekend

Life’s little pleasures on a Father’s Day weekend:

Cleaning part of the house… just… right.
Watching a Pixar flick with your kids.
Dunkin’ Donuts without any guilt
Getting a 98-page PDF, in tabular format, dropped on your lap at 5:30 pm on a Friday with a year’s worth of financial data. (PDF Here)

Well, I guess we signed up for this.

We worked through this at the beginning of the year. We luckily had three things going for us:

Semi-consistent numbering on the OGE 278e financial forms
Two previous years of clean data in which to compare against the new one
A few handy PDF extraction tools that, while far from perfect, are pretty good and pulling the data out in non-crappy format.

So, that said, still about 10 hours. But the bright side of being hands on is… you learn a lot. For example:

Ownership. Basically, anything previously with an owner of “Donald J. Trump” is now shifted to one of the following:
- DJT Holdings LLC
- DJT Holdings Managing Member LLC
- DTTM Operations LLC
- DTTM Operations Managing Member LLC
- … or the Donald J Trump Revocable Trust
It’s worth noting that the four LLCs mentioned above are all owned by the Donald J Trump Revocable Trust
As part of moving around assets, a checking and savings account in excess of $50,000,000 was opened at Capital One on April 12, 2017 for the Donald J. Trump Revocable Trust
A bookkeeping thing. The companies listed in his resignation letter from January 19 and the list of resignations in the OGE 278e don’t match or line up neatly. Someone should poke around just to make sure I’s dotted, t’s crossed.

Everything is integrated into our Assets page (https://factba.se/topic/assets).

In addition, we put everything in two Spreadsheets, because nobody should deal with PDFs. We feel strongly about that (with apologies to Adobe).

The OGE 278e Financial Disclosure from June 14, 2017 is completed converted to a spreadsheet here: https://goo.gl/4jL9Bo
It’s embedded below, but save yourself the headache and go straight to the sheet.
We put his income, liabilities and portfolio side by side for 2015, 2016, and 2017 in a spreadsheet here: https://goo.gl/MdbrhC

Everything above is a Creative Commons 3.0 Attribution. Put it to good use and crunch away.

https://docs.google.com/spreadsheets/d/1ESZwVWN2yUjkeGe__u0AK3TwSOc2KojhFH9CAFoSzZo/pubhtml?widget=true&headers=false

Feed Me (Transcripts), Seymour…

If there’s one thing about statistical models that’s generally true: they need to be fed.

For about six months now, I’ve been living most waking moments in the words of Donald Trump. I love algorithms, but I check them. And check them again. And again. It’s not even borderline compulsive. We blew past borderline around January. It is compulsive.

Probably the single biggest challenge I face in shaping the models: access to raw materials. We check, and check again, every word. Yes, he was on Oprah in 1988, but we need more than 3:11… we need the whole show for context.

We are constantly updating our backlog of material, with volunteers generously sending in links (I’m looking at you CJ in particular), text, videos that in turn need to be checked and then fed into Margaret, our pseudo-AI that ravenously consumes every word spoken, analyzing the audio, video and text to build her model. This in turn analyzes tweets, transcribes better, and does lots of other cool things.

The single best source of this information are interviews. As opposed to speeches, they are generally unscripted. As opposed to tweets, you get more than 19.6 words at a time (1 year moving average, 3,203 tweets, 62,871 words). Sometime, I’ll have enough time to do a separate post explaining how different the models view speeches vs. interviews… it’s almost two different people in the output.

However, as Chris Cillizza at CNN pointed out in a recent tweet, these are often unshared, even after the news cycle. Some organizations publish transcripts simultaneously. Most publish just excerpts, noting they’ve been edited. Some share audio and video, but with cuts and jumps. Others… nothing.

RESOLVED: All news outlets that do Trump interviews need to release the full transcript of the interview. It's 2017, man!

— Chris Cillizza (@CillizzaCNN) May 11, 2017

I’m not naming names, but given that the messages coming from The White House can at times appear to contradict each other, this raw material is crucial, both for the historical record, and for building a base of research that others can analyze.

Also, the full, unedited interview can remove potential questions as to whether comments are in context. Personally, I think that is in nearly every case a ridiculous argument, but the argument can’t be made if there’s no edits.

So I’m making both a public plea, and an offer: please, in the name of all that is good in the world, once you’ve run your stories and pieces, please publish and share the raw materials. Pull any off-the-record comments, but otherwise, share the raw audio, video and text.

Since everyone has a few things to do nowadays, here’s what we’ll offer for any interview with the President, if time or resources constrains a full transcript or sharing raw video and/or audio.

Factba.se will happily, and freely, transcribe in full any video or audio provided, both via Margaret, and with a human editor to verify.
Factba.se will provide, via a spreadsheet or any other medium, ALL metadata developed. This is the stuff that is behind the scenes (not for long) on our site. If audio, you’ll get back second-by-second audio analysis of voice stress and emotion, which is keyed to Trump (the sotto voce whisper). If video, it will include facial expressions, smile/frown, gestures and other analysis (clothing identification, colors, smile / frown, the two-handed punctuation I myself have as a third-generation bridge-and-tunnel child, etc). It is even learning to pick up when he flushes (complexion change). It will be a lot. But it will be everything.
We will provide the full keyword and entity extraction, by three-sentence pair, section and overall, both for the entire interview, and specifically on just when Trump is speaking.
We will provide the full-range of analysis. Grade-level models, sentiment, emotion… all of it.
We will respect any and all embargoes given. We are not meant to be a news organization. If you’d like us to hold until a day, two days, three days after the stories run before integrating and sharing the information, fine. You’re the boss. It’s your interview. You get it back first and control the story.
If a human is in the mix editing, figure two hours per hour of video/audio for transcript. If you don’t mind raw from Margaret (she’s close to 95% dead on now), 90 seconds per hour. We just need a little notice to plan our day to be ready for it if you want a quick turnaround.
We will, of course, link out to your pieces from the text.
If there are any other requests… fine. Our interest is the record, and sharing the resulting analysis.

We’re not looking to create a hippie commune. We are looking, however, to unleash the data that is contained in your excellent work, in a way that does not conflict with your job.

Also, on the off chance Margaret becomes sentient again, you’ll be in her good graces.

— Bill Frischling

When Occam’s Razor Cuts You

The past couple of weeks have had a fun on-again, off-again fixing a “legacy” problem, if that term can be applied to a four-month old site: IDing when Trump himself uses @realdonaldtrump, vs his staff.

Trumpologists have known for a while that he used a Samsung Galaxy S3, aka an Android phone. His staff all used iPhones. Further, multiple analyses of the language style, time of day, etc. (nerd out here and here) validated the connection between the Android and tweets from the thumbs of Trump.

But things change. His Android phone was last seen the morning of March 8th:

LinkedIn Workforce Report: January and February were the strongest consecutive months for hiring since August and September 2015

— Donald J. Trump (@realDonaldTrump) March 8, 2017

Then… nothing. Android showed up briefly for two tweets on March 25th within four minutes of each other (I’m picturing a brief wrestling match with the Secret Service as he pulled the phone from its hiding spot in the limo on the way to Trump National in Potomac Falls, VA… the two tweets were during his ride over, 20 minutes before arrival… maybe while on the Toll Road?) and disappeared again.

ObamaCare will explode and we will all get together and piece together a great healthcare plan for THE PEOPLE. Do not worry!

— Donald J. Trump (@realDonaldTrump) March 25, 2017

Watch @JudgeJeanine on @FoxNews tonight at 9:00 P.M.

— Donald J. Trump (@realDonaldTrump) March 25, 2017

Meanwhile, there were 139 other tweets, mostly from the iPhone. Including lots of FAKE NEWS references and other tweets that are universally agreed to have come from him.

So, here in the Fact Cave, we’ve had an algorithm for months that looked at his text. The trouble: it essentially biased the heck out of Android as an indicator. When the Android disappeared, the algo rolled over, showed its belly and promptly failed miserably. Back to the drawing board.

So we got to work. And iterated, tested, iterated and iterated some more.

Meanwhile, Andrew McGill at The Atlantic remembered the golden rule we forgot: Always. Be. Shipping. His excellent code can be found here.

We agreed with most of his approach, but now properly motivated, we blew the dust off perfection, hunkered down and reclassified all the tweets from March 8th forward?

The logic? We kept it simple:

Control. In 2016, removing retweets, the Android phone tweeted 1,357 times. Other devices tweeted 2,264 times. For our purposes, this was treated as gospel, where Android = Trump, not Android = staff
Words. Trump is very distinctive. We generated a deep count of commonly used words on both accounts. Simple word frequency
Hashtags. Trump almost never uses hashtags. His staff uses them frequently. Appearance of hashtags biases heavily to Staff
URLs and Photos. Outside of retweets, Trump used either a URL or photo a total of 10 times out of 1,357 tweets. Another heavy bias
Others. What we tested and ignored: sentiment (Trump tweets bias negative, but not consistently enough to be a factor), user mentions, use of capitalization, use of exclamation points. None of these were as clear as hashtags and URLs.
New platforms. Twitter Ads and Media Studio – both social platforms / products unlikely to be in Trump’s hands on his phone, are an automatic staff.

Then we guessed…

… well, not really. We took those three factors (for each we did a log odds ratio) and threw it into a test that automatically adjusted the weighting and scores for each of the factors and compared it against a random sample of the control (1,000 items) until it settled on the best outcome.

The best result we could get without relying on the device as a major indicator? The robot can correctly identify Trump tweets 91% of the time, and staff tweets 85% of the time.

Perfect? No. Better than nothing? Yes.

We’re going to keep working on it and getting that number up. And we’ll be faster about it next time.

Factba.se 2.0. Now with 0.2 More!

Okay, been focusing on quite a few things, but we just pushed out a fairly large update, and we’ve got some news as well. But first, the updates from today and the past two weeks:

Full Access to transcripts. We’ve been asked about this repeatedly. Now, you can browser through everything Trump has said that we have in the system in a handy timeline here: https://factba.se/transcripts. In addition, it surfaces some of the behind-the-scenes analytics, like emotion analysis, sentiment analysis, keywords, entities and more. Just click on an item to see a detailed breakout. (for example: https://factba.se/transcript/donald-trump-remarks-greek-ceo-march-24-2017).
White House Schedule. A simple little doodad. It lists the President’s Schedule (public schedule), broken out as appointments. As analysis comes in, it is linked into the schedule. It’s also available in JSON, CSV, and of course iCal format, as well as in a public Google calendar. https://factba.se/topic/calendar
iOS App. This grew out of the consolidated White House feed we did, so everyone can monitor all the White House’s social feeds, website and email list to the press in one spot. We were asked for realtime alerts. Then we thought about an app. Then I said “Hey, how hard can it be to learn to code an iOS app?” Seven days laters, with about four hours of sleep total and three cases of Diet Coke, the keyword-friendly-named “Trump White House Consolidated News Release Feed” app was born. A whopping $0.99, which after using for a year, means we lose money on the push alert costs. But it needed to be done. http://apple.co/2nEVN7Y

Whew, that’s quite a bit for an update. One more piece of news…

Open Data Access. After a fair amount of discussion, we’ve decided to pursue freely distributing the entire Trump dataset via APIs. This will provide data access to:

Complete Transcript Library (3MM+ words) + Meta Data
The live Trump Twitter Archive
The complete screenshot library of his @realdonaldtrump feed
Financial records in data form and mapped to company holdings
H1B Filings
Court Records

We’ve already started doing that with our live feeds and calendars. Anyone who wants to data mine or come up with new ways of using the data will be free to do so.

We need to get the infrastructure in place, and that may take a couple of weeks, but we’ll have managed public APIs that let you get some, or all, of the data for public use, on the condition the work product is shared publicly as well.

The live White House feed is available freely now as:

JSON
RSS

The President’s Schedule is available similarly as:

That’s enough of an update for today. Onward.

Factba.se v1.8 – We’re almost to v2

We’ll get to the v2 to release. But first, a couple of notes:

1. We’ve been a bit overwhelmed by the requests to assist — voluntarily — with the site and information collection. If we’ve been slow to follow up, see #3 below.

2. We pushed live an internal tool that we think would be useful. One of our unexpected challenges was the lack of a centralized place to gather new information. Update can appear on Twitter, Facebook, Youtube, the White House site. To that end, we centralized a realtime feed that pulls together:

Whitehouse.gov
Facebook (DonaldTrump, POTUS, Whitehouse)
Instagram (Whitehouse)
YouTube (WhiteHouse)
Twitter (realDonaldTrump, WhiteHouse, POTUS, VP, Mike_Pence, SeanSpicer)
The White House press distribution list (immediate release only)

…and puts it here: https://factba.se/topic/latest . This is the same feed we use to monitor and add all new public statements. The social feeds update realtime. The White House and email are every 60 seconds. You can also plug it in to RSS, hit the JSON directly, or follow it live on the robotic @FactbaseFeed

If you see a source missing, just let us know. As near as we can tell, it’s the only source that monitors everything coming out of the WHPO from all sources.

3. A bit of personal news. Since the election, Factba.se has become an increasing focus of my life in particular. To that end, I left my day job last week as Vice President / Entrepreneur-in-Residence at U.S. News and World Report to dedicate more time and focus on the platform and content. Based on the traffic, it’s getting regular, repeat use in newsrooms. We hope to become only more valuable as time goes on. This includes getting back to the folks in regard to what we need (mostly: tracking down video for documents).

And if you know a good project manager who could take some time to yell at me daily to stop obsessing on minor details, please send them my way… or just randomly call and yell at me. The 120 Jira tickets aren’t going down fast enough :-).

Onward.

@BillFrisch

v1.7 Data Is Easy. Messy Data is Hard

“Hey, I wonder if the President ever filed for H-1Bs?” sayeth I this morning. How hard can that be to track down.

Tracking down? Not so bad.

Gathering records that were originally OCR’ed from fax? (2001-2006)? A helpful Department of Labor uploading neat data… in Excel… 500,000 rows at a time?

So, a simple data mining exercise ended up being about six hours of glorious frustration dealing used hermes briefcase calabasas ca
with 200+MB XLSX to free the data into a database. Oh, and please change the field hermes belts names every two years. And rearrange the columns while you’re at it.

So, at 4:50 am ET, the data is free, in a searchable database… all 6.1MM records since the start of the millennium (h/t to FLC for archiving the older records pre-2008) and all 564 companies checked, including for rough typos, then hand checked. And, the answer, it turns out is 162 H-1Bs filed for 288 open positions.

Note we included Eric Trump’s winery, since it appears President Trump was involved in the organization. Also birkin bag hermes included are some hotels that license President’s Trump name, as standard licenses from the Trump Organization involve upholding standards 2015 hermes scarf catalogue monoprix en
set by the organization.

Expect a flurry of releases in the coming hermes bags days. The last hermes h belt week was spent on the back end refining the composite model for better hermes handbags
transcription. Also, we’re archiving and internalizing video as a backup, as videos of Trump campaign speeches have been disappearing from Youtube.

Also, as a side note, we’d love more information poloponynetwork.com on the property at 2265 Aragon St, Sebring, Florida. It’s a 0.25 acre parcel of undeveloped swampland, owned by President Trump since July, 2005. It doesn’t appear on either of his Form 278e financial disclosure forms, but taxes are current as of November, 2016.

Within the next seven days, we should be back on track with speeches and statements automatically appearing on the site. We wanted to make sure that whatever we did, we wouldn’t run into video links dying on us again.

Until then, the salary ranges and positions for the H-1Bs make for an interesting read.

v1.65 Inauguration Analysis

Lots of analysis 2015 hermes scarf catalogue hermes handbags monoprix en
about hermes outlet
the inaugural used hermes briefcase calabasas ca
hermes handbags
address. How about a mathematical hermes h bracelet one (semantic, emotion, top keywords).

https://factba.se/topics/inauguration

Quick and dirty. Back poloponynetwork.com cheap hermes belt to vacation. Visuals later.

v1.6 – This is Vacation?

Yeesh, okay. Got it in just under the wire before inauguration. Quick note but the basics:

Added about 50 more hours of interviews, including interviews from the past two weeks, press conferences, 18 new stump speeches, and three paid speaking transcripts, including Australia.
Up to date as of the Lincoln poloponynetwork.com Memorial concerts on 19 January
Margaret is better at transcribing. You’ll see this in a lot of the new uploads.
We think we have the feed set for the press office transcripts and videos… we’ll know in nine hours
Smarter search used hermes briefcase calabasas ca
– now searches for things like “deleted tweets” surface hermes birkin bag the right topic pages.

Thanks for your patience. Back from vacation Sunday, and full hermes belts replica steam on 1.7.