Our robot’s audio analysis of Trump from his two minutes on camera with Putin: “Disappointed, feeling of missed opportunity. Cold and remote. Pompous, overexcited, empowered.”
Our robot’s audio analysis of Putin from his two minutes on camera with Trump: “Feels in control. Controlled speaking.”
Questions? I hope so. Explanation below. It’s worth it. If not, skip to the Infographic,
We’re constantly bringing new processes, techniques and tools online at FactSquared. We’ve been using machine learning for analyzing audio, video and text since we launched (all the way back in… January!)
But we’re also very agnostic about tools. We don’t have all the answers, and we watch this space closely for new developments. When something new comes along, we try it. If it adds value, we integrate into our composite.
— Homer Simpson
The Simpsons, S05E11
We were going through a round of testing on a new approach right when Donald Trump met with Vladimir Putin in Hamburg, Germany on Friday, July 7, 2017.
In a peanut-butter-meets-chocolate moment, we said: “let’s try this out!”
In keeping with the past blog posts, you get some background and details. It’s like Neil deGrasse Tyson, but not as funny. Or smart. Or handsome. Or charismatic…
…back from the therapist. All better. Picking it back up…
A huge part of what we do, separate from pulling all this data together, is the analysis. Most of it is behind the scenes because it’s a lot of data. 115 datapoints per word. Or, in the average 10-minute speech (1,132 words, at current 30-day moving average of Trump’s speeches and remarks of 113.2 words per minute), 130,180 datapoints. You do not want all that on a page.
It all feeds our search engine to make the results hyper-accurate, but the goal has always been a way to surface the information that doesn’t overwhelm. You’ll start seeing some of it in the next few days as we get charts and dashboards on the search, and in our daily newsletter (yes, it’s coming).
Part of all this is text analytics of course. Using established approaches and methodologies, it analyzes the words and groups of words to score how positive or negative a statement is, what emotions it conveys, the topics of conversation, and so on.
For example, when we analyze word usage to determine odd turns of phrase, or how “normal” a statement is in terms of language, it utilizes the Corpus of Contemporary American English, a statistical compilation of 520 million words across books, newspapers, magazines, books, spoken words from 1990 – 2015 (it’s cool, but bring Dramamine). The raw data we generate is reproducible.
The same principle applies to audio analysis, which measures voice stress reliably, as well as comparing the frequency, tremors and other ticks against things like the Toronto Emotional Speech Set or the Berlin Database of Emotional Speech, and quite a few others. From there, we tailor the models, building on top of the core data. Ditto for video. You get the point.
Taking all the above, for example, our current system generated a composite of the Trump / Putin discussion. It described Trump and Putin as very positive. This was of course challenging as the text analysis was off the translator for Putin’s comments. The text emotion reflected “Joy” and “Agreeable” for both.
But, this provides an analysis of the words, not the person.
The audio analysis, which is important, told a different story. It characterized Trump as being moderately positive, but low energy. Putin was characterized at the midline: neither high nor low energy, neither positive nor negative. Put another way, the robot said Trump was upbeat in tone, happy, but lower energy. The same robot said Putin was a cipher. Neutral across the board.
Something Old, Something New
That brings us current. We’ve been meaning to test an expanded voice analysis tool. The company, BeyondVerbal, had built their analysis off of more than 60,000 samples, far larger than the others. Analysis tools such as these are the embodiment of “more is more.” The much larger sample set lets the analysis be much more finely sliced. So we took it out for a spin.
We used the below for Trump…
Because the tool specifically measures voice frequencies, the camera noise should not impact it. That being said, we tested it anyway after removing the camera noise…
…and found the analysis, with 1-2%, to be nearly identical. Feel free to test on your own to validate.
Our findings are below in the table. The data indicated a more restrained, less confident Trump, while Putin appeared to be in tight control of his voice and more confident of his position. The data, combined with several dozen other tests, also proved to be an improvement on our audio analysis. So we’ll be integrating it into our composite in the coming days.