Data, and nothing but the data. Want opinions on tech, open data, and data science methodology? Cool. College sports? Simple. An acknowledged bias against #umich. Political opinions? Go literally anywhere else for that. Not us.
Donald Trump’s grade level, on Day 986 of his Presidency, declined slightly from the first two analyses, dropping from a grade level of 4.6 to 4.5. In all three analyses, he consistently has the lowest score of all U.S. Presidents since Herbert Hoover, regardless of methodology or sample size.
So a little known fact: companies need to make money. It requires PowerPoints, more PowerPoints, presentations, sales calls, and all sorts of things that are not why data nerds started data nerding. It’s been busy in the Fact Cave. We track every word spoken by Donald Trump… but we’re also three weeks behind fully launching Factba.ses for all Presidential candidates… all 47,141 of them (47,141 is a rough estimate, plus or minus 47,129).
On the side, we transcribe and analyze every word spoken in the U.S. Congress: all the committees, floor speeches, the whole shebang, for our clients, whom we worship and thank daily (דַּיֵּנוּ).
Because that just seems like we’re slacking, we also do the same on 3,000+ earnings calls every quarter for a whole different set of clients (ever watch Billions…? Like that, except non-fictional firms. Unfortunately, also less Taylor Mason, the current reigning übernerd). So our AI Margaret’s been busy. Her meatspace human staff are busy too.
Of course, in the thick of all this business-ing… Donald Trump pulls out “Stable Genius.” In a press conference. In the middle of all this. And the requests start coming in again for us to re-run the analysis of his grade level.
So, okay, fine. We have a responsibility. But that doesn’t mean we can’t whine about it. Thought we’d make it easy for you?
Previously, on “Stable Genius”…
So Donald Trump first pulled out “stable genius” on January 6th, 2018 @ 7:30 am on Twitter…
….to President of the United States (on my first try). I think that would qualify as not smart, but genius….and a very stable genius at that!
It got a lot of press and interrupted a perfectly calm weekend in the Fact Cave, where we’d fallen asleep two hours earlier. It led to cursing (7:30 am on a Saturday?) and a thought experiment in trying to measure intelligence using the words in our database. We’re not going over that again. The blog post is linked here and goes through the replay, including the methodology. Have a question? It’s answered in there.
The long and the short is: you can’t measure IQ, but you can, as a crude proxy, measure vocabulary use with, among other methods, the Flesch-Kincaid Grade Level. We like this for a number of reasons: it’s public, peer-reviewed, been around since the 1970s and perhaps, most important, was developed for the U.S. Government to measure the readability of written materials to ensure it could be understood. It’s the same test the government still uses to this day to make sure text can be readily understood. So we measure using the U.S. government’s own scoring mechanism.
Because things like speeches are scripted, and may more accurately be a reflection of a speechwriter’s vocabulary, we narrow the sample to just interviews, press conferences, Q&As… settings where someone is least likely to be on-script. This gets much closer to a person’s true vocabulary.
Since data without a cohort comparison is without context, we compare Donald Trump to his cohort… other Presidents in the 20th and 21st centuries.
Donald Trump has used the phrase publicly seven times since that first tweet: in three other tweets (one deleted), in three press conferences and in one Q&A session after delivering remarks. But it got noticed again quite visibly when it was raised during a press conference with Sauli Niinistö, President of Finland (and man most likely to annoy our spelling bot), on October 2, 2019, where it was put in the context of “those that think” Trump is a stable genius.
There are those that think I’m a very stable genius. Okay? I watch my words very, very closely. — Donald Trump, 10/2/19
Given some of the D.C. hoopla, it kicked off the aforementioned round of “Hey Factba.se, can Margaret re-run the data?” [Disclosure: no one outside the Fact Cave anthropomorphizes our AI by calling her Margaret. We do, because, well, because we’re scared of her coming sentience so we’re very nice to her. We respect our AI overlords. There’s a new Terminator movie coming out after all. She’s gonna get ideas if Trump mentions Arnold Schwarzenegger ever again.]
Previously, we ran it on Day 354 of his Presidency (January 8, 2018) and Day 683 (December 3, 2018). In the first case, we chose the first 30,000 words of everyone’s Presidency, so each President had the same amount, though statistically, it didn’t make a difference. On Day 683 and in this analysis on Day 986 (October 3, 2019), we use all non-scripted remarks — press conferences, interviews, gaggles and so forth, from inauguration up through that day of their respective presidencies (see disclaimer on Gerald Ford, who only made it 895 days).
This yielded a range of 45,244 words for Richard Nixon, up through 859,711 words for George H. W. Bush (if you saw the man’s schedule, you’d know why there’s so many words). Trump had 847,334 words in the analysis.
Statistically, we didn’t expect much of a change, and we weren’t disappointed. Some numbers moved around, but by-and-large, not much movement. We were surprised, however, to see a slight shift downward in Donald Trump’s score. But the data is the data.
Full data tables and a pretty chart below.
Flesch-Kincaid Grade Level Scores
Compares scores run on Day 354 (January 8, 2018), Day 683 (December 3, 2018) and Day 986 (October 3, 2019) for each U.S. President in the cohort.
George W. Bush
George H. W. Bush
Lyndon B. Johnson
John F. Kennedy
Dwight D. Eisenhower
Harry S. Truman
Franklin D. Roosevelt
Different scoring mechanisms, counts and statistics for the Day 986 analysis on October 3, 2019. Notes with links:
Support of Whistleblowers. Donald Trump has signed an executive order to protect whistleblowers, signed legislation to protect whistleblowers and has praised the concept of whistleblowing six times during his campaign and administration.
15 Press Releases. Trump’s White House has issued 15 press releases citing his support of whistleblowing as one of the achievements of his administration
FBI Informant. He also offered to be an FBI informant in 1980, per a 1981 FBI memo.
Change in Tone. In the past month, he has referenced, tweeted or retweeted about whistleblowing 45 times as of 10/3/19 @ 2 pm. The 45 references were not supportive.
Just a quick blog post… more of a compendium. Donald Trump has spoken about whistleblowers in the past… and about turning people in to the authorities for wrongdoing. It’s recently that his expressed opinions of whistleblowers have changed.
And we’re going to protect those who protect. And we’re going to protect the people that are protecting us. — Donald Trump, 4/27/17
In addition, on October 26, 2017, he signed S.585, the “Dr. Chris Kirkpatrick Whistleblower Protection Act of 2017.”
He brought up whistleblowing in a speech at the United Nations on September 18, 2017:
We seek a United Nations that regains the trust of the people around the world. In order to achieve this, the United Nations must hold every level of management accountable, protect whistle-blowers and focus on results rather than on process. —Donald Trump, 9/18/17
The concept of whistleblowing was important enough to be cited 15 different times in White House press releases from July 27, 2017 – February 5, 2019, including the 2018 and 2019 White House list of presidential achievements released around the 2018 and 2019 State of the Union addresses. The 15 releases are listed below.
The concept, not the phrase, came up at least three times during his campaign, once on an interview on Morning Joe, and twice in speeches, as well as an early speech from his administration on February 8, 2017. They were in reference to “turning in” potential terrorists or gang members.
The federal government can never be that precise. But you’re in the neighborhoods — you know the bad ones, you know the good ones. I want you to turn in the bad ones. Call Secretary Kelly’s representatives and we’ll get them out of our country and bring them back where they came from, and we’ll do it fast. — Donald Trump, 2/8/17
Though note a quote, there is an FBI memo from 1981 where Trump offered to be an FBI informant. Trump was looking to build in Atlantic City and offered, according to a September 22, 1981 FBI memo, to serve as an FBI informant and station FBI agents in his casino. The full PDF of the memo is available online.
You know the drill. Headline first. Then a march through the methodology. If you want single-serving factoids, go to, well, pretty much anywhere else on the Internet.
Donald Trump arrives in the office almost an hour later than any President since 1933. His average arrival time is 10:42 am for his Presidency, and 11:18 a.m. for 2018. He spends the least amount of time in the office as well, though under 8 hours of office time is a trait shared with 5 of the past 13 presidents.
This is based on an analysis of daily presidential calendars from March 4, 1933 through December 31, 2018, and covered 83,229 pages of records.
So to answer the several dozen questions we’ve received on this topic: you’re not imagining it. But, like all things in life, it’s not so cut and dry.
Hang onto your hopes my friend.
So this question has been asked a few times. But who on earth has time to rummage through a dozen presidential libraries? Apparently, we do.
At the Fact Cave, one of our more popular features is Donald Trump’s daily schedule. We’ve beefed it up of late. You can find monthly stats embedded, miles traveled, record counts, time on camera. Look for the plus sign in the Month bar.
But no good deed goes unpunished. We’ve been asked whether it appeared that Donald Trump was coming to the office later in the day, particularly after his busy election season. We did a quick check. But it’s the holiday season in D.C, which like August, tends to be a “what, me work?” kind of month, even for Presidents, and certainly in D.C., whether or not it’s shut down.
But the question started taking a life of its own on December 13th…
Trump is not in the West Wing yet this morning, NBC's Hans Nichols just reported on @MSNBC. The Marine guard is not posted out there.
The volume of questions picked up. So we started digging in and, lo and behold, it looked like there were some interesting patterns in Trump’s schedule. But without a cohort, there’s no way to tell if it was a normal pattern. Which leads to our favorite word: cohort.
It’s bound to be a better ride.
So, much like how Factba.se started, we sometimes jump first, then find out if there’s water in the pool. So, really, how many pages could a Presidential schedule be?
83,229 pages since Roosevelt, in a dozen different presidential libraries. 67,173 of those pages scans of typewritten logs. 13,905 of them handwritten (in fairness to LBJ, some were typed… but not a lot).
Sure. We were so glad we decided to try this. Around the holidays. Shoot me.
Luckily, it’s good to be friends with an AI that has read about 1.2 million pages of government PDFs in her life. Especially when you helped program her. And pay her AWS bills. So really, other than some truly awful scans, we could pull the vast majority of the data.
…Drinking my vodka…
Seriously? No More Simon & Garfunkel Quotes
You want the blog post, or do you just want to complain about my wandering non sequiturs? Actually, I was thinking of The Bangles version from 1987 (though they played it live as early as 1983). Though nothing but love to Simon & Garfunkel.
On with it…
We were able to locate the schedule of nearly every President from Roosevelt forward, or enough portions to be statistically valid. Only the George H.W. Bush Library doesn’t have either the Daily Guidance or the Daily Diary available online. We had to take what we could get from the National Archives. The George W. Bush Library has the schedule for 338 of his 2,922 days… not ideal, but enough for a good statistical sample. We’re noticing a familial pattern here on data availability.
Because this was a lot, we were going for a statistically valid sample, not perfection (we made that mistake initially and burned two days… story for another time). So we unleashed Margaret. For 31,348 days since March 4, 1933, we had data on 26,564 of them. We ended up eliminating some days for vacation (no start or end appointments), others because the data was missing, and some if Margaret just said “hey, I’m not sure about this day.” We were working with half-century old documents in some cases.
In the end, we ended up with 20,905 days of schedules. That works out to 77.49% of the days with data, or 66.69% of the whole. I think my statistics professor would say that’s a valid sample.
Not All Schedules Are Created Equal
As with any historical data, you work with what you have. So it’s important to understand the two different types.
The “Daily Guidance” or “Daily Schedule“. This is what we have for Trump, Obama and Clinton, and it appears to be the level of depth for what we have for Truman, Eisenhower and Kennedy in that it’s very high level and doesn’t tend to go into breakfasts, dinners and 3-5 minute meetings. It’s the high-level items. For Clinton/Obama/Trump, this is the schedule published to the members of the media. This will sometimes include appointments that were canceled… not often, but it does happen. It also reflects what was scheduled… not the time things actually occurred.
And perhaps most important. It is intended for public consumption at the time it was issued.
The “Daily Diary” we have for nearly everyone else is the log kept by the Presidential Office of Appointments and Scheduling. It is an exact record of what actually happened. In addition, since it is meant for posterity, and can be redacted for security, it is a far more detailed account of the day and will include things that may not have been in the public schedule. Also, these usually aren’t released for a couple of decades, if at all.
The level of detail can also vary. Roosevelt’s daily schedule includes notes from his stenographer, the White House usher, and his personal appointment diary. Reagan’s includes his personal diary entries as well as the official log.
What does this mean? When looking at the office hours, some entries are more detailed, and thus cover more time. Truman’s diary doesn’t have a lot of records once he “left the office,” so to speak, where Roosevelt’s details who he drank and smoked with at midnight. The chart above groups the hours by who had what type of data available.
More details in the methodology at the bottom.
Okay, What Did You Find?
First, we were able to validate what our readers had emailed and tweeted about. Trump starts his day later in the office than any other President since 1933. That’s definitive, regardless of schedule type. For his presidency, his average first scheduled appointment is 10:42 a.m. This is 47 minutes later than the next closest average start time, which belongs to Harry Truman.
With 2018 wrapping up, his yearly average start time for 2018 is now 11:18 a.m. This is the latest average yearly start time since 1933, and 1 hour, 1 minute later than the runner-up, 1948 (Truman). No other president had a year where their first public appointment started after 11 a.m.
Of interest: for those presidents elected to their first term (FDR, Eisenhower, Kennedy, Nixon, Carter, Reagan, Clinton, George W. Bush… no data for either Bush 41 or Obama), most were consistent in arrival time between years, ranging from 19 minutes earlier between the first two years (FDR) to 32 minutes later (Carter… from 6:38 a.m. average time to 7:10 a.m.). Trump’s first public appointment moved 74 minutes, from 10:04 a.m. in 2017 to 11:18 a.m. in 2018. No one else had a shift that large.
Trump also has the shortest day, in terms of the time between his first and last appointments. His average scheduled time, as published, is 6 hours, 26 minutes. This is less than half the average time logged by George H. W. Bush (see below), Gerald Ford or Jimmy Carter, who had a more detailed schedule. But even within the sub-cohort, it was the least amount of scheduled time, on average, followed closely by Harry Truman.
George H. W. Studmuffin
Yes, Bush 41 stands out, but he comes with a large footnote, and not just for his super-sized scheduling.
The Bush 41 Library has not yet gotten around to publishing the entire Daily Diary to the Intertubes yet. It’s the only Presidential library that doesn’t have at least a partial, if not complete, set of schedules. The National Archives happens to have 41 (get it?) very complete schedules for two days in 1990 and 39 in 1991. It’s not enough to do a monthly/yearly, but it at least gives us an idea of his schedule. [Update 12/31 @ 6:55 pm ET. Thank you for the tip on the Web Archive! Now at 414 days for Bush. ]
And the man, in those 41 days, he started between 5 and 6 am on 30 of those days. He’s got one all-nighter in the set of schedules. I’m fairly certain there was a whole section in there where he caught up with Clint Eastwood, got drunk on rye, then proceeded to beat the crap out of Ivan Drago while Eastwood glowered. And he was still up at 5 the next morning for breakfast and phone calls.
Suffice it to say, without any editorializing, day-um.
But also suffice it to say: we were comfortable including it in the overall average only, given it is 41 days. The schedule has a regularity to it missing from all the other schedules. He was a man of routines. But it is still based on 41 days from the National Archives out of his 1,461 days in office. So take the data with appropriate caution. [Update 12/31 @ 6:55 pm ET: now 414 records, revising average start to 7:50 am and average last item at 10:44 p.m., just shy of 15 hours on average per day.]
What About Twitter?
That’s an absolutely fair point. Trump is known to be awake at all hours tweeting on @realDonald Trump. We’ve even included the distribution data so you can see for yourself.
But here’s the thing: is it actually Trump tweeting? We can work off lots of different types of data. In looking at first/last tweets every day for an indication of schedule, it literally spans 24 hours on @realDonaldTrump. Of 710 days checked, on a midnight->midnight basis, 260 of those days had their first tweet between midnight and 5 a.m. (36.6%). If we arbitrarily assume 5 a.m. is a starting point, then the last tweet of the day occurred between 12-5 a.m. 275 times or 38.7% of the time.
But was it him? Or Dan Scavino? Or someone from the social media team? Or the press office?
We have a bot that uses as reference data tweets from when he was the only one on his team using an Android to tweet, and everyone else used an iPhone. Using thousands of those tweets, our model is able to predict, with 91% accuracy, if he tweeted something, or if his staff tweeted something, based on the language, use of hashtags and images, and so forth (hint: he rarely uses hashtags or images, but his staff uses CAPITAL LETTERS in almost equal proportions when using the account).
It’s not bad, but not perfect. But using that as a reference, our bots say 54% of those midnight-5 a.m. tweets are staff. That’s way too high a number to assume it’s Trump thumbing the tweet.
Take a look at the data for yourself and see if you can make better heads or tails of it. But for the purposes of this exercise, until a more detailed calendar is available, we’re leaving Twitter out of this, because if we use @realDonaldTrump as a guide to the beginning or end of the day, we’d have to ask if Trump has ever slept. Ever.
So What Does This All Mean?
We can pretty safely say he’s not a morning person, at least as far as his official calendar goes. And we no more think Trump leaves the office at 5:18 p.m. with nary a thought about his job than we think three-hour gaps in the middle George H.W. Bush’s schedule meant he was refining his knife-fighting skills.
What we can say, pretty definitively, is his official calendar starts later than any time since 1933 by a wide margin. His last meeting isn’t the earliest… that belongs to Eisenhower (3:47 p.m.). But what we basically did is answer the question: is he getting to the office later? The answer: yes.
Also, we now regret never watching Bush 41 arm wrestle Stallone.
So now, the data, methodology, and notes.
We have the data in a handy Google Sheet . It includes the monthly and yearly data for each President and their schedule, along with the original raw data. It also includes the twitter data referenced above. A summary is below.
Time Zones. All times are adjusted to local time, based on where the President was at that particular time, so first and last meetings are based on where the president was on that day/time vs. Eastern time.
Missing / Incomplete Data.
Days with no public schedule or diary, either due to redaction or vacation where no data was available, were excluded from the calculations.
In cases where the Factba.se platform could not discern a time or date automatically with a high degree of accuracy, the day in question was excluded from the calculation. The percent coverage is noted below.
11:59 pm. For simplification, any day that ended after 11:59 pm was rounded to 11:59 pm in the evening. So if the last item on the schedule was 1:15 am, it was calculated at 11:59 pm for the purposes of the analysis.
Weekends. For Donald Trump, M-F calculations are used, but both M-F and full 7-day averages are available on the Trump 45 Tab. All other Presidents were analyzed using all days, including weekends and holidays.
Presidency: 2,840 days; In Analysis: 2,003 days (70.53%)
Type: Daily Schedule (Public Meetings)
Note: The schedule was partial and filled in based on news reports by the Library. As a result, a number of days have records that just list an evening speech. These were filtered out as they were not part of the official record.
So what is a Federal Government shutdown? Well, for starters, it’s not a shutdown.
It’s the result of the 1974 Congressional Budget and Impoundment Act. Among other things, it required that Congress must authorize funding for the federal government. This is something, in the 44 years since it has passed, it has failed to do on time in 36.4% of the budget cycles, or 16 of the past 44 years . With continuing resolutions in the same cycle, it has happened 22 times.
This did not always mean a shutdown. The first six times this happened, it was business as usual. A series of opinions issued by then-Attorney General Benjamin Civiletti in 1980 and 1981 established that, if a funding gap exists, the agency or agencies impacted must shut down. This, in turn, was his interpretation of the 1884 Antideficiency Act.
The first time any agency was shut down as a result was during the 18-hour shutdown of the FTC on May 1, 1980.
Funding gaps result in shutdowns only when those agencies are without Congressionally-approved funding. Essential services, like the Department of Defense, are largely unaffected.
The longest shutdown: December 22, 2018 – January 25, 2019 at 9 pm ET, under Donald Trump. [Update: Added January 12, 2018]
The longest shutdown: December 16, 1995 – January 6, 1996 under Bill Clinton.
The shortest shutdown: February 9, 2018 for about nine hours, under Donald Trump.
The only two presidents who had an actual partial or full shutdown while the same party controlled the executive and legislative branches? Jimmy Carter and Donald Trump.
U.S. Federal Government Funding Gaps
Dec 22 '18 12:00 AM
Jan 25 '19 9:00 PM
Donald Trump (R)
Feb 9 '18 12:00 AM
Feb 9 '18 9:00 AM
Donald Trump (R)
Jan 20 '18 12:00 AM
Jan 22 '18 11:05 PM
Donald Trump (R)
Oct 1 '13 12:00 AM
Oct 17 '13 12:30 AM
Barack Obama (D)
Dec 16 '95 12:00 AM
Jan 6 '96 12:05 AM
Bill Clinton (D)
Nov 13 '95 12:00 AM
Nov 19 '95 6:40 PM
Bill Clinton (D)
Oct 5 '90 12:00 AM
Oct 10 '90 1:30 AM
George H. W. Bush (R)
Dec 18 '87 12:00 AM
Dec 20 '87 8:00 PM
Ronald Reagan (R)
Oct 16 '86 12:00 AM
Oct 19 '86 1:30 AM
Ronald Reagan (R)
Oct 3 '84 12:00 AM
Oct 5 '84 5:00 PM
Ronald Reagan (R)
Sep 30 '84 12:00 AM
Oct 3 '84 12:00 PM
Ronald Reagan (R)
Nov 10 '83 12:00 AM
Nov 14 '83 12:00 AM
Ronald Reagan (R)
Dec 17 '82 12:00 AM
Dec 21 '82 12:10 AM
Ronald Reagan (R)
Sep 30 '82 12:00 AM
Oct 2 '82 10:00 PM
Ronald Reagan (R)
Nov 20 '81 12:00 AM
Nov 23 '81 6:38 PM
Ronald Reagan (R)
May 1 '80 12:00 AM
May 1 '80 6:00 PM
Jimmy Carter (D)
Sep 30 '79 12:00 AM
Oct 12 '79 8:00 PM
Jimmy Carter (D)
Sep 30 '78 12:00 AM
Oct 18 '78 12:00 AM
Jimmy Carter (D)
Nov 30 '77 12:00 AM
Dec 9 '77 12:00 AM
Jimmy Carter (D)
Oct 31 '77 12:00 AM
Nov 9 '77 12:00 AM
Jimmy Carter (D)
Sep 30 '77 12:00 AM
Oct 13 '77 12:00 AM
Gerald Ford (R)
Sep 30 '76 12:00 AM
Oct 11 '76 11:59 PM
Gerald Ford (R)
Editor’s Note: We incorrectly initially listed the 1995 shutdown as beginning on December 5th, 1995. It began December 16, 1995. We regret this error.
In keeping with tradition, here’s the headline. Then we’re going to explain it before you get to the good stuff. Deal with it.
The Trump Organization derives revenue from licensing the Trump name to place on buildings. Based on analyzing more than 33,000 transactions, the brand results in the properties selling for 18.2% less when compared to similar properties in the same ZIP code. The downward trend has been underway since 2014, prior to Donald Trump’s election.
That’s more of a mouthful than normal. And if you want to cheat, you can skip to the new Factba.se Trump Brand Index, which will track this constantly based on real estate transactions.
And it’s late. So let’s get explainin’…
It’s Been a While Since You Blogged. What Gives?
Building a business gets in the way of doing deep research. The whole “need to eat and send kids to college thing.” It’s Sunday at about 2 a.m. while I’m writing this. And get off my lawn.
This Has Nothing to Do with The Above.
Right, back on target.
We’ve read some articles that reference the perceived change in the value of the Trump brand. There was a piece in The Washington Post earlier this month discussing merchandising. The Wall Street Journal has been taking a look. Even Realtor.com wrote about it. The deepest consistent dive we’ve seen is from CityRealty, which has been trending condos compared to Trump-branded condos in the New York area. If you haven’t seen their Trump Report, it’s well thought out, and they clearly have a better production budget than we do.
But what we’ve seen, with CityRealty as the exception, are qualitative stories. That is, there hasn’t been a comprehensive look, across all Trump-branded properties, to see if he is, in fact, profiting from the Presidency.
This is important for two reasons:
Trump has not placed his assets in a blind trust. He still maintains control of his properties and his companies, even if he has elected not to exercise that control, vs. a blind trust, where he could not exercise that control even if he wanted to. This can lead to issues affecting diplomacy, among other challenges.
The Trump Organization has built a significant, but non-quantifiable, portion of its business in licensing the Trump name. Most new construction involving the Trump brand in the past decade is via licensing the Trump name and/or handling management.
Therefore, the value of the Trump brand isn’t something abstract. The brand is the product. The Trump Organization, which is currently ranked 40th on Crain’s list of the largest private companies in New York, generates revenue from licensing the brand in the hospitality and real estate verticals. The expectation is the brand brings a premium.
However, measuring that performance, and its relative value within a privately held company is difficult, even with the public financial disclosures he must file. These disclosures are not a P&L. They are ranges of values and are not required to indicate a profit or a loss. And there’s no public record of membership in Trump clubs, nor occupancy in Trump-branded hotels.
Great, So It’s Impossible…
Not quite. But pretty damn hard. But at least it was a starting point.
What we can measure is the resale value of units in Trump-branded buildings. This information is fragmented but public. We can also measure the performance of sales of condominiums in the same ZIP code, and further measure the performance of similar condos, that is, condos in the same price band, or quartile, as Trump-branded condos.
So we had a direction, but we didn’t quite have an idea of what we were in for.
First, we had to figure out what was in and what was out. Trump hotels and clubs were clearly out of the evaluation. Trump Bay Street in Jersey City is a rental complex and was similarly removed. The handful of villas at Trump National in California were too small a sample, as were transactions at Trump Park Residences in Westchester.
We made a decision to similarly exclude four buildings on Riverside Drive in Manhattan (120, 200, 220 and 240 respectively) that are currently involved in a lawsuit to remove the Trump brand. It would be difficult to gauge the impact of the Trump brand when it may or may not be removed.
That left 20 buildings branded Trump in New York (9), Florida (7), New Jersey (1), Illinois (1), Hawaii (1) and Connecticut (1) with a total of 5,483 condo units between them all. This range spans 9 different counties and 14 different ZIP codes. So we had our sample. The rest is cake, right?
An Open Data Rant
So, real estate transactions are public records. They should be easy to get at. And I should be sleeping. Enough fantasy.
Fat chance. Eight of the buildings were easy, thanks to the Open Data Connecticut site and, in particular, the completely and thoroughly awesome New York City Department of Finance, which has everything up to date in handy spreadsheets with easy historical data. And, small (but large) miracles: standardized data fields and columns across data back 15 years. If you see Dr. Jacques Jiha, give him a smooch and tell him he’s our hero.
But we’re taking a whole paragraph out to yell at Miami-Dade. They have this nice comparison tool. But they went out of their way to make it next to impossible to pull down any data. Literally anti-bot tech the likes of which Google would be proud of. Their Microsoft server pages were tinfoil-hat-level paranoid. And they limit the data to 2015.
From there, we had to fill in square footage data for about 50,000 units. We won’t bore you with that slog, though it was a combination of our awesome researchers, and teaching Margaret, our AI, how to slog through Google results to fish it out. (Hint: Google address + # + unit + city + sq ft. Pull first results page. Teach bot to find square footage and read addresses. Two solid matches on reserved list of real estate sites = victory. Repeat tens of thousands of times).
The square footage is important. You can’t really compare sales of different sizes, high floor vs. low floor. But you can use a common metric used in the real estate industry: price per square foot. It smoothes out these swings between $200,000 and $2,000,000 condos by breaking down the value into comparable figures.
Almost There. Promise.
We invested probably 200 person-hours in this. Bear with us a bit longer.
So we had the data. We had to filter out bad data, non arm’s-length transactions (zero or below market sales, usually to family members), foreclosures, and obvious outliers, like a 600,000 square foot apartment in Midtown, or miracle 1,000 sq ft apartments in Murray Hill selling for $150,000. I don’t think so.
That ended up with a bit more than 33,000 real estate transactions since 2010.
From there, we sliced four ways:
Data Sample. We compared sales of condos in the same ZIP code to the sale of Trump-branded condos. We also ran the same comparison by quartile.
To do this, we established which quartile each Trump building was in. In Jersey City and Honolulu, it landed in the 3rd quartile, meaning on average, sales of units fell in the 50-75% range. So these are close to the upper tier, but not in the upper tier based on sale price in that ZIP code.
Similarly, Trump Parc and Trump Parc East on Central Park and Trump Palace on East 69th Street landed in the 50-75% range. That may sound wrong, but remember Central Park South shares a ZIP with Billionaire’s Row, where Michael Dell paid $9,200 per square foot recently. Apparently Central Park South is a step below that.
Billionaires. They’re just like us. Except they’re not.
The remainder were all in the 4th quartile… at the top of the chart.
Time. We ran data by month for the full sample and by quartile. There was not enough data to accurately subslice geography by month. We sliced New York State and City by quarter. Every other geo we sliced by year
Geography. See above.
Index. This is what we’ll use going forward. The index selects a start point (January 2010) where the difference in average square foot price between Trump-branded condos and non-Trump condos is established in a ratio. This sets the index at 100, similar to how Zillow, or the U.S. Government, indexes markets.
Then, moving forward, that ratio is tracked moving forward, establishing the difference from the start point. Numbers above 100 are positive. Below 100, negative.
We also cross-checked our percentile bands, means and medians against median sale prices and trends provided by Redfin, Streeteasy, and Zillow, and further checked our trendlines against Zillow’s ZHVI Index index for condos. The lines all checked out. Please, double check our math below, just in case (see downloads).
Enough Already. Get To it.
Okay, okay. You got the headline, so let’s dive in.
First, high-end condo sales are cool across the country. That means, in general, expensive units are taking longer to sell, or selling for less.
The index, however, takes that into account. The question is: are Trump-branded units moving higher, lower, or with the market?
So far, Florida is the shining spot for the Trump brand. On a year-over-year basis, Trump buildings are holding their own against the market. The Trump brand was selling for a 19.8% discount in Florida in 2010. Now, they sell for a 2.1% premium as of 2018. Note in Florida, we don’t have data for the 1,446 units in Sunny Isles Beach (right near where the Rascal House used to be) until 2015, so that may skew the data. Trump Hollywood and Trump Plaza of the Palm Beaches are definitely trending lower than their respective ZIP codes and quartiles, where Sunny Isles Beach is not.
Writer’s note: Sunny Isles Beach will always be North Miami Beach to me (#winstontowers, shuffleboard, the smell of onions frying, רעדן אויף ייִדיש).
Other properties individually show an upward trend or are stable against the index. Trump Palace and Trump Parc East in New York City and Trump Plaza Residences have all increased. Trump Waikiki, on low volume, is up significantly. And Trump Plaza in New Rochelle and Trump World Tower appear stable.
In New York City, the Trump brand is trending below the market at a faster pace. In 2010, the Trump brand carried a 4.5% discount on comparable units. In the first quarter of 2018, it carries a 16.6% percent discount when compared to similar units. This is particularly acute in Trump Tower, where the brand appears to result in a 39.8% discount over its cohort in 2017. There have been no sales recorded in 2018 by New York City to measure against.
The Trump Brand
Overall, the Trump brand is trending down in the United States, as measured by 1,845 transactions in Trump-branded properties since 2010, compared against 31,618 transactions overall in the same ZIP codes. Rather than commanding a premium, paying to license the Trump name for real estate, as of Q1 2018, results in an 18.2% discount when compared to properties in its quartile.
If we compare against all transactions, including lower-end units not associated with the Trump brand, it has gone from commanding a 59.4% premium at the beginning of 2010 to 17.2% premium today.
What is clear from the data is this trend pre-dates by more than a year Trump’s announcement to run for President on June 16, 2015. The trend has continued, and in New York City, in particular, accelerated, since his election. No matter if we sliced the data by quartile or with all ZIPs, or sliced by geography, the same downward trend appears at least a year prior to his candidacy.
It’s important to note what the above means. The downward trend in the Trump brand began well before his entry into presidential politics in the 2016 cycle. There is not an inflection point in his candidacy and rise to the Oval Office. We can’t say if his election helps, hurts, slowed the decline or increased it, other than indicators in New York. The brand was in decline before he began his run.
We wanted to pursue this further outside the United States, but that data is difficult to come by in Panama, the Philippines, India, and even Canada.
To answer the initial question of this thought exercise: the Trump brand is not being served by the Trump presidency in the United States, in the sense that it has not resulted in an uptick in the value of Trump-branded units. There is no sign, outside of Sunny Isles Beach properties, that his being president is increasing the value of the Trump name. The opposite, in fact, is proving to be true. But the opposite predates his run.
Though Trump won 83.6% of the counties in the 2016 election, not a single county with a Trump-branded condominium property voted for him. The closest was in Palm Beach County, which he lost 58-41.
It has nothing to do with anything, but was an interesting datapoint. His brand is marketed and sold, as far as condos, entirely in counties that did not vote for him.
One Last Point
As we always note, we could be wrong. We’re open to different analyses and opinions, but we expect data in the response. The world has enough opinions, so we try to stick to… facts. Facts that can be independently verified and proven.
“Data can be manipulated to tell any story you want,” said Eric Trump, the president’s son and an executive vice president of the organization. “The fact remains, our buildings sell for the highest prices per square foot of any properties in the world,” he said. “That is undeniable.”
So, we’d like to say the following:
One: Our data supports CityRealty’s analysis and expands that analysis out to properties in Westchester, Jersey City, Stamford, Chicago, Florida, and Hawaii. Other than previously noted movement in Florida, the trend is down and has been since 2014.
Three: So we’re all on the same page, we’re open-sourcing all the data used for this analysis, and encourage others (We’re looking at you, Miami-Dade!) to do the same. You’ll find download links on the Factba.se Trump Brand Index page, and at the bottom of this page.
That’s all the raw data, line-by-line, every transaction, including data we excluded for incomplete/incorrect data (in case someone wants to tackle that!). So, that’s 48,000+ line items of data, plus all the resulting output we’re using in this post and in our Brand Index
As a side note, each line item includes a link to the source of the data. So please, don’t take our word for it. Trust, but verify.
Before making a claim, please share the data. Use ours, or provide your own to be verified.
That’s all we got folks. Nothing to see here.
Research for this analysis was done by Cheryl Ley, Cori Lovas, Maite Dizon and Paula Boyland. A thank you to Chad Smolinski and Matt Koll for their guidance on statistics. Note: only I am to blame for the ridiculous writing.
So, as always. First the headline, then you need to eat your vegetables to get the details.
By any metric to measure vocabulary, using more than a half dozen tests with different methodologies, Donald Trump has the most basic, most simplistically constructed, least diverse vocabulary of any President in the last 90 years. This is by a statistically significant margin in each case.
Okay, the headline’s out of the way. On to the vegetables, so you understand why we checked this, and the methodology.
(And with our apologies for the simplistic charts. The Google Sheets plug in is quick and dirty… but the data’s all there for you at the bottom)
Why Are You Blogging on a Sunday Night?
Well, the Golden Globes are on. Also…
I usually try to unplug over the weekend. And by unplug, I mean “catch up on everything I was supposed to do during the week but didn’t because who the hell can get work done during office hours.” You know, by relaxing and stuff.
So the emails that started coming in Saturday morning around 8 a.m. kind of interfered with that plan. I ignored them for all of 20 seconds before seeing what the heck was going on. In general, when something is going on, the emails tend to clump together. The phone wasn’t going to stop vibrating by force of will alone.
Turned out, it was a number of folks asking if I’d seen the “genius” tweet, and if Factba.se had ever run an intelligence test.
Now, when someone emails me at an ungodly hour (and prior to 11 am on a weekend more than qualifies, given my normal bedtime is defined as “Thursday”) to ask about a tweet, I put the darker thoughts out of my mind and did my best not to get upset.
But I was awake. May as well spoil it. The tweet in question (a three-parter, which is more unusual of late since the character limit was upped):
Now that Russian collusion, after one year of intense study, has proven to be a total hoax on the American public, the Democrats and their lapdogs, the Fake News Mainstream Media, are taking out the old Ronald Reagan playbook and screaming mental stability and intelligence…..
….Actually, throughout my life, my two greatest assets have been mental stability and being, like, really smart. Crooked Hillary Clinton also played these cards very hard and, as everyone knows, went down in flames. I went from VERY successful businessman, to top T.V. Star…..
…spanning 11 minutes. (Sorry about that last one… one of my favorite Road Runners).
The quote that seemed to stick out in everyone’s mind was the last one: “I think that would qualify as not smart, but genius….and a very stable genius at that!”
Okay, I was awake.
Apparently, the intellectual exercise would be to parse the phrase “genius” and could it be proven, or disproven.
Into the Den of Snopes
Measuring intelligence is normally done through a simple method with no agreed upon standard: an IQ test, a loosely-defined standardized test, variations of which have been in use for more than a century. The most common one in modern use is the the Wechsler Adult Intelligence Scale (WAIS) v4, in use since 2008.
However, there is no peer-reviewed method to look at writing / speeches / etc to assess intelligence. The closest is a 2006 study, which used a historiometric method.
Suffice it to say, that method is fine, but it takes a doctorate and an expert. We don’t have presidential scholars at Factba.se. We’re a bunch of data schmoos. Also, this particular study was ripped off and faked enough in the past 15 years that it has multiple snopes pages (here, here, and here) and it rates its own Wikipedia page. Again, the study is fine. Making stuff up around it isn’t.
However, the ability to measure the complexity of vocabulary, the diversity and its comprehension level is something we do all the time here in the Fact Cave, courtesy of Margaret, our platform’s AI. In fact, it’s done every time we add a word into the platform, automagically. The most common metric, the Flesch-Kincaid Grade Level, was actually developed for the military in the 1970s as a way to check that training materials were appropriate and could be understood by its personnel. It is used as a measurement in legislation to ensure documents such as insurance policies can be understood.
There are a number of competing algorithms. They use different approaches, but all try to do one of two things:
Grade Level. Establish the grade level at which the text could be understood
Reading Ease. Essentially the same thing, but with a normalized statistical score vs. a U.S.-centric grade level.
At Factba.se, Margaret runs every single bit of text automatically through the following algorithms:
… and about a dozen others, including difficult word count, etc. We’re also testing the Lexile Framework.
As a side benefit, recreationally, we built a database of interviews, speeches and press conferences for previous presidents, leaning heavily on what’s available publicly from presidential libraries, and the wonderful collections at the University of California, Santa Barbara’s American Presidency Project. One of the reasons we did this is to provide a point of contrast. Looking at a single datapoint can tell you everything and nothing. A nice cohort comparison… that’s better.
Importantly, as we’ve blogged earlier, we like to focus on a person’s own words if possible, not speechwriters. The UCSB archive in particular gave us a rich trove of Presidential press conferences back to Herbert Hoover in 1929. So we could look at just what a president said. Unscripted (or as close an approximation as is possible for a president).
Okay. We had the algorithms. We had the text. On to…
As mentioned previously, we narrowed our samples from Hoover forward to just press conferences, presidential debates and interviews. Of course, within those, we only use words spoken by the President, nothing else.
This left us with a deep sample for each, but spread out. We ran the analysis two ways:
Complete. Whatever we have, we have. On the low end, it’s 44,705 words for Gerald Ford, up to 1,124,164 words for Bill Clinton. Trump clocked in second at 915,801 words.
Equal Sample. We then ran the same test on 30,000 words, plus or minus 1% (actual range was 30,003 – 30,253 words), where we looked only within the person’s presidency (no pre-election debates) and started from Inauguration Day forward, adding sentences until we hit 30,000, then stopped and analyzed those.
In addition, we’ve been testing the Lexile framework. It’s a free test so we’re limited to 1,000 words. But we took the first 1,000 words (in full sentence format) from the equal sample and tested those.
It’s important to note: for the two presidents where social media existed, this was not included. This was strictly utilizing the responses given by a president in an interview, during a press conference, or in a political debate.
It statistically made no difference which way we analyzed it, or which method. It affected some scores and some of the ranks, but not the position of Donald Trump on that list. In each case, he ranked last of the past 15 presidents.
By every metric and methodology tested, Donald Trump’s vocabulary and grammatical structure is significantly more simple, and less diverse, than any President since Herbert Hoover, when measuring “off-script” words, that is, words far less likely to have been written in advance for the speaker.
Significant is not editorializing. The gap between Trump and the next closest president (in most indices, Harry Truman, known historically for a folksy, simple pattern of speech), is larger than any other gap using Flesch-Kincaid. Statistically speaking, there is a significant gap.
This gap appears both when using the complete corpus available to us for all presidents, and the more limited 30,000 word set to use an equal data set for each. In either data set, Donald Trump consistently clocks in at the bottom of the list. Depending on the scale used, it’s between a 3rd and 7th grade reading level.
Using the same one used by the Department of Defense, the grade level on the equal sample is 4.6. That’s between a fourth and fifth grade level.
The next closest is Truman at 5.9, followed by Bush 41 at 6.7. The top three: Herbert Hoover (11.3), Jimmy Carter (10.7) and Barack Obama (9.7).
In terms of word diversity and structure, Trump averages 1.33 syllables per word, which all others average 1.42 – 1.57 words. In terms of variety of vocabulary, in the 30,000-word sample, Trump was at the bottom, with 2,605 unique words in that sample while all others averaged 3,068 – 3,869. The exception: Bill Clinton, who clocked in at 2,752 words in our unique sample.
That’s a fair question. So what? Vocabulary is not a proxy for intelligence. In IQ Tests, vocabulary is a component, but only a component. However, it is used as a proxy for a number of things:
Doctors use it to measure symptoms of degenerative brain diseases (note: as blogged previously, we see no downward trend over 40 years in Trump’s vocabulary. For unscripted, it’s very consistent).
Psychologists use vocabulary as a measure of intellectual curiosity and a person’s reading ability.
But also, it should be pointed out:
Politicians strive to get a clear, concise message in front of the public. That includes keeping it short and simple.
Other than Donald Trump, all presidents in this cohort were either career politicians, or in the case of Eisenhower, a very public figure and military leader for decades before running for president (historians argue whether a general at Eisenhower’s level would already be considered a politician before running for office, due to the need to navigate very political waters at that level).
Back to so what? In answer to those who emailed the equivalent of “is the president a stable genius”, the answer is “we don’t know.” Short of IQ tests, there’s no way to know for sure.
But what we can say is, compared to the 14 presidents who preceded him, by every measure, his use of words when off script are significantly less diverse, and simpler, than all presidents who preceded him back to Herbert Hoover.
As always, feel free to dispute the analysis, but come prepared with data. We don’t need more opinions. But more analysis with supporting data is always welcome.
[Update: 9/27/17: Audio has been removed per DMCA notice from SiriusXM. Think it should be public? Feel free to let @SternShow and @SiriusXM know.]
[Update: 9/30/17: TrumpOnStern.com kindly pointed out we missed two Stern shows. Donald Trump appeared on November 9, 1995 for 22 minutes and January 20, 1994 for at least 8 minutes (the audio is not complete). The post below reflects the data without those two shows included.]
Be careful what you wish for. It could screw up your month.
So… the Howard Stern / Donald Trump interviews. It’s been a bit of an obsession of ours. But not for the reasons you might think.
There have been some articles written before the election about Howard Stern, primarily by Andrew Kaczynski and Nate McDermott at Buzzfeed and later at CNN, Virginia Heffernan at Politico, David Fahrenhold at The Washington Post and others, including Mother Jones and The Atlantic.
These all quoted excerpts from these interviews. By our count, we found about 20 minutes of audio total covering about a dozen interviews.
If you’ve listened to Howard Stern before, you know you can find something salacious without a great deal of effort, and the interviews with Donald Trump were no exception.
However, the stories (with the exception of Heffernan’s excellent piece) didn’t address what we thought were two key points.
Howard Stern is an excellent interviewer. Guests can spend two hours or longer speaking with Stern. His staff preps him well and they are impeccably researched, and move from making out with girls to port security in Dubai effortlessly. Howard Stern gets people to speak about things that, in any other context, they would never discuss.
Based on our research, no one has spent more time interviewing Donald Trump publicly than Howard Stern, both in terms of the length of the interviews, the number, and over a larger period of time.
We wanted that record for our database. It’s a gaping hole.
But therein lies the problem. Howard Stern has done, conservatively more than 8,000 shows since the 1980s, and that number is probably low. Based on the normal length, that’s at least 30,000 hours of audio and likely a minimum of 50,000,000 words. And there’s no definitive record. If Stern has the list, it hasn’t been shared.
We’ve found snippets and pieces before. But, per our mission, we want to ensure that anything in our database is the full transcript, versus an excerpt. As such, we were interested in the full record of conversations between Donald Trump and Howard Stern from the 1990s forward. To make sure we had it all, we wanted the whole show to check.
Our research indicated he was on the show dozens of times, but not the details, exact dates, etc. We reached out to people who operate fan sites, particularly marksfriggin.com, and on the Internet, particularly via Reddit. Stern fans are known for collecting recordings of old shows, so we were hoping to find the full recordings,
We were insulted in ways both creative and thorough, but kept trying. In short, we struck out. By the spring, we had shifted our focus to building out the features on the site.
Out of the blue, early in the morning September 5th, about 3 1/2 months after we had moved on, we received an email with a Dropbox link from an anonymous Yahoo account. We looked and to our surprise, it was several dozen MP3s with the entire show, end-to-end, which allowed us to verify we were capturing the entire interview. We copied the MP3s and quickly emailed back to ask a couple of clarifying questions. We were not-so-politely told to leave them alone.
Between the files and extensive research on marksfriggin.com and other sites, we were able to verify 35 unique interviews, beginning May 8, 1993 on Howard Stern’s E! interview show, through August 25, 2015. There were other MP3s, but they contained Stern talking about Trump, or a time when Trump was supposed to dial in but couldn’t, or in one case, a re-run. We filtered those out.
So we got to work, transcribing, proofreading, cross-checking. This is harder than it sounds. Our transcription robots are good. But the show is fast paced (235 words per minute by our measure), filled with crosstalk, music and other sound effects in the background, noise. It mixes clean audio with phone audio. It’s the greatest hits list of “things that mess with algorithms.”
Combine that with a mixed bag of recording methods, and our robot was none too happy with us. So it involved a lot more manual work than we like.
The transcripts are complete but we’ll be working them towards perfection for some time. But they’re just about there, and married to the audio, and run through our usual battery of audio, text and voice analysis. (And please, when in doubt, listen to the audio).
But after investing more than two weeks, there’s just too much to do. We’ll keep tweaking in our spare time, but there’s only so many hours of the day before you start writing your blog posts at 3:45 am. Just sayin’.
Yeah Yeah Yeah. Whatcha Got?
Donald Trump’s time on Howard Stern totals 15 hours, 8 minutes and 52 seconds, with 104,357 words spoken by Donald Trump. This is 21% longer than his first book, “The Art of the Deal” (86,575 words). Hell, it’s almost half as long as the Frost / Nixon Interviews.
Based on our records, this is far more time Trump has spent in an interview than any other journalist or media personality, including Morning Joe, Sean Hannity, Bill O’Reilly, Chris Matthews, Larry King, Don Imus… any of them. This is in terms of the number of interviews, the length, the time period.
Trump has spent far more time, over a far longer period of time, speaking in greater depth with Howard Stern than any other interviewer. No one has spent more time interviewing Donald Trump in a public setting than Howard Stern, and in particular spanning more than two decades. Having these interviews in our database provides a crucial perspective.
We stopped counting after more than 500 unique questions and answers. Yes, lots of questions about sex, positions, his views on women, and things you don’t find in any other interviews (AIDS, Chlamydia, group sex, groping in public… our robot keyworded a lot of new things… we chose not to teach our AI some things. It leads to scary things). But also, lots on North Korea, Iraq, infrastructure and taxation. The Port of Dubai security was a real question. And it was answered.
Some of the stories Trump told repeat themselves across multiple years. He discusses a great deal about his personal life. And most of the interviews had a specific hook: boxing matches Trump was promoting, new books, The Apprentice and, toward the end of the series, a great deal more about politics.
We also had to develop a custom taxonomy and classification. A good many of the questions and answers are, in Stern’s style, leading. For example, an oft-quoted excerpt from a 42-minute interview had the following segment:
Donald Trump: My daughter is beautiful, Ivanka. She…
Howard Stern: By the way, your daughter.
Donald Trump: She’s beautiful.
Howard Stern: Can I say this? A piece of ass.
Donald Trump: Yeah.
He didn’t say his daughter was “a piece of ass.” However, he did not argue the point.
This follows a pattern throughout the interviews of Stern making a statement as a question and Trump either confirming or denying the statement without repeating it. Trump first explicitly stated he wouldn’t answer a question on September 23, 2004, his 20th interview with Stern. As the interviews evolved closer to 2015, the rate of objections increased.
The interviews begin on May 8, 1993, before Tiffany and Barron were born, Eric was 9, Ivanka was 12 and Don Jr. 16. He had just divorced his first wife, Ivana, and was dating Marla Maples. The last interview was on August 25, 2015, two months after he announced his 2016 presidential run. He and Melania had been married a decade, his children were married and he had starred in two famous television shows.
So Is This Everything?
We are almost sure we have them all. Daily records of Stern’s show prior to 1997 are difficult to find. Is it possible we missed one? Absolutely. But we’re pretty sure we’ve got them all. If we’re wrong, we’d love to know the dates and get to work transcribing.
Also, please check the audio. We think we did a good job tagging who is speaking. But when in doubt, hit play. And if we’re wrong, let us know so we can fix it.
So let’s get the headline out of the way. Donald Trump is not at all comfortable discussing God. That’s based on more than three hours of video covering more than 424 distinct segments spanning more than 200 events.
That’s why you probably clicked here. Now, you get a data science explainer before you get the data. We’re so bait-and-switch.
As part of a set of new features we’re deploying (see our Emotion Subtitles), we generated a huge amount of data from our new approach to Voice Stress Analysis. Each second of audio and video gets individually analyzed, as well as 10-second segments, sequential segments, and the entire speech, interview or press conference.
This compilation opened up an interesting opportunity for analysis. Since our data is extensively tagged and structured, we could document, statistically, exactly what makes him relax, and what makes him tense. So we thought: cool.
A Word about Voice Stress Analysis
You’ll read a lot about voice stress analysis. So let’s address one thing here: it’s not a lie detector test. This is hotly debated, and we prefer to stick with the known. It has not been proven definitively that increases in voice stress indicate lies. If a person believes a lie, they will be relaxed. If a person steps on a tack, stress will increase even if telling the truth.
What this does definitely detect is a level of comfort, stress and/or anxiety. The higher the frequency (due to muscles contracting, including muscles in the neck that affect the voice box, thus the frequency), the greater the indication of stress. By measuring patterns when this occurs, we can identify statements and topics where a person is not comfortable with what they are saying. Coupled with identification the underlying feelings and measuring factors such as word choice and rate of speech, among several dozen others (we gather 115 datapoints per word), it’s a powerful way to uncover how someone feels about what they are saying.
That’s why the next part is important: we have hundreds of hours of Trump documented, transcribed and keyworded. A bad day is possible. 200 bad days on the same topic? Unlikely. In fact, we did a basic statistical model and found the odds of having “a bad day” on 200 or more unique days exactly when a particular topic being discussed was… some big number. Excel showed one of those 1e12 things and we just moved on.
Back to Why You’re Here.
So we ran the data. The methodology is important, which we’ll explain in detail:
Eliminate Bias. To remove bias, we selected only topics that Trump has discussed publicly 200 or more times, according to our database. Every one of those topics / subjects was checked and is reflected below.
Find Midpoint. For each interview, speech, event, and so on, an individual middle (median) point was established for just Trump’s voice. So if he was having a relaxed day, we measured when topics moved the stress above or below that midpoint. If he was having a bad day, same thing.
Phrase subjectivity. For phrases, we freely admit this was subjective. We checked our database for frequently used phrases and it found thousands. It’s a literal beast, so “I am going” appears in the list of three-word phrases. We punted and googled “Trump catch phrases” and selected about a dozen. We made a subjective choice to add “Make America Great Again” into the mix, as well as “Thanks”, “Thank You”, “God Bless You” and “God Bless America” into our checks, based on the findings in our topical analysis.
“You’re Fired”We eliminated “You’re fired” since most of the references were short, pre-recorded clips from the television show vs. a real-world situation.
Short Segments. We eliminated any segment less than four seconds long, as that can add anomalous spikes, and we want the phrase or topic in context.
Sample size. This got us to 170.23 hours of video, spanning 30,899 unique segments (1- to 3-word sentences are a unit in our database based on size), from 1980 through this week, covering 1,634,208 words.
<nerd>This then fed into our algorithm, which is an Adaptive Empirical Mode Decomposition (AEMD) process, to check for deviations outside of 8-12Hz. This is widely recognized as the normal frequency range to monitor. When it goes above 12Hz, it’s considered stress…</nerd>
A reminder… but again, we use the midpoint from a particular event, to account for the fact that being President probably is stress in and of itself.
One note: you will see topics on the table below with less than 200 citations. Our check of topics included print interviews, his writings and tweets, indicating it is a topic he frequently discusses, but may be represented less than 200 times in the audio and video.
And from that data…
Back to the Lede
Trump is clearly, statistically, uncomfortable expressing gratitude. When he thanks people, based on 67 unique segments where thanking someone was the topic, and another 105 phrase references to thanking someone, he is consistently at an elevated stress level, indicating anxiety.
Similarly, when discussing God as a topic (424 unique segments), he is also uncomfortable, with his voice indicating stress and anxiety well above the midpoint established contextually in the conversation. Note this is specific to discussing God, vs any particular religion or religion itself.
Rounding out the top list of uncomfortable topics and phrases:
“Make America Great Again” (32 segments)
“Build the Wall” / “Build That Wall” (153 segments)
The White House (as an institution – 323 segments)
Veterans (402 segments)
Law Enforcement (194 segments)
The Wall (as a topic – 790 segments)
Okay, but what puts him at ease? On what topics is he comfortable?
The top of the list is what our system classified as “inner cities” but in looking at specific references, it’s discussions of urban planning, cities and infrastructure. He’s well below the stress midpoint when on this topic (145 references). A good number of these references were in interviews pre-dating his Presidency as well.
The Middle East is strongly represented on the list of topics where he is comfortable: Iraq (420 references), Iran (406 references), Syria (281 references) and the Middle East in general (305 references) are all points where he is clearly relaxed and not anxious when discussing.
Rounding out this list of topics where he is comfortable:
War (248 references)
The New York Times (89 references)
Terrorism (366 references)
“A lot of money” (275 references)
“Many many” (126 references)
So was there anything else surprising?
Personally, for me, there were a few things, but the world doesn’t need another opinion right now, so take a look at the data below and decide for yourself. If you disagree with anything in the methodology, let us know. But be warned: we make available all our data on request, and will continue to do so. If you disagree with the points above, we’re happy to send you the algo and all the underlying data for you to verify the results for yourself, or to run through a different process. The world not needing another opinion doesn’t just apply to me :-). We’re all about data and verifiable facts at Factba.se, so you’re welcome to think we’re wrong, but be ready for us to challenge you to prove we’re wrong.
We’re constantly bringing new processes, techniques and tools online at FactSquared. We’ve been using machine learning for analyzing audio, video and text since we launched (all the way back in… January!)
But we’re also very agnostic about tools. We don’t have all the answers, and we watch this space closely for new developments. When something new comes along, we try it. If it adds value, we integrate into our composite.
“Oh people can come up with statistics to prove anything Kent. Forty percent of all people know that.”
— Homer Simpson
The Simpsons, S05E11
We were going through a round of testing on a new approach right when Donald Trump met with Vladimir Putin in Hamburg, Germany on Friday, July 7, 2017.
In a peanut-butter-meets-chocolate moment, we said: “let’s try this out!”
In keeping with the past blog posts, you get some background and details. It’s like Neil deGrasse Tyson, but not as funny. Or smart. Or handsome. Or charismatic…
…back from the therapist. All better. Picking it back up…
A huge part of what we do, separate from pulling all this data together, is the analysis. Most of it is behind the scenes because it’s a lot of data. 115 datapoints per word. Or, in the average 10-minute speech (1,132 words, at current 30-day moving average of Trump’s speeches and remarks of 113.2 words per minute), 130,180 datapoints. You do not want all that on a page.
It all feeds our search engine to make the results hyper-accurate, but the goal has always been a way to surface the information that doesn’t overwhelm. You’ll start seeing some of it in the next few days as we get charts and dashboards on the search, and in our daily newsletter (yes, it’s coming).
Part of all this is text analytics of course. Using established approaches and methodologies, it analyzes the words and groups of words to score how positive or negative a statement is, what emotions it conveys, the topics of conversation, and so on.
For example, when we analyze word usage to determine odd turns of phrase, or how “normal” a statement is in terms of language, it utilizes the Corpus of Contemporary American English, a statistical compilation of 520 million words across books, newspapers, magazines, books, spoken words from 1990 – 2015 (it’s cool, but bring Dramamine). The raw data we generate is reproducible.
The same principle applies to audio analysis, which measures voice stress reliably, as well as comparing the frequency, tremors and other ticks against things like the Toronto Emotional Speech Set or the Berlin Database of Emotional Speech, and quite a few others. From there, we tailor the models, building on top of the core data. Ditto for video. You get the point.
Taking all the above, for example, our current system generated a composite of the Trump / Putin discussion. It described Trump and Putin as very positive. This was of course challenging as the text analysis was off the translator for Putin’s comments. The text emotion reflected “Joy” and “Agreeable” for both.
But, this provides an analysis of the words, not the person.
The audio analysis, which is important, told a different story. It characterized Trump as being moderately positive, but low energy. Putin was characterized at the midline: neither high nor low energy, neither positive nor negative. Put another way, the robot said Trump was upbeat in tone, happy, but lower energy. The same robot said Putin was a cipher. Neutral across the board.
Something Old, Something New
That brings us current. We’ve been meaning to test an expanded voice analysis tool. The company, BeyondVerbal, had built their analysis off of more than 60,000 samples, far larger than the others. Analysis tools such as these are the embodiment of “more is more.” The much larger sample set lets the analysis be much more finely sliced. So we took it out for a spin.
We used the below for Trump…
Because the tool specifically measures voice frequencies, the camera noise should not impact it. That being said, we tested it anyway after removing the camera noise…
…and found the analysis, with 1-2%, to be nearly identical. Feel free to test on your own to validate.
Our findings are below in the table. The data indicated a more restrained, less confident Trump, while Putin appeared to be in tight control of his voice and more confident of his position. The data, combined with several dozen other tests, also proved to be an improvement on our audio analysis. So we’ll be integrating it into our composite in the coming days.