Feed Me (Transcripts), Seymour…

If there’s one thing about statistical models that’s generally true: they need to be fed.

For about six months now, I’ve been living most waking moments in the words of Donald Trump. I love algorithms, but I check them. And check them again. And again. It’s not even borderline compulsive. We blew past borderline around January. It is compulsive.

Probably the single biggest challenge I face in shaping the models: access to raw materials. We check, and check again, every word. Yes, he was on Oprah in 1988, but we need more than 3:11… we need the whole show for context.

We are constantly updating our backlog of material, with volunteers generously sending in links (I’m looking at you CJ in particular), text, videos that in turn need to be checked and then fed into Margaret, our pseudo-AI that ravenously consumes every word spoken, analyzing the audio, video and text to build her model. This in turn analyzes tweets, transcribes better, and does lots of other cool things.

The single best source of this information are interviews. As opposed to speeches, they are generally unscripted. As opposed to tweets, you get more than 19.6 words at a time (1 year moving average, 3,203 tweets, 62,871 words). Sometime, I’ll have enough time to do a separate post explaining how different the models view speeches vs. interviews… it’s almost two different people in the output.

However, as Chris Cillizza at CNN pointed out in a recent tweet, these are often unshared, even after the news cycle. Some organizations publish transcripts simultaneously. Most publish just excerpts, noting they’ve been edited. Some share audio and video, but with cuts and jumps. Others… nothing.

I’m not naming names, but given that the messages coming from The White House can at times appear to contradict each other, this raw material is crucial, both for the historical record, and for building a base of research that others can analyze.

Also, the full, unedited interview can remove potential questions as to whether comments are in context. Personally, I think that is in nearly every case a ridiculous argument, but the argument can’t be made if there’s no edits.

So I’m making both a public plea, and an offer: please, in the name of all that is good in the world, once you’ve run your stories and pieces, please publish and share the raw materials. Pull any off-the-record comments, but otherwise, share the raw audio, video and text.

Since everyone has a few things to do nowadays, here’s what we’ll offer for any interview with the President, if time or resources constrains a full transcript or sharing raw video and/or audio.

  1. Factba.se will happily, and freely, transcribe in full any video or audio provided, both via Margaret, and with a human editor to verify.
  2. Factba.se will provide, via a spreadsheet or any other medium, ALL metadata developed. This is the stuff that is behind the scenes (not for long) on our site. If audio, you’ll get back second-by-second audio analysis of voice stress and emotion, which is keyed to Trump (the sotto voce whisper). If video, it will include facial expressions, smile/frown, gestures and other analysis (clothing identification, colors, smile / frown, the two-handed punctuation I myself have as a third-generation bridge-and-tunnel child, etc). It is even learning to pick up when he flushes (complexion change). It will be a lot. But it will be everything.
  3. We will provide the full keyword and entity extraction, by three-sentence pair, section and overall, both for the entire interview, and specifically on just when Trump is speaking.
  4. We will provide the full-range of analysis. Grade-level models, sentiment, emotion… all of it.
  5. We will respect any and all embargoes given. We are not meant to be a news organization. If you’d like us to hold until a day, two days, three days after the stories run before integrating and sharing the information, fine. You’re the boss. It’s your interview. You get it back first and control the story.
  6. If a human is in the mix editing, figure two hours per hour of video/audio for transcript. If you don’t mind raw from Margaret (she’s close to 95% dead on now), 90 seconds per hour. We just need a little notice to plan our day to be ready for it if you want a quick turnaround.
  7. We will, of course, link out to your pieces from the text.
  8. If there are any other requests… fine. Our interest is the record, and sharing the resulting analysis.

We’re not looking to create a hippie commune. We are looking, however, to unleash the data that is contained in your excellent work, in a way that does not conflict with your job.

Also, on the off chance Margaret becomes sentient again, you’ll be in her good graces.

— Bill Frischling

2 thoughts on “Feed Me (Transcripts), Seymour…

  1. You do an awful lot of work so you can shed bad light on our president, which is direspectful. So what of his daily seduce is 30 minutes later in the morning than Obama? Why do his 2011 tweets have any business here. Who decides whether it’s a negative story or positive? Many I saw, marked thumbs down negative, we’re accyratevposts! If you are trying to show he’s deceitful because his numbers are exaggerated does not make it wrong. He just doesn’t know the exact number. I was shocked to discover such an appalling website trolling the President of the United States!

    1. We neither present positive nor negative. We present data. We also make the data public. If you believe there’s a bias, we certainly want to know, but present the data, not opinions. Not what we deal in.

      In regard to the 2011 tweets, what we do believe is that political figures should be transparent. Statements he made to millions on twitter would certainly qualify as part of the record.

      Similarly, he has compared himself to other presidents no less than 121 times, per our database. As such, he has raised the issue of comparing and contrasting his performance to other presidents.

Leave a Reply

Your email address will not be published. Required fields are marked *