Why predict who wrote @realDonaldTrump and @POTUS tweets?
In late December of 2016, I stumbled across this tweet:
The U.S. Consumer Confidence Index for December surged nearly four points to 113.7, THE HIGHEST LEVEL IN MORE THAN 15 YEARS! Thanks Donald!
— Donald J. Trump (@realDonaldTrump) December 28, 2016
I don’t know about you, but I found it really odd that Mr. Trump would be thanking himself like that. So I did a little google searching and found a very extensive article written by data-scientist David Robinson, concluding that Mr. Trump doesn’t actually write many of the tweets from his personal account. The nerd in me just couldn’t resist testing out this hypothesis. You can read about how this works in much more detail here.
How do you predict who wrote the tweets?
That’s a long story, and is spelled out in this explanation. The really short, short version is that we believe that for years, tweets posted using an Android device were all written by Mr. Trump, while during the election, those posted with an iPhone were generally written by someone else (i.e. members of his staff). I developed my own deep-learning model and trained it to classify the author (really, the text coming from the different devices used to make the tweets) using all the iPhone and Android tweets from @realDonaldTrump during the election. I tested the model on over 14,000 more tweets, and it had an accuracy of just over 98%. I then applied this classification model to every tweet in the archive as either being written by Mr. Trump or someone else (most likely, members of his staff). The same model is applied to tweets from @POTUS, under the hypothesis that he writes a number of those as well. The data-analysis server checks for new tweets every few minutes, and posts the results on didtrumptweetit.com.
What is “Deep Learning?”
It’s a type of machine learning that uses sophisticated neural networks to tease out subtle features from huge bodies of information. For a short but readable introduction for the layperson, check out this short piece entitled “Machine Learning for dummies – explained in under 3 minutes.” And no, if you asked the question, you certainly aren’t a dummy.
Wasn’t there a different model here?
Yes. You can read about it here. But as I note here, it showed some inadequacies over time that I needed to address. The new “N3CNN” model in use here is much more accurate. It wen’t live on August 14, 2018.
If Mr. Trump uses an Android device, why do you need to make predictions in the first place?
Great question. In the database of @realDonaldTrump tweets, there are over 13,000 that were posted using methods other than an Android or iPhone device. Who wrote those? No @POTUS tweets have come from an Android phone as of yet, so who writes those?
Perhaps most interestingly, only a handful of tweets have been made from an Android phone since March 8, 2017. All others have been from iPhones and web-based systems. That may have been in response to a letter sent to the White House legal counsel by the House Committee on Government Oversight and Reform on March 9, which read in part that
President Trump uses at least two Twitter accounts: an official White House account (@POTUS) and an account that predates his inauguration (@realDonaldTrump). Many of the messages sent from these accounts are likely to be presidential records and therefore must be preserved. It has been reported, however, that President Trump has deleted tweets, and if those tweets are not archived it could pose a violation of the Presidential Records Act.
One speculation is that Mr. Trump stopped using his Android device to be in compliance with the Presidential Records Act. Regardless of the circumstances, it isn’t enough to merely report how the tweet was made in order to know who made it.
How accurate are your predictions?
I used a variety of standard tests from machine learning to establish the accuracy and validity of the classification model I built, all of which are presented here, but the “executive summary” is that the model is about 98% accurate in correctly identifying tweets written by Mr. Trump, as validated using a testing set over 14,000 tweets, thousands of which were written by Dan Scavino Jr. to ensure that I knew Mr. Trump couldn’t have written them.
Caveat emptor: these are all just predictions from a mathematical model. I do not represent these predictions as statements of fact or claims, legal or otherwise. I will not be liable to you for consequences arising from your use of the web pages and data provided at didtrumptweetit.com.
Can I see your code?
Yes. I grabbed the relevant parts and posted them here.
Why don’t you classify retweets?
There is really no way to know who clicked the “retweet” button.
How do you know Mr. Trump is actually the author of the tweets from Android phones?
This really is the central question to this entire study. After all, my model is based on the hypothesis that Mr. Trump wrote most all of the Android tweets, and other people wrote most of the others. Mr. Trump has never stated this is the case, but I can give you an argument based on four pieces of evidence:
- The extensive linguistic and sentiment analyses by David Robinson (here and here) show that tweets made from an Android phone are clearly written by different authors than tweets made from other devices.
- This model was trained on a small subset of all the tweets from @realDonaldTrump. However, it is remarkably accurate at classifying a tweet as coming from an Android device (over 98% correct for the tweets made after Mr. Trump took office). If there weren’t different author(s) for Android and non-Android tweets, then classification models could never do better than 50% accuracy (the same as random).
- I analyzed the times-of-day of tweets, as functions of how they were made and how my model classified them. Before any Android tweets showed up, 92% of @realDonaldTrump tweets were made between the working hours of 9 AM and 6 PM. From the first Android tweet (Feb 5., 2013) until Mr. Trump announced his candidacy for President, 85% of Android-based tweets were made outside working hours, while 91% of the non-Android tweets were still made during working hours.
- Mr. Trump has been repeatedly photographed using a Samsung Galaxy S3, which was released in mid-2012 (9 months before Android tweets started showing up).
Android-based tweets are undeniably different than all the others. Now either many different people had Android phones that they were using almost exclusively outside of work to write @realDonaldTrump tweets, or only Mr. Trump did. Occam’s razor suggests the latter.
What platform or language is used to do all this?
In a word, python. It’s really powerful, runs on almost any computer, has an immense library of tools, and best of all, it is completely free. I use python and twitter’s API to download new tweets as they are posted. Python’s toolkits sklearn (for machine learning) and ntlk (the Natural Language Toolkit) are used extensively to perform the data analysis. All data are stored in a SQL database, since python interfaces with it seamlessly. And posts are uploaded using python’s interfaces to WordPress. There are more details provided here.
Can I use your archive or machine learning predictions for…?
Yes, of course! Use them any way you see fit, as long as it doesn’t involve breaking the law. However, under the licensing terms of this website (see below), you must acknowledge didtrumptweetit.com as the source. Seems fair, considering how much work I put into this, right?
Twitter only lets you grab the last 3200 tweets. How did you build the database of all Mr. Trump’s tweets?
Short answer: I got help. All @realDonaldTrump tweets prior to mid-November, 2016 were gathered from here. We encourage you to visit http://trumptwitterarchive.com, the page that uses that database, for another presentation of the tweet archive, as well as a great variety of statistics and analyses of these tweets. Between mid-Nov. 2016 and the inauguration, I updated this site daily. Starting on Jan 22, 2017, I processed @realDonaldTrump and @POTUS tweets in real-time using my own python scripting and twitter’s API.
Why are you using WordPress?
It’s a very easy platform for hosting and searching. In particular, I can upload posts directly from the python programs I use to download and process data. WordPress allows back-dating of posts, so I can time tag a tweet exactly when the tweet was posted. And the WordPress search feature makes it effortless to find all the tweets that use a common word or phrase.
Why are ads showing up here?
I’m covering all the costs for this project out of my own pocket, and would like to raise the funds needed to cover them. I’ve placed ads along the side and bottom of the site using Google’s AdSense to help raise that money. Any revenue on top of those costs will be donated evenly on a regular basis to the following:
- The American Civil Liberties Union.
- Planned Parenthood.
- The Southern Poverty Law Center.
- The Natural Resources Defense Council.
As this site just started up, it will take a while before I can start making donations, but rest assured, I will report the dates and amounts here, as they happen.
Who are you?
I’m a scientist turned data scientist. I’ve been doing professional research for over 20 years, and have an extensive CV of peer-reviews publications, presentations, and research grants (just to establish some bona fides). Like any good scientist, once I started thinking about the question of author identification in Mr. Trump’s tweets, I couldn’t let it go. This website is the result of that intrigue and investigation.
Can I contact you?
Absolutely. Please email didtrumptweetit (at) gmail.com. If you use these data for any research or media purposes, please send me a link to what you did so I can include it in the media page. Plus, I love seeing what people do with this.
NOTICE: Any use of this database must be properly acknowledged, e.g. by referencing the website URL.
This database has been built with open-source software, obtained and used via the Apache-2.0 and MIT licensing agreements. All twitter data was obtained according to the Twitter Developer Agreement. All analyses presented in this website, and methods used or created to present these media, are licensed as follows.
Copyright 2017 DidTrumpTweetIt.com
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.