Why predict who wrote @realDonaldTrump and @POTUS tweets?
As I explain here, Data Scientist David Robinson presented very compelling evidence back in the fall of 2016 that Trump doesn’t actually write many of the tweets from his personal account. The nerd in me just couldn’t resist identifying who is actually writing on his feed and under his name.
When he took office, Trump also took control of the @POTUS twitter account. According to it’s main page, all tweets are by his director of social media, Dan Scavino Jr., unless signed “-DJT.”
As of April 23, 2017, there were only 7 actual @POTUS tweets that were signed this way. However, there are numerous examples of @POTUS tweets that are identical to tweets first posted on @realDonaldTrump using an Android phone, which strongly suggests Trump wrote them. As with his personal account, identifying who is writing the @POTUS tweets was a challenge I just couldn’t pass up.
How do you predict who wrote the tweets?
That’s a long story, and is spelled out in this explanation. The really short, short version is that we believe tweets posted using an Android device were written by Trump, while during the election, those posted with an iPhone were generally written by someone else (i.e. members of his staff). I developed my own set of features in tweets, and trained machine learning classifiers using all the iPhone and Android tweets from @realDonaldTrump during the election. I then applied this model to every tweet in the archive (except for retweets and quotes) as either being written by Trump or someone else (most likely, members of his staff). The same model is applied to tweets from @POTUS, under the hypothesis that he writes a number of those as well. The data-analysis server also checks for new tweets every 5 minutes, and posts the results on didtrumptweetit.com.
If Trump uses an Android device, why do you need to make predictions in the first place?
Great question. In the database of @realDonaldTrump tweets, there are over 13,000 that were posted using methods other than an Android or iPhone device. Who wrote those? No @POTUS tweets have come from an Android phone as of yet, so who writes those?
Perhaps most interestingly, only a handful of tweets have been made from an Android phone since March 8, 2017. All others have been from iPhones and web-based systems. That may have been in response to a letter sent to the White House legal counsel by the House Committee on Government Oversight and Reform on March 9, which read in part that
President Trump uses at least two Twitter accounts: an official White House account (@POTUS) and an account that predates his inauguration (@realDonaldTrump). Many of the messages sent from these accounts are likely to be presidential records and therefore must be preserved. It has been reported, however, that President Trump has deleted tweets, and if those tweets are not archived it could pose a violation of the Presidential Records Act.
One speculation is that Trump stopped using his Android device to be in compliance with the Presidential Records Act. Regardless of the circumstances, it isn’t enough to merely report how the tweet was made in order to know who made it.
How accurate are your predictions?
I used a variety of standard tests from machine learning to establish the accuracy and validity of the classification model I built, all of which are presented here, but the “executive summary” is that the model is about 98% accurate in correctly identifying tweets written by Trump.
Caveat emptor: these are all just predictions from a mathematical model. I do not represent these predictions as statements of fact or claims, legal or otherwise. I will not be liable to you for consequences arising from your use of the web pages and data provided at didtrumptweetit.com.
Why don’t you classify retweets and quotes?
There is really no way to know who clicked the “retweet” button, copied and pasted someone else’s tweet, or posted a quote that someone else wrote. I can only model the syntactic patterns of tweets Trump wrote. However in general a copy/paste followed by a short “thanks” was written by Trump.
How do you know Trump is actually the author of the tweets from Android phones?
This really is the central question to this entire study. After all, my model is based on the hypothesis that Trump wrote most all of the Android tweets, and other people wrote most of the others. Trump has never stated this is the case, but I can give you an argument based on four pieces of evidence:
- The extensive linguistic and sentiment analyses by David Robinson (here and here) show that tweets made from an Android phone are clearly written by different authors than tweets made from other devices.
- This model was trained on a small subset of all the tweets from @realDonaldTrump. However, it is remarkably accurate at classifying a tweet as coming from an Android device (over 98% correct for the tweets made after Trump took office). If there weren’t different author(s) for Android and non-Android tweets, then classification models could never do better than 50% accuracy (the same as random).
- I analyzed the times-of-day of tweets, as functions of how they were made and how my model classified them. Before any Android tweets showed up, 92% of @realDonaldTrump tweets were made between the working hours of 9 AM and 6 PM. From the first Android tweet (Feb 5., 2013) until Trump announced his candidacy for President, 85% of Android-based tweets were made outside working hours, while 91% of the non-Android tweets were still made during working hours.
- Trump has been repeatedly photographed using a Samsung Galaxy S3, which was released in mid-2012 (9 months before Android tweets started showing up).
Android-based tweets are undeniably different than all the others. Now either many different people had Android phones that they were using almost exclusively outside of work to write @realDonaldTrump tweets, or only Mr. Trump did. Occam’s razor suggests the latter.
What platform or language is used to do all this?
In a word, python. It’s really powerful, runs on almost any computer, has an immense library of tools, and best of all, it is completely free. I use python and twitter’s API to download new tweets as they are posted. Python’s toolkits sklearn (for machine learning) and ntlk (the Natural Language Toolkit) are used extensively to perform the data analysis. All data are stored in a SQL database, since python interfaces with it seamlessly. And posts are uploaded using python’s interfaces to WordPress. There are more details provided here.
Why didn’t you use Deep Learning?
You really need massive data sets to train deep learning. I only have a training set of 6400. That said, if I find some extra time I might grab the deep layers of a model already trained on text, and tinker.
Can I use your archive or machine learning predictions for…?
Yes, of course! Use them any way you see fit, as long as it doesn’t involve breaking the law. However, under the licensing terms of this website (see below), you must acknowledge didtrumptweetit.com as the source. Seems fair, considering how much work I put into this, right?
Twitter only lets you grab the last 3200 tweets. How did you build the database of all Trump’s tweets?
Short answer: I got help. All @realDonaldTrump tweets prior to mid-November, 2016 were gathered from here. We encourage you to visit http://trumptwitterarchive.com, the page that uses that database, for another presentation of the tweet archive, as well as a great variety of statistics and analyses of these tweets. Between mid-Nov. 2016 and the inauguration, I updated this site daily. Starting on Jan 22, 2017, I processed @realDonaldTrump and @POTUS tweets in real-time using my own python scripting and twitter’s API.
Why are you using WordPress?
It’s a very easy platform for hosting and searching. In particular, I can upload posts directly from the python programs I use to download and process data. WordPress allows back-dating of posts, so I can time tag a tweet exactly when the tweet was posted. And the WordPress search feature makes it effortless to find all the tweets that use a common word or phrase.
Why are ads showing up here?
I’m covering all the costs for this project out of my own pocket, and would like to raise the funds needed to cover them. I’ve placed ads along the side and bottom of the site using Google’s AdSense to help raise that money. Any revenue on top of those costs will be donated evenly on a regular basis to the following:
- The American Civil Liberties Union.
- Planned Parenthood.
- The Southern Poverty Law Center.
- The Natural Resources Defense Council.
As this site just started up, it will take a while before I can start making donations, but rest assured, I will report the dates and amounts here, as they happen.
Who are you?
I’m a scientist turned data scientist. I’ve been doing professional research for over 20 years, and have an extensive CV of peer-reviews publications, presentations, and research grants (just to establish some bona fides). Like any good scientist, once I started thinking about the question of author identification in Trump’s tweets, I couldn’t let it go. This website is the result of that intrigue and investigation.
Can I contact you?
Absolutely. Please email didtrumptweetit (at) gmail.com. If you use these data for any research or media purposes, please send us a link to what you did so we can include it in our media page.
NOTICE: Any use of this database must be properly acknowledged, e.g. by referencing the website URL.
This database has been built with open-source software, obtained and used via the Apache-2.0 and MIT licensing agreements. All twitter data was obtained according to the Twitter Developer Agreement. All analyses presented in this website, and methods used or created to present these media, are licensed as follows.
Copyright 2017 DidTrumpTweetIt.com
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.