Tuesday, June 13, 2017

Twitter bot data exercise



It's been fairly well documented that there is a substantial fraction of 'robot' Twitter accounts that are used for various purposes:

1.) Boosting importance of a tweeter by having a large number of followers
2.) Disseminating information in volume that would make a view on a topic seem popular

In particular item #2 was alleged to have been perpetrated by Russian Twitter bots for the 2016 election.   Ever since the Russian invasion of the Crimea, I was fascinated by the use of bots and false commentators.   Perhaps the most humorous of these were in the form of comments.   I would read a comment on the Crimean invasion by Joe Blow from Ohio, and it would be written like a Russian speaker trying to speak English - with all sorts of "tells" like dropped articles.    Eventually both CNN and the New York Times began to filter the comments section - I'm assuming so as to not be overwhelmed.

Periodically, I check in on Tweets with hashtags or lead names that might split the political spectrum and look at the approaches that both sides take.  Typically very similar themes are repeated by either side, not surprisingly.   Someone creates a 'narrative' and then it gets repeated.   This doesn't mean that the repeaters are necessarily bots, but they're propagating one particular slant.  

If I suspect a topic is garnering bots, I start to look at how many followers they have, and there is more than fair share of tweeters who look like bots.  

How to identify?   Typically they repeat the same line, and have very few followers.

So, rather than be anecdotal, I decided to take a data sample.  

On June 12th, the 9th Circuit Court of Appeals announced a decision against Donald Trump's "Muslim Ban" as it's come to be known.   The decision went against Trump.   Predictably, the two sides of the political coin formed into pro and anti 9th Circuit teams and promoted their views on Twitter.  

This was something of an ideal case to sample Twitter, as it's not a major case, but sure to invite commentary.   I figured that, since we've been told that the Russians help out the right-wingers with Twitter bots, that I would find a large number of tweets against the 9th circuit that also had a low count of followers.  

Here are the ground rules:

Tweets have to be 'lone' tweets and don't reference an article or be a retweet.  The tweets cannot originate from a news source, like a local radio station.

The tweet has to be easily classified into a pro 9th circuit group or an anti 9th circuit group.

What did I expect?   I expected to see a lot of tweets from accounts with a low-count of followers from anti 9th circuit sentiments.  This follows the narrative that Russia favors the Trump followers.

I sampled for about 2 hours on June 13th, the day after the 9th Circuit ruling.

What I found surprised me.   There were 72 anti 9th Circuit tweets, and 38 pro 9th Circuit tweets.   OK, so I have a short attention span, but that's enough to get some sampling.

I defined a "bot" as an account that had 20 or fewer followers.   Using this, 45% of the pro 9th circuit tweeters were 'bots', and only 18% were anti 9th circuit tweeters.

This was truly surprising, but the trend kept up.  

What explains the dynamic?  

Random chance, but the p-value starts to become prohibitive.

Perhaps the people who say that the Russian hacking is designed to create polarization are right.   If there is a trend  favoring a certain sentiment, bots are launched that advocate the opposite sentiment.

Other factors?  Benefactors who are progressive and promote opposing accounts?   Real users weighing in late to the show.   I don't know.   Just that some data sampling.   Correlation doesn't imply causation, but still.



No comments:

Post a Comment