Wow! If Heartbleed was an 11 (on scale of 10) Shellshock is probably a 12 as it will affect many more computers! http://t.co/c7ivKcdMHa
Twitter is giving traditional media a run for its money in many aspects, especially when it comes to getting the news out. Over the last few years a common pattern has emerged where news breaks first over Twitter or a comparable social media platform only to be picked up later by traditional media such as TV/Radio/Newspapers. In fact, most of the traditional media powerhouses have started incorporating social media in their portfolio both as means of reaching a younger tech savvy audience as well as receiving information about events as soon as they appear on social media. Twitter is by far the most popular choice of social network for breaking news as well as subsequent user/community interactions and for official corporate accounts to interact with the user base at large. With this in mind we analyze and assess the impact of Twitter when it comes to raising awareness of critical Information Security-related events.
The questions we’re tried to answer were…
- How effective is the Twitter platform when it comes to raising awareness about InfoSec in general, and high profile events in specific?
- Can InfoSec professionals and Organizations who face a constant uphill battle to keep up with what now seems like an endless barrage of computer/network attacks use Twitter to their advantage ?
On the whole, 2014 has already seen some very high profile vulnerability disclosures as well as data breaches. The intent of this exercise was to do a cross-sectional study of one such high profile vulnerability disclosure, the ’‘Shellshock vulnerability’, which made its appearance on Twitter and mainstream media in late Sept 2014. We use this particular vulnerability disclosure for our study because it was a very high profile disclosure with the potential to impact a lot of easy targets on the web and intranets (Don’t discount the internal threats !). Also this disclosure came on the heels of another high profile vulnerability disclosure, the ‘Heartbleed’.
Data at a glance
For this study we downloaded some 330,000 odd tweets using Twitter’s search API spanning the several weeks during which this vulnerability disclosure generated the most amount of activity both on social and mainstream media.
Due to the nature of Twitter conversations, where we have tweets, retweets, favorites, replies and combinations there of, it was interesting to look at the overall conversation map. This allowed us to gain a high level understanding of the scope of the conversations and the interactions that happened in those conversations.
Figure 1 shows a succinct view of the data we sampled. The numbers in the boxes show the number of tweets for that particular category (e.g. original tweets are tweets which are neither retweets nor replies). For comparison sake we also show what percentage each category takes up with respect to parent and grand-parent categories. What is clear is that there was not a whole lot of interaction here apart from retweeting. Not many back-and-forth conversations (replies and replies to replies < 3%), nor too many tweets got favorited (<10%). Of retweets, it was mostly the original tweets that tended to be retweeted rather than a subsequent reply to a tweet. All this pointed to the fact that the InfoSec twitter crowd is very close knit small crowd but not very interactive at least on the Twitter platform as far as discussing InfoSec events is concerned.
To elaborate on how minuscule this activity is, as far as high profile Twitter Activities go, we compared it with the July 9th 2014, Soccer World cup semi-final match between Germany and Brazil (which Germany won 7–1). That single match alone (lasting a little over 90 minutes) generated more that 35 Million tweets with a peak rate of approximately 580,000 tweets/minute (more than our entire sample size spanning several weeks). Although we don’t have a similar Conversation Map of that activity, the sheer number of tweets compared to our data gave us a good idea of how insignificant our ‘Shellshock’ activity was compared to a high profile sports related activity.
Note : The numbers were calculated from the data we downloaded using the Twitter search API. The search API does not provide all tweets but a sample of them (anywhere from 1% to 40% without indicating the true volume). The assumption here was that the sampling distribution is true enough to the distribution found in the total amount of tweets related to ‘shellshock’.
Timeline and Activity Trend
The one argument that Twitter has going for it over traditional media is that the speed with which news is received and propagated is very high compared to traditional media. There are various reasons behind such a claim, but let’s first see if there is any truth to this claim in the InfoSec world. Below we see two graphs. Figure 2 shows the per-day tweets and retweets related to ‘shellshock’ around the time when shellshock dominated the InfoSec news world, and Figure 3 is the google trend related to searches for the phrase ‘shellshock’ during that same time period.
Immediately what we saw was a remarkable similarity in the two trends. The peak of the activity was on Sept 25th & 26th and gradually declining over the next two weeks. We also saw a repetitive pattern of drop in activity over the weekends (27th Sat, 28th Sun and repeated on Oct 4th & 5th and later again on the 11th & 12th). This was a clear indicator that the InfoSec community and in general people who are interested in InfoSec was monitoring/searching for more information about ‘Shellshock’ and at the same time also interacting and exchanging information about it on Twitter, but this activity was mostly happening during work-days.
If we look at the timeline of creation dates of the accounts who participated in this conversation in Figure 4, we don’t see a huge spike in new user creation near the timeline of the event, which tells us that quite a lot of seasoned Twitter pros were engaged in the conversation.
To analyze the diversity in the twitter communication we looked at the Top 10 Languages that were used to tweet in Figure 5 and Top 10 user timezones in Figure 6.
Unsurprisingly, English was the dominating language with Japanese and Spanish coming in at distant 2nd and 3rd. The United States of America (USA) and Europe dominated the top user timezones. This was expected given their high concentration of IT industries and as a consequence more awareness about InfoSec in these locations. What was surprising was how little Asia, APAC, South-America, & Africa contributed in the conversation. Does this point to relatively lower InfoSec awareness in these regions ? Or does it simply point to lack of popularity of Twitter in these regions ? Given that Twitter has a huge world wide user base, we suspect the former more than the later.
Twitter does provide a way for its users to individually geo-tag each tweet with a location, but out of the 330,000 or so tweets we sampled, less than 1,000 had this information available, so we didn’t deep dive in to per tweet location analysis.
Individual Tweets can have hashtags, refer to external URLs, and mentions of individual users. Analyzing this information gives us valuable insights about the internal and external resources used in these tweets.
Figure 7 shows the Top 10 Hashtags found in all tweets. The ‘shellshock’ hashtag took a very comfortable 1st spot. What’s more, the previous high profile vulnerability ‘heartbleed’ also got a place in the Top 10. Because the Top 10 hashtags in just the unique tweets (sans retweets) show similar distribution, we didn’t replicate it plot here. Figure 8 shows the top 10 URLs in all the tweets. It was a bit of a surprise to see a CNET URL getting the top spot. This was due to a single tweet that got retweeted 11K times to push that CNET URL to the top. That same tweet is mentioned at the start of this article.When we removed the retweets from the equation then the Top URL spot went to a very detailed explanation about the vulnerability by Troy Hunt.
Figure 9 shows the top 10 Users who were mentioned in the all the tweets. The user account ‘whsaito’ took the top spot on account of a single tweet that got retweeted almost 11K times. (The same one which has the CNET URL and which appears at the top of this article.)
Figure_10 shows the Top 10 active user accounts in terms of number of tweets twitted from those accounts. These accounts can be thought of as being most active in raising awareness about this vulnerability by tweeting about it multiple times.
To round out this discussion we also present the Top 10 Retweeted, Replied, and Favorited user accounts in terms of number of tweets in Figure 11
We already looked at a high level interaction map in Figure 1. Now let’s deep dive in to it a bit. For starters let’s see how many followers our InfoSec twitter users tend to have. In Figure 12 below we show a kernel density plot of follower counts of each unique user involved in the data sample. What was very apparent is that InfoSec crowd is not very popular amongst Twitter user base. This is evident in that most of the accounts having less than 1,000 followers. But there are a few high profile accounts which have followers in the millions (the InfoSec rock-stars).
In Figure 13 above we show how many times unique tweets tend to be retweeted and favorited. This is a way to measure how popular InfoSec tweets tend to be. Most tweets don’t tend to be retweeted or favorited more than 100 times, a very small number especially when compared to some of the high profile trending activity on Twitter. This is a pity as it again points to lesser reach of InfoSec related Tweets.
For an interesting comparison we looked that the relationship between a user’s number of followers and whether it had any impact on a tweet being retweeted or favorited. Conventional wisdom would seem to suggest that there indeed should be some correlation there, i.e. the more the number of followers, the more the retweets or favorites, but we found evidence to the contrary. As Figure 14 seems to suggest that the less followers you have the higher the number of times your tweets get retweeted / favorited. One possible explanation for this contradiction is that we didn’t taken the PageRank effect into account, i.e. a tweet being retweeted by someone who has lots more followers than the original user account. This was left as an exercise for the future.
So what did we learn ? Twitter has the potential to reach vast amounts of users/organizations even outside the traditional InfoSec community to raise awareness about high profile security incidents or vulnerability disclosure. However as things stand now the InfoSec world is very close-knit and largely not popular outside its own sphere. InfoSec tweets about high profile events such as the ‘Shellshock’ vulnerability tend to reach a restricted and niche user base as opposed to say a high profile sports activity such as a Soccer world cup match or a super-bowl match. This was evident from the fact that most tweets related to shellshock were seen by, retweeted, replied, favorited by a very small fraction of the twitter user base.
If InfoSec organizations and professionals want to use twitter to their advantage, then they have their work cut out. They need to engage more and more people/Orgs from outside the core InfoSec community in the conversation. The more the conversation the more the awareness. InfoSec will never be as popular as Soccer or Super-Bowl. But if it manages to attract enough attention from non-InfoSec crowd it would be a step in the right direction.
- For the interested the data was downloaded using a python script and Twitter’s search API.
- It was analyzed and plotted using R & ggplot2.
- An interesting continuation of this analysis would be to put the data in a Graph Structure to explore the conversations in more detail, and see if we can discover any interesting clusters in the conversations.
- Another possible research path is performing text analytics on the data for finding clusters based on words occurring in the tweets.