Twitter’s New Restrictions: Implications for Social Analytics Platforms

Twitter’s mercurial boss, Elon Musk, has announced a host of restrictions on viewing content on the platform. The restrictions place rather stringent limits on both paid and free accounts. The official explanation from the king of controversy is that Twitter wants to limit the scraping of data by bots and AI tools. While many people have focused on the effect of these changes on Twitter’s user base, there are significant implications for social analytics platforms that source their data from the world’s premier micro-blogging platform. We sat down with ZENPULSAR’s Co-Founder and Head of Data Science, Pavel Dudko, to discuss what these changes mean for the field of social media analytics. Below are some of the key questions discussed during this interview.

Q: First of all, Pavel, what are these rather unexpected limits on Twitter content viewing all about?

PD: Musk says this is to prevent excessive data scraping. However, a lot of people take that official explanation with a grain of salt. Some allege that Twitter is trying to fight the onslaught of bots, others believe he’s out there trying to pull off another publicity stunt. My belief is that this is mostly linked to some serious technical problems that Twitter is currently struggling with. I think they simply don’t want to admit it, but there are signs that their software infrastructure is strained, and these restrictions are designed to ease the load. They are likely working hard in the background trying to address the issue, and I’d speculate that when (and if) they are able to fix their technical problems, these restrictions would either be magically scrapped or scaled back.

Having said that, it’s possible that these restrictions are driven by more than one factor. Their technical problems could be the major point, but you know Musk – he might be trying to kill multiple birds with one stone. As we are speaking now, Meta is preparing to launch “its own Twitter” – Meta Threads. This is planned for this week, possibly tomorrow (ZP: the interview was taking place on 4 July). With this restrictions announcement, Musk has certainly stolen the spotlight in the social media world. If he’s going to scrap the restrictions very soon, this could have been an attempt to dampen Meta Threads’ launch effect.

Q: In the event that these restrictions become a permanent or long-term feature, what effect will they have on social media listening and social media analytics platforms that use Twitter to source their data?

PD: Tools that rely only or mostly on Twitter will undoubtedly be affected in a major way. For us, it is a relatively minor concern. To train our NLP models that source finance-related sentiment from social media, we use 18 different sources, of which Twitter is only one. Besides Twitter, our primary sources include Reddit, LinkedIn, SeekingAlpha, Weibo, Facebook, YouTube, and Medium. Additionally, we source data from no less than 10 smaller regional finance-related forums and news sites.

I’d say that our coverage of data sources is unmatched in the social analytics industry. This gives us a great hedge against what Musk has just done with those Twitter limits. The vast majority of other social analytics tools, particularly those with a finance slant, rely exclusively or very heavily on Twitter. If these restrictions aren’t lifted soon, these platforms will be hammered.

Q: What about the quality of NLP models that use Twitter data in light of these restrictions? Wouldn’t these limits have a negative impact in this area?

PD: Again, this is another area where we, as ZENPULSAR, only stand to have an edge. As you know, we have always focused heavily on the bot detection process. Our Twitter bot detection rate is around 98%. The majority of our competitors offer tools that either cannot filter bots out or have relatively weak bot detection capabilities.

This might not seem obvious, but it has some major implications for the quality of the NLP algorithms trained on Twitter data. Tools with weak or no bot detection will now have a massive difference between the amount and nature of Twitter data they use to train their NLP models. Before the restrictions, the big majority of the data they sourced from Twitter was bot-based. After the restrictions, the amount and proportion of bot-based data available to train their models will plummet. Since NLP algorithms rely heavily on training based on historical data, I’d hazard a guess that this massive difference in before vs after data will significantly weaken their models. Many of these models will even be rendered useless.

With our models, since we have always had the ability to identify bots, the difference in before vs after the restrictions data will be minimal. Due to the minimal difference, our NLP models will hardly be affected.

This is a critical point that gives us a great competitive edge when it comes to the quality of the NLP models used in social analytics tools like PUMP. When these restrictions were first announced, I immediately knew that rather than pulling my hair out, I should pat our great team on the shoulder for devoting the years of focused work to ZENPULSAR’s bot detection mechanism.