Transforming Social Media Data into Value

22 February, 2023
Social Media (SM) sentiment measurement tools, such as our PUMP platform, are increasingly relied upon by traders, fund managers, and analysts in a variety of financial markets. This makes the factor of data quality in these tools of paramount importance. At the end of the day, a SM listening tool is only usable if it delivers high-quality data.

It’s also not a big secret that SM and the signals generated by SM contain massive amounts of noise and distortion. For an analytics tool to be of value, this noise needs to be purged from the data. In this article, we describe the main noise-reduction approaches PUMP uses to deliver you the best quality SM data. We are not going to get into the technicalities of the data purification techniques we use. They come from the domains of AI, NLP, and advanced data mining. Instead, this article provides a non-technical, user-friendly look into PUMP’s noise reduction approaches.
Identifying Bot Traffic
SM bots have become ubiquitous on many platforms, particularly on Twitter, which is among the leading SM sources for traders and financial analysts. Though Twitter is adamant that bots represent less than 5% of all accounts, there is a far larger proportion of bot-generated posts on the platform. It is estimated that bots may account for nearly 30% of all US-generated Twitter traffic.

Identifying traffic that is bot-generated vs human-generated is critical for noise reduction. PUMP uses a range of NLP-based techniques to dissect SM content into bot vs human-driven. Our system analyses the content produced by the originating account as well as the account’s network linking it to other SM accounts. In many cases, bots have a distinctly different network linkage structure compared to a human account.
Classifying Signals Correctly into 3 Main Categories
When a particular asset, be it a crypto coin, stock, or a commodity, is mentioned on SM and picked up by PUMP, the critical thing is to correctly classify it into one of the three categories — bullish, bearish, or neutral. This is at the core of SM analytics, as signals that are misclassified will be of no use to traders.

PUMP uses several techniques from the family of deep learning algorithms to correctly classify content. The advanced sentiment classification models we use are continually trained on our internal datasets for precision improvements.
Classifying Accounts into Ordinary, Influencer, and Market Analyst Types
Another important data calibration step in PUMP is correctly classifying SM accounts into ordinary, influencer, and market analyst types. Influencers naturally drive a lot of sentiment online. Therefore, it is important to identify these sources and account for their effect on the overall sentiment.

Identification of market analyst accounts is also critical. These users often have more informed and robust sentiment. A tweet predicting a strong correction for Bitcoin sent out by a wannabe technical analyst (an ordinary user) is clearly not the same in terms of sentiment quality as a similar prediction from an established market analyst.

Again, under the hood, PUMP uses a number of NLP-based content analysis models to correctly identify the influencer and analyst sources.
Classifying Sources into Trusted vs Not Trusted
A quality SM sentiment tool shouldn’t be only about sentiment charts and quantitative data. Industry news form an important part of an asset’s fundamental analysis. Thus, we pay particular attention to delivering the most relevant and useful news to PUMP users.

A key part of noise reduction procedures within PUMP is classifying news into trusted and… well, not so trusted sources. In the massive ocean of daily news content, getting lost in news from less than trustable sources is easy. Let’s also be honest — Google isn’t always great at prioritising the most trustable news sources.

Many news websites and blogs that are based on hype and keyword acrobatics rise to the top of Google rankings when you search for a specific term. This issue is particularly prevalent in the cryptocurrency sphere, though, to a lesser degree, it also happens with regard to stocks, commodities, and forex.

PUMP’s AI-based models are used to estimate if the news originates from a trustable publication. You may then filter your news feed for each asset by selecting the trusted sources only. This lets you obtain the best quality of insight. You also save time by not having to subjectively evaluate if a particular news piece is worth your trust.
The noise reduction procedures described above aren’t the only methods we use. However, these are the fundamental procedures that form the basis of PUMP’s working mechanism. Under the hood, a large number of methods related to AI, NLP, and complex classificatory data mining are continually used.

In this article, we tried to steer clear of diving into all that techy stuff, though we promise to dedicate a future article to entertaining the curiosity of the technical folk as well. The procedures above are critical to a key task that PUMP is involved in — reducing the massive noise present in today’s SM data.