ZENPULSAR Social Media Sentiment API Datasets for Crypto & Stocks

21 February, 2023
ZENPULSAR has recently issued three unique API dataset products, available at https://datarade.ai/data-providers/zenpulsar/profile. Datarade is a major data trading platform used by professionals and businesses of all kinds to trade valuable data. Our datasets have been attracting the expected interest both from the B2B and B2C sides, and we would like to focus this article on covering these datasets. All three products are based on Social Media sentiment statistics collected via PUMP — our Social Media analytics platform.
Our Datasets

Dataset "Social Media Pulse — Crypto"
Our data centric AI platform, PUMP, monitors in real time multiple Social Media networks to track activities related to crypto assets and then analyses them. It detects emerging viral narratives likely to form trends and impact the assets. PUMP clears out the noise of Social Media with unmatched speed and accuracy. It identifies viral narratives related to the assets you track and spots early signals you can act on before other market players.

Social Media Pulse for Crypto provides detailed time series sentiment data relevant to cryptocurrency assets. The data is extracted from Twitter, Reddit, Seeking Alpha, and Telegram.

The dataset covers 30 major crypto assets, with more assets being added continually as we integrate them into our system.

The image below shows the attribute table for the dataset.
For a detailed look at the product, please visit the Social Media Pul — Crypto page at Datarade.
Dataset “Social Media Momentum – Crypto”
Social Media Momentum for Crypto tracks mentions of crypto assets on Social Media and evaluates their popularity. Similar to the Pulse dataset above, this product covers data collected from Twitter, Reddit, Telegram, and Seeking Alpha. It measures how the dynamics of popularity changes among different groups of users (influencers, bots, and retail investors).

It covers the same crypto assets as the Pulse dataset above.

The dataset’s attribute table is shown below.
The Momentum and Pulse datasets for crypto differ somewhat with regard to the focus of your analysis. Pulse is best for an in-depth look at a specific asset of interest to you. You can view detailed time series data on the asset and drill down on all the stats, such as posts, likes, reposts, audience reach, by using various filters (e.g., by bots vs humans, by specific Social Media source, by account types, by various time intervals, and more).

On the other hand, Momentum is a better product to get an overview of the crypto market, see the asset rankings of cryptos by all the detailed stats within the dataset. The Momentum dataset is excellent for a birds-eye view of what is happening with cryptos in terms of their relative position in the market.

We would describe the Pulse and Momentum datasets as complementary products. At the outset of your daily analysis process, it is great to use Momentum to see which assets should be the focus of your attention. Having narrowed down your target asset list, you can then use the Pulse data to study the asset (s) in-depth.

For details on the Momentum dataset, please visit the Social Media Momentum — Crypto page at Datarade.
Dataset "Social Media Pulse — Equities"
The Social Media Pulse — Equities dataset essentially mirrors the Crypto Pulse product described above, with the obvious difference being that it tracks stocks rather than cryptocurrencies.

It is currently based on 26 major stocks and one stock market index. Just as in the case with the crypto datasets, we are continually adding more assets as we integrate them into our system.

Details of this product are available on the Social Media Pulse — Equities page at Datarade.
Our Data Quality
A Social Media sentiment tracking product is only as good as the quality of its data. Zenpulsar makes the quality of Social Media signal measurement a top priority. We employ a number of AI-based data mining and analysis models — an area of our core strength and value proposition. Below, we briefly touch upon some of the methods we use for data selection and purification.

Selection of asset-relevant social media posts. This is done via iterative usage
of information retrieval methods such as keyword extraction and topic modelling (LDA, BERTopic, etc.).

Finance-related classification.
To filter key samples from large amounts of posts and news, we employ state-of-the-art NLP models (Roberta-XLM) to achieve the best performance.

Bot detection. Some of the key techniques we use to identify if content originates from bots or humans include:
  • NLP-based content analysis — we employ transformer models, such as Google MT5 and XLM-RoBERTa, trained on bot post datasets.
  • Heuristics-based features (speed of posting, statistical characteristics based on NER
  • analysis results, etc). Those features are fed to the Support Vector machine classifier.
  • The format of recent posts from the same user. Many bots have templates for different posts by putting the text together and transforming it. The model can extract features on it to improve the model.
  • Analysis of network topology (bots have a different one from human accounts), specifically betweenness centrality characteristics of an account within an account network (Katz centrality, Pagerank).

Identification of influencers, market analysts, and abnormal accounts. To identify specific account types, we use the following techniques:
  • NLP-based content analysis — transformer models like Google MT5 or XLM-RoBERTa trained on influencer post datasets.
  • Analysis of the account-following network characteristics of an account, specifically betweenness centrality, within the account network (Katz centrality, Pagerank, Eigenvector centrality).
  • Number of followers/Reddit karma thresholds.

Sentiment detection. We utilise transformer-based models (FinBert, CryptoBert and CryptoRoberta) fine-tuned on our internal datasets. The model was trained on the cryptocurrency and stock data collected from Social Media, and three output classes by the classifier — bearish, neutral, and bullish.

Use Cases for the Datasets
All three of our datasets may be used for:
  • Identifying assets for Alpha generation in your portfolio
  • Using sentiment signals to predict short-term asset price movements for day trading and other forms of active and frequent trading
  • Identifying suitable assets for long-term investment, typically by using extended historical time intervals in the sentiment data analysis to spot established correlations between sentiments and prices
  • Portfolio management and diversification for hedge funds and other forms of investment funds
  • Quantitative investing
  • Using the Social Media sentiment data within fundamental analysis of the target assets
ZENPULSAR’s Social Media sentiment API datasets cover rich data spanning 5 years. The data is available for hourly, daily, weekly, and monthly time frames. Delivered in the JSON format, the datasets can be easily integrated into your own web and internal apps and analytic packages. The datasets and their details are listed on our Datarade profile page. Whether you are an individual trader, fund manager, market analyst, or long-term investor, these datasets are designed to help you maximise returns and achieve ALPHA by delivering you the best quality Social Media sentiment data!