Realtime Twitter Sentiment Analysis using Python | NLP

Hackers Realm

Jun 27, 20225 min read

Updated: Jun 1, 2023

Uncover the pulse of Twitter in real-time with Python! This tutorial delves into NLP techniques for sentiment analysis, allowing you to analyze and understand the sentiment behind tweets as they happen. Explore the power of natural language processing, harness real-time data streams, and gain valuable insights into public opinion. Enhance your skills in text classification, sentiment analysis, and stay ahead of trends with this hands-on project tutorial. #TwitterSentimentAnalysis #Python #NLP #RealtimeAnalysis #TextClassification

Realtime Twitter Sentiment Analysis NLP — Realtime Twitter Sentiment Analysis

In this project tutorial, we are going to use Tweepy module to receive the tweets in realtime by the specific keyword and also the pretrained model Flair to detect the sentiments of the tweets.

You can watch the step by step explanation video tutorial down below

Dataset Information

The objective of this task is to detect the sentiments of the tweets in realtime for the specific keyword. Tweepy is used to get the tweets in realtime for the corresponding keyword. Flair is a pretrained sentiment analysis model used to detect the sentiments of the tweets.

Install Modules

!pip install tweepy --ignore-installed
!pip install flair

Command to install the Tweepy and Flair module
--ignore-installed - Command to reinstall the library ignoring if it's already installed

Configuration

The next link is to access the Twitter developer portal to create a developer account and receive over 500 thousand tweets in realtime. You must follow the documentation in order to generate the keys and copy them for this project.

Get Keys from here: https://developer.twitter.com/en/portal/dashboard

bearer = "<<KEY>>"
consumer_key = "<<KEY>>"
consumer_secret = "<<KEY>>"
access_token = "<<KEY>>"
access_token_secret = "<<KEY>>"

You must use your keys generated from your developer account

Import Modules

import tweepy
import re
import time
from flair.models import TextClassifier
from flair.data import Sentence

tweepy - Module to extract the tweets in realtime
re – used as a regular expression to find particular patterns and process it
time - module to implement time functions
TextClassifier - Text classification module from Flair
Sentence - Flair module to process sentences

## initialize tweepy
api = tweepy.Client(bearer, consumer_key, consumer_secret, access_token, access_token_secret)
api.get_me()

Response(data=<User id=1493221119410970626 name=Aswin S username=aswintechguy>, includes={}, errors=[], meta={})

Here we establish the connection from the account using the keys
To see if the connection was successful, use api.get_me() and check the output data
If any error is given regenerate the keys, save them in the configuration and try again

Now we will extract tweets in realtime just for exploration purpose

## get tweets in realtime
response = api.search_recent_tweets('#crypto')

tweets = response.data
for tweet in tweets:
    print(tweet.text)
    print('-----------------------------------------------')

@altcryptocom @binance #SHUMO Bullish as ever 🚀🚀🚀.. 💥💥💥 make sure you #HODL a bag $SHUMO💰💰 #SHUMO to the world 🔥🔥🔥 #SHUMOARMY https://t.co/Qyfbu3BLTN #Shiba #crypto #expo #eth #100x @ShumoOfficial https://t.co/uxi3Z4WthZ https://t.co/boGtlucWgF https://t.co/T6vWRQnpLQ ----------------------------------------------- #IMXUSDT Bull Alert! 15X Volume Price: 1.436 5-min %: 2.4% Volume: $2,095,578 #crypto #whale #btc #eth #IMX $IMX https://t.co/Bk61uU9eed ----------------------------------------------- RT @Btcexpertindia: Russia Ukraine war: Investors' wealth tumbles over 5.91 lakh cr in morning trade. Risk is everywhere in life, in Stock… ----------------------------------------------- RT @Carefultea1: Make sure to tune in with @christse @cardstack for this webinar. Not to miss for anyone interessed in #Web3. Save the date… ----------------------------------------------- RT @bezoge: We're up to something. 👀

We want to reward you but it will take some work! If we reach 500 people in the #AMA, we'll give a… ----------------------------------------------- RT @airdropinspect: New airdrop: ArcadeLand (USDT) Total Reward: 2,000 USDT Rate: ⭐️⭐️⭐️⭐️ Winners: 750 Random & Top 100 Distribution: with… ----------------------------------------------- RT @NwcPublic: ❗️NWC APP RELEASE LOADING❗️ The most essential trading tools at the end of your fingertips. 2 days to go ⏳ Everything #cr… ----------------------------------------------- RT @CryptoTownEU: 🚀 Airdrop: Ape Rocket 💰 Value: 20,000,000 $APEROCKET 👥 Referral: 5,000,000 $APEROCKET 📊 Exchange: Latoken 📼 Audit: Audit… ----------------------------------------------- RT @MindMusic_BSC: 💥 TUNE IN! 💥 Join the live AMA over on telegram tonight at 20:00 UTC for lots of exciting news and updates, including t… ----------------------------------------------- RT @JohnHunterGems: Acabo de subir en mi canal de Telegram el libro “DAY TRADING En un semana “ Muy buen libro para empezar en este mun… -----------------------------------------------

Extracted tweets with the keyword #crypto
Most of the data is cropped off only showing the first two lines of the tweet
Special characters, punctuations, user handles and unknown variables must be removed for better results
Pre-trained models like Flair will already use preprocessing steps but for this project tutorial we will preprocess the data anyway

Now we will define the structure to preprocess the text

def preprocess_text(text):
    # convert to lower case
    text = text.lower()
    # remove user handle
    text = re.sub("@[\w]*", "", text)
    # remove http links
    text = re.sub("http\S+", "", text)
    # remove digits and spl characters
    text = re.sub("[^a-zA-Z#]", " ", text)
    # remove rt characters
    text = re.sub("rt", "", text)
    # remove additional spaces
    text = re.sub("\s+", " ", text)
    
    return text

Simplifying the text helps to process the data quicker and get better results.
User handle, http links, digits, special characters and rt characters are all irrelevant for this project and it will improve the prediction results.

Now let us see the difference

tweet.text

'RT @JohnHunterGems: Acabo de subir en mi canal de Telegram el libro “DAY TRADING \nEn un semana “ \n\nMuy buen libro para empezar en este mun…'

preprocess_text(tweet.text)

' acabo de subir en mi canal de telegram el libro day trading en un semana muy buen libro para empezar en este mun '

Simplified text only leaving meaningful words

## create sentiment analysis function
classifier = TextClassifier.load('en-sentiment')
def get_sentiment(tweet):
    sentence = Sentence(tweet)
    classifier.predict(sentence)
    return str(sentence.labels).split("\'")[3]

2022-03-07 14:08:41,483 loading file /root/.flair/models/sentiment-en-mix-distillbert_4.pt

TextClassifier.load('en-sentiment') - Set the language to English
str(sentence.labels).split("\'")[3] - This will return the exact label without the confidence score

Now let us try the get_sentiment function

get_sentiment(tweet.text)

'POSITIVE'

POSITIVE means that it contains a positive sentiment in the text

Realtime Twitter Sentiments

Now we will process the tweets in realtime

## preprocess the tweets
def preprocess_text(text):
    # convert to lower case
    text = text.lower()
    # remove user handle
    text = re.sub("@[\w]*", "", text)
    # remove http links
    text = re.sub("http\S+", "", text)
    # remove digits and spl characters
    text = re.sub("[^a-zA-Z#]", " ", text)
    # remove rt characters
    text = re.sub("rt", "", text)
    # remove additional spaces
    text = re.sub("\s+", " ", text)
    
    return text
    
## create sentiment analysis function
classifier = TextClassifier.load('en-sentiment')
def get_sentiment(tweet):
    sentence = Sentence(tweet)
    classifier.predict(sentence)
    return str(sentence.labels).split("\'")[3]

## get realtime sentiments
while True:
    # get tweets (10 tweets)
    tweets = api.search_recent_tweets('#crypto').data
    
     for tweet in tweets:
         original_tweet = tweet.text
         clean_tweet = preprocess_text(original_tweet)
         sentiment = get_sentiment(clean_tweet)
         print('------------------------Tweet-------------------------------')
         print(original_tweet)
         print('------------------------------------------------------------')
         print('Sentiment:', sentiment)
         time.sleep(1)
         print('\n\n')

------------------------Tweet------------------------------- RT @Jennifersperdu3: Plain & Simple Truth want to be financial free & do whatever it is u want to do in life then BUY #Shibadoge @realshib… ------------------------------------------------------------ Sentiment: NEGATIVE ------------------------Tweet------------------------------- RT @Cryptoskyrun: Guys❗️ The day we’ve been waiting for is approaching🚨 @NunuSpiritsNFT #TGE COMING SOON🤩 📌#NunuSpirits TGE will happen on… ------------------------------------------------------------ Sentiment: NEGATIVE

------------------------Tweet------------------------------- RT @brettmurphynet: #affiliate #affiliatemarketing #deal #blogger #business #cryptocurrency #deals #discount #gifts #marketing #shopping #d… ------------------------------------------------------------ Sentiment: POSITIVE ------------------------Tweet------------------------------- RT @airdropinspect: New airdrop: Lifetise (USDT) Total Reward: 2,000 USDT Rate:⭐️⭐️⭐️⭐️ Winners: 700 Random & Top 50 Distribution: within a… ------------------------------------------------------------ Sentiment: POSITIVE

time.sleep(1) - Delays 1 second before displaying next result, to view the results better
We can see the sentiments of the tweet and accuracy of the results are much better
You may remove the split("\'")[3] to see the confidence score of the sentiment

Final Thoughts

You may use other pre-trained modules like Radar (or) TextBlob to obtain different results.
Simplifying and filtering text can achieve cleaner data to process, giving better results.
Flair is trained using word embedding and it will capture more meaningful information.

In this project tutorial, we have explored the Realtime Twitter Sentiment Analysis process as a deep learning project. The data was obtained in realtime and preprocessed accordingly to detect the sentiments in the tweets.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm