top of page
Writer's pictureHackers Realm

Realtime Twitter Sentiment Analysis using Python | NLP

Updated: Jun 1, 2023

Uncover the pulse of Twitter in real-time with Python! This tutorial delves into NLP techniques for sentiment analysis, allowing you to analyze and understand the sentiment behind tweets as they happen. Explore the power of natural language processing, harness real-time data streams, and gain valuable insights into public opinion. Enhance your skills in text classification, sentiment analysis, and stay ahead of trends with this hands-on project tutorial. #TwitterSentimentAnalysis #Python #NLP #RealtimeAnalysis #TextClassification

Realtime Twitter Sentiment Analysis NLP
Realtime Twitter Sentiment Analysis

In this project tutorial, we are going to use Tweepy module to receive the tweets in realtime by the specific keyword and also the pretrained model Flair to detect the sentiments of the tweets.


You can watch the step by step explanation video tutorial down below


Dataset Information

The objective of this task is to detect the sentiments of the tweets in realtime for the specific keyword. Tweepy is used to get the tweets in realtime for the corresponding keyword. Flair is a pretrained sentiment analysis model used to detect the sentiments of the tweets.



Install Modules


!pip install tweepy --ignore-installed
!pip install flair
  • Command to install the Tweepy and Flair module

  • --ignore-installed - Command to reinstall the library ignoring if it's already installed


Configuration


The next link is to access the Twitter developer portal to create a developer account and receive over 500 thousand tweets in realtime. You must follow the documentation in order to generate the keys and copy them for this project.



bearer = "<<KEY>>"
consumer_key = "<<KEY>>"
consumer_secret = "<<KEY>>"
access_token = "<<KEY>>"
access_token_secret = "<<KEY>>"
  • You must use your keys generated from your developer account



Import Modules


import tweepy
import re
import time
from flair.models import TextClassifier
from flair.data import Sentence
  • tweepy - Module to extract the tweets in realtime

  • re – used as a regular expression to find particular patterns and process it

  • time - module to implement time functions

  • TextClassifier - Text classification module from Flair

  • Sentence - Flair module to process sentences


## initialize tweepy
api = tweepy.Client(bearer, consumer_key, consumer_secret, access_token, access_token_secret)
api.get_me()

Response(data=<User id=1493221119410970626 name=Aswin S username=aswintechguy>, includes={}, errors=[], meta={})

  • Here we establish the connection from the account using the keys

  • To see if the connection was successful, use api.get_me() and check the output data

  • If any error is given regenerate the keys, save them in the configuration and try again



Now we will extract tweets in realtime just for exploration purpose

## get tweets in realtime
response = api.search_recent_tweets('#crypto')

tweets = response.data
for tweet in tweets:
    print(tweet.text)
    print('-----------------------------------------------')

@altcryptocom @binance #SHUMO Bullish as ever 🚀🚀🚀.. 💥💥💥 make sure you #HODL a bag $SHUMO💰💰 #SHUMO to the world 🔥🔥🔥 #SHUMOARMY https://t.co/Qyfbu3BLTN #Shiba #crypto #expo #eth #100x @ShumoOfficial https://t.co/uxi3Z4WthZ https://t.co/boGtlucWgF https://t.co/T6vWRQnpLQ ----------------------------------------------- #IMXUSDT Bull Alert! 15X Volume Price: 1.436 5-min %: 2.4% Volume: $2,095,578 #crypto #whale #btc #eth #IMX $IMX https://t.co/Bk61uU9eed ----------------------------------------------- RT @Btcexpertindia: Russia Ukraine war: Investors' wealth tumbles over 5.91 lakh cr in morning trade. Risk is everywhere in life, in Stock… ----------------------------------------------- RT @Carefultea1: Make sure to tune in with @christse @cardstack for this webinar. Not to miss for anyone interessed in #Web3. Save the date… ----------------------------------------------- RT @bezoge: We're up to something. 👀

We want to reward you but it will take some work! If we reach 500 people in the #AMA, we'll give a… ----------------------------------------------- RT @airdropinspect: New airdrop: ArcadeLand (USDT) Total Reward: 2,000 USDT Rate: ⭐️⭐️⭐️⭐️ Winners: 750 Random &amp; Top 100 Distribution: with… ----------------------------------------------- RT @NwcPublic: ❗️NWC APP RELEASE LOADING❗️ The most essential trading tools at the end of your fingertips. 2 days to go ⏳ Everything #cr ----------------------------------------------- RT @CryptoTownEU: 🚀 Airdrop: Ape Rocket 💰 Value: 20,000,000 $APEROCKET 👥 Referral: 5,000,000 $APEROCKET 📊 Exchange: Latoken 📼 Audit: Audit… ----------------------------------------------- RT @MindMusic_BSC: 💥 TUNE IN! 💥 Join the live AMA over on telegram tonight at 20:00 UTC for lots of exciting news and updates, including t… ----------------------------------------------- RT @JohnHunterGems: Acabo de subir en mi canal de Telegram el libro “DAY TRADING En un semana “ Muy buen libro para empezar en este mun… -----------------------------------------------

  • Extracted tweets with the keyword #crypto

  • Most of the data is cropped off only showing the first two lines of the tweet

  • Special characters, punctuations, user handles and unknown variables must be removed for better results

  • Pre-trained models like Flair will already use preprocessing steps but for this project tutorial we will preprocess the data anyway


Now we will define the structure to preprocess the text

def preprocess_text(text):
    # convert to lower case
    text = text.lower()
    # remove user handle
    text = re.sub("@[\w]*", "", text)
    # remove http links
    text = re.sub("http\S+", "", text)
    # remove digits and spl characters
    text = re.sub("[^a-zA-Z#]", " ", text)
    # remove rt characters
    text = re.sub("rt", "", text)
    # remove additional spaces
    text = re.sub("\s+", " ", text)
    
    return text
  • Simplifying the text helps to process the data quicker and get better results.

  • User handle, http links, digits, special characters and rt characters are all irrelevant for this project and it will improve the prediction results.


Now let us see the difference

tweet.text

'RT @JohnHunterGems: Acabo de subir en mi canal de Telegram el libro “DAY TRADING \nEn un semana “ \n\nMuy buen libro para empezar en este mun…'

preprocess_text(tweet.text)

' acabo de subir en mi canal de telegram el libro day trading en un semana muy buen libro para empezar en este mun '

  • Simplified text only leaving meaningful words


## create sentiment analysis function
classifier = TextClassifier.load('en-sentiment')
def get_sentiment(tweet):
    sentence = Sentence(tweet)
    classifier.predict(sentence)
    return str(sentence.labels).split("\'")[3]

2022-03-07 14:08:41,483 loading file /root/.flair/models/sentiment-en-mix-distillbert_4.pt

  • TextClassifier.load('en-sentiment') - Set the language to English

  • str(sentence.labels).split("\'")[3] - This will return the exact label without the confidence score


Now let us try the get_sentiment function

get_sentiment(tweet.text)

'POSITIVE'

  • POSITIVE means that it contains a positive sentiment in the text


Realtime Twitter Sentiments


Now we will process the tweets in realtime

## preprocess the tweets
def preprocess_text(text):
    # convert to lower case
    text = text.lower()
    # remove user handle
    text = re.sub("@[\w]*", "", text)
    # remove http links
    text = re.sub("http\S+", "", text)
    # remove digits and spl characters
    text = re.sub("[^a-zA-Z#]", " ", text)
    # remove rt characters
    text = re.sub("rt", "", text)
    # remove additional spaces
    text = re.sub("\s+", " ", text)
    
    return text
    
## create sentiment analysis function
classifier = TextClassifier.load('en-sentiment')
def get_sentiment(tweet):
    sentence = Sentence(tweet)
    classifier.predict(sentence)
    return str(sentence.labels).split("\'")[3]


## get realtime sentiments
while True:
    # get tweets (10 tweets)
    tweets = api.search_recent_tweets('#crypto').data
    
     for tweet in tweets:
         original_tweet = tweet.text
         clean_tweet = preprocess_text(original_tweet)
         sentiment = get_sentiment(clean_tweet)
         print('------------------------Tweet-------------------------------')
         print(original_tweet)
         print('------------------------------------------------------------')
         print('Sentiment:', sentiment)
         time.sleep(1)
         print('\n\n')

------------------------Tweet------------------------------- RT @Jennifersperdu3: Plain &amp; Simple Truth want to be financial free &amp; do whatever it is u want to do in life then BUY #Shibadoge @realshib… ------------------------------------------------------------ Sentiment: NEGATIVE ------------------------Tweet------------------------------- RT @Cryptoskyrun: Guys❗️ The day we’ve been waiting for is approaching🚨 @NunuSpiritsNFT #TGE COMING SOON🤩 📌#NunuSpirits TGE will happen on… ------------------------------------------------------------ Sentiment: NEGATIVE


------------------------Tweet------------------------------- RT @brettmurphynet: #affiliate #affiliatemarketing #deal #blogger #business #cryptocurrency #deals #discount #gifts #marketing #shopping #d ------------------------------------------------------------ Sentiment: POSITIVE ------------------------Tweet------------------------------- RT @airdropinspect: New airdrop: Lifetise (USDT) Total Reward: 2,000 USDT Rate:⭐️⭐️⭐️⭐️ Winners: 700 Random &amp; Top 50 Distribution: within a… ------------------------------------------------------------ Sentiment: POSITIVE


  • time.sleep(1) - Delays 1 second before displaying next result, to view the results better

  • We can see the sentiments of the tweet and accuracy of the results are much better

  • You may remove the split("\'")[3] to see the confidence score of the sentiment


Final Thoughts

  • You may use other pre-trained modules like Radar (or) TextBlob to obtain different results.

  • Simplifying and filtering text can achieve cleaner data to process, giving better results.

  • Flair is trained using word embedding and it will capture more meaningful information.


In this project tutorial, we have explored the Realtime Twitter Sentiment Analysis process as a deep learning project. The data was obtained in realtime and preprocessed accordingly to detect the sentiments in the tweets.


Get the project notebook from here


Thanks for reading the article!!!


Check out more project videos from the YouTube channel Hackers Realm

Komentar


bottom of page