Abstractive text summarization is a natural language processing (NLP) task that aims to generate a concise and coherent summary of a given document (or) a piece of text. It involves understanding the main ideas and important details from the source text and then composing a summary that captures the essence of the original content.
In this project tutorial we will use various Transformer models to generate summaries that are more fluent, coherent, and can capture important information even if it is not explicitly present in the source text. These models leverage the power of attention mechanisms to capture the contextual relationships between words and generate high-quality summaries.
You can watch the video-based tutorial with a step-by-step explanation down below.
Install Modules
We will install the Transformers module and PyTorch
pip install transformers
pip install torch
This will install the specific version modules and the necessary dependencies
Import Modules
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Config
torch - is a fundamental component of the PyTorch library, which is widely used for deep learning and scientific computing tasks
transformers - is used for various NLP tasks, providing pre-trained models, tokenization tools, model architectures, training and fine-tuning utilities, inference capabilities, evaluation metrics, and seamless integration with popular deep learning frameworks
Initialize Pretrained Model
We will have to initialize the t5 Transformer model
# initialize the pretrained model
model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')
device = torch.device('cpu')
First it will download the T5ForConditionalGeneration model specifically designed for conditional generation tasks. The main purpose of this model is to generate text based on input prompts or conditions. It can be used for a variety of conditional generation tasks, including text summarization, text translation, question answering, text completion, and more
Then the from_pretrained() method loads the pre-trained weights for the t5-small model, which is a smaller version of the T5 model architecture.
Next it will create a T5Tokenizer object by calling the from_pretrained() method. The tokenizer is initialized with the vocabulary and settings specific to the 't5-small' model.
Then we will set the device to CPU, indicating that you want to perform computations on the CPU rather than a GPU. Here, we have set the CPU instead of GPU as we are summarizing a small article in this project tutorial.
Next we will have to provide the input text for which we will have to generate a summary
# input text
text = """
Back in the 1950s, the fathers of the field, Minsky and McCarthy, described artificial intelligence as any task performed by a machine that would have previously been considered to require human intelligence.
That's obviously a fairly broad definition, which is why you will sometimes see arguments over whether something is truly AI or not.
Modern definitions of what it means to create intelligence are more specific. Francois Chollet, an AI researcher at Google and creator of the machine-learning software library Keras, has said intelligence is tied to a system's ability to adapt and improvise in a new environment, to generalise its knowledge and apply it to unfamiliar scenarios.
"Intelligence is the efficiency with which you acquire new skills at tasks you didn't previously prepare for," he said.
"Intelligence is not skill itself; it's not what you can do; it's how well and how efficiently you can learn new things."
It's a definition under which modern AI-powered systems, such as virtual assistants, would be characterised as having demonstrated 'narrow AI', the ability to generalise their training when carrying out a limited set of tasks, such as speech recognition or computer vision.
Typically, AI systems demonstrate at least some of the following behaviours associated with human intelligence: planning, learning, reasoning, problem-solving, knowledge representation, perception, motion, and manipulation and, to a lesser extent, social intelligence and creativity.
AlexNet's performance demonstrated the power of learning systems based on neural networks, a model for machine learning that had existed for decades but that was finally realising its potential due to refinements to architecture and leaps in parallel processing power made possible by Moore's Law. The prowess of machine-learning systems at carrying out computer vision also hit the headlines that year, with Google training a system to recognise an internet favorite: pictures of cats.
The next demonstration of the efficacy of machine-learning systems that caught the public's attention was the 2016 triumph of the Google DeepMind AlphaGo AI over a human grandmaster in Go, an ancient Chinese game whose complexity stumped computers for decades. Go has about possible 200 moves per turn compared to about 20 in Chess. Over the course of a game of Go, there are so many possible moves that are searching through each of them in advance to identify the best play is too costly from a computational point of view. Instead, AlphaGo was trained how to play the game by taking moves played by human experts in 30 million Go games and feeding them into deep-learning neural networks.
"""
Preprocess Input Text
Next we will pre process the provided input text
## preprocess the input text
preprocessed_text = text.strip().replace('\n','')
t5_input_text = 'summarize: ' + preprocessed_text
The strip() method is used to remove any leading or trailing white spaces from the text variable.
We will have to remove any new lines if present for that we have used replace() method which will remove any newline characters from the input text.
Next add the prefix 'summarize: ' to the preprocessed_text. It is a common convention for text-to-text models like T5 to include a task-specific prefix to indicate the desired task. In this case, the prefix 'summarize: ' is added to indicate that the model should generate a summary of the input text.
The resulting preprocessed text, with the prefix added, is stored in the t5_input_text variable.
Next we will display the preprocessed text
t5_input_text
'summarize: Back in the 1950s, the fathers of the field, Minsky and McCarthy, described artificial intelligence as any task performed by a machine that would have previously been considered to require human intelligence.That's obviously a fairly broad definition, which is why you will sometimes see arguments over whether something is truly AI or not.Modern definitions of what it means to create intelligence are more specific. Francois Chollet, an AI researcher at Google and creator of the machine-learning software library Keras, has said intelligence is tied to a system's ability to adapt and improvise in a new environment, to generalise its knowledge and apply it to unfamiliar scenarios."Intelligence is the efficiency with which you acquire new skills at tasks you didn't previously prepare for," he said."Intelligence is not skill itself; it's not what you can do; it's how well and how efficiently you can learn new things."It's a definition under which modern AI-powered systems, such as virtual assistants, would be characterised as having demonstrated 'narrow AI', the ability to generalise their training when carrying out a limited set of tasks, such as speech recognition or computer vision.Typically, AI systems demonstrate at least some of the following behaviours associated with human intelligence: planning, learning, reasoning, problem-solving, knowledge representation, perception, motion, and manipulation and, to a lesser extent, social intelligence and creativity.AlexNet's performance demonstrated the power of learning systems based on neural networks, a model for machine learning that had existed for decades but that was finally realising its potential due to refinements to architecture and leaps in parallel processing power made possible by Moore's Law. The prowess of machine-learning systems at carrying out computer vision also hit the headlines that year, with Google training a system to recognise an internet favorite: pictures of cats.The next demonstration of the efficacy of machine-learning systems that caught the public's attention was the 2016 triumph of the Google DeepMind AlphaGo AI over a human grandmaster in Go, an ancient Chinese game whose complexity stumped computers for decades. Go has about possible 200 moves per turn compared to about 20 in Chess. Over the course of a game of Go, there are so many possible moves that are searching through each of them in advance to identify the best play is too costly from a computational point of view. Instead, AlphaGo was trained how to play the game by taking moves played by human experts in 30 million Go games and feeding them into deep-learning neural networks.'
Next we will check the length of preprocessed text
len(t5_input_text.split())
410
No. of words in preprocessed text
Next convert the preprocessed text into tokenized format suitable for input to the model
tokenized_text = tokenizer.encode(t5_input_text, return_tensors='pt', max_length=512).to(device)
We will tokenize the t5_input_text using a tokenizer and return the tokenized text as PyTorch tensors.
This will limit the sentence to max length of 512 which will be helpful when summarizing large articles or input text.
Generate Summary
Next we will get summary result
summary_ids = model.generate(tokenized_text, min_length=30, max_length=120)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
model.generate() function takes the tokenized_text as input and generates a summary and returns a tensor containing the generated summary
tokenizer.decode() function takes the generated summary tokens and converts them back into human-readable text. The skip_special_tokens=True argument ensures that any special tokens (such as padding or end-of-sequence tokens) are excluded from the decoded summary
Next we will print the result
summary
'artificial intelligence is a task performed by a machine that would have previously been considered to require human intelligence. it's a definition under which modern AI-powered systems, such as virtual assistants, would be characterised as having demonstrated 'narrow AI' the ability to generalise training when carrying out a limited set of tasks, such as speech recognition or computer vision.'
This is the summary generated for the given input text.
Final Thoughts
Transformer-based models, such as the T5 model that we have used in this project tutorial, have revolutionized the field of natural language processing, including text summarization. These models can capture complex linguistic patterns and generate human-like summaries.
Abstractive summarization differs from extractive summarization, as it generates summaries by understanding the meaning of the text and paraphrasing it in a concise manner, rather than selecting and stitching together existing sentences or phrases
Transformer models can generate summaries that are more fluent, coherent, and can capture important information even if it is not explicitly present in the source text.
While Transformer models have shown impressive results in abstractive summarization, they still face challenges. One common issue is generating summaries that faithfully represent the input while avoiding factual inaccuracies or introducing biases. Ensuring the faithfulness and accuracy of generated summaries remains an ongoing research area.
Evaluating the quality of abstractive summaries is subjective and challenging. Metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are commonly used but have limitations. Human evaluation and domain-specific evaluation criteria can provide more nuanced insights into the quality of summaries.
In this project tutorial, we have explored the Transformer model for generating summary for the given data.
Get the project notebook from here
Thanks for reading the article!!!
Check out more project videos from the YouTube channel Hackers Realm
Comments