Introduction

Recently, I took an AI course, and this is the sixth assignment. The main content taught includes the following topics:

Learn to use LSTM
Use SpaCy

Homework Requirements

Train a text classification on the TweetEval emotion recognition dataset using LSTMs and GRUs.

Build an LSTM model: Follow the example described here. Use the same architecture, but:
1. only use the last output of the LSTM in the loss function
2. use an embedding dim of 128
3. use a hidden dim of 256.
Use SpaCy to split words: Use spaCy to split the tweets into words.
Select the Top 5000 words: Limit your vocabulary (i.e., the words that you converted to an index) to the most frequent 5000 words and replace all other words with a placeholder index (e.g., 1001).
Train the model and calculate accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)
Build and train a GRU model: Do the same, but this time use GRUs instead of LSTMs.

Task 0: Download the Dataset

In this section, we need to do the following:

Download the dataset
Use pandas to convert the dataset into the format we need

Download the Dataset

Refer to this link to download the required data: TweetEval

1	git clone https://github.com/cardiffnlp/tweeteval.git

After downloading, you can see the following information, the emotion folder is the information we will use this time:

.
├── README.md
├── TweetEval_Tutorial.ipynb
├── datasets
│   ├── README.txt
│   ├── emoji
│   ├── emotion # This is the data we need
│   │   ├── mapping.txt # Emotion corresponding to numbers e.g. {0:'angry', 1:'happy'}
│   │   ├── test_labels.txt # Emotion labels for test data, i.e., the answers e.g. 0 
│   │   ├── test_text.txt # Content of the test data e.g. "I'm so angry"
│   │   ├── train_labels.txt # Emotion labels for training data, i.e., the answers e.g. 0
│   │   ├── train_text.txt # Content of the training data e.g. "I'm so angry"
│   │   ├── val_labels.txt # Emotion labels for validation data, i.e., the answers e.g. 0
│   │   └── val_text.txt # Content of the validation data e.g. "I'm so angry"
...

Covert data format

We first introduce the required packages:

# CNN
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler

# others
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
import time

# dataset
import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import Flowers102

# read file 
import pandas as pd

# label
from scipy.io import loadmat
import json

Then we convert the data into the format we need, this time we use panda to process the data and read the data into variables for later use.

Make sure to change the root path to the folder path of your git clone!

# Set the relative path of each file first 
root = '../../Data/tweeteval/datasets/emotion/'
mapping_file = os.path.join(root, 'mapping.txt')
test_labels_file = os.path.join(root, 'test_labels.txt')
test_text_file = os.path.join(root, 'test_text.txt')
train_labels_file = os.path.join(root, 'train_labels.txt')
train_text_file = os.path.join(root, 'train_text.txt')
val_labels_file = os.path.join(root, 'val_labels.txt')
val_text_file = os.path.join(root, 'val_text.txt')

# Use panda to read the data and read the labels 
mapping_pd = pd.read_csv(mapping_file, sep='\t', header=None)
test_label_pd = pd.read_csv(test_labels_file, sep='\t', header=None)
train_label_pd = pd.read_csv(train_labels_file, sep='\t', header=None)
val_label_pd = pd.read_csv(val_labels_file, sep='\t', header=None)

# Use \n to split the content of the training and testing data, and remove the last blank word_embeddings 
# because test_dataset[-1] is empty, and the length will be consistent with the length of labels after removing the length 
test_dataset = open(test_text_file).read().split('\n')[:-1] # remove last empty line 
train_dataset = open(train_text_file).read().split('\n')[:-1] # remove last empty line
val_dataset = open(val_text_file).read().split('\n')[:-1] # remove last empty line


# Print the length of the dataset 
print(f'len(train_dataset)= {len(train_dataset)}')
print(f'len(train_label_pd)= {len(train_label_pd)}')
print(f'=== train_label_pd === \n{train_label_pd.value_counts()}')
print(f'len(test_dataset)= {len(test_dataset)}')
print(f'len(test_label_pd)= {len(test_label_pd)}')
print(f'=== test_label_pd === \n{test_label_pd.value_counts()}')

Result

len(train_dataset)= 3257
len(train_label_pd)= 3257
=== train_label_pd === 
0    1400
3     855
1     708
2     294
Name: count, dtype: int64
len(test_dataset)= 1421
len(test_label_pd)= 1421
=== test_label_pd === 
0    558
3    382
1    358
2    123
Name: count, dtype: int64

Task 1 + 5: Build LSTM, GRU Models

Build an LSTM model: Follow the example described here. Use the same architecture, but:
1. only use the last output of the LSTM in the loss function
2. use an embedding dim of 128
3. use a hidden dim of 256.
Build and train a GRU model: Do the same, but this time use GRUs instead of LSTMs.

From the official example, we can learn how to build an LSTM model, which basically includes the following elements:

hidden_dim: The dimension of the hidden layer, representing the number of neurons in the hidden layer.
word_embeddings: Converts each word in the input sentence into word vectors.
- embedding_dim(vocab_size, embedding_dim):
  - vocab_size: The size of the dictionary, i.e., the total number of words we have. In this example, we will input 5001 words: 5000 common words + 1 unrecognized word.
  - embedding_dim: Represents mapping each word or symbol to a fixed-size vector space. For instance, if your embedding_dim is set to 6, and your input vector is [1, 2, 3, 5], the model will map each number to a six-dimensional vector space, forming a representation like [1, 2, 3, 5, 6, 4].
- lstm(input_size, hidden_size, dropout)
  - input_size: The dimension of the input, which is our word vector dimension.
  - hidden_size: The dimension of the hidden layer, representing the number of neurons in the hidden layer.
  - dropout: The proportion of dropout, default is 0, meaning no dropout is used.
- hidden2tag(in_features, out_features)
  - in_features: The input dimension, which is also the word vector dimension.
  - out_features: The output dimension, which is the dimension of our emotion labels.

class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim
        # Convert the input word into a word vector 
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        # convert the input word into a word vector. Now the sentence is already an index vector
        embeds = self.word_embeddings(sentence) 
        # Use the index vector as the input of the LSTM model to get the output and hidden state of the LSTM layer
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1)) 
        # Take only the last output of the LSTM
        last_output = lstm_out[-1].view(1, -1)  
        tag_space = self.hidden2tag(last_output) #  Use the last output of the LSTM model to convert to the word tag space
        tag_scores = F.log_softmax(tag_space, dim=1) # Use log_softmax to convert to probability 
        return tag_scores

GRU and LSTM are similar, except that the only thing to modify is:

class GRUTagger(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        ...
        # Here !!! Change to GRU  
        self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout)
    def forward(self, sentence):
        ...
        # Here !!! Change to GRU 
        ## Use the index vector as the input of the GRU model to get the output and hidden state of the GRU layer
        gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1)) # 將詞向量作為LSTM模型的輸入 得到LSTM曾的輸出和隱藏狀態
        last_output = gru_out[-1].view(1, -1) # Selecting the last output 為了滿足作業要求，我們只取最後一個輸出 
        ...

After the above modification, the completed LSTM program is as follows:

class GRUTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        super(GRUTagger, self).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout) # <== Here ! 
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence) 
        gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1))  # <== Here ! 
        last_output = gru_out[-1].view(1, -1)  # <== Here ! 
        tag_space = self.hidden2tag(last_output)
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

Task 2 + 3: Split Words Using SpaCy, Find Top 5000 Words

We have already placed the necessary data into a list variable in Task 0, with each data entry being a sentence. Now, we need to do a few things:
2. Split Words Using SpaCy: Use spaCy to split the tweets into words.
3. Select Top 5000 Words: Limit your vocabulary (i.e., the words that you converted to an index) to the most frequent 5000 words and replace all other words with a placeholder index (e.g., 1001).

Install SpaCy

We need to execute the following commands to install the SpaCy package:

# If you are using Python3 
pip install -U spacy
# If you are using Anaconda 
conda install -c conda-forge spacy

As we are analyzing English text, we need to download the English model. Execute the following command:

1	python -m spacy download en_core_web_sm

Only then can we import the spacy package in the notebook and use the English model.

If the above command is not executed, you will encounter an error here!!
nlp = spacy.load(“en_core_web_sm”)

import spacy 
from collections import Counter

# use spacy to tokenize the sentence with english model 
nlp = spacy.load("en_core_web_sm") # <=== If the above command is not executed, you will encounter an error here!!

Prepare a Dictionary of Top 5000 Common Words

We need to identify the top 5000 common words and create a dictionary for this purpose:

First, prepare a string concatenating all sentences.
Then, send the entire string to spacy for data segmentation, filtering out punctuation (punct), stop words, and spaces.
Use the Counter package to count the words, which facilitates identifying the top 5000 common words.

# join all the sentence together 
# e.g. ['today is good', 'today is bad'] => ['today is good today is bad']
text = ' '.join(train_dataset)

# use spacy to tokenize the sentence 
doc = nlp(text)

# filter out the punctuation and stop words
word_freq = Counter(token.text for token in doc \
                    if not token.is_punct and \
                        not token.is_stop and \
                            not token.is_space )
word_freq

Result

Counter({'@user': 2019, --  @user appear 2019 times
         'like': 212,
         'amp': 148,
         'people': 126,
         'know': 96,
         'think': 92,
         'sad': 90,
         'got': 85,
         'day': 81,
         'u': 80,
         'time': 78,
         '✨': 75,
         '😂': 75,
         'want': 74,
         'life': 73,
         'going': 69,
         'feel': 67,
         'angry': 66,
         '2': 65,
         ...})

Next, we can select the top 5000 words based on the number of times they appear:

# select the top 5000 most common words 
most_common_words = word_freq.most_common(5000)
# Build a dictionary mapping words to indexes e.g. {'hello':0, 'like':1 ...} 
vocab = {word[0]: idx for idx, word in enumerate(most_common_words)}

Convert Sentences to Tensors

With the vocab dictionary at hand, we can now convert sentences into an index format based on this dictionary. For example:

Original sentence: I like apple
Converted into index format: [100, 3923, 123]

But what if we encounter a word that we don’t understand or is not included in the dictionary?

Here, we also need a placeholder_index.
When a word in our sentence is not in the vocab dictionary, we convert that word to the placeholder_index.
We set this as 5000, representing an unrecognizable word. For example:
- Original sentence: I like jifw8evjk
- Converted into index format: [100, 3923, 5000]

# Convert words to indexes, and use the placeholder index 5000 for words that are not in the vocabulary table 
placeholder_index = 5000
# Store the result of the entire dataset converted to index 
indexed_dataset = []
# Iterate over the entire dataset
for tweet in train_dataset:
    # Build an empty list to store the results of the current sentence (e.g. I like apple -> [100, 3923, 123])
    indexed_words = []
    # Use spacy to split the sentence into words
    for token in nlp(tweet): 
        # filter out the punctuation and stop words and space
        if not token.is_punct and not token.is_stop and not token.is_space: 
            word = token.text 
            # If the word is in the top 5000 words in the vocab, convert it to its index
            if word in vocab:
                indexed_words.append(vocab[word])
            # Otherwise, convert it to the index of the placeholder token
            else:
                indexed_words.append(placeholder_index) 
    indexed_dataset.append(indexed_words)

# Print the converted data 
print(indexed_dataset)

Result

1	[[2013, 3615, 269, 3616, 3617, 1426, 717, 86], [1069, 339, 2014, 2015, 44, 2016], ...]

Base on the above, we can wrap the above code into a function, which will be convenient for us to convert the sentence into an index format later during training:

# for sentence to sequence 
def prepare_sentence_sequence(seq, to_ix):
    idx = []
    # use spacy to tokenize the sentence 
    for token in nlp(seq):
        # filter out the punctuation and stop words and space 
        if not token.is_punct and not token.is_stop and not token.is_space:
            word = token.text
            # if the token is in the top 5000 words in the vocab, add its index to the list
            if word in to_ix:
                idx.append(to_ix[word])
            else:
                # else add the index of the placeholder token
                idx.append(placeholder_index)
    return torch.tensor(idx, dtype=torch.long) # list convert to tensor

將標籤轉換成 tensor

接下來，我們要處理標籤，標籤也需要轉換成向量，這樣 model 的 ouput 才可以與正確解答做比較：

通常我們預期 model 的 output 會長這樣：[0.1, 0.2, 0.3, 0.4]
- 分別代表 {0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'}的機率
當解答是 anger 時，我們希望 model 的 output 越接近 [1, 0, 0, 0] 越好
- 也就是說，我們需要把 label 進行 one-hot-encoding 轉換成向量的形式，才可以進行比較
- 為了可以把「模型產生的結果」 [0.1, 0.2, 0.3, 0.4] 和「正確解答」[1,0,0,0] 放入 loss function 中計算 loss
- 因此我們需要一個函式，把標籤轉換成向量的形式，這個函示就是 one_hot_encode。

# val is the index of the label (e.g. 2); to_ix is the dictionary of the label (e.g. {0:'angry', 1:'happy'})
def one_hot_encode(val, to_ix): 
    # create an empty list to store the result
    result = [] 
    # iterate over the dictionary of the label
    for k, v in to_ix.items(): 
        # if the index of the label is the same as the index of the dictionary, we found the correct label
        if val == k: 
            # append 1 to the list
            result.append(1)  
        else: 
            # append 0 to the list if the index is not the same
            result.append(0)  
    return torch.tensor(result, dtype=torch.float32) # convert list to tensor

After the above, we can wrap the above code into a function, which will be convenient for us to convert the sentence into an index format later during training:

# Because the label is a number, we need to convert it to a vector  
mapping = dict(zip(mapping_pd[0], mapping_pd[1])) # Return  {0:'angry', 1:'happy', 2:'optimism', 3:'sadness'} 
print(mapping)
print(f'ans=2; vector={one_hot_encode(2, tag_to_ix)}')

Result: We successfully converted 2 into the vector [0, 0, 1, 0]!

1 2	{0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'} ans=2; vector=tensor([0., 0., 1., 0.])

Task 4: Train the Model and Calculate Accuracy

Train the Model and Calculate Accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)

Try Your Hand

Before starting to train the model, we need to first understand what our model’s input and output look like. Let’s see what the model predicts before it’s trained!


# See what the scores are before training
# Here we don't need to train, so the code is wrapped in torch.no_grad()
# Take the first sentence as an example 
sentence_idx = 1
# Print out：My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs 
print(f'First Sentense = {train_dataset[sentence_idx]}') 

with torch.no_grad():
    # Convert the first sentence into index format, and convert it to tensor 
    inputs = prepare_sentence_sequence(train_dataset[sentence_idx], word_to_ix)
    print(f'Sentense to tensor = {inputs}') # 印出：tensor([1070,  340, 2015, 2016,   45, 2017])
    # Then convert the answer to tensor 
    labels = one_hot_encode(train_label_pd[0][sentence_idx], tag_to_ix)
    print(f'Sentense of result to tensor = {labels}') # 印出：tensor([1., 0., 0., 0.])
    
    # Send the inputs to the model and get the model's prediction 
    outputs = model(inputs)
    print(f'tag_scores = {outputs}') # Print：tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])

    # Take the maximum probability value and take out the index 
    _, preds = torch.max(outputs, 1) 
    print(f'preds = {preds}') # Print out： preds = tensor([3])  

    # Take out the index of the maximum probability value 
    result_idx = torch.argmax(outputs).item()
    print(f'result = {result_idx}, ans = {train_label_pd[0][sentence_idx]}') # 印出：result = 3, ans = 0 

    # Calculate loss to see the difference between output and label. Here output[0] is because we found that output has one more layer 
    loss = loss_function(outputs[0], labels)
    print(f'loss = {loss}')

Result

First Sentense = My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs 
Sentense to tensor = tensor([1070,  340, 2015, 2016,   45, 2017])
Sentense of result to tensor = tensor([1., 0., 0., 0.])
tag_scores = tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])
loss = 1.32795250415802
preds = tensor([3])
result = 3, ans = 0

Looks like it’s running pretty smoothly, right?
Here we go!

Task 4: Train the Model and Calculate Accuracy

Train the Model and Calculate Accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)

Try Your Hand

Before starting to train the model, we need to first understand what our model’s input and output look like. Let’s see what the model predicts before it’s trained!

Epoch 0/29 
----------
train Loss: 1.2157 Acc: 0.4642 Time elapsed: 25 sec. 
test Loss: 1.2095 Acc: 0.4553 Time elapsed: 32 sec. 

Epoch 1/29
----------
train Loss: 1.1019 Acc: 0.5333 Time elapsed: 58 sec.
test Loss: 1.1816 Acc: 0.4708 Time elapsed: 65 sec.

Epoch 2/29
----------
train Loss: 1.0151 Acc: 0.5812 Time elapsed: 92 sec.
test Loss: 1.1603 Acc: 0.4898 Time elapsed: 99 sec.
...
Training complete in 17m 5s 
Best val Acc: 0.599578 #

Does the above code look familiar? Yes, it does! If you have followed this article Flower102 Dataset - Training with Transfer Learning + Batch Normalization for CNN it uses the same kind of training. , the same kind of training is used.
It is possible to observe both the training results and the testing results to see if there is any overfitting.
Even if there is overfitting, this method can still preserve the best model.

So let’s get started with the train_model function, and I’ll indicate where we need to change it by !!!! to indicate where we need to make changes:

def train_model(model, criterion, optimizer, scheduler, num_epochs=1):
    # The time when training starts
    since = time.time()
    # Create a temporary folder to store the best model
    with TemporaryDirectory() as tempdir:
        # The path where the best model is stored
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
        # Initially store the best model
        torch.save(model.state_dict(), best_model_params_path)
        # The current best accuracy, which will be updated if a better accuracy is found
        best_acc = 0.0

        # Start training for n epochs
        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)

            # Each epoch has a training and validation phase
            for phase in ['train', 'test']:
                if phase == 'train':
                    model.train()
                else: 
                    model.eval()
                
                running_loss = 0.0
                running_corrects = 0

                # Iterate over data.
                for input, label in zip(dataloaders[phase], resultloaders[phase]):
                    # ===== !!! Here !!! ====== 
                    # Here we will use the functions created in Task 2+3 to convert sentences to indices and labels to vectors
                    # e.g., tensor([1070,  340, 2015, 2016,   45, 2017])
                    inputs_vector = prepare_sentence_sequence(input, word_to_ix) 
                    # e.g., tensor([1., 0., 0., 0.])
                    labels_vector = one_hot_encode(label, tag_to_ix) 
                    # ===== !!! End !!! ====== 


                    # zero the parameter gradients 
                    optimizer.zero_grad()

                    # forward
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        # Similar to the earlier test
                        # Get the predicted result tensor for each emotion
                        outputs = model(inputs_vector) # (e.g., tensor([[-1.3948, -1.4476, -1.3804, -1.3261]]))

                        # ===== !!! Here !!! ====== 
                        # Get the index of the maximum value
                        pred = torch.argmax(outputs).item() # (e.g., 2)
                        # For the outer layer, only need to calculate the loss between the inner layer [-1.3948, -1.4476, -1.3804, -1.3261] and [0, 0, 1, 0]
                        loss = criterion(outputs[0], labels_vector) #
                        # ===== !!! End !!! ====== 

                        # backward + optimize only if in training phase
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()

                    # statistics
                    running_loss += loss.item()
                    if pred == label:
                        running_corrects += 1

                if phase == 'train':
                    scheduler.step()
                # Calculate the loss and accuracy for each epoch
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects / dataset_sizes[phase]
                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} Time elapsed: {round((time.time() - since))} sec.')
                
                # If a better accuracy is found, save the model
                if phase == 'test' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_params_path)

            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')

        # Load best model weights then proceed to the next epoch
        model.load_state_dict(torch.load(best_model_params_path))
    return model

You will find that there are not many places to change… at most:

...
        # the input and label conversion 
        inputs_vector = prepare_sentence_sequence(input, word_to_ix) 
        labels_vector = one_hot_encode(label, tag_to_ix) 
        ...
        # Then get the prediction result tensor for each emotion 
        pred = torch.argmax(outputs).item()
        # Calculate the loss 
        loss = criterion(outputs[0], labels_vector)
...

Now we can start training the model!

Training

Let’s first prepare the dataset for training:

# Before we do that, let's prepare the dataset for the model to use 
dataloaders = {'train': train_dataset, 'test': test_dataset}
resultloaders = {'train': train_label_pd[0].tolist(), 'test': test_label_pd[0].tolist()}
dataset_sizes = {x: len(dataloaders[x]) for x in ['train', 'test']}

Firstly, let’s build the LSTM model!

# Build the model 
# vocab_size need to add 1 because if there are words in the sentence that are not in the vocab, use 5000 to replace them, so you need to add 1 
model_LSTM = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_LSTM = nn.CrossEntropyLoss()
optimizer_LSTM = optim.SGD(model_LSTM.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_LSTM = lr_scheduler.StepLR(optimizer_LSTM, step_size=7, gamma=0.1)

# Start training 
modelLSTM = train_model(model_LSTM, loss_function_LSTM, optimizer_LSTM, exp_lr_scheduler_LSTM, num_epochs=30)

Result

Epoch 2/29
----------
train Loss: 0.9885 Acc: 0.5840 Time elapsed: 97 sec.
test Loss: 1.1279 Acc: 0.5236 Time elapsed: 104 sec.

Epoch 3/29
----------
train Loss: 0.8893 Acc: 0.6371 Time elapsed: 132 sec.
test Loss: 1.1053 Acc: 0.5369 Time elapsed: 139 sec.

Epoch 4/29
----------
train Loss: 0.7683 Acc: 0.7003 Time elapsed: 168 sec.
test Loss: 1.0772 Acc: 0.5658 Time elapsed: 175 sec.
...
test Loss: 1.1330 Acc: 0.6059 Time elapsed: 1040 sec.

Training complete in 17m 20s
Best val Acc: 0.610134

Then let’s build the GRU model!

# vocab_size need to add 1 because if there are words in the sentence that are not in the vocab, use 5000 to replace them, so you need to add 1 
modelGRU = GRUTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_gru = nn.CrossEntropyLoss()
optimizer_gru = optim.SGD(modelGRU.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_gru = lr_scheduler.StepLR(optimizer_gru, step_size=7, gamma=0.1)

# Start training 
modelGRU = train_model(modelGRU, loss_function_gru, optimizer_gru, exp_lr_scheduler_gru, num_epochs=30)

Result

Epoch 3/29
----------
train Loss: 0.8445 Acc: 0.6702 Time elapsed: 131 sec.
test Loss: 1.1211 Acc: 0.5327 Time elapsed: 138 sec.

Epoch 4/29
----------
train Loss: 0.6843 Acc: 0.7393 Time elapsed: 166 sec.
test Loss: 1.1305 Acc: 0.5707 Time elapsed: 173 sec.
...
test Loss: 1.3237 Acc: 0.6073 Time elapsed: 1003 sec.

Training complete in 16m 43s
Best val Acc: 0.608726