Introduction

Recently, I took an AI course, and this is the sixth assignment. The main content taught includes the following topics:

  1. Learn to use LSTM
  2. Use SpaCy

Homework Requirements

Train a text classification on the TweetEval emotion recognition dataset using LSTMs and GRUs.

  1. Build an LSTM model: Follow the example described here. Use the same architecture, but:
    1. only use the last output of the LSTM in the loss function
    2. use an embedding dim of 128
    3. use a hidden dim of 256.
  2. Use SpaCy to split words: Use spaCy to split the tweets into words.
  3. Select the Top 5000 words: Limit your vocabulary (i.e., the words that you converted to an index) to the most frequent 5000 words and replace all other words with a placeholder index (e.g., 1001).
  4. Train the model and calculate accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)
  5. Build and train a GRU model: Do the same, but this time use GRUs instead of LSTMs.

Task 0: Download the Dataset

In this section, we need to do the following:

  1. Download the dataset
  2. Use pandas to convert the dataset into the format we need

Download the Dataset

  1. Refer to this link to download the required data: TweetEval
1
git clone https://github.com/cardiffnlp/tweeteval.git
  1. After downloading, you can see the following information, the emotion folder is the information we will use this time:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
.
├── README.md
├── TweetEval_Tutorial.ipynb
├── datasets
│ ├── README.txt
│ ├── emoji
│ ├── emotion # This is the data we need
│ │ ├── mapping.txt # Emotion corresponding to numbers e.g. {0:'angry', 1:'happy'}
│ │ ├── test_labels.txt # Emotion labels for test data, i.e., the answers e.g. 0
│ │ ├── test_text.txt # Content of the test data e.g. "I'm so angry"
│ │ ├── train_labels.txt # Emotion labels for training data, i.e., the answers e.g. 0
│ │ ├── train_text.txt # Content of the training data e.g. "I'm so angry"
│ │ ├── val_labels.txt # Emotion labels for validation data, i.e., the answers e.g. 0
│ │ └── val_text.txt # Content of the validation data e.g. "I'm so angry"
...

Covert data format

We first introduce the required packages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# CNN
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler

# others
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
import time

# dataset
import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import Flowers102

# read file
import pandas as pd

# label
from scipy.io import loadmat
import json

Then we convert the data into the format we need, this time we use panda to process the data and read the data into variables for later use.

Make sure to change the root path to the folder path of your git clone!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Set the relative path of each file first 
root = '../../Data/tweeteval/datasets/emotion/'
mapping_file = os.path.join(root, 'mapping.txt')
test_labels_file = os.path.join(root, 'test_labels.txt')
test_text_file = os.path.join(root, 'test_text.txt')
train_labels_file = os.path.join(root, 'train_labels.txt')
train_text_file = os.path.join(root, 'train_text.txt')
val_labels_file = os.path.join(root, 'val_labels.txt')
val_text_file = os.path.join(root, 'val_text.txt')

# Use panda to read the data and read the labels
mapping_pd = pd.read_csv(mapping_file, sep='\t', header=None)
test_label_pd = pd.read_csv(test_labels_file, sep='\t', header=None)
train_label_pd = pd.read_csv(train_labels_file, sep='\t', header=None)
val_label_pd = pd.read_csv(val_labels_file, sep='\t', header=None)

# Use \n to split the content of the training and testing data, and remove the last blank word_embeddings
# because test_dataset[-1] is empty, and the length will be consistent with the length of labels after removing the length
test_dataset = open(test_text_file).read().split('\n')[:-1] # remove last empty line
train_dataset = open(train_text_file).read().split('\n')[:-1] # remove last empty line
val_dataset = open(val_text_file).read().split('\n')[:-1] # remove last empty line


# Print the length of the dataset
print(f'len(train_dataset)= {len(train_dataset)}')
print(f'len(train_label_pd)= {len(train_label_pd)}')
print(f'=== train_label_pd === \n{train_label_pd.value_counts()}')
print(f'len(test_dataset)= {len(test_dataset)}')
print(f'len(test_label_pd)= {len(test_label_pd)}')
print(f'=== test_label_pd === \n{test_label_pd.value_counts()}')

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
len(train_dataset)= 3257
len(train_label_pd)= 3257
=== train_label_pd ===
0 1400
3 855
1 708
2 294
Name: count, dtype: int64
len(test_dataset)= 1421
len(test_label_pd)= 1421
=== test_label_pd ===
0 558
3 382
1 358
2 123
Name: count, dtype: int64

Task 1 + 5: Build LSTM, GRU Models

  1. Build an LSTM model: Follow the example described here. Use the same architecture, but:
    1. only use the last output of the LSTM in the loss function
    2. use an embedding dim of 128
    3. use a hidden dim of 256.
  2. Build and train a GRU model: Do the same, but this time use GRUs instead of LSTMs.

From the official example, we can learn how to build an LSTM model, which basically includes the following elements:

  • hidden_dim: The dimension of the hidden layer, representing the number of neurons in the hidden layer.
  • word_embeddings: Converts each word in the input sentence into word vectors.
    • embedding_dim(vocab_size, embedding_dim):
      • vocab_size: The size of the dictionary, i.e., the total number of words we have. In this example, we will input 5001 words: 5000 common words + 1 unrecognized word.
      • embedding_dim: Represents mapping each word or symbol to a fixed-size vector space. For instance, if your embedding_dim is set to 6, and your input vector is [1, 2, 3, 5], the model will map each number to a six-dimensional vector space, forming a representation like [1, 2, 3, 5, 6, 4].
    • lstm(input_size, hidden_size, dropout)
      • input_size: The dimension of the input, which is our word vector dimension.
      • hidden_size: The dimension of the hidden layer, representing the number of neurons in the hidden layer.
      • dropout: The proportion of dropout, default is 0, meaning no dropout is used.
    • hidden2tag(in_features, out_features)
      • in_features: The input dimension, which is also the word vector dimension.
      • out_features: The output dimension, which is the dimension of our emotion labels.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class LSTMTagger(nn.Module):

def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim
# Convert the input word into a word vector
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout)

# The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

def forward(self, sentence):
# convert the input word into a word vector. Now the sentence is already an index vector
embeds = self.word_embeddings(sentence)
# Use the index vector as the input of the LSTM model to get the output and hidden state of the LSTM layer
lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
# Take only the last output of the LSTM
last_output = lstm_out[-1].view(1, -1)
tag_space = self.hidden2tag(last_output) # Use the last output of the LSTM model to convert to the word tag space
tag_scores = F.log_softmax(tag_space, dim=1) # Use log_softmax to convert to probability
return tag_scores

GRU and LSTM are similar, except that the only thing to modify is:

1
2
3
4
5
6
7
8
9
10
11
12
class GRUTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
...
# Here !!! Change to GRU
self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout)
def forward(self, sentence):
...
# Here !!! Change to GRU
## Use the index vector as the input of the GRU model to get the output and hidden state of the GRU layer
gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1)) # 將詞向量作為LSTM模型的輸入 得到LSTM曾的輸出和隱藏狀態
last_output = gru_out[-1].view(1, -1) # Selecting the last output 為了滿足作業要求,我們只取最後一個輸出
...

After the above modification, the completed LSTM program is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class GRUTagger(nn.Module):

def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
super(GRUTagger, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout) # <== Here !
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

def forward(self, sentence):
embeds = self.word_embeddings(sentence)
gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1)) # <== Here !
last_output = gru_out[-1].view(1, -1) # <== Here !
tag_space = self.hidden2tag(last_output)
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores

Task 2 + 3: Split Words Using SpaCy, Find Top 5000 Words

We have already placed the necessary data into a list variable in Task 0, with each data entry being a sentence. Now, we need to do a few things:
2. Split Words Using SpaCy: Use spaCy to split the tweets into words.
3. Select Top 5000 Words: Limit your vocabulary (i.e., the words that you converted to an index) to the most frequent 5000 words and replace all other words with a placeholder index (e.g., 1001).

Install SpaCy

We need to execute the following commands to install the SpaCy package:

1
2
3
4
# If you are using Python3 
pip install -U spacy
# If you are using Anaconda
conda install -c conda-forge spacy

As we are analyzing English text, we need to download the English model. Execute the following command:

1
python -m spacy download en_core_web_sm

Only then can we import the spacy package in the notebook and use the English model.

If the above command is not executed, you will encounter an error here!!
nlp = spacy.load(“en_core_web_sm”)

1
2
3
4
5
import spacy 
from collections import Counter

# use spacy to tokenize the sentence with english model
nlp = spacy.load("en_core_web_sm") # <=== If the above command is not executed, you will encounter an error here!!

Prepare a Dictionary of Top 5000 Common Words

We need to identify the top 5000 common words and create a dictionary for this purpose:

  1. First, prepare a string concatenating all sentences.
  2. Then, send the entire string to spacy for data segmentation, filtering out punctuation (punct), stop words, and spaces.
  3. Use the Counter package to count the words, which facilitates identifying the top 5000 common words.
1
2
3
4
5
6
7
8
9
10
11
12
13
# join all the sentence together 
# e.g. ['today is good', 'today is bad'] => ['today is good today is bad']
text = ' '.join(train_dataset)

# use spacy to tokenize the sentence
doc = nlp(text)

# filter out the punctuation and stop words
word_freq = Counter(token.text for token in doc \
if not token.is_punct and \
not token.is_stop and \
not token.is_space )
word_freq

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Counter({'@user': 2019, --  @user appear 2019 times
'like': 212,
'amp': 148,
'people': 126,
'know': 96,
'think': 92,
'sad': 90,
'got': 85,
'day': 81,
'u': 80,
'time': 78,
'✨': 75,
'😂': 75,
'want': 74,
'life': 73,
'going': 69,
'feel': 67,
'angry': 66,
'2': 65,
...})

Next, we can select the top 5000 words based on the number of times they appear:

1
2
3
4
# select the top 5000 most common words 
most_common_words = word_freq.most_common(5000)
# Build a dictionary mapping words to indexes e.g. {'hello':0, 'like':1 ...}
vocab = {word[0]: idx for idx, word in enumerate(most_common_words)}

Convert Sentences to Tensors

With the vocab dictionary at hand, we can now convert sentences into an index format based on this dictionary. For example:

  • Original sentence: I like apple
  • Converted into index format: [100, 3923, 123]

But what if we encounter a word that we don’t understand or is not included in the dictionary?

  • Here, we also need a placeholder_index.
  • When a word in our sentence is not in the vocab dictionary, we convert that word to the placeholder_index.
  • We set this as 5000, representing an unrecognizable word. For example:
    • Original sentence: I like jifw8evjk
    • Converted into index format: [100, 3923, 5000]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Convert words to indexes, and use the placeholder index 5000 for words that are not in the vocabulary table 
placeholder_index = 5000
# Store the result of the entire dataset converted to index
indexed_dataset = []
# Iterate over the entire dataset
for tweet in train_dataset:
# Build an empty list to store the results of the current sentence (e.g. I like apple -> [100, 3923, 123])
indexed_words = []
# Use spacy to split the sentence into words
for token in nlp(tweet):
# filter out the punctuation and stop words and space
if not token.is_punct and not token.is_stop and not token.is_space:
word = token.text
# If the word is in the top 5000 words in the vocab, convert it to its index
if word in vocab:
indexed_words.append(vocab[word])
# Otherwise, convert it to the index of the placeholder token
else:
indexed_words.append(placeholder_index)
indexed_dataset.append(indexed_words)

# Print the converted data
print(indexed_dataset)

Result

1
[[2013, 3615, 269, 3616, 3617, 1426, 717, 86], [1069, 339, 2014, 2015, 44, 2016], ...] 

Base on the above, we can wrap the above code into a function, which will be convenient for us to convert the sentence into an index format later during training:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# for sentence to sequence 
def prepare_sentence_sequence(seq, to_ix):
idx = []
# use spacy to tokenize the sentence
for token in nlp(seq):
# filter out the punctuation and stop words and space
if not token.is_punct and not token.is_stop and not token.is_space:
word = token.text
# if the token is in the top 5000 words in the vocab, add its index to the list
if word in to_ix:
idx.append(to_ix[word])
else:
# else add the index of the placeholder token
idx.append(placeholder_index)
return torch.tensor(idx, dtype=torch.long) # list convert to tensor

將標籤轉換成 tensor

接下來,我們要處理標籤,標籤也需要轉換成向量,這樣 model 的 ouput 才可以與 正確解答 做比較:

  • 通常我們預期 model 的 output 會長這樣:[0.1, 0.2, 0.3, 0.4]
    • 分別代表 {0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'}的機率
  • 當解答是 anger 時,我們希望 model 的 output 越接近 [1, 0, 0, 0] 越好
    • 也就是說,我們需要把 label 進行 one-hot-encoding 轉換成向量的形式,才可以進行比較
    • 為了可以把「模型產生的結果」 [0.1, 0.2, 0.3, 0.4] 和 「正確解答」[1,0,0,0] 放入 loss function 中計算 loss
    • 因此我們需要一個函式,把標籤轉換成向量的形式,這個函示就是 one_hot_encode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# val is the index of the label (e.g. 2); to_ix is the dictionary of the label (e.g. {0:'angry', 1:'happy'})
def one_hot_encode(val, to_ix):
# create an empty list to store the result
result = []
# iterate over the dictionary of the label
for k, v in to_ix.items():
# if the index of the label is the same as the index of the dictionary, we found the correct label
if val == k:
# append 1 to the list
result.append(1)
else:
# append 0 to the list if the index is not the same
result.append(0)
return torch.tensor(result, dtype=torch.float32) # convert list to tensor

After the above, we can wrap the above code into a function, which will be convenient for us to convert the sentence into an index format later during training:

1
2
3
4
# Because the label is a number, we need to convert it to a vector  
mapping = dict(zip(mapping_pd[0], mapping_pd[1])) # Return {0:'angry', 1:'happy', 2:'optimism', 3:'sadness'}
print(mapping)
print(f'ans=2; vector={one_hot_encode(2, tag_to_ix)}')

Result: We successfully converted 2 into the vector [0, 0, 1, 0]!

1
2
{0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'}
ans=2; vector=tensor([0., 0., 1., 0.])

Task 4: Train the Model and Calculate Accuracy

  1. Train the Model and Calculate Accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)

Try Your Hand

Before starting to train the model, we need to first understand what our model’s input and output look like. Let’s see what the model predicts before it’s trained!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

# See what the scores are before training
# Here we don't need to train, so the code is wrapped in torch.no_grad()
# Take the first sentence as an example
sentence_idx = 1
# Print out:My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs
print(f'First Sentense = {train_dataset[sentence_idx]}')

with torch.no_grad():
# Convert the first sentence into index format, and convert it to tensor
inputs = prepare_sentence_sequence(train_dataset[sentence_idx], word_to_ix)
print(f'Sentense to tensor = {inputs}') # 印出:tensor([1070, 340, 2015, 2016, 45, 2017])
# Then convert the answer to tensor
labels = one_hot_encode(train_label_pd[0][sentence_idx], tag_to_ix)
print(f'Sentense of result to tensor = {labels}') # 印出:tensor([1., 0., 0., 0.])

# Send the inputs to the model and get the model's prediction
outputs = model(inputs)
print(f'tag_scores = {outputs}') # Print:tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])

# Take the maximum probability value and take out the index
_, preds = torch.max(outputs, 1)
print(f'preds = {preds}') # Print out: preds = tensor([3])

# Take out the index of the maximum probability value
result_idx = torch.argmax(outputs).item()
print(f'result = {result_idx}, ans = {train_label_pd[0][sentence_idx]}') # 印出:result = 3, ans = 0

# Calculate loss to see the difference between output and label. Here output[0] is because we found that output has one more layer
loss = loss_function(outputs[0], labels)
print(f'loss = {loss}')

Result

1
2
3
4
5
6
7
First Sentense = My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs 
Sentense to tensor = tensor([1070, 340, 2015, 2016, 45, 2017])
Sentense of result to tensor = tensor([1., 0., 0., 0.])
tag_scores = tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])
loss = 1.32795250415802
preds = tensor([3])
result = 3, ans = 0

Looks like it’s running pretty smoothly, right?
Here we go!

Task 4: Train the Model and Calculate Accuracy

  1. Train the Model and Calculate Accuracy: Evaluate the accuracy on the test set. (Note: If the training takes too long, try to use only a fraction of the training data.)

Try Your Hand

Before starting to train the model, we need to first understand what our model’s input and output look like. Let’s see what the model predicts before it’s trained!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Epoch 0/29 
----------
train Loss: 1.2157 Acc: 0.4642 Time elapsed: 25 sec.
test Loss: 1.2095 Acc: 0.4553 Time elapsed: 32 sec.

Epoch 1/29
----------
train Loss: 1.1019 Acc: 0.5333 Time elapsed: 58 sec.
test Loss: 1.1816 Acc: 0.4708 Time elapsed: 65 sec.

Epoch 2/29
----------
train Loss: 1.0151 Acc: 0.5812 Time elapsed: 92 sec.
test Loss: 1.1603 Acc: 0.4898 Time elapsed: 99 sec.
...
Training complete in 17m 5s
Best val Acc: 0.599578 #

Does the above code look familiar? Yes, it does! If you have followed this article Flower102 Dataset - Training with Transfer Learning + Batch Normalization for CNN it uses the same kind of training. , the same kind of training is used.
It is possible to observe both the training results and the testing results to see if there is any overfitting.
Even if there is overfitting, this method can still preserve the best model.

So let’s get started with the train_model function, and I’ll indicate where we need to change it by !!!! to indicate where we need to make changes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def train_model(model, criterion, optimizer, scheduler, num_epochs=1):
# The time when training starts
since = time.time()
# Create a temporary folder to store the best model
with TemporaryDirectory() as tempdir:
# The path where the best model is stored
best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
# Initially store the best model
torch.save(model.state_dict(), best_model_params_path)
# The current best accuracy, which will be updated if a better accuracy is found
best_acc = 0.0

# Start training for n epochs
for epoch in range(num_epochs):
print(f'Epoch {epoch}/{num_epochs - 1}')
print('-' * 10)

# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train()
else:
model.eval()

running_loss = 0.0
running_corrects = 0

# Iterate over data.
for input, label in zip(dataloaders[phase], resultloaders[phase]):
# ===== !!! Here !!! ======
# Here we will use the functions created in Task 2+3 to convert sentences to indices and labels to vectors
# e.g., tensor([1070, 340, 2015, 2016, 45, 2017])
inputs_vector = prepare_sentence_sequence(input, word_to_ix)
# e.g., tensor([1., 0., 0., 0.])
labels_vector = one_hot_encode(label, tag_to_ix)
# ===== !!! End !!! ======


# zero the parameter gradients
optimizer.zero_grad()

# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
# Similar to the earlier test
# Get the predicted result tensor for each emotion
outputs = model(inputs_vector) # (e.g., tensor([[-1.3948, -1.4476, -1.3804, -1.3261]]))

# ===== !!! Here !!! ======
# Get the index of the maximum value
pred = torch.argmax(outputs).item() # (e.g., 2)
# For the outer layer, only need to calculate the loss between the inner layer [-1.3948, -1.4476, -1.3804, -1.3261] and [0, 0, 1, 0]
loss = criterion(outputs[0], labels_vector) #
# ===== !!! End !!! ======

# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()

# statistics
running_loss += loss.item()
if pred == label:
running_corrects += 1

if phase == 'train':
scheduler.step()
# Calculate the loss and accuracy for each epoch
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects / dataset_sizes[phase]
print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} Time elapsed: {round((time.time() - since))} sec.')

# If a better accuracy is found, save the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
torch.save(model.state_dict(), best_model_params_path)

print()

time_elapsed = time.time() - since
print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
print(f'Best val Acc: {best_acc:4f}')

# Load best model weights then proceed to the next epoch
model.load_state_dict(torch.load(best_model_params_path))
return model

You will find that there are not many places to change… at most:

1
2
3
4
5
6
7
8
9
10
...
# the input and label conversion
inputs_vector = prepare_sentence_sequence(input, word_to_ix)
labels_vector = one_hot_encode(label, tag_to_ix)
...
# Then get the prediction result tensor for each emotion
pred = torch.argmax(outputs).item()
# Calculate the loss
loss = criterion(outputs[0], labels_vector)
...

Now we can start training the model!

Training

Let’s first prepare the dataset for training:

1
2
3
4
# Before we do that, let's prepare the dataset for the model to use 
dataloaders = {'train': train_dataset, 'test': test_dataset}
resultloaders = {'train': train_label_pd[0].tolist(), 'test': test_label_pd[0].tolist()}
dataset_sizes = {x: len(dataloaders[x]) for x in ['train', 'test']}

Firstly, let’s build the LSTM model!

1
2
3
4
5
6
7
8
9
# Build the model 
# vocab_size need to add 1 because if there are words in the sentence that are not in the vocab, use 5000 to replace them, so you need to add 1
model_LSTM = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_LSTM = nn.CrossEntropyLoss()
optimizer_LSTM = optim.SGD(model_LSTM.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_LSTM = lr_scheduler.StepLR(optimizer_LSTM, step_size=7, gamma=0.1)

# Start training
modelLSTM = train_model(model_LSTM, loss_function_LSTM, optimizer_LSTM, exp_lr_scheduler_LSTM, num_epochs=30)

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Epoch 2/29
----------
train Loss: 0.9885 Acc: 0.5840 Time elapsed: 97 sec.
test Loss: 1.1279 Acc: 0.5236 Time elapsed: 104 sec.

Epoch 3/29
----------
train Loss: 0.8893 Acc: 0.6371 Time elapsed: 132 sec.
test Loss: 1.1053 Acc: 0.5369 Time elapsed: 139 sec.

Epoch 4/29
----------
train Loss: 0.7683 Acc: 0.7003 Time elapsed: 168 sec.
test Loss: 1.0772 Acc: 0.5658 Time elapsed: 175 sec.
...
test Loss: 1.1330 Acc: 0.6059 Time elapsed: 1040 sec.

Training complete in 17m 20s
Best val Acc: 0.610134

Then let’s build the GRU model!

1
2
3
4
5
6
7
8
# vocab_size need to add 1 because if there are words in the sentence that are not in the vocab, use 5000 to replace them, so you need to add 1 
modelGRU = GRUTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_gru = nn.CrossEntropyLoss()
optimizer_gru = optim.SGD(modelGRU.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_gru = lr_scheduler.StepLR(optimizer_gru, step_size=7, gamma=0.1)

# Start training
modelGRU = train_model(modelGRU, loss_function_gru, optimizer_gru, exp_lr_scheduler_gru, num_epochs=30)

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Epoch 3/29
----------
train Loss: 0.8445 Acc: 0.6702 Time elapsed: 131 sec.
test Loss: 1.1211 Acc: 0.5327 Time elapsed: 138 sec.

Epoch 4/29
----------
train Loss: 0.6843 Acc: 0.7393 Time elapsed: 166 sec.
test Loss: 1.1305 Acc: 0.5707 Time elapsed: 173 sec.
...
test Loss: 1.3237 Acc: 0.6073 Time elapsed: 1003 sec.

Training complete in 16m 43s
Best val Acc: 0.608726