前言

最近選了一堂AI課程，這是第六個作業，主要教授內容為以下主題：

學會使用 LSTM
使用SpaCy

作業要求

Train a text classification on the TweetEval emotion recognition dataset using LSTMs and GRUs.

建立LSTM模型：Follow the example described here. Use the same architecture, but:
1. only use the last output of the LSTM in the loss function
2. use an embedding dim of 128
3. use a hidden dim of 256.
使用SpaCy切割字：Use spaCy to split the tweets into words.
挑選Top5000的字：Limit your vocabulary (i.e. the words that you converted to an index) to the most frequent 5000
words and replace all other words with an placeholder index (e.g. 1001).
訓練模型並計算準確度：Evaluate the accuracy on the test set. (Note: If the training takes to long, try to use only a fraction of the training data.)
建立GRU模型，並訓練：Do the same, but this time use GRUs instead of LSTMs.

Task 0: 下載資料集

在這個章節我們需要做的事有以下：

下載資料集
使用 panda 將資料集轉換成我們需要的格式

下載資料集

可以參考此連結，下載所需要的資料：TweetEval

1	git clone https://github.com/cardiffnlp/tweeteval.git

下載完後，可以看到以下資料， emotion 資料夾中是我們這次會使用的資料：

.
├── README.md
├── TweetEval_Tutorial.ipynb
├── datasets
│   ├── README.txt
│   ├── emoji
│   ├── emotion # 這是我們需要的資料 
│   │   ├── mapping.txt # 情緒對應的數字 e.g. {0:'angry', 1:'happy'}
│   │   ├── test_labels.txt # 測試資料的情緒標籤，也就是解答 e.g. 0 
│   │   ├── test_text.txt # 測試資料的內容 e.g. "I'm so angry"
│   │   ├── train_labels.txt # 訓練資料的情緒標籤，也就是解答 e.g. 0
│   │   ├── train_text.txt # 訓練資料的內容 e.g. "I'm so angry"
│   │   ├── val_labels.txt # 驗證資料的情緒標籤，也就是解答 e.g. 0
│   │   └── val_text.txt # 驗證資料的內容 e.g. "I'm so angry" 
...

轉換資料格式

我們先引入所需要的套件

# CNN
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler

# others
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
import time

# dataset
import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import Flowers102

# read file 
import pandas as pd

# label
from scipy.io import loadmat
import json

接著我們將資料轉換成我們需要的格式，這邊我們使用 panda 來處理資料，並將資料讀取至變數中，方便我們之後使用。

記得修改 root 的路徑到你 git clone 的資料夾路徑喔！！

# 先設定每個檔案的相對路徑 
root = '../../Data/tweeteval/datasets/emotion/'
mapping_file = os.path.join(root, 'mapping.txt')
test_labels_file = os.path.join(root, 'test_labels.txt')
test_text_file = os.path.join(root, 'test_text.txt')
train_labels_file = os.path.join(root, 'train_labels.txt')
train_text_file = os.path.join(root, 'train_text.txt')
val_labels_file = os.path.join(root, 'val_labels.txt')
val_text_file = os.path.join(root, 'val_text.txt')

# 使用panda讀取資料 把標籤進行讀取 
mapping_pd = pd.read_csv(mapping_file, sep='\t', header=None)
test_label_pd = pd.read_csv(test_labels_file, sep='\t', header=None)
train_label_pd = pd.read_csv(train_labels_file, sep='\t', header=None)
val_label_pd = pd.read_csv(val_labels_file, sep='\t', header=None)

# 把訓練用的內容和測試用的內容，透過 \n 進行切割，並且把最後一個空白的字去除  
# 因為 test_dataset[-1] 是空的，並且要去除 長度才會與 labels 的長度一致
test_dataset = open(test_text_file).read().split('\n')[:-1] # remove last empty line 
train_dataset = open(train_text_file).read().split('\n')[:-1] # remove last empty line
val_dataset = open(val_text_file).read().split('\n')[:-1] # remove last empty line


# 列印看看資料的長度
print(f'len(train_dataset)= {len(train_dataset)}')
print(f'len(train_label_pd)= {len(train_label_pd)}')
print(f'=== train_label_pd === \n{train_label_pd.value_counts()}') # 查看train的情緒標籤資料分布
print(f'len(test_dataset)= {len(test_dataset)}')
print(f'len(test_label_pd)= {len(test_label_pd)}')
print(f'=== test_label_pd === \n{test_label_pd.value_counts()}') # 查看test的情緒標籤資料分布

輸出結果

len(train_dataset)= 3257
len(train_label_pd)= 3257
=== train_label_pd === 
0    1400
3     855
1     708
2     294
Name: count, dtype: int64
len(test_dataset)= 1421
len(test_label_pd)= 1421
=== test_label_pd === 
0    558
3    382
1    358
2    123
Name: count, dtype: int64

Task 1 + 5 : 建立LSTM, GRU 模型

建立LSTM模型：Follow the example described here. Use the same architecture, but:
1. only use the last output of the LSTM in the loss function
2. use an embedding dim of 128
3. use a hidden dim of 256.
建立GRU模型，並訓練：Do the same, but this time use GRUs instead of LSTMs.

從官網範例中我們可以學習到如何建置LSTM模型，基本上我們會有以下元素：

hidden_dim：隱藏層的維度，代表的是 hidden layer 的神經元數量
word_embeddings：將輸入的句子中的每個詞都轉換成詞向量
- embedding_dim(vocab_size, embedding_dim)：
  - vocab_size：字典的大小，也就是我們總共有多少個字，這個例子中我們之後會輸入 5001 個字，5000 常用字＋ 1 無法辨識的字
  - embedding_dim：表示將每個詞或符號映射到固定大小的向量空間中。如果你的 embedding_dim 設定為 6，而你的 input 向量是 [1, 2, 3, 5]，模型會將每個數字映射到六維度的向量空間中，形成類似 [1, 2, 3, 5, 6, 4] 的表示。
- lstm(input_size, hidden_size, dropout)
  - input_size：輸入的維度，也就是我們的詞向量維度
  - hidden_size：隱藏層的維度，代表的是 hidden layer 的神經元數量
  - dropout：dropout的比例，預設為0，代表不使用dropout
- hidden2tag(in_features, out_features)
  - in_features：也就是 hidden_dim 輸入的維度，也就是我們的詞向量維度
  - out_features：也就是 tagset_size 輸出的維度，也就是我們的情緒標籤維度

class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim
        # 把每個詞都轉換成詞向量
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence) # 將輸入的句子中的每個詞都轉換成詞向量 此時的 sentence 已經是 index 形式的向量
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1)) # 將詞向量作為LSTM模型的輸入 得到LSTM曾的輸出和隱藏狀態
        # Take only the last output of the LSTM
        last_output = lstm_out[-1].view(1, -1)  # Selecting the last output 為了滿足作業要求，我們只取最後一個輸出 
        tag_space = self.hidden2tag(last_output) # 將LSTM模型的最後輸出轉換成 詞標籤 空間
        tag_scores = F.log_softmax(tag_space, dim=1) # 將詞標籤 空間 轉換成 機率空間 
        return tag_scores

GRU 與 LSTM 類似，唯獨要修改的地方是：

class GRUTagger(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        ...
        # Here !!! 修改成 GRU 
        self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout)
    def forward(self, sentence):
        ...
        # Here !!! 這邊套用 gru 
        gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1)) # 將詞向量作為LSTM模型的輸入 得到LSTM曾的輸出和隱藏狀態
        last_output = gru_out[-1].view(1, -1) # Selecting the last output 為了滿足作業要求，我們只取最後一個輸出 
        ...

完成的GRU程式如下：

class GRUTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, dropout=0.0):
        super(GRUTagger, self).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim, dropout=dropout) # <== Here ! 
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence) 
        gru_out, _ = self.gru(embeds.view(len(sentence), 1, -1))  # <== Here ! 
        last_output = gru_out[-1].view(1, -1)  # <== Here ! 
        tag_space = self.hidden2tag(last_output)
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

Task 2 + 3 : 使用SpaCy切割字, 找出Top5000的字

我們已經在 Task0 把所需要的資料都放入變數list當中，每筆資料都是一個句子，我們現在要做幾件事情：
2. 使用SpaCy切割字：Use spaCy to split the tweets into words.
3. 挑選Top5000的字：Limit your vocabulary (i.e. the words that you converted to an index) to the most frequent 5000
words and replace all other words with an placeholder index (e.g. 1001).

安裝 SpaCy

我們要先執行以下指令，安裝 SpaCy 套件:

# 如果你是 Python3 
pip install -U spacy
# 如果你是 Anaconda 
conda install -c conda-forge spacy

因為我們分析的是英文，所以我們需要下載英文用的模型，執行以下指令：

1	python -m spacy download en_core_web_sm

這樣才可以在 notebook 中 import spacy 套件，並且使用英文模型。

如果沒有執行上述指令，這邊會出錯喔！！
nlp = spacy.load("en_core_web_sm")

import spacy 
from collections import Counter

# use spacy to tokenize the sentence with english model 
nlp = spacy.load("en_core_web_sm") # <=== 如果沒有執行上述指令，這邊會出錯喔！！

準備 top 5000 常用字的字典

我們要先找出 top 5000 常用字，並且建立一個字典，為了做到這樣的事情：

我們先準備好一個字串，把所有句子串起來。
然後把整個字串送入 spacy 進行資料切割，並且過濾掉標點符號(punct)和停用字(stop word)還有空白(space)。
使用 Counter 套件，把 word 進行計數，以方便找出 top 5000 常用字。

# join all the sentence together 
# e.g. ['today is good', 'today is bad'] => ['today is good today is bad']
text = ' '.join(train_dataset)

# use spacy to tokenize the sentence 
doc = nlp(text)

# filter out the punctuation and stop words
word_freq = Counter(token.text for token in doc \
                    if not token.is_punct and \
                        not token.is_stop and \
                            not token.is_space )
word_freq

輸出結果

Counter({'@user': 2019, -- 光是 @user 就出現了 2019 次 
         'like': 212,
         'amp': 148,
         'people': 126,
         'know': 96,
         'think': 92,
         'sad': 90,
         'got': 85,
         'day': 81,
         'u': 80,
         'time': 78,
         '✨': 75,
         '😂': 75,
         'want': 74,
         'life': 73,
         'going': 69,
         'feel': 67,
         'angry': 66,
         '2': 65,
         ...})

接下來我們就可以根據 words 出現的次數，選出最多的前 5000 筆：

# 選擇最常見的 5000 個單詞作為詞彙表
most_common_words = word_freq.most_common(5000)
# 建立詞彙到索引的映射 e.g. {'hello':0, 'like':1 ...}
vocab = {word[0]: idx for idx, word in enumerate(most_common_words)}

將句子轉換成 tensor

有了 vocab 這個字典後，我們就可以基於這個字典，把句子轉換成 index 的形式。舉例來說：

原本的句子：I like apple
轉換成 index 的形式：[100, 3923, 123]

但是萬一出現了我們看不懂的單字，或是沒收錄的單字該怎麼辦？

這邊我們還需要一個 placeholder_index
當我們的句子中有字不在 vocab 字典中時，我們就把這個字轉換成 placeholder_index
這邊我們設定為 5000，代表無法辨識的字，舉例來說：
- 原本的句子：I like jifw8evjk
- 轉換成 index 的形式：[100, 3923, 5000]

# 轉換單詞為索引，超出詞彙表的單詞用佔位索引 5000 代替 因為我們會收集前 0-4999 index 的單詞
placeholder_index = 5000
# 存放整個 dataset 轉換成 index 的結果 
indexed_dataset = []
for tweet in train_dataset: # 取出第一個句子 
    indexed_words = [] # 建立一個空的 list 存放當前句子的結果 (e.g. I like apple -> [100, 3923, 123]) 
    for token in nlp(tweet): # 透過 spacy 切割句子成單詞 
        if not token.is_punct and not token.is_stop and not token.is_space: # 確保單字不是標點符號、停用字、空白 
            word = token.text 
            if word in vocab: # 如果該單字在我們的常見 5000 單字中，就把它轉換成 index 
                indexed_words.append(vocab[word])
            else: # 否則 index 就是 placeholder_index 
                indexed_words.append(placeholder_index)
    indexed_dataset.append(indexed_words)

# 打印轉換後的數據
print(indexed_dataset)

輸出結果

1	[[2013, 3615, 269, 3616, 3617, 1426, 717, 86], [1069, 339, 2014, 2015, 44, 2016], ...]

那根據上面的說明，我們可以把上面的程式碼包裝成一個 function，方便我們之後在進行訓練時，把句子轉換成 index list：

# for sentence to sequence 
def prepare_sentence_sequence(seq, to_ix):
    idx = []
    # use spacy to tokenize the sentence 
    for token in nlp(seq):
        # filter out the punctuation and stop words and space 
        if not token.is_punct and not token.is_stop and not token.is_space:
            word = token.text
            # if the token is in the top 5000 words in the vocab, add its index to the list
            if word in to_ix:
                idx.append(to_ix[word])
            else:
                # else add the index of the placeholder token
                idx.append(placeholder_index)
    return torch.tensor(idx, dtype=torch.long) # 把 list 轉換成 tensor

將標籤轉換成 tensor

接下來，我們要處理標籤，標籤也需要轉換成向量，這樣 model 的 ouput 才可以與正確解答做比較：

通常我們預期 model 的 output 會長這樣：[0.1, 0.2, 0.3, 0.4]
- 分別代表 {0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'}的機率
當解答是 anger 時，我們希望 model 的 output 越接近 [1, 0, 0, 0] 越好
- 也就是說，我們需要把 label 進行 one-hot-encoding 轉換成向量的形式，才可以進行比較
- 為了可以把「模型產生的結果」 [0.1, 0.2, 0.3, 0.4] 和「正確解答」[1,0,0,0] 放入 loss function 中計算 loss
- 因此我們需要一個函式，把標籤轉換成向量的形式，這個函示就是 one_hot_encode。

def one_hot_encode(val, to_ix): # val 是標籤的 index (e.g. 2); to_ix 是標籤的字典 (e.g. {0:'angry', 1:'happy'}) 
    result = [] # 建立一個空的 list 儲存結果 
    for k, v in to_ix.items(): # 我們把標籤的字典進行迭代
        if val == k: # 一但發現，val 等於 k，代表我們找到了正確的標籤 
            result.append(1) # 在這個位置我們要填上 1 
        else: 
            result.append(0) # 其他位置都填上 0 
    return torch.tensor(result, dtype=torch.float32) # 把 list 轉換成 tensor

建立好上述的函示之後我們可以來實驗看看這個函示有沒有用：

# 因為 mapping_pd 是一個 dataframe，我們要把它轉換成字典，方便我們之後使用 
mapping = dict(zip(mapping_pd[0], mapping_pd[1])) # 回傳 {0:'angry', 1:'happy', 2:'optimism', 3:'sadness'} 
print(mapping)
print(f'ans=2; vector={one_hot_encode(2, tag_to_ix)}')

結果如下: 你看！我們把 2 成功轉換成 [0, 0, 1, 0] 的向量了！

1 2	{0: 'anger', 1: 'joy', 2: 'optimism', 3: 'sadness'} ans=2; vector=tensor([0., 0., 1., 0.])

Task 4: 訓練模型並計算準確度

訓練模型並計算準確度：Evaluate the accuracy on the test set. (Note: If the training takes to long, try to use only a fraction of the training data.)

小試身手

在開始訓練模型前，我們要先知道我們的模型的輸入和輸出長什麼樣子，在這邊我們試試看，模型還沒訓練前預測的結果吧！


# See what the scores are before training
# Here we don't need to train, so the code is wrapped in torch.no_grad()
sentence_idx = 1 # 拿第一個句子來測試
# 印出：My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs 
print(f'First Sentense = {train_dataset[sentence_idx]}') 

with torch.no_grad():
    # 這時候我們可以把第一個句子轉換成 index 的形式，並且把它轉換成 tensor 
    inputs = prepare_sentence_sequence(train_dataset[sentence_idx], word_to_ix)
    print(f'Sentense to tensor = {inputs}') # 印出：tensor([1070,  340, 2015, 2016,   45, 2017])
     
    # 然後把解答轉換成 tensor 
    labels = one_hot_encode(train_label_pd[0][sentence_idx], tag_to_ix)
    print(f'Sentense of result to tensor = {labels}') # 印出：tensor([1., 0., 0., 0.])
     
    # 把 inputs 送入模型中，得到模型的預測結果 
    outputs = model(inputs)
    print(f'tag_scores = {outputs}') # 印出：tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])

    # 取出最大的機率值，並且取出 index 
    _, preds = torch.max(outputs, 1) 
    print(f'preds = {preds}') # 印出：preds = tensor([3])  

    # 計算 loss 看看 output 跟 label 的差距，這邊 output[0] 是因為發現 output 多包一層 
    result_idx = torch.argmax(outputs).item()
    print(f'result = {result_idx}, ans = {train_label_pd[0][sentence_idx]}') # 印出：result = 3, ans = 0 

    # 計算 loss 看看 output 跟 label 的差距，這邊 output[0] 是因為發現 output 多包一層
    loss = loss_function(outputs[0], labels)
    print(f'loss = {loss}')

輸出結果

First Sentense = My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs 
Sentense to tensor = tensor([1070,  340, 2015, 2016,   45, 2017])
Sentense of result to tensor = tensor([1., 0., 0., 0.])
tag_scores = tensor([[-1.3280, -1.4272, -1.4998, -1.3026]])
loss = 1.32795250415802
preds = tensor([3])
result = 3, ans = 0

看起來運行的還挺順暢的對吧？
那我們正式開始囉！

準備 training 用的函示

這邊我希望在 training 每次的 epoch 時：

列印出 training 的 loss 和 accuracy 來確認模型的訓練狀況。
同時保留最好的model。
計算訓練時間。
我們預期輸出的結果會長這樣：

Epoch 0/29 
----------
train Loss: 1.2157 Acc: 0.4642 Time elapsed: 25 sec. -- 列印出 training 的 accuracy 來確認模型的訓練狀況。
test Loss: 1.2095 Acc: 0.4553 Time elapsed: 32 sec. -- 列印出 testing 的 accuracy 來確認模型的訓練狀況。

Epoch 1/29
----------
train Loss: 1.1019 Acc: 0.5333 Time elapsed: 58 sec.
test Loss: 1.1816 Acc: 0.4708 Time elapsed: 65 sec.

Epoch 2/29
----------
train Loss: 1.0151 Acc: 0.5812 Time elapsed: 92 sec.
test Loss: 1.1603 Acc: 0.4898 Time elapsed: 99 sec.
...
Training complete in 17m 5s -- 列印出 訓練所有 epoch 的時間。 
Best val Acc: 0.599578 #  -- 列印出並保留 最好的 accuracy 的 model

有沒有覺得上述的程式碼很熟悉？沒錯！如果你有按照這篇Flower102 Dataset - 使用 Transfer Learning 訓練 + 使用 Batch Normalization 於 CNN裡面也是用同一種training的方式。
因為它既可以同時觀察 training 的結果，還可以觀察 testing 的結果訓練狀況是否有過擬合的情況。
儘管過度擬合，這個方式也可以保存最佳的模型。

那我們就開始 train_model 的函式，我這邊會透過 !!! 來標示出我們需要修改的地方：

def train_model(model, criterion, optimizer, scheduler, num_epochs=1):
    # 開始訓練的時間 
    since = time.time()
    # 建立一個暫存資料夾，用來存放最好的模型 
    with TemporaryDirectory() as tempdir:
        # 把目前最好的模型存放的路徑 
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
        # 先把最好的模型存放起來 
        torch.save(model.state_dict(), best_model_params_path)
        # 當前最好的準確度，如果有更好的準確度就會更新 
        best_acc = 0.0

        # 開始訓練 n 個 epoch 
        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)

            # Each epoch has a training and validation phase
            for phase in ['train', 'test']:
                if phase == 'train':
                    model.train()
                else: 
                    model.eval()
                
                running_loss = 0.0
                running_corrects = 0

                # Iterate over data.
                for input, label in zip(dataloaders[phase], resultloaders[phase]):
                    # ===== !!! Here !!! ====== 
                        # 這邊就會使用到我們Task 2+3 所建立的函式，把句子轉換成 index 的形式，還有把label轉換成向量的形式 
                        # e.g. tensor([1070,  340, 2015, 2016,   45, 2017])
                    inputs_vector = prepare_sentence_sequence(input, word_to_ix) 
                        # e.g. tensor([1., 0., 0., 0.]) 
                    labels_vector = one_hot_encode(label, tag_to_ix) 
                    # ===== !!! End !!! ====== 


                    # zero the parameter gradients 
                    optimizer.zero_grad()

                    # forward
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        # 以下就會跟小試身手類似
                        # 取得針對每個emotion的預測結果tensor  
                        outputs = model(inputs_vector) # (e.g. tensor([[-1.3948, -1.4476, -1.3804, -1.3261]]))

                        # ===== !!! Here !!! ====== 
                            # 取得最大值的index 
                        pred = torch.argmax(outputs).item() # (e.g. 2)
                            # 外面還有一層，只需取得內層 [-1.3948, -1.4476, -1.3804, -1.3261] 與 [0, 0, 1, 0] 的計算loss 
                        loss = criterion(outputs[0], labels_vector) # 
                        # ===== !!! End !!! ====== 

                        # backward + optimize only if in training phase
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()

                    # statistics
                    running_loss += loss.item()
                    if pred == label:
                        running_corrects += 1

                if phase == 'train':
                    scheduler.step()
                # 計算每個 epoch 的 loss 和 accuracy 
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects / dataset_sizes[phase]
                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} Time elapsed: {round((time.time() - since))} sec.')
                
                # 如果發現 有更好的準確度，就把模型存起來 
                if phase == 'test' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_params_path)

            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')

        # load best model weights 然後執行下一個 epoch 
        model.load_state_dict(torch.load(best_model_params_path))
    return model

你會發現要改的地方沒幾個…頂多就是

...
        # input 跟 label 的轉換 
        inputs_vector = prepare_sentence_sequence(input, word_to_ix) 
        labels_vector = one_hot_encode(label, tag_to_ix) 
        ...
        # 然後取出 pred 的 index 才可以判斷預測是否正確 
        pred = torch.argmax(outputs).item()
        # 計算 loss 時要特別取出內層的值
        loss = criterion(outputs[0], labels_vector)
...

那我們現在來準備訓練模型吧！

訓練模型

那我們先準備好訓練使用的dataset吧！

# 在這之前我們先準備好模型使用的dataset 
dataloaders = {'train': train_dataset, 'test': test_dataset}
resultloaders = {'train': train_label_pd[0].tolist(), 'test': test_label_pd[0].tolist()}
dataset_sizes = {x: len(dataloaders[x]) for x in ['train', 'test']}

首先建立 LSTM 的模型吧！

# 建立 model 
# vocab_size 要添加 1 因為如果 sentence 中有出現沒在 vocab 中的單字，使用 5000 來代替，所以要加 1
model_LSTM = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_LSTM = nn.CrossEntropyLoss()
optimizer_LSTM = optim.SGD(model_LSTM.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_LSTM = lr_scheduler.StepLR(optimizer_LSTM, step_size=7, gamma=0.1)

# 開始訓練 
modelLSTM = train_model(model_LSTM, loss_function_LSTM, optimizer_LSTM, exp_lr_scheduler_LSTM, num_epochs=30)

輸出結果

Epoch 2/29
----------
train Loss: 0.9885 Acc: 0.5840 Time elapsed: 97 sec.
test Loss: 1.1279 Acc: 0.5236 Time elapsed: 104 sec.

Epoch 3/29
----------
train Loss: 0.8893 Acc: 0.6371 Time elapsed: 132 sec.
test Loss: 1.1053 Acc: 0.5369 Time elapsed: 139 sec.

Epoch 4/29
----------
train Loss: 0.7683 Acc: 0.7003 Time elapsed: 168 sec.
test Loss: 1.0772 Acc: 0.5658 Time elapsed: 175 sec.
...
test Loss: 1.1330 Acc: 0.6059 Time elapsed: 1040 sec.

Training complete in 17m 20s
Best val Acc: 0.610134

然後建立 GRU 的模型吧！

# vocab_size 要添加 2 因為如果 sentence 中有出現沒在 vocab 中的單字，使用 5001 來代替，所以要加 1
modelGRU = GRUTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix)+1, len(tag_to_ix), dropout=0.5)
loss_function_gru = nn.CrossEntropyLoss()
optimizer_gru = optim.SGD(modelGRU.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler_gru = lr_scheduler.StepLR(optimizer_gru, step_size=7, gamma=0.1)

# 開始訓練 
modelGRU = train_model(modelGRU, loss_function_gru, optimizer_gru, exp_lr_scheduler_gru, num_epochs=30)

輸出結果

Epoch 3/29
----------
train Loss: 0.8445 Acc: 0.6702 Time elapsed: 131 sec.
test Loss: 1.1211 Acc: 0.5327 Time elapsed: 138 sec.

Epoch 4/29
----------
train Loss: 0.6843 Acc: 0.7393 Time elapsed: 166 sec.
test Loss: 1.1305 Acc: 0.5707 Time elapsed: 173 sec.
...
test Loss: 1.3237 Acc: 0.6073 Time elapsed: 1003 sec.

Training complete in 16m 43s
Best val Acc: 0.608726