前言

最近選了一堂AI課程，這是第四個作業，主要教授內容為以下主題：

Pick a dataset and train a model on it.
Transfer Learning - Fine Tuning.
Batch Normalization in CNN.

主要參考以下網站：

作業要求

Task:

選擇一個DataSet： Check out the torchvision DataSet of Pytorch and decide one dataset that you want to use (no
CIFAR, no ImageNet, no FashionMNIST).
印出圖片和資料大小：Show some example images of the dataset in the notebook and print the dataset size.
建構使用Batch Normalization的CNN：Design a CNN to predict on the dataset. Use a similar architecture like last time, but this time
also include batch normalization layers.
使用dataset訓練模型並印出Testing的準確率：Train the model on the dataset and measure the accuracy on hold out test data.
使用ResNet18來進行Transfer-Learning：Now use transfer learning to use a pre-trained ResNet18 on the dataset as follows:
1. 不改變別人模型訓練好的權重：ResNet18 as fixed feature extractor.
2. 使用RestNet進行Fineturned：ResNet18 finetuned on the training data.
使用EfficientNet_B5進行Fineturned：Repeat step 4 but now use EfficientNet_B5 instead of RestNet18.
比較這些不同的方法，並列印出準確度：Compare the accuracy of the different approaches on the test data and print out the training
times for each approach.

Task 0 - import package

先來導入所需的套件：

# CNN 
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn

# others
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
import time

# dataset 
import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import Flowers102

# label 
from scipy.io import loadmat
import json

cudnn.benchmark = True
plt.ion()   # interactive mode

Task 1 - 選擇一個DataSet

Ref: 為什麼是[0.485, 0.456, 0.406]進行Normalization

選擇一個DataSet： Check out the torchvision DataSet of Pytorch and decide one dataset that you want to use (no
CIFAR, no ImageNet, no FashionMNIST).

為了體驗 Transfer Learning，並且快速訓練。我們這邊使用 flower102 來作為我們的資料集。因為 flower102 沒有提供中文的 Label，我網路上找大部分都是讀取已經寫好的 .json 或是 .txt 檔案，該檔案會描述每一個 label index 對應的中文。

1
2
3

# 指定你要下載的資料及路徑 和 btach size 一次訓練的量
batch_size = 4
data_dir = '../../Data/flowers-102'

# 建立 dataset 的 classes_name 
json_data = '{"21": "fire lily", "3": "canterbury bells", "45": "bolero deep blue", "1": "pink primrose", "34": "mexican aster", "27": "prince of wales feathers", "7": "moon orchid", "16": "globe-flower", "25": "grape hyacinth", "26": "corn poppy", "79": "toad lily", "39": "siam tulip", "24": "red ginger", "67": "spring crocus", "35": "alpine sea holly", "32": "garden phlox", "10": "globe thistle", "6": "tiger lily", "93": "ball moss", "33": "love in the mist", "9": "monkshood", "102": "blackberry lily", "14": "spear thistle", "19": "balloon flower", "100": "blanket flower", "13": "king protea", "49": "oxeye daisy", "15": "yellow iris", "61": "cautleya spicata", "31": "carnation", "64": "silverbush", "68": "bearded iris", "63": "black-eyed susan", "69": "windflower", "62": "japanese anemone", "20": "giant white arum lily", "38": "great masterwort", "4": "sweet pea", "86": "tree mallow", "101": "trumpet creeper", "42": "daffodil", "22": "pincushion flower", "2": "hard-leaved pocket orchid", "54": "sunflower", "66": "osteospermum", "70": "tree poppy", "85": "desert-rose", "99": "bromelia", "87": "magnolia", "5": "english marigold", "92": "bee balm", "28": "stemless gentian", "97": "mallow", "57": "gaura", "40": "lenten rose", "47": "marigold", "59": "orange dahlia", "48": "buttercup", "55": "pelargonium", "36": "ruby-lipped cattleya", "91": "hippeastrum", "29": "artichoke", "71": "gazania", "90": "canna lily", "18": "peruvian lily", "98": "mexican petunia", "8": "bird of paradise", "30": "sweet william", "17": "purple coneflower", "52": "wild pansy", "84": "columbine", "12": "colt\'s foot", "11": "snapdragon", "96": "camellia", "23": "fritillary", "50": "common dandelion", "44": "poinsettia", "53": "primula", "72": "azalea", "65": "californian poppy", "80": "anthurium", "76": "morning glory", "37": "cape flower", "56": "bishop of llandaff", "60": "pink-yellow dahlia", "82": "clematis", "58": "geranium", "75": "thorn apple", "41": "barbeton daisy", "95": "bougainvillea", "43": "sword lily", "83": "hibiscus", "78": "lotus lotus", "88": "cyclamen", "94": "foxglove", "81": "frangipani", "74": "rose", "89": "watercress", "73": "water lily", "46": "wallflower", "77": "passion flower", "51": "petunia"}'
# load data 
cat_to_name = json.loads(json_data)
# 把 key 轉成 int，因為 dataset 的 label 從 0 開始。但是這個 json 從 1 開始，所以我們要 -1
cat_to_name = {int(k)-1:v for k,v in cat_to_name.items()}
# 排序，轉換成 dic 並印出來
class_names = dict(sorted(cat_to_name.items()))
print(class_names)

結果如下

{0: 'pink primrose', 1: 'hard-leaved pocket orchid', 2: 'canterbury bells', 3: 'sweet pea', 4: 'english marigold', 5: 'tiger lily', 6: 'moon orchid', 7: 'bird of paradise', 8: 'monkshood', 9: 'globe thistle', 10: 'snapdragon', 11: "colt's foot", 12: 'king protea', 13: 'spear thistle', 14: 'yellow iris', 15: 'globe-flower', 16: 'purple coneflower', 17: 'peruvian lily', 18: 'balloon flower', 19: 'giant white arum lily', 20: 'fire lily', 21: 'pincushion flower', 22: 'fritillary', 23: 'red ginger', 24: 'grape hyacinth', 25: 'corn poppy', 26: 'prince of wales feathers', 27: 'stemless gentian', 28: 'artichoke', 29: 'sweet william', 30: 'carnation', 31: 'garden phlox', 32: 'love in the mist', 33: 'mexican aster', 34: 'alpine sea holly', 35: 'ruby-lipped cattleya', 36: 'cape flower', 37: 'great masterwort', 38: 'siam tulip', 39: 'lenten rose', 40: 'barbeton daisy', 41: 'daffodil', 42: 'sword lily', 43: 'poinsettia', 44: 'bolero deep blue', 45: 'wallflower', 46: 'marigold', 47: 'buttercup', 48: 'oxeye daisy', 49: 'common dandelion', 50: 'petunia', 51: 'wild pansy', 52: 'primula', 53: 'sunflower', 54: 'pelargonium', 55: 'bishop of llandaff', 56: 'gaura', 57: 'geranium', 58: 'orange dahlia', 59: 'pink-yellow dahlia', 60: 'cautleya spicata', 61: 'japanese anemone', 62: 'black-eyed susan', 63: 'silverbush', 64: 'californian poppy', 65: 'osteospermum', 66: 'spring crocus', 67: 'bearded iris', 68: 'windflower', 69: 'tree poppy', 70: 'gazania', 71: 'azalea', 72: 'water lily', 73: 'rose', 74: 'thorn apple', 75: 'morning glory', 76: 'passion flower', 77: 'lotus lotus', 78: 'toad lily', 79: 'anthurium', 80: 'frangipani', 81: 'clematis', 82: 'hibiscus', 83: 'columbine', 84: 'desert-rose', 85: 'tree mallow', 86: 'magnolia', 87: 'cyclamen', 88: 'watercress', 89: 'canna lily', 90: 'hippeastrum', 91: 'bee balm', 92: 'ball moss', 93: 'foxglove', 94: 'bougainvillea', 95: 'camellia', 96: 'mallow', 97: 'mexican petunia', 98: 'bromelia', 99: 'blanket flower', 100: 'trumpet creeper', 101: 'blackberry lily'}

這邊我主要是參考官方網站Transfer Learning的寫法，改成自己想要的 dataSet，並開始下載檔案：

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        # 首先對圖像進行裁剪，然後再調整大小。它隨機選擇一個矩形區域並裁剪圖像
        # 然後將裁剪的圖像調整為指定的大小為 224x224 像素。
        transforms.RandomResizedCrop(224),
        # 設定圖像的翻轉機率，通常是一個 0 到 1 的數字，例如 0.5，表示有 50% 的機率翻轉圖像。Default value is 0.5
        transforms.RandomHorizontalFlip(),
        # 將圖像轉換成 Tensor
        transforms.ToTensor(),
        # 用數值 normalize 的方式來正規化 image 的數值，第一參數是 mean，第二個參數是 std 標準差
        # 設定 [0.485, 0.456, 0.406] 的原因可參考：https://www.geeksforgeeks.org/how-to-normalize-images-in-pytorch/
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        # 這個沒有隨機選擇區域，而是直接整圖像的尺寸，使其符合指定的大小
        transforms.Resize(256),
        # 將圖像的中心部分保留，然後調整尺寸以滿足指定的大小。
        # 用於驗證或測試數據，以確保測試圖像具有相似的特徵，並且不像 RandomResizedCrop 那樣具有隨機性
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
# 我們把 trainning 用的資料下載到 data_dir/train 資料夾，並且使用 data_transforms["train"] 這個函式來做資料的轉換
train_datasets = Flowers102(root=data_dir+"/train", split="train", download=True, transform=data_transforms["train"])
# 我們把 validation 用的資料下載到 data_dir/val 資料夾，並且使用 data_transforms["val"] 這個函式來做資料的轉換
val_datasets = Flowers102(root=data_dir+"/val", split="val", download=True, transform=data_transforms["val"])

# 指定下載 flowers102 的資料集，下載 train 和 val 的資料集
image_datasets = {x: Flowers102(root=data_dir, split=x, download=True, transform=data_transforms[x])
                    for x in ['train', 'val']}

# 轉換成 DataLoader 的形式，並且指定 batch_size
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

print("device: ",device)
print("image_datasets function call: ", dir(image_datasets["train"]))

這樣我們就完成了第一個Task，也就是下載好我們想要的 dataset。

Task 2 - 印出圖片和資料大小

印出圖片和資料大小：Show some example images of the dataset in the notebook and print the dataset size.

參考官方網站Transfer Learning的寫法，我們先建立

def imshow(inp, title=None):
    """Display image for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

# x.item() 取出 tensor 的值，通常是數字，然後在 class_names dic 找到該數字對應的英文名字
imshow(out, title=[class_names[x.item()] for x in classes])

print(inputs.shape)
print("dataset_sizes: ",dataset_sizes)

結果如下：

Task 3 & 4 - CNN + Batch Normalization

建構使用Batch Normalization的CNN：Design a CNN to predict on the dataset. Use a similar architecture like last time, but this time also include batch normalization layers.
使用dataset訓練模型並印出Testing的準確率：Train the model on the dataset and measure the accuracy on hold out test data.

根據李鴻毅教授在 Transfer Learning 提到…
通常會在 Activation Function 之前執行 Batch Normalization，有興趣可以參考這個章節。Batch Normalization 簡單來說就是以 Batch 的方式，執行 feature normalization。

為什麼要做 feature normalization ?
他就是為了讓不同的 feature 有類似接近的數值範圍，這樣模型在執行Gradient Descent的時候，w1, w2 對 loss 的影響才不會太大，他們擁有相似的數值範圍，才能夠平均的影響 loss，而不是某個 w1 對 loss 的影響遠大於 w2。

大概是下圖這種效果。

建立 Network

請注意，根據不同的 dataset 其尺寸大小還有 hidden layer的數量，你要做兩個調整！！

在 fully connection layer 中，input 要根據你的 hidden layer 執行 max-pooling 跟 convolution 的次數來決定。
然後你要根據你 dataset 的 categories 數量，調整最後一層 output layer 的output數量。

請注意程式碼中標示註解箭頭<==== 的部分

所以我們這邊的 CNN 架構如下，你可以根據需求決定是否要執行 dropout，來解開註解。
但是在我的情境中，我測試 dropout 並沒有帶來比較高的準確率。

import torch.nn as nn

class NewNet(nn.Module):
    def __init__(self):
        super(NewNet, self).__init__()
        # Layer 1: 3x3 kernel，depth = 32，224-3+1=222 => 222x222 pixel
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.bn1 = nn.BatchNorm2d(32)
        # self.dropout1 = nn.Dropout(0.5) # 可根據需求套用 dropout

        # Layer 2: Max pooling with 2x2 kernel，222/2=111 => 111x111 pixel
        self.pool = nn.MaxPool2d(2, 2)

        # Layer 3: 3x3 kernel，depth = 64，111-3+1=109 => 109x109 pixel
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.bn2 = nn.BatchNorm2d(64)
        # self.dropout2 = nn.Dropout(0.5) # 可根據需求套用 dropout

        # Layer 4: Max pooling with 2x2 kernel，109/2=54 => 54x54 pixel
        self.pool = nn.MaxPool2d(2, 2)

        # Layer 5: 3x3 kernel，depth = 128，54-3+1=52 => 52x52 pixel
        self.conv3 = nn.Conv2d(64, 128, 3)
        self.bn3 = nn.BatchNorm2d(128)
        # self.dropout3 = nn.Dropout(0.5) # 可根據需求套用 dropout  

        # Layer 6: Max pooling with 2x2 kernel，52/2=26 => 26x26 pixel
        self.pool = nn.MaxPool2d(2, 2)
        
        # Final input is 512，pixel is 26*26 => 128*26*26
        self.fc1 = nn.Linear(128 * 26 * 26, 2048) # <==== 128 * 26 * 26 根據 hidden layer 來調整
        self.fc2 = nn.Linear(2048, 1024)
        self.fc3 = nn.Linear(1024, 512)
        self.fc4 = nn.Linear(512, 102)# <==== 102 根據 dataset 的種類數量

    def forward(self, x):
        # We put the batch normalization before the activation function. 
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool(x)
        # x = self.dropout1(x) # 可根據需求套用 dropout 
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool(x)
        # x = self.dropout2(x) # 可根據需求套用 dropout  
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.pool(x)
        # x = self.dropout3(x) # 可根據需求套用 dropout  
        x = x.view(-1, 128 * 26 * 26) # <==== 128 * 26 * 26 根據 hidden layer 來調整
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

net = NewNet()
net.to(device)

並且指定 optimizer 和 loss function：

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

建立 Training Func

我們需要建立一個fucntion來執行訓練模型的動作如下：

def train(epoch, start_time):
    net.train()
    cur_count = 0 
    running_loss = 0.0
    for batch_idx, data in enumerate(dataloaders["train"], 0):
        cur_count += len(data)
        inputs, labels = data[0].to(device), data[1].to(device)
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        outputs.to(device)
        
        loss = criterion(outputs, labels)
        loss.to(device)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if batch_idx % 100 == 99:
            print(f'[{epoch}, {batch_idx + 1:5d}] loss: {running_loss / 100:.3f} time elapsed: {round((time.time() - start_time))} sec.')
            running_loss = 0.0

建立 Testing Func

def test(): 
    net.eval()  # set model to evaluation mode
    correct = 0
    total = 0
    class_correct = [0] * len(class_names)  
    class_total = [0] * len(class_names)  
    with torch.no_grad():
        for data in dataloaders["val"]:
            images, labels = data[0].to(device), data[1].to(device)
            outputs = net(images) 

            # select top 3 predictions
            _, predicted = torch.topk(outputs, 1, dim=1)
            
            # check if predicted labels are in true labels
            for i in range(len(labels)):
                total += 1
                class_total[labels[i]] += 1

                if labels[i] in predicted[i]:
                    correct += 1
                    class_correct[labels[i]] += 1

    class_accuracies = [class_correct[i] / class_total[i] for i in range(len(class_names))]
    accuracy = correct / total
    return accuracy, class_accuracies

執行 Training

為了讓訓練過程中，我們可以看到訓練的狀況，所以我們每 100 個 batch 就印出一次訓練的狀況，並且每 5 個 epoch 就印出一次 test 的狀況。

num_epochs = 100 
start_time = time.time()

accuracy, class_accuracies = test()
print(f'Accuracy on test data (top-1): {100 * accuracy:.2f}%')
for epoch in range(0, num_epochs - 1):
    print(f"============ Epoch: {epoch} ==========")
    train(epoch, start_time)
    
    # 每 5 個 epoch 執行一次 test，看一下訓練狀況
    if epoch % 5 == 0:
        accuracy, class_accuracies = test()
        # print accuracies
        print(f'Accuracy on test data (top-1): {100 * accuracy:.2f}%')

print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

結果如下

Accuracy on test data (top-1): 0.0019%
============ Epoch: 0 ==========
[0,   100] loss: 4.984 time elapsed: 40 sec.
[0,   200] loss: 4.876 time elapsed: 47 sec.
...
Accuracy on test data (top-1): 35.59%
Finished Training. Total elapsed time: 67 min

Task 5 & 4 - Transfer Learning：Resnet18

使用dataset訓練模型並印出Testing的準確率：Train the model on the dataset and measure the accuracy on hold out test data.
使用ResNet18來進行Transfer-Learning：Now use transfer learning to use a pre-trained ResNet18 on the dataset as follows:
1. 把參數fixed：ResNet18 as fixed feature extractor.
2. 使用RestNet進行Fineturned：ResNet18 finetuned on the training data.

建立 Trainning & Testing Func

根據官方的範例Transfer Learning，我是直接複製過來的。

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    # 設定開始時間，用於log印出以看每個Epoch的訓練時間
    since = time.time()

    # 建立一個暫存的資料夾，用於存放最好的模型參數
    with TemporaryDirectory() as tempdir:
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
        # 還沒訓練，但是我們先存當前的模型
        torch.save(model.state_dict(), best_model_params_path)
        best_acc = 0.0 # 設定目前最佳的 accuracy 是 0，一但比這個數字大，就會更新該數值以判斷目前最好的模型

        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)

            # 每一個 epoch 一但 train 完，就會進行 validation
            for phase in ['train', 'val']:
                # 判斷目前應該是 training 還是 validation 
                if phase == 'train':
                    model.train()  # Set model to training mode
                else:
                    model.eval()   # Set model to evaluate mode

                running_loss = 0.0
                running_corrects = 0

                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    # 放到 gpu 中
                    inputs = inputs.to(device)
                    labels = labels.to(device)

                    # 歸零梯度
                    optimizer.zero_grad()

                    # 執行 forward propagation
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1) # 選最大的那個數字當作預測的 label 
                        loss = criterion(outputs, labels) # 計算答案和預測的差距 

                        # backward + optimize only if in training phase
                        # 執行 backward propagation
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()

                    # 因為 batch_size 是 4，所以 loss 乘上 4，才是一個 batch 的 loss
                    running_loss += loss.item() * inputs.size(0)
                    # 算出一個 batch 中，有多少答對的
                    running_corrects += torch.sum(preds == labels.data)

                # 只有在 training 的時候，才會調整 learning rate
                # scheduler 是學習率lr調整器 用於在模型訓練過程中調整學習率lr 的值
                if phase == 'train':
                    scheduler.step()

                # 一整個 epoch 訓練完後，算出該 epoch 的 loss 和 accuracy
                # Avg. loss = 全部的 loss / 正個 dataset 的大小 
                epoch_loss = running_loss / dataset_sizes[phase]
                # Avg. Acc = 全部的答對數 / 正個 dataset 的大小
                epoch_acc = running_corrects.float() / dataset_sizes[phase]

                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} Time elapsed: {round((time.time() - since))} sec.')

                # 如果在 validation 的時候，一但發現 accuracy 比目前最好的還要好，就把模型參數存起來
                if phase == 'val' and epoch_acc > best_acc:
                    # 更新目前最好的 accuracy
                    best_acc = epoch_acc
                    # deep copy the model
                    torch.save(model.state_dict(), best_model_params_path)

            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')

        # load best model weights
        # 以目前最好的 model 取出來，繼續下一個 epoch 的訓練
        model.load_state_dict(torch.load(best_model_params_path))
    return model

使用 Transfer Learning

根據老師的要求，要使用 resnet18 來進行 Transfer Learning，目前根據官方說明，resent18 如果不給予參數，則預設就是 IMAGENET1K_V1，為了清楚我們到底使用哪一個 model 的參數，我們還是給予參數。

model_ft = models.resnet18(weights='IMAGENET1K_V1')

# num_ftrs is the number of input features for the last layer. 
# 抓取最後一層的輸入數量
num_ftrs = model_ft.fc.in_features

# Here the size of each output sample is set to 102.
# model_ft.fc is the final layer of the model, and used for classification.
# 自己建立最後一層 layer，並且把輸入數量設定為 num_ftrs，輸出數量設定為 102（因為這個case有102個）
model_ft.fc = nn.Linear(num_ftrs, 102)
model_ft = model_ft.to(device)

# loss function 使用 CrossEntropyLoss 
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
# optimizer 使用 SGD，learning rate = 0.001，momentum = 0.9
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
# 每 7 個 epoch 就把 learning rate 乘上 0.1 來對 lr 進行 decay
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

大概是這種感覺來進行Transfer Learning

為什麼要調整lr?
將學習率每隔一定的 epoch 進行調整是一種常見的學習率調整策略，稱為學習率衰減（learning rate decay）或學習率調度（learning rate scheduling）。這樣的效果是：

提高模型的穩定性：在訓練過程中，一開始使用相對較大的學習率，有助於快速收斂。但當訓練靠近最佳解時，較大的學習率可能導致模型在最佳解附近震盪或過度調整。透過週期性地降低學習率，模型在訓練的後期會更穩定，更接近最佳解。
防止過度擬合：週期性地降低學習率有助於防止模型在訓練集上過度擬合。當學習率降低時，模型更謹慎地調整參數，不太容易陷入訓練集中的噪聲。

在實際應用中，學習率調整策略的具體設置（例如，step_size 和 gamma 的值）通常是根據試驗和經驗來調整的，以達到最佳性能。通常，這些參數的設置取決於你的數據集大小、模型架構、問題的難度和其他因素。

開始訓練

1 2	model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)

結果如下: 準確率 89.41% 挺好的

Epoch 0/24
----------
train Loss: 4.4280 Acc: 0.0657 Time elapsed: 33 sec.
val Loss: 2.9901 Acc: 0.3118 Time elapsed: 58 sec.

Epoch 1/24
----------
train Loss: 3.3046 Acc: 0.2353 Time elapsed: 87 sec.
val Loss: 1.6604 Acc: 0.5941 Time elapsed: 112 sec.

Epoch 2/24
----------
train Loss: 2.5080 Acc: 0.4029 Time elapsed: 141 sec.
val Loss: 1.2243 Acc: 0.6951 Time elapsed: 166 sec.

Epoch 3/24
----------
train Loss: 1.9871 Acc: 0.5196 Time elapsed: 195 sec.
val Loss: 0.9578 Acc: 0.7216 Time elapsed: 219 sec.

Epoch 4/24
----------
train Loss: 1.5865 Acc: 0.6225 Time elapsed: 249 sec.
val Loss: 0.6911 Acc: 0.8108 Time elapsed: 273 sec.
...
val Loss: 0.3919 Acc: 0.8912 Time elapsed: 1313 sec.

Training complete in 21m 53s
Best val Acc: 0.894118

使用 ResNet18 作為 fixed feature extractor

因為作業有要求，要使用 ResNet18 作為 fixed feature extractor，所以我們要把所有的參數都設定為不可訓練，只有最後一層的參數是可以訓練的，簡單來說就是別人訓練好的 model 你就不要改人家的 weight 了拉。要改的地方就是，把 model 的每個 parameters 的 requires_grad 都設定為 False。這樣，我們就可以把 ResNet18 當作 fixed feature extractor 來使用。

# 其他都老樣子
model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')

# !!! 添加這兩行，把 requires_grad 設定為 False，這樣就不會更新該參數了
for param in model_conv.parameters():
    param.requires_grad = False

# 其他都老樣子
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 102)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

# 然後我們開始訓練 
model_conv_SGD = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)

結果如下：準確率 79.11% 比較差一點，但是這樣的訓練速度會比較快

Epoch 0/24
----------
train Loss: 4.6979 Acc: 0.0176 Time elapsed: 24 sec.
val Loss: 3.9863 Acc: 0.1235 Time elapsed: 48 sec.

Epoch 1/24
----------
train Loss: 4.0589 Acc: 0.1137 Time elapsed: 72 sec.
val Loss: 3.1125 Acc: 0.3608 Time elapsed: 95 sec.

Epoch 2/24
----------
train Loss: 3.4935 Acc: 0.2304 Time elapsed: 119 sec.
val Loss: 2.5003 Acc: 0.4912 Time elapsed: 142 sec.

Epoch 3/24
----------
train Loss: 3.1030 Acc: 0.3422 Time elapsed: 165 sec.
val Loss: 2.1583 Acc: 0.5510 Time elapsed: 189 sec.

Epoch 4/24
----------
train Loss: 2.7367 Acc: 0.4402 Time elapsed: 212 sec.
val Loss: 1.7064 Acc: 0.6304 Time elapsed: 236 sec.
...
val Loss: 1.0910 Acc: 0.7824 Time elapsed: 1179 sec.

Training complete in 19m 39s
Best val Acc: 0.791176

Task 6 & 4 - Transfer Learning：EfficientNet_B5

使用dataset訓練模型並印出Testing的準確率：Train the model on the dataset and measure the accuracy on hold out test data.
使用EfficientNet_B5進行Fineturned：Repeat step 4 but now use EfficientNet_B5 instead of RestNet18.

接下來，我們需要把 RestNet18 根據題目要求換成別的訓練好的模型，你可能會需要先透過 pip 安裝 efficientnet_pytorch：

1	pip install efficientnet_pytorch

然後再執行下面的程式碼：

from efficientnet_pytorch import EfficientNet

# Load the pre-trained EfficientNet-B5 model
model_ft = EfficientNet.from_pretrained('efficientnet-b5')

# 一樣，我們先取得該 model 最後一層 layer 的輸入數量
num_ftrs = model_ft._fc.in_features
# 建立一個新的 layer，輸入數量是 num_ftrs，輸出數量是 102（因為這個case有102個）
model_ft.fc = nn.Linear(num_ftrs, 102)
# 把 model 放到 GPU 中
model_ft = model_ft.to(device)

# Loss function
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

# 開始訓練
model_ft_effb5 = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

結果如下：準確率 73.33% 好像沒有比較好。

Epoch 0/24
----------
train Loss: 6.2860 Acc: 0.0157 Time elapsed: 156 sec.
val Loss: 5.6637 Acc: 0.0382 Time elapsed: 200 sec.

Epoch 1/24
----------
train Loss: 4.8955 Acc: 0.1039 Time elapsed: 322 sec.
val Loss: 4.5101 Acc: 0.2392 Time elapsed: 365 sec.

Epoch 2/24
----------
train Loss: 3.8566 Acc: 0.2422 Time elapsed: 485 sec.
val Loss: 3.6194 Acc: 0.4265 Time elapsed: 529 sec.

Epoch 3/24
----------
train Loss: 3.0979 Acc: 0.3637 Time elapsed: 653 sec.
val Loss: 2.8613 Acc: 0.5539 Time elapsed: 696 sec.

Epoch 4/24
----------
train Loss: 2.4323 Acc: 0.4725 Time elapsed: 818 sec.
val Loss: 2.2894 Acc: 0.6725 Time elapsed: 863 sec.
...


Epoch 24/24
----------
train Loss: 1.3168 Acc: 0.7343 Time elapsed: 4769 sec.
val Loss: 1.3167 Acc: 0.8167 Time elapsed: 4812 sec.

Training complete in 80m 12s
Best val Acc: 0.821569

Task 7 - 討論

比較這些不同的方法，並列印出準確度：Compare the accuracy of the different approaches on the test data and print out the training

從前面開始，我們測試了幾個方法：

使用自己建立的 CNN
使用Transfer Learning Resnet18
使用Transfer Learning EfficientNet-B5

他們的數據大概如下：

Model	Accuracy	Training Time	Result
自建 CNN	`35%`	`超過1小時`	最差
Resnet18	89.41%	21分鐘	準確率最高
Resnet18 (fixed feature extractor)	79.11%	19分鐘	時間最短
EfficientNet-B5	82.15%	80分鐘	普普