前言

最近選了一堂AI課程，這是第三個作業，主要參考以下網站：

教授如何使用 Pytorch 搭建 CNN：Pytorch Tutorial
教授如何使用 TensorBoard：Pytorch TensorBoard Tutorial
在 CoLabe 使用 TensorBoard 教學：TensorBoard in CoLabe Tutorial

本篇的主要目的是理解 CNN，並試圖搭建更深層的 Network，並使用GPU加快效率，最後將結果 Loss 與猜錯的結果顯示在 TensorBoard 上。

環境設置與作業要求

環境設置：

Python 3.10.9

Pytorch 2.0.1

作業要求

Task:

先建立一個CNN：Train the same network as in the PyTorch CNN tutorial.
建立出CNN滿足以下要求：Change now the network architecture as follows and train the network:
1. Conv layer with 3x3 kernel and depth = 8, ReLu activation
2. Conv layer with 3x3 kernel and depth = 16, ReLu activation
3. Max pooling with 2x2 kernel
4. Conv layer with 3x3 kernel and depth = 32, ReLu activation
5. Conv layer with 3x3 kernel and depth = 64, ReLu activation
6. Max pooling with 2x2 kernel
7. Fully connected with 4096 nodes, ReLu activation
8. Fully connected with 1000 nodes, ReLu activation
9. Fully connected with 10 nodes, no activation
使用GPU並比較CPU結果：Run the training on the GPU and compare the training time to CPU.
把 Trainning Loss 放到 Tensorboard: Log the training loss in tensorboard.
修改表示正確的標準為只要答案在前三者output 的prediction中，就視為正確：Change the test metric as follows: A prediction is considered „correct“ if the true label is within the top three outputs of the network. Print the accuracy on the test data (with respect to this new definition).
雖機抽取五個例子是猜錯的，並放到tensorboard中：Randomly take 5 examples on which the network was wrong on the test data (according to the new definition of correct) and plot them to tensorboard together
with the true label.
在 notebook 上顯示 tensorBoard：Show the tensor board widget at the end of your notebook.

Bonus: See if you can get better by using a deeper network (or another architecture).

前置準備

先載入必要的套件

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import random
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
import time

# 把tesorBaord的結果存到 ./board/assignment_3
writer = SummaryWriter('./board/resule')

下載訓練和測試用的 Dataset 到當前文件中所在的資料夾中建立/data資料夾。為了做歸一化，把均值設定為 0.5 標準差設定為 0.5，表示圖像的範圍在 [0, 1] 之間，變成 [-1, 1] 之間。

# 定義圖像轉換，將圖像轉換成張量並進行歸一化
transform = transforms.Compose([
    transforms.ToTensor(),  # 將圖像轉換為 PyTorch 张量
    # 因為每個像素有三個通道（紅、綠、藍），且通道值通常在 [0, 1] 範圍內。
    # 所以我們要把這三個通道，進行歸一化，讓它們的範圍變成 [-1, 1]。
    # 因 [0,1] 的均值是 0.5，mean 設定 0.5 表示減去均值 0.5 使得原本的均值由 0.5 變成 0 
    # 因 [0,1] 的標準差是 0.5，std 設定 0.5 表示除以標準差 0.5 使得原本的標準差由 0.5 變成 1
    # 最後因為 mean 變成 0 標準差是 1 所以就獲得 [-1,1]
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))  # 歸一化圖像數據
])

# 定義訓練批次大小
batch_size = 4

# 下載和加載 CIFAR-10 訓練數據集
trainset = torchvision.datasets.CIFAR10(
    root='./data',  # 存儲數據的根目錄
    train=True,  # 載入訓練數據
    download=True,  # 下載數據（如果還未下載）
    transform=transform  # 應用先前定義的圖像轉換
)

# 創建一個用於訓練數據的 DataLoader，用於批次處理和數據加載
trainloader = torch.utils.data.DataLoader(
    trainset,
    batch_size=batch_size,  # 設置每個批次的大小
    shuffle=True,  # 隨機打亂數據，增加訓練的隨機性
    num_workers=2  # 使用多個工作進程來加快數據讀取速度
)

# 下載和加載 CIFAR-10 測試數據集，同樣的數據轉換和數據加載設置
testset = torchvision.datasets.CIFAR10(
    root='./data',
    train=False,  # 載入測試數據
    download=True,
    transform=transform
)

# 創建一個用於測試數據的 DataLoader
testloader = torch.utils.data.DataLoader(
    testset,
    batch_size=batch_size,
    shuffle=False
)

# 把所有的類別名稱放到一個 tuple 中
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Task 1+2 建立出 CNN

先建立一個CNN：Train the same network as in the PyTorch CNN tutorial.
建立出CNN滿足以下要求：Change now the network architecture as follows and train the network.

建立 CNN

Task 2. 建立出CNN滿足以下要求：Change now the network architecture as follows and train the network:

Conv layer with 3x3 kernel and depth = 8, ReLu activation
Conv layer with 3x3 kernel and depth = 16, ReLu activation
Max pooling with 2x2 kernel
Conv layer with 3x3 kernel and depth = 32, ReLu activation
Conv layer with 3x3 kernel and depth = 64, ReLu activation
Max pooling with 2x2 kernel
Fully connected with 4096 nodes, ReLu activation
Fully connected with 1000 nodes, ReLu activation
Fully connected with 10 nodes, no activation

import torch.nn as nn

class NewNet(nn.Module):
    def __init__(self):
        super(NewNet, self).__init__()
        # (input是上一個的depth，也就是 color 3 RGB，32*32 pixel
        self.conv1 = nn.Conv2d(3, 8, 3)  # 第1層：3x3 kernel and depth = 8 
        # (input是上一個的depth，也就是 題目要求的 8，32-3+1=30，因此是 30*30 pixel
        self.conv2 = nn.Conv2d(8, 16, 3)  # 第2層：3x3 kernel and depth = 16
        # 30/2=15，因此是 15*15 pixel
        self.pool = nn.MaxPool2d(2, 2)  # 第3層：Max pooling with 2x2 kernel
        # (input是上一個的depth，也就是 題目要求的 16，15-3+1=13，因此是 13*13 pixel
        self.conv3 = nn.Conv2d(16, 32, 3)  # 第4層：3x3 kernel and depth = 32
        # (input是上一個的depth，也就是 題目要求的 32，13-3+1=11，因此是 11*11 pixel
        self.conv4 = nn.Conv2d(32, 64, 3)  # 第5層：3x3 kernel and depth = 64 
        # 還要再做一個 Max pooling，11/2=5，因此是 5*5 pixel
        # 因此最終 input 是 64，pixel 是 5*5 因此 64*5*5
        self.fc1 = nn.Linear(64 * 5 * 5, 4096)  # 第6層：Fully connected with 4096 nodes
        self.fc2 = nn.Linear(4096, 1000)  # 第7層：Fully connected with 1000 nodes
        self.fc3 = nn.Linear(1000, 10)  # 第8層：Fully connected with 10 nodes

    def forward(self, x):
        x = F.relu(self.conv1(x))  # ReLu activation
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.pool(x)
        x = x.view(-1, 64 * 5 * 5)  # flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Task 3 + 4 GPU and loss on tensorBoard

使用GPU並比較CPU結果：Run the training on the GPU and compare the training time to CPU.
把 Trainning Loss 放到 Tensorboard: Log the training loss in tensorboard.

使用 GPU 加速 Network

因為我是使用mac，因此輸入mps，但如果你是 windows 系統，請輸入 cuda。
初始化function and optimizer

# device = torch.device("cuda" if torch.backends.mps.is_available() else "cpu")
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
net = NewNet()
net.to(device)

# Let's use a Classification Cross-Entropy loss and SGD with momentum.
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

建立訓練模型

開始撰寫訓練模型，並且把結果寫到 tensorBoard 上。

start_time = time.time()

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        # 把資料放到 gpu 或 cpu 上
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        outputs.to(device)
        loss = criterion(outputs, labels)
        loss.to(device)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 200:.3f} time elapsed: {round((time.time() - start_time) / 60)} min')
            # ...log the running loss
            # 把 loss 寫到 tensorBoard，因為每次 200 才寫進去，所以要 / 200 才是真正的 loss
            writer.add_scalar('training loss',
                            running_loss / 200,
                            epoch * len(trainloader) + i)
            running_loss = 0.0

# 計算時間用來比較 cpu 跟 gpu 的差異
print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

結果如下圖

[1,  3600] loss: 1.977 time elapsed: 1 min
[1,  3800] loss: 2.021 time elapsed: 1 min
[1,  4000] loss: 1.933 time elapsed: 1 min
[1,  4200] loss: 1.922 time elapsed: 1 min
[1,  4400] loss: 1.902 time elapsed: 1 min
[1,  4600] loss: 1.836 time elapsed: 1 min
[1,  4800] loss: 1.788 time elapsed: 1 min
[1,  5000] loss: 1.818 time elapsed: 1 min
...
[1, 12000] loss: 1.479 time elapsed: 2 min
[1, 12200] loss: 1.469 time elapsed: 2 min
[1, 12400] loss: 1.485 time elapsed: 2 min
Finished Training. Total elapsed time: 2.2 min

然後可以再寫一個 cpu 比較一下時間

# Use CPU 
device = torch.device("cpu")
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

import time
start_time = time.time()

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        # inputs, labels = data, put the inputs and labels on the device (cpu or gpu)
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        outputs.to(device)

        loss = criterion(outputs, labels)
        loss.to(device) 
        loss.backward() # 計算梯度

        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f} time elapsed: {round((time.time() - start_time) / 60)} min')
            
            running_loss = 0.0

print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

訓練結果儲存

我目前是先把模型儲存在 ./model/cifar_net.pth，之後再把它讀出來，下次就不用重新訓練了

PATH = './model/cifar_net.pth'
torch.save(net.state_dict(), PATH)
net = NewNet()
net.load_state_dict(torch.load(PATH)) # load the weights from the saved file

使用 Test Data 評估模型

Task 5. 修改表示正確的標準為只要答案在前三者output 的prediction中，就視為正確：Change the test metric as follows: A prediction is considered „correct“ if the true label is within the top three outputs of the network. Print the accuracy on the test data (with respect to this new definition).

根據作業要求，要做以下事情：

TODO 1 調整對準確率的定義，只要答案在前三者output 的prediction中，就視為正確。
TODO 2 列印出準確率，這邊我列印出 “每個類別” 的準確率和 "整個準確率“。
TODO 3 因為他要把錯誤的圖片、輸出和標籤記錄下來，因此我們要先把它們記錄下來，之後隨機選五個錯誤的會使用到。

correct = 0
total = 0

class_correct = [0] * len(classes)  # 用來記錄每個類別的正確預測數
class_total = [0] * len(classes)    # 用來記錄每個類別的總樣本數

# 用來儲存所有錯誤預測的圖片、輸出和標籤
all_errors = []

# 因為我們不訓練，所以不需要計算輸出的梯度
with torch.no_grad():
    for data in testloader:
        images, labels = data 
        outputs = net(images) 

        # 在這裡我們不需要 values 所以放 _, 但是我們需要 結果的index 所以放 predicted
        _, predicted = torch.topk(outputs, 3, dim=1)
        
        
        # 因為 testloader 是 batch，所以我們需要逐個樣本（在這種情況下為 4）進行循環
        for i in range(len(labels)):
            # Example Print out => Predicted: tensor([3, 5, 2]) Actual: 3 Correct: True
            print(f'Predicted: {predicted[i]} Actual: {labels[i]} \t Correct: {labels[i] in predicted[i]}')
            total += 1
            # 針對 key 是 labels[i] 的 class_total 加 1
            class_total[labels[i]] += 1

            # 檢查 labels[i] 是否在 predicted[i] 中，因為labels有四個值，所以用 i 來取出
            if labels[i] in predicted[i]:
                correct += 1
                class_correct[labels[i]] += 1
            else:
                # 將錯誤的圖片、輸出和標籤記錄下來
                all_errors.append((images[i], outputs[i], labels[i]))

# 計算每個類別的準確率
class_accuracies = [class_correct[i] / class_total[i] for i in range(len(classes))]

# 計算並列印新的準確度
accuracy = correct / total

結果長這樣

Predicted: tensor([3, 5, 2]) Actual: 3 	 Correct: True
Predicted: tensor([8, 0, 1]) Actual: 8 	 Correct: True
Predicted: tensor([8, 1, 9]) Actual: 8 	 Correct: True
Predicted: tensor([8, 0, 1]) Actual: 0 	 Correct: True
Predicted: tensor([4, 2, 6]) Actual: 6 	 Correct: True
Predicted: tensor([6, 3, 5]) Actual: 6 	 Correct: True
Predicted: tensor([1, 9, 5]) Actual: 1 	 Correct: True
Predicted: tensor([2, 6, 4]) Actual: 6 	 Correct: True
Predicted: tensor([3, 5, 2]) Actual: 3 	 Correct: True
Predicted: tensor([1, 8, 9]) Actual: 1 	 Correct: True
...
Predicted: tensor([5, 7, 2]) Actual: 5 	 Correct: True
Predicted: tensor([4, 2, 3]) Actual: 1 	 Correct: False
Predicted: tensor([7, 4, 2]) Actual: 7 	 Correct: True

列印出準確率

現在就可以列印出準確率了，我們可以看到準確率是 0.1，因為我們只有 10 個類別，所以隨機猜的準確率就是 0.1。

print(f'Accuracy on test data (top-3): {100 * accuracy:.2f}%')

# print each class accuracy
for i in range(len(classes)):
    print(f'Accuracy for class {classes[i]}: {100 * class_accuracies[i]:.2f}%')

# print the number of misclassified images
print(f'Total misclassified images: {len(all_errors)}')

結果會如下：

Accuracy on test data (top-3): 91.60%
Accuracy for class plane: 89.50%
Accuracy for class car: 96.30%
Accuracy for class bird: 85.20%
Accuracy for class cat: 91.80%
Accuracy for class deer: 92.40%
Accuracy for class dog: 91.40%
Accuracy for class frog: 88.30%
Accuracy for class horse: 91.50%
Accuracy for class ship: 96.20%
Accuracy for class truck: 93.40%

Total misclassified images: 840

Task 6 Random 5 errors img

Task 6. 雖機抽取五個例子是猜錯的，並放到tensorboard中：Randomly take 5 examples on which the network was wrong on the test data (according to the new definition of correct) and plot them to tensorboard together
with the true label.

設定圖片轉換函式

為了後續可以把圖片印出來，我們需要製作一個顯示圖片用的function，而因為 torchvision 資料集的輸出是範圍 [0, 1] 的 PILImage 影像。我們將它們轉換為歸一化範圍 [-1, 1] 的張量。如果我們要把圖片轉換顯示出來，我們必須進行反歸一化，也就是把歸一化後的 [-1, 1] 變回去 [0, 1]，所以我們可以透過 $\frac{x}{2} + 0.5$ 達成。

# functions to show an image
# 如果 one_channel 為 True，則函數假定輸入的影像是單通道的（通常是灰階影像），並使用灰階色圖來顯示影像。
# 如果 one_channel 為 False，則函數假定輸入的影像是三通道的（通常是彩色影像），並使用彩色色圖來顯示影像。
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel: 
        plt.imshow(npimg, cmap="Greys")
    else:
        # 因為 matplotlib 所使用的函式輸入是 (高_1, 深度_2, 寬_0) 
        # 但是 npimg 預設是 (寬_0, 高_1, 深度=顏色RGB_2) 
        # 因此我們需要透過 np.transpose 操作來將通道的順序從(寬_0, 高_1, 深度=顏色RGB_2) 轉換為(高_1, 深度_2, 寬_0)，
        plt.imshow(np.transpose(npimg, (1, 2, 0)))

隨機選擇 5 個錯誤

我們剛剛有收集好所有錯誤的 images 還有 predicts 跟 labels，依據題目要求我們需要從中選出 5 個錯誤的 images，並且把它們印出來。

def plot_classes_preds(all_errors):
    # 隨機選五個猜錯的圖片
    random_errors = random.sample(all_errors, 5)

    # 創建一個大的 matplotlib 圖表，figsize 參數用於指定圖形物件（Figure）的寬度和高度，通常以英吋為單位
    fig = plt.figure(figsize=(12, 10))

    for idx, (image, output, label) in enumerate(random_errors):
        # 參數表示：行數，列數，子圖索引（從 1 開始，放五個圖片在 長度12英吋*18英吋的圖表）
        # xticks, yticks 用來設定座標的參數，如果不想顯示座標，可以設定為空串列
        ax = fig.add_subplot(1, 5, idx+1, xticks=[], yticks=[])
        # 列印出彩色的圖片
        matplotlib_imshow(image, one_channel=True)
        # 因為 output 是一個 寬1長10 的張量，因此我們要用 dim=0 取出 column 的 top 3 
        preds = torch.topk(output, 3, dim=0).indices  # 關於 dim 請看補充
        pred_classes = [classes[p] for p in preds] # 把 index 轉成 class name，並放入 list 中會有三個 string 
        # 給當前這個圖片設定標題，顯示出預測的類別和實際的類別
        ax.set_title("\n(label: {0})\n({1})".format(
            classes[label],
            ", ".join(pred_classes)),
            color="red")

    # 最後回傳全部畫好的圖表（總共會有5個圖片）
    return fig

# 呼叫函示
plot_classes_preds(all_errors)

會如下圖

把圖片放到 tensorBoard

1
2
3

# 放到 tensorBoard
fig = plot_classes_preds(all_errors)
writer.add_figure("predictions vs. actuals", fig)

Task 7 在 notebook 上顯示 tensorBoard

Task 7. 在 notebook 上顯示 tensorBoard：Show the tensor board widget at the end of your notebook.

1
2
3

# 在 notebook 上顯示 tensorBoard
%load_ext tensorboard # 這行程式碼會載入TensorBoard的擴展，以便你可以在Notebook中執行TensorBoard。
%tensorboard --logdir board # 這行程式碼會啟動TensorBoard，並將其指向你的日誌文件夾。

補充

歸一化 vs 標準化

Ref: Preprocessing Data : 數據特徵標準化和歸一化

歸一化 vs 標準化差異？

歸一化 (Normalization)：將數據按比例縮放，使之落入一個小的特定區間，例如 [0, 1] 或 [-1, 1]。
- 公式： $\frac{x_i - min(x_i)}{max(x_i) - min(x_i)}$
標準化 (Standardization)：將數據按比例縮放，使之落入平均值為 0，方差為 1 的分佈中，因此極端值是可以不在[0, 1]的區間。
- 公式： $\frac{x_i - \mu}{sd(x)}$

這兩者的共同標準？

都是對某個特徵（column）進行縮放（scaling）而不是對某個樣本的特徵向量（row）進行縮放。

為什麼要做歸一化？

提高精準度：在機器學習算法的目標函數，許多學習算法中目標函數的基礎都是假設所有的特徵都是零均值並且具有同一階數上的平方差。如果某個特徵的平方差比其他特徵大幾個數量級，那麼它就會在學習算法中佔據主導位置，導致學習器並不能像我們說期望的那樣，從其他特徵中學習。因此，歸一化是讓不同維度之間的特徵在數值上有一定比較性，可以大大提高分類器的準確性。
提升收斂速度：經過歸一化後，最優解的尋優過程明顯會變得平緩，更容易正確的收斂到最優解。

dim ?

dim 參數的不同設置，它決定了在哪個維度上進行排名和獲取最大值。
讓我們透過一個範例來說明它們之間的差異：

如果設定 dim=0 就會是看整個 column 的最大值
如果設定 dim=1 就會是看整個 row 的最大值。

import torch

# 创建一个示例张量
output = torch.tensor([[0.2, 0.6, 0.9, 0.5, 0.3],
                       [0.4, 0.1, 0.8, 0.7, 0.2],
                       [0.5, 0.8, 0.9, 0.1, 0.2]])

# 获取每列中的前 3 个最大值及其索引
top_values_col, top_indices_col = torch.topk(output, 3, dim=0)

# 获取每行中的前 3 个最大值及其索引
top_values_row, top_indices_row = torch.topk(output, 3, dim=1)

print("Output tensor:")
print(output)

print("Top 3 values and indices per column:")
print(top_values_col)
print(top_indices_col)

print("Top 3 values and indices per row:")
print(top_values_row)
print(top_indices_row)

Output tensor:
tensor([[0.2000, 0.6000, 0.9000, 0.5000, 0.3000],
        [0.4000, 0.1000, 0.8000, 0.7000, 0.2000],
        [0.5000, 0.8000, 0.9000, 0.1000, 0.2000]])
Top 3 values and indices per column:
# 已經從大到小排好了
tensor([[0.5000, 0.8000, 0.9000, 0.7000, 0.3000],
        [0.4000, 0.6000, 0.9000, 0.5000, 0.2000],
        [0.2000, 0.1000, 0.8000, 0.1000, 0.2000]])
# 印出 column 中依序最大到最小的 index 舉例來說 第一列（0.2,0.4,0.5）的 0.5最大 因此 第一個 index 依序是 2, 1, 0 
# 這就是為什麼你看到第一列出現 [[2...], [1...], [0...]]
tensor([[2, 2, 0, 1, 0], 
        [1, 0, 2, 0, 1],
        [0, 1, 1, 2, 2]])
Top 3 values and indices per row:
# 已經從大到小排好了
tensor([[0.9000, 0.6000, 0.5000],
        [0.8000, 0.7000, 0.4000],
        [0.9000, 0.8000, 0.5000]])
# 印出 row 中一句最大到最小的 index 舉例來說 第一行 （0.2, 0.6, 0.9, 0.5, 0.3）的 0.9最大 因此 第一個 index 依序是 2, 1, 3
# 這就是為什麼妳看到第一行出現 [[2, 1, 3]
tensor([[2, 1, 3],
        [2, 3, 0],
        [2, 1, 0]])