Introduction

I recently enrolled in an AI course, and this is the third assignment. It mainly refers to the following websites:

The main purpose of this article is to understand CNNs, try to build a deeper network, use GPU to improve efficiency, and finally display the results of Loss and mispredicted results on TensorBoard.

Environment Setup and Homework Requirements

Environment setup:

  • Python 3.10.9
  • Pytorch 2.0.1

Homework Requirements

Task:

  1. First build a CNN: Train the same network as in the PyTorch CNN tutorial.
  2. Build a CNN that meets the following requirements: Change the network architecture as follows and train the network:
    1. Conv layer with 3x3 kernel and depth = 8, ReLu activation
    2. Conv layer with 3x3 kernel and depth = 16, ReLu activation
    3. Max pooling with 2x2 kernel
    4. Conv layer with 3x3 kernel and depth = 32, ReLu activation
    5. Conv layer with 3x3 kernel and depth = 64, ReLu activation
    6. Max pooling with 2x2 kernel
    7. Fully connected with 4096 nodes, ReLu activation
    8. Fully connected with 1000 nodes, ReLu activation
    9. Fully connected with 10 nodes, no activation
  3. Use GPU and compare with CPU results: Run the training on the GPU and compare the training time to CPU.
  4. Log Training Loss to TensorBoard: Log the training loss in TensorBoard.
  5. Modify the criterion for correctness to include predictions in the top three outputs: Change the test metric as follows: A prediction is considered „correct“ if the true label is within the top three outputs of the network. Print the accuracy on the test data (with respect to this new definition).
  6. Randomly select five examples of incorrect predictions and display them on TensorBoard: Randomly take 5 examples on which the network was wrong on the test data (according to the new definition of correct) and plot them to TensorBoard together with the true label.
  7. Display TensorBoard in the notebook: Show the TensorBoard widget at the end of your notebook.
  • Bonus: See if you can improve results by using a deeper network (or another architecture).

Preliminary Preparation

  1. First, load the necessary packages
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import random
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
import time

# Store the TensorBoard results in ./board/assignment_3
writer = SummaryWriter('./board/result')

# 1. Download the training and testing datasets into a `/data` folder created in the current directory. To normalize, set the mean to 0.5 and the standard deviation to 0.5, indicating that the image range is between [0, 1] and is converted to [-1, 1].
```python
# Define image transformation, converting images to tensors and normalizing them
transform = transforms.Compose([
transforms.ToTensor(), # Convert images to PyTorch tensors
# Since each pixel has three channels (red, green, blue) and channel values are typically in the [0, 1] range.
# We normalize these three channels to bring their range to [-1, 1].
# Since the mean of [0,1] is 0.5, setting mean to 0.5 means subtracting the mean of 0.5 to shift the original mean from 0.5 to 0
# Since the standard deviation of [0,1] is 0.5, setting std to 0.5 means dividing by the standard deviation of 0.5 to change the original standard deviation from 0.5 to 1
# Finally, as the mean becomes 0 and standard deviation is 1, we obtain the range [-1,1]
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)) # Normalize image data
])

# Define batch size for training
batch_size = 4

# Download and load the CIFAR-10 training dataset
trainset = torchvision.datasets.CIFAR10(
root='./data', # Root directory for storing data
train=True, # Load training data
download=True, # Download data (if not already downloaded)
transform=transform # Apply previously defined image transformation
)

# Create a DataLoader for the training data for batch processing and data loading
trainloader = torch.utils.data.DataLoader(
trainset,
batch_size=batch_size, # Set the size of each batch
shuffle=True, # Randomly shuffle data to increase the randomness of training
num_workers=2 # Use multiple worker processes to speed up data loading
)

# Download and load the CIFAR-10 test dataset, with the same data transformation and data loading settings
testset = torchvision.datasets.CIFAR10(
root='./data',
train=False, # Load test data
download=True,
transform=transform
)

# Create a DataLoader for the test data
testloader = torch.utils.data.DataLoader(
testset,
batch_size=batch_size,
shuffle=False
)

# Place all class names in a tuple
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Task 1+2 Build a CNN

  1. First build a CNN: Train the same network as in the PyTorch CNN tutorial.
  2. Build a CNN that meets the following requirements: Change the network architecture as follows and train the network.

Build a CNN

Task 2. Build a CNN that meets the following requirements: Change the network architecture as follows and train the network:

  1. Conv layer with 3x3 kernel and depth = 8, ReLu activation
  2. Conv layer with 3x3 kernel and depth = 16, ReLu activation
  3. Max pooling with 2x2 kernel
  4. Conv layer with 3x3 kernel and depth = 32, ReLu activation
  5. Conv layer with 3x3 kernel and depth = 64, ReLu activation
  6. Max pooling with 2x2 kernel
  7. Fully connected with 4096 nodes, ReLu activation
  8. Fully connected with 1000 nodes, ReLu activation
  9. Fully connected with 10 nodes, no activation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torch.nn as nn

class NewNet(nn.Module):
def __init__(self):
super(NewNet, self).__init__()
# The input is the depth of the previous layer, which is the color 3 RGB, 32*32 pixel
self.conv1 = nn.Conv2d(3, 8, 3) # Layer 1: 3x3 kernel and depth = 8
# The input is the depth of the previous layer, which is the requested 8, 32-3+1=30, thus 30*30 pixel
self.conv2 = nn.Conv2d(8, 16, 3) # Layer 2: 3x3 kernel and depth = 16
# 30/2=15, thus 15*15 pixel
self.pool = nn.MaxPool2d(2, 2) # Layer 3: Max pooling with 2x2 kernel
# The input is the depth of the previous layer, which is the requested 16, 15-3+1=13, thus 13*13 pixel
self.conv3 = nn.Conv2d(16, 32, 3) # Layer 4: 3x3 kernel and depth = 32
# The input is the depth of the previous layer, which is the requested 32, 13-3+1=11, thus 11*11 pixel
self.conv4 = nn.Conv2d(32, 64, 3) # Layer 5: 3x3 kernel and depth = 64
# Another Max pooling is needed, 11/2=5, thus 5*5 pixel
# Therefore, the final input is 64, and the pixel is 5*5, hence 64*5*5
self.fc1 = nn.Linear(64 * 5 * 5, 4096) # Layer 6: Fully connected with 4096 nodes
self.fc2 = nn.Linear(4096, 1000) # Layer 7: Fully connected with 1000 nodes
self.fc3 = nn.Linear(1000, 10) # Layer 8: Fully connected with 10 nodes

def forward(self, x):
x = F.relu(self.conv1(x)) # ReLu activation
x = F.relu(self.conv2(x))
x = self.pool(x)
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
x = self.pool(x)
x = x.view(-1, 64 * 5 * 5) # Flatten the tensor
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

Task 3 + 4 GPU and Loss on TensorBoard

  1. Use GPU and Compare CPU Results: Run the training on the GPU and compare the training time to CPU.
  2. Log Training Loss on TensorBoard: Log the training loss in TensorBoard.

Accelerate Network Using GPU

Since I am using a Mac, I input mps, but if you are using a Windows system, please input cuda.
Initialize function and optimizer.

1
2
3
4
5
6
7
8
# device = torch.device("cuda" if torch.backends.mps.is_available() else "cpu")
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
net = NewNet()
net.to(device)

# Let's use a Classification Cross-Entropy loss and SGD with momentum.
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Build Training Model

Start writing the training model, and log the results to TensorBoard.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
start_time = time.time()

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
# inputs, labels = data, put the inputs and labels on the device (cpu or gpu)
inputs, labels = data[0].to(device), data[1].to(device)

# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
outputs.to(device)
loss = criterion(outputs, labels)
loss.to(device)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 200 == 199: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 200:.3f} time elapsed: {round((time.time() - start_time) / 60)} min')
# ...log the running loss
# put the loss into tensorBoard, because it only writes every 200, so / 200 is the real loss
writer.add_scalar('training loss',
running_loss / 200,
epoch * len(trainloader) + i)
running_loss = 0.0

# calculate the total time for training, and round it to 1 decimal place
print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
[1,  3600] loss: 1.977 time elapsed: 1 min
[1, 3800] loss: 2.021 time elapsed: 1 min
[1, 4000] loss: 1.933 time elapsed: 1 min
[1, 4200] loss: 1.922 time elapsed: 1 min
[1, 4400] loss: 1.902 time elapsed: 1 min
[1, 4600] loss: 1.836 time elapsed: 1 min
[1, 4800] loss: 1.788 time elapsed: 1 min
[1, 5000] loss: 1.818 time elapsed: 1 min
...
[1, 12000] loss: 1.479 time elapsed: 2 min
[1, 12200] loss: 1.469 time elapsed: 2 min
[1, 12400] loss: 1.485 time elapsed: 2 min
Finished Training. Total elapsed time: 2.2 min

Then you can write a cpu to compare the time

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Use CPU 
device = torch.device("cpu")
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

import time
start_time = time.time()

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
# inputs, labels = data, put the inputs and labels on the device (cpu or gpu)
inputs, labels = data[0].to(device), data[1].to(device)

# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
outputs.to(device)

loss = criterion(outputs, labels)
loss.to(device)
loss.backward()

optimizer.step()

# print statistics
running_loss += loss.item()
if i % 200 == 199: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f} time elapsed: {round((time.time() - start_time) / 60)} min')

running_loss = 0.0

print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

Save Training Results

I am currently saving the model in ./model/cifar_net.pth, and then reading it back later, so that I don’t have to retrain next time.

1
2
3
4
PATH = './model/cifar_net.pth'
torch.save(net.state_dict(), PATH)
net = NewNet()
net.load_state_dict(torch.load(PATH)) # load the weights from the saved file

Evaluate the Model Using Test Data

Task 5. Modify the Criterion for Correctness to Include Predictions in the Top Three Outputs: Change the test metric as follows: A prediction is considered „correct“ if the true label is within the top three outputs of the network. Print the accuracy on the test data (with respect to this new definition).

According to the assignment requirements, we need to do the following:

  1. TODO 1 Adjust the definition of accuracy to consider a prediction correct if the answer is among the top three outputs.
  2. TODO 2 Print the accuracy, here I print out the accuracy “for each category” and “overall accuracy.”
  3. TODO 3 Since we need to record the wrong images, outputs, and labels, we first record them and then randomly select five wrong ones for later use.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
correct = 0
total = 0

class_correct = [0] * len(classes) # Used to record the number of correct predictions for each class
class_total = [0] * len(classes) # Used to record the total number of samples for each class

# Used to store all the misclassified images, outputs, and labels
all_errors = []

# Since we are not training, we don't need to compute gradients of the outputs
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)

# We use _, predicted because we don't need the values, but we need the indices of the results
_, predicted = torch.topk(outputs, 3, dim=1)

# Since testloader is a batch, we need to loop through individual samples (in this case, 4 samples)
for i in range(len(labels)):
# Example Print out => Predicted: tensor([3, 5, 2]) Actual: 3 Correct: True
print(f'Predicted: {predicted[i]} Actual: {labels[i]} \t Correct: {labels[i] in predicted[i]}')
total += 1
# Increment class_total for the class with key labels[i]
class_total[labels[i]] += 1

# Check if labels[i] is in predicted[i] since labels has four values, we use i to access them
if labels[i] in predicted[i]:
correct += 1
class_correct[labels[i]] += 1
else:
# Record the misclassified image, output, and label
all_errors.append((images[i], outputs[i], labels[i]))

# Calculate the accuracy for each class
class_accuracies = [class_correct[i] / class_total[i] for i in range(len(classes))]

# Calculate and print the new overall accuracy
accuracy = correct / total

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Predicted: tensor([3, 5, 2]) Actual: 3 	 Correct: True
Predicted: tensor([8, 0, 1]) Actual: 8 Correct: True
Predicted: tensor([8, 1, 9]) Actual: 8 Correct: True
Predicted: tensor([8, 0, 1]) Actual: 0 Correct: True
Predicted: tensor([4, 2, 6]) Actual: 6 Correct: True
Predicted: tensor([6, 3, 5]) Actual: 6 Correct: True
Predicted: tensor([1, 9, 5]) Actual: 1 Correct: True
Predicted: tensor([2, 6, 4]) Actual: 6 Correct: True
Predicted: tensor([3, 5, 2]) Actual: 3 Correct: True
Predicted: tensor([1, 8, 9]) Actual: 1 Correct: True
...
Predicted: tensor([5, 7, 2]) Actual: 5 Correct: True
Predicted: tensor([4, 2, 3]) Actual: 1 Correct: False
Predicted: tensor([7, 4, 2]) Actual: 7 Correct: True

Now we can print the accuracy. We can see that the accuracy is 0.1 because we have only 10 classes, so the random guessing accuracy is 0.1.

1
2
3
4
5
6
7
8
print(f'Accuracy on test data (top-3): {100 * accuracy:.2f}%')

# print each class accuracy
for i in range(len(classes)):
print(f'Accuracy for class {classes[i]}: {100 * class_accuracies[i]:.2f}%')

# print the number of misclassified images
print(f'Total misclassified images: {len(all_errors)}')

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
Accuracy on test data (top-3): 91.60%
Accuracy for class plane: 89.50%
Accuracy for class car: 96.30%
Accuracy for class bird: 85.20%
Accuracy for class cat: 91.80%
Accuracy for class deer: 92.40%
Accuracy for class dog: 91.40%
Accuracy for class frog: 88.30%
Accuracy for class horse: 91.50%
Accuracy for class ship: 96.20%
Accuracy for class truck: 93.40%

Total misclassified images: 840

Task 6 Random 5 errors img

Task 6. Randomly select five examples that were incorrectly predicted by the model and display them in TensorBoard:
Randomly take 5 examples on which the network was wrong on the test data (according to the new definition of correct) and plot them to TensorBoard together with the true label.

Setting up the Image Transformation Function

In order to display the images later, we need to create a function for displaying images. Since the output of torchvision datasets is PILImage images with a range of [0, 1], we need to convert them to tensors with a normalized range of [-1, 1]. If we want to display the images, we need to perform the reverse normalization to go from the normalized range [-1, 1] back to [0, 1]. We can achieve this using the formula x2+0.5\frac{x}{2} + 0.5.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Functions to show an image
# If one_channel is True, the function assumes the input image is single-channel (usually grayscale) and displays it using a grayscale colormap.
# If one_channel is False, the function assumes the input image is three-channel (usually color) and displays it using a color colormap.
def matplotlib_imshow(img, one_channel=False):
if one_channel:
img = img.mean(dim=0)
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
if one_channel:
plt.imshow(npimg, cmap="Greys")
else:
# Because the function used by matplotlib expects input as (height_1, depth_2, width_0)
# while npimg defaults to (width_0, height_1, depth=RGB color_2)
# We need to use np.transpose to change the channel order from (width_0, height_1, depth=RGB color_2) to (height_1, depth_2, width_0)
plt.imshow(np.transpose(npimg, (1, 2, 0)))

Randomly Select 5 Errors

We have just collected all the images, predictions, and labels for errors. According to the task requirements, we need to randomly select 5 images with incorrect predictions and print them out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def plot_classes_preds(all_errors):
# Randomly select five misclassified images
random_errors = random.sample(all_errors, 5)

# Create a large matplotlib figure, the figsize parameter is used to specify the width and height of the figure in inches
fig = plt.figure(figsize=(12, 10))

for idx, (image, output, label) in enumerate(random_errors):
# Parameters indicate: number of rows, number of columns, subplot index (starting from 1, placing five images in a 12-inch by 18-inch figure)
# xticks and yticks are used to set coordinate parameters; if you don't want to display coordinates, you can set them as empty lists
ax = fig.add_subplot(1, 5, idx+1, xticks=[], yticks=[])
# Display the color image
matplotlib_imshow(image, one_channel=True)
# Since output is a 1x10 tensor, we use dim=0 to extract the top 3 columns
preds = torch.topk(output, 3, dim=0).indices # See the explanation regarding dim in the notes
pred_classes = [classes[p] for p in preds] # Convert indices to class names and store them in a list (there will be three strings)
# Set the title for the current image, displaying the predicted class and the actual class
ax.set_title("\n(label: {0})\n({1})".format(
classes[label],
", ".join(pred_classes)),
color="red")

# Finally, return the entire set of prepared figures (there will be a total of 5 images)
return fig

# Call the function
plot_classes_preds(all_errors)

Result

Put the image into tensorBoard

1
2
3
# put on tensorBoard
fig = plot_classes_preds(all_errors)
writer.add_figure("predictions vs. actuals", fig)

Task 7 Show tensorBoard on notebook

  1. Display TensorBoard in the notebook: Show the TensorBoard widget at the end of your notebook.
1
2
3
# Displaying TensorBoard in the notebook
%load_ext tensorboard # This line of code loads the TensorBoard extension so that you can run TensorBoard within the notebook.
%tensorboard --logdir board # This line of code starts TensorBoard and points it to your log folder.

Supplementary Information

Normalization vs. Standardization

Normalization vs. Standardization: What’s the Difference?

  • Normalization: Scaling data proportionally to fit into a small specific range, such as [0, 1] or [-1, 1].
    • Formula: ximin(xi)max(xi)min(xi)\frac{x_i - min(x_i)}{max(x_i) - min(x_i)}
  • Standardization: Scaling data proportionally to fit into a distribution with a mean of 0 and a standard deviation of 1, so extreme values may not fall within [0, 1].
    • Formula: xiμsd(x)\frac{x_i - \mu}{sd(x)}

Common Standards for Both

  • Both techniques scale individual features (columns) and not the feature vectors of individual samples (rows).

Why Normalize Data?

  1. Improved Precision: Many machine learning algorithms are based on objective functions that assume all features have zero mean and the same order of magnitude for variances. If the variance of a feature is orders of magnitude larger than that of other features, it will dominate the learning algorithm and prevent it from learning correctly. Therefore, normalization is done to make different dimensions of features comparable, significantly improving the classifier’s accuracy.
  2. Faster Convergence: After normalization, the process of finding the optimal solution is noticeably smoother, making it easier to converge to the optimal solution.

dim?

The dim parameter in PyTorch determines the dimension along which ranking and obtaining the maximum values occur. Let’s explain the differences using an example:

  • If you set dim=0, it will look at the maximum values for the entire column.
  • If you set dim=1, it will look at the maximum values for the entire row.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch

# Create an example tensor
output = torch.tensor([[0.2, 0.6, 0.9, 0.5, 0.3],
[0.4, 0.1, 0.8, 0.7, 0.2],
[0.5, 0.8, 0.9, 0.1, 0.2]])

# Get the top 3 maximum values and their indices per column
top_values_col, top_indices_col = torch.topk(output, 3, dim=0)

# Get the top 3 maximum values and their indices per row
top_values_row, top_indices_row = torch.topk(output, 3, dim=1)

print("Output tensor:")
print(output)

print("Top 3 values and indices per column:")
print(top_values_col)
print(top_indices_col)

print("Top 3 values and indices per row:")
print(top_values_row)
print(top_indices_row)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Output tensor:
tensor([[0.2000, 0.6000, 0.9000, 0.5000, 0.3000],
[0.4000, 0.1000, 0.8000, 0.7000, 0.2000],
[0.5000, 0.8000, 0.9000, 0.1000, 0.2000]])
Top 3 values and indices per column:
# Already sorted from largest to smallest
tensor([[0.5000, 0.8000, 0.9000, 0.7000, 0.3000],
[0.4000, 0.6000, 0.9000, 0.5000, 0.2000],
[0.2000, 0.1000, 0.8000, 0.1000, 0.2000]])
# Printing the indices in each column from largest to smallest. For example, in the first column (0.2, 0.4, 0.5), 0.5 is the largest,
# so the indices in order are 2, 1, 0. This is why you see [[2...], [1...], [0...]] in the output.
tensor([[2, 2, 0, 1, 0],
[1, 0, 2, 0, 1],
[0, 1, 1, 2, 2]])
Top 3 values and indices per row:
# Already sorted from largest to smallest
tensor([[0.9000, 0.6000, 0.5000],
[0.8000, 0.7000, 0.4000],
[0.9000, 0.8000, 0.5000]])
# Printing the indices in each row from largest to smallest. For example, in the first row (0.2, 0.6, 0.9, 0.5, 0.3),
# 0.9 is the largest, so the indices in order are 2, 1, 3. This is why you see [[2, 1, 3] in the output.
tensor([[2, 1, 3],
[2, 3, 0],
[2, 1, 0]])