Preface

I recently took an ai course, this is the fourth assignment and the main topics taught are the following.

  1. selecting a dataset and training a model on it.
  2. migration learning - fine tuning.
  3. batch normalization in CNN.

The main references are the following websites: 1.

  1. Flower102 dataset
  2. Migration Learning
  3. Pytorch dataset
  4. Migration Learning Model
  5. Shannon’s Transfer Learning Blog
  6. Resnet18

Assignment Requirements

Tasks

  1. Choose a dataset*: Look at torchvision Pytorch’s dataset and decide which dataset you want to use (excluding
    CIFAR, ImageNet, FashionMNIST).
  2. Print images and profile sizes: show some sample images of the dataset in your notebook and print the size of the dataset.
  3. Construct a CNN using batch normalization: design a CNN to make predictions on the dataset. Use a similar architecture as last time, but this time
    also includes a batch normalization layer.
  4. Train a model using a dataset and print out the accuracy of the test: train a model on a dataset and measure the accuracy on retained test data.
  5. Use ResNet18 for Migration Learning: now use migration learning to use a pre-trained ResNet18 on the dataset as follows:
  6. Without changing the trained weights of other people’s models: ResNet18 is used as a fixed feature extractor.
  7. Fine-tuning using RestNet : ResNet18 is fine-tuned on the training data.
  8. Fine-tuning using EfficientNet_B5: Repeat step 4 but now use EfficientNet_B5 instead of RestNet18.
    Compare these different methods and print out the accuracy: Compare the accuracy of the different methods on the test data and print out the training time for each method.
    Training time for each method.

Task 0 - Importing Packages

Let’s start by importing the required package: # Task 0 - import package

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# CNN 
import torch.nn.functional as F
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn

# others
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory
import time

# dataset
import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import Flowers102

# label
from scipy.io import loadmat
import json

cudnn.benchmark = True
plt.ion() # interactive mode

Task 1 - Selecting a DataSet

Select a DataSet: Check out the torchvision DataSet of Pytorch and decide one dataset that you want to use (no
CIFAR, no ImageNet, no FashionMNIST).

In order to experience Transfer Learning and train it quickly, we use flower102 here. We use flower102 as our dataset. Since flower102 doesn’t provide Chinese labels, most of my searching on the web is done by reading the .json or .txt files that are already written, which describes each label index in Chinese.

1
2
3
# Specify the data you want to download, the path and btach size, and the amount of training to do at once.
batch_size = 4
data_dir = '... /... /Data/flowers-102'
1
2
3
4
5
6
7
8
9
# Build the classes_name of the dataset
json_data = '{"21": "fire lily", "3": "canterbury bells", "45": "bolero deep blue", "1": "pink primrose", "34": "mexican aster", "27": "prince of wales feathers", "7": "moon orchid", "16": "globe-flower", "25": "grape hyacinth", "26": "corn poppy", "79": "toad lily", "39": "siam tulip", "24": "red ginger", "67": "spring crocus", "35": "alpine sea holly", "32": "garden phlox", "10": "globe thistle", "6": "tiger lily", "93": "ball moss", "33": "love in the mist", "9": "monkshood", "102": "blackberry lily", "14": "spear thistle", "19": "balloon flower", "100": "blanket flower", "13": "king protea", "49": "oxeye daisy", "15": "yellow iris", "61": "cautleya spicata", "31": "carnation", "64": "silverbush", "68": "bearded iris", "63": "black-eyed susan", "69": "windflower", "62": "japanese anemone", "20": "giant white arum lily", "38": "great masterwort", "4": "sweet pea", "86": "tree mallow", "101": "trumpet creeper", "42": "daffodil", "22": "pincushion flower", "2": "hard-leaved pocket orchid", "54": "sunflower", "66": "osteospermum", "70": "tree poppy", "85": "desert-rose", "99": "bromelia", "87": "magnolia", "5": "english marigold", "92": "bee balm", "28": "stemless gentian", "97": "mallow", "57": "gaura", "40": "lenten rose", "47": "marigold", "59": "orange dahlia", "48": "buttercup", "55": "pelargonium", "36": "ruby-lipped cattleya", "91": "hippeastrum", "29": "artichoke", "71": "gazania", "90": "canna lily", "18": "peruvian lily", "98": "mexican petunia", "8": "bird of paradise", "30": "sweet william", "17": "purple coneflower", "52": "wild pansy", "84": "columbine", "12": "colt\'s foot", "11": "snapdragon", "96": "camellia", "23": "fritillary", "50": "common dandelion", "44": "poinsettia", "53": "primula", "72": "azalea", "65": "californian poppy", "80": "anthurium", "76": "morning glory", "37": "cape flower", "56": "bishop of llandaff", "60": "pink-yellow dahlia", "82": "clematis", "58": "geranium", "75": "thorn apple", "41": "barbeton daisy", "95": "bougainvillea", "43": "sword lily", "83": "hibiscus", "78": "lotus lotus", "88": "cyclamen", "94": "foxglove", "81": "frangipani", "74": "rose", "89": "watercress", "73": "water lily", "46": "wallflower", "77": "passion flower", "51": "petunia"}'
# load data
cat_to_name = json.loads(json_data)
# Turn the key into an int, because the label of the dataset starts from 0. But this json starts from 1, so we have to -1
cat_to_name = {int(k)-1:v for k,v in cat_to_name.items()}
# sort and print
class_names = dict(sorted(cat_to_name.items()))
print(class_names)

Result

1
{0: 'pink primrose', 1: 'hard-leaved pocket orchid', 2: 'canterbury bells', 3: 'sweet pea', 4: 'english marigold', 5: 'tiger lily', 6: 'moon orchid', 7: 'bird of paradise', 8: 'monkshood', 9: 'globe thistle', 10: 'snapdragon', 11: "colt's foot", 12: 'king protea', 13: 'spear thistle', 14: 'yellow iris', 15: 'globe-flower', 16: 'purple coneflower', 17: 'peruvian lily', 18: 'balloon flower', 19: 'giant white arum lily', 20: 'fire lily', 21: 'pincushion flower', 22: 'fritillary', 23: 'red ginger', 24: 'grape hyacinth', 25: 'corn poppy', 26: 'prince of wales feathers', 27: 'stemless gentian', 28: 'artichoke', 29: 'sweet william', 30: 'carnation', 31: 'garden phlox', 32: 'love in the mist', 33: 'mexican aster', 34: 'alpine sea holly', 35: 'ruby-lipped cattleya', 36: 'cape flower', 37: 'great masterwort', 38: 'siam tulip', 39: 'lenten rose', 40: 'barbeton daisy', 41: 'daffodil', 42: 'sword lily', 43: 'poinsettia', 44: 'bolero deep blue', 45: 'wallflower', 46: 'marigold', 47: 'buttercup', 48: 'oxeye daisy', 49: 'common dandelion', 50: 'petunia', 51: 'wild pansy', 52: 'primula', 53: 'sunflower', 54: 'pelargonium', 55: 'bishop of llandaff', 56: 'gaura', 57: 'geranium', 58: 'orange dahlia', 59: 'pink-yellow dahlia', 60: 'cautleya spicata', 61: 'japanese anemone', 62: 'black-eyed susan', 63: 'silverbush', 64: 'californian poppy', 65: 'osteospermum', 66: 'spring crocus', 67: 'bearded iris', 68: 'windflower', 69: 'tree poppy', 70: 'gazania', 71: 'azalea', 72: 'water lily', 73: 'rose', 74: 'thorn apple', 75: 'morning glory', 76: 'passion flower', 77: 'lotus lotus', 78: 'toad lily', 79: 'anthurium', 80: 'frangipani', 81: 'clematis', 82: 'hibiscus', 83: 'columbine', 84: 'desert-rose', 85: 'tree mallow', 86: 'magnolia', 87: 'cyclamen', 88: 'watercress', 89: 'canna lily', 90: 'hippeastrum', 91: 'bee balm', 92: 'ball moss', 93: 'foxglove', 94: 'bougainvillea', 95: 'camellia', 96: 'mallow', 97: 'mexican petunia', 98: 'bromelia', 99: 'blanket flower', 100: 'trumpet creeper', 101: 'blackberry lily'}

Here I mainly refer to the official website Transfer Learning to change the writing style to the dataSet I want, and start to download the file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
'train': transforms.Compose([
# First, the image is randomly cropped and then resized. A random rectangular area is chosen and the image is cropped.
# The cropped image is then resized to the specified size of 224x224 pixels.
transforms.RandomResizedCrop(224),
# Set the probability of image flipping, usually a number from 0 to 1, for example, 0.5, which means there's a 50% chance of flipping the image. Default value is 0.5
transforms.RandomHorizontalFlip(),
# Convert the image into a Tensor
transforms.ToTensor(),
# Normalize the image values using numerical normalization, where the first parameter is mean, and the second parameter is the standard deviation (std)
# The reason for setting [0.485, 0.456, 0.406] can be referred from: https://www.geeksforgeeks.org/how-to-normalize-images-in-pytorch/
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
# This doesn't randomly select an area but directly resizes the entire image to fit the specified size.
transforms.Resize(256),
# Keep the central part of the image, then resize to meet the specified size.
# Used for validation or test data to ensure that the test images have similar features, and don't have the randomness like RandomResizedCrop.
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
# We download the training data into the data_dir/train folder, and use the data_transforms["train"] function for data transformation.
train_datasets = Flowers102(root=data_dir+"/train", split="train", download=True, transform=data_transforms["train"])
# We download the validation data into the data_dir/val folder, and use the data_transforms["val"] function for data transformation.
val_datasets = Flowers102(root=data_dir+"/val", split="val", download=True, transform=data_transforms["val"])

# Specify to download the flowers102 dataset, downloading both train and val datasets.
image_datasets = {x: Flowers102(root=data_dir, split=x, download=True, transform=data_transforms[x])
for x in ['train', 'val']}

# Convert to DataLoader format, and specify batch_size
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
shuffle=True, num_workers=4)
for x in ['train', 'val']}

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

print("device: ",device)
print("image_datasets function call: ", dir(image_datasets["train"]))

This completes the first Task, which is to download the dataset we want.

Task 2 - Printing out images and profile sizes

  1. Print images and profile size: display some sample images of the dataset in the notebook and print the dataset size.

Referring to the official website Transfer Learning for the writeup, we first create the

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def imshow(inp, title=None):
"""Display image for Tensor."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

# x.item() takes the value of the tensor, usually a number, and finds the English equivalent of that number in the class_names dic
imshow(out, title=[class_names[x.item()] for x in classes])

print(inputs.shape)
print("dataset_sizes: ",dataset_sizes)

Result

Task 3 & 4 - CNN + Batch Normalization

  1. Construct a CNN using Batch Normalization: Design a CNN to predict on the dataset. Use a similar architecture like last time, but this time also include batch normalization layers.
  2. **Train the model on the dataset and measure the accuracy on hold out test data.

According to Prof. Hongyi Li in Transfer Learning, he mentioned…
Usually, Batch Normalization is performed before Activation Function, you can refer to this section if you are interested. Batch Normalization is simply to run feature normalization in the same way as Batch.

Why do we need to do feature normalization? Why do we do feature normalization?
It is to let different features have similar value ranges, so that when the model performs Gradient Descent, the effect of w1 and w2 on the loss will not be too big, and they have similar value ranges, so that they can affect the loss evenly, instead of a certain w1 affecting the loss much more than w2.

Instead of a w1 having a much larger effect on loss than w2, the effect will be something like the following.

! Origin

Build the Network

Note that depending on the size of the dataset and the number of hidden layers, you have to make two adjustments!

  1. In the fully connection layer, the input is determined by the number of times your hidden layer performs max-pooling and convolution. 2.
  2. Then you have to adjust the number of outputs in the last output layer according to the number of categories in your dataset.

Please note the part of the code labeled with the comment arrow <====.

So our CNN architecture is as follows, You can decide whether you want to run a dropout to unpack the annotations or not.
But in my case, I tested that the dropout didn’t result in higher accuracy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import torch.nn as nn

class NewNet(nn.Module):
def __init__(self):
super(NewNet, self).__init__()
# Layer 1: 3x3 kernel, depth = 32, 224-3+1=222 => 222x222 pixel
self.conv1 = nn.Conv2d(3, 32, 3)
self.bn1 = nn.BatchNorm2d(32)
# self.dropout1 = nn.Dropout(0.5) # Apply dropout as needed

# Layer 2: Max pooling with 2x2 kernel, 222/2=111 => 111x111 pixel
self.pool = nn.MaxPool2d(2, 2)

# Layer 3: 3x3 kernel, depth = 64, 111-3+1=109 => 109x109 pixel
self.conv2 = nn.Conv2d(32, 64, 3)
self.bn2 = nn.BatchNorm2d(64)
# self.dropout2 = nn.Dropout(0.5) # Apply dropout as needed

# Layer 4: Max pooling with 2x2 kernel, 109/2=54 => 54x54 pixel
self.pool = nn.MaxPool2d(2, 2)

# Layer 5: 3x3 kernel, depth = 128, 54-3+1=52 => 52x52 pixel
self.conv3 = nn.Conv2d(64, 128, 3)
self.bn3 = nn.BatchNorm2d(128)
# self.dropout3 = nn.Dropout(0.5) # Apply dropout as needed

# Layer 6: Max pooling with 2x2 kernel, 52/2=26 => 26x26 pixel
self.pool = nn.MaxPool2d(2, 2)

# Final input is 512, pixel is 26*26 => 128*26*26
self.fc1 = nn.Linear(128 * 26 * 26, 2048) # <==== Adjust according to the hidden layer, 128 * 26 * 26
self.fc2 = nn.Linear(2048, 1024)
self.fc3 = nn.Linear(1024, 512)
self.fc4 = nn.Linear(512, 102) # <==== 102 according to the number of classes in the dataset

def forward(self, x):
# We put the batch normalization before the activation function.
x = F.relu(self.bn1(self.conv1(x)))
x = self.pool(x)
# x = self.dropout1(x) # Apply dropout as needed
x = F.relu(self.bn2(self.conv2(x)))
x = self.pool(x)
# x = self.dropout2(x) # Apply dropout as needed
x = F.relu(self.bn3(self.conv3(x)))
x = self.pool(x)
# x = self.dropout3(x) # Apply dropout as needed
x = x.view(-1, 128 * 26 * 26) # <==== Adjust according to the hidden layer, 128 * 26 * 26
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = self.fc4(x)
return x

net = NewNet()
net.to(device)

and specify the optimizer and loss function:

1
2
3
4
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Create a Training Func

We need to create a funcntion to execute the training model as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def train(epoch, start_time):
net.train()
cur_count = 0
running_loss = 0.0
for batch_idx, data in enumerate(dataloaders["train"], 0):
cur_count += len(data)
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
outputs.to(device)

loss = criterion(outputs, labels)
loss.to(device)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if batch_idx % 100 == 99:
print(f'[{epoch}, {batch_idx + 1:5d}] loss: {running_loss / 100:.3f} time elapsed: {round((time.time() - start_time))} sec.')
running_loss = 0.0

Build Testing Func

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def test(): 
net.eval() # set model to evaluation mode
correct = 0
total = 0
class_correct = [0] * len(class_names)
class_total = [0] * len(class_names)
with torch.no_grad():
for data in dataloaders["val"]:
images, labels = data[0].to(device), data[1].to(device)
outputs = net(images)

# select top 3 predictions
_, predicted = torch.topk(outputs, 1, dim=1)

# check if predicted labels are in true labels
for i in range(len(labels)):
total += 1
class_total[labels[i]] += 1

if labels[i] in predicted[i]:
correct += 1
class_correct[labels[i]] += 1

class_accuracies = [class_correct[i] / class_total[i] for i in range(len(class_names))]
accuracy = correct / total
return accuracy, class_accuracies

Run Training

In order for us to see the status of the training during the training process, we print the status of the training every 100 batches and the status of the test every 5 epochs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
num_epochs = 100 
start_time = time.time()

accuracy, class_accuracies = test()
print(f'Accuracy on test data (top-1): {100 * accuracy:.2f}%')
for epoch in range(0, num_epochs - 1):
print(f"============ Epoch: {epoch} ==========")
train(epoch, start_time)
# every 5 epoch is completed, we will perform validation on the test set
if epoch % 5 == 0:
accuracy, class_accuracies = test()
# print accuracies
print(f'Accuracy on test data (top-1): {100 * accuracy:.2f}%')

print(f'Finished Training. Total elapsed time: {round((time.time() - start_time) / 60, 1)} min')

Result

1
2
3
4
5
6
7
Accuracy on test data (top-1): 0.0019%
============ Epoch: 0 ==========
[0, 100] loss: 4.984 time elapsed: 40 sec.
[0, 200] loss: 4.876 time elapsed: 47 sec.
...
Accuracy on test data (top-1): 35.59%
Finished Training. Total elapsed time: 67 min

Task 5 & 4 - Transfer Learning:Resnet18

  1. Train a model using a dataset and print out the accuracy of the test: train a model on a dataset and measure the accuracy on retained test data.
  2. Use ResNet18 for Migration Learning: now use migration learning to use a pre-trained ResNet18 on the dataset as follows:
    Without changing the trained weights of other people’s models: ResNet18 is used as a fixed feature extractor.
    1. Fix the parameters: Fix the parameters of the ResNet18 model and only train the last layer.
    2. Fine-tuning: Fine-tune the ResNet18 model.

Build up Trainning & Testing Func

Refer to the official website Transfer Learning for the writeup, we first create the

def train_model(model, criterion, optimizer, scheduler, num_epochs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Set the start time for logging to monitor the training time for each epoch
since = time.time()

# Create a temporary folder to store the best model parameters
with TemporaryDirectory() as tempdir:
best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')
# Haven't started training yet, but we first save the current model
torch.save(model.state_dict(), best_model_params_path)
best_acc = 0.0 # Set the current best accuracy to 0, it will be updated if a higher value is found to determine the best model

for epoch in range(num_epochs):
print(f'Epoch {epoch}/{num_epochs - 1}')
print('-' * 10)

# After training each epoch, proceed with validation
for phase in ['train', 'val']:
# Determine whether it's training or validation phase
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode

running_loss = 0.0
running_corrects = 0

# Iterate over data.
for inputs, labels in dataloaders[phase]:
# Move to GPU
inputs = inputs.to(device)
labels = labels.to(device)

# Zero the parameter gradients
optimizer.zero_grad()

# Forward propagation
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1) # Select the number with the highest prediction as the label
loss = criterion(outputs, labels) # Calculate the difference between the answer and prediction

# backward + optimize only if in training phase
# Perform backward propagation
if phase == 'train':
loss.backward()
optimizer.step()

# Since batch_size is 4, multiply loss by 4 to get the loss for a batch
running_loss += loss.item() * inputs.size(0)
# Calculate how many are correct in a batch
running_corrects += torch.sum(preds == labels.data)

# Adjust the learning rate only during training
# scheduler is a learning rate (lr) adjuster used to modify the lr value during model training
if phase == 'train':
scheduler.step()

# After an entire epoch of training, calculate the loss and accuracy for that epoch
# Avg. loss = total loss / size of the entire dataset
epoch_loss = running_loss / dataset_sizes[phase]
# Avg. Acc = total number of correct answers / size of the entire dataset
epoch_acc = running_corrects.float() / dataset_sizes[phase]

print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f} Time elapsed: {round((time.time() - since))} sec.')

# If during validation, and accuracy is found to be better than the current best, save the model parameters
if phase == 'val' and epoch_acc > best_acc:
# Update the current best accuracy
best_acc = epoch_acc
# Deep copy the model
torch.save(model.state_dict(), best_model_params_path)

print()

time_elapsed = time.time() - since
print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
print(f'Best val Acc: {best_acc:4f}')

# Load the best model weights
# Take out the best model so far to continue with the next epoch of training
model.load_state_dict(torch.load(best_model_params_path))
return model

Use Transfer Learning

According to the teacher’s request, I want to use resnet18 for Transfer Learning, currently according to the [official description](https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet18. html#torchvision.models.ResNet18_Weights), resent18 is IMAGENET1K_V1 by default if we don’t give the parameter, in order to make it clear which model’s parameter we are using, we still give the parameter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
model_ft = models.resnet18(weights='IMAGENET1K_V1')

# num_ftrs is the number of input features for the last layer.
# Retrieve the number of input features for the last layer
num_ftrs = model_ft.fc.in_features

# Here the size of each output sample is set to 102.
# model_ft.fc is the final layer of the model, used for classification.
# Creating the final layer ourselves, setting the input number as num_ftrs, and output number as 102 (since there are 102 classes in this case)
model_ft.fc = nn.Linear(num_ftrs, 102)
model_ft = model_ft.to(device)

# The loss function uses CrossEntropyLoss
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
# Optimizer uses SGD with learning rate = 0.001, momentum = 0.9
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
# Multiply the learning rate by 0.1 every 7 epochs to decay the lr
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

Why need to adjust lr?
Adjusting the learning rate every certain epoch is a common learning rate adjustment strategy called learning rate decay or learning rate scheduling. The effect of this is to:

  1. Improve model stability: During training, ``using a relatively large learning rate at the beginning helps to converge quickly.‘’ However, when training near the optimal solution, a larger learning rate may cause the model to oscillate or over-adjust near the optimal solution. By periodically decreasing the learning rate, the model will be more stable and closer to the optimal solution in the later stages of training.

  2. Preventing overfitting: `Periodically decreasing the learning rate helps prevent the model from overfitting on the training set.’ When the learning rate is reduced, the model adjusts its parameters more carefully and is less likely to fall into the noise in the training set.

In practice, the specific settings of the learning rate tuning strategy (e.g., the values of step_size and gamma) are usually adjusted based on trials and experience to achieve optimal performance. Typically, the settings of these parameters depend on the size of your dataset, the model architecture, the difficulty of the problem, and other factors.

Start Training

1
2
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)

Result: Accuracy on test data (top-1): 89.41%

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Epoch 0/24
----------
train Loss: 4.4280 Acc: 0.0657 Time elapsed: 33 sec.
val Loss: 2.9901 Acc: 0.3118 Time elapsed: 58 sec.

Epoch 1/24
----------
train Loss: 3.3046 Acc: 0.2353 Time elapsed: 87 sec.
val Loss: 1.6604 Acc: 0.5941 Time elapsed: 112 sec.

Epoch 2/24
----------
train Loss: 2.5080 Acc: 0.4029 Time elapsed: 141 sec.
val Loss: 1.2243 Acc: 0.6951 Time elapsed: 166 sec.

Epoch 3/24
----------
train Loss: 1.9871 Acc: 0.5196 Time elapsed: 195 sec.
val Loss: 0.9578 Acc: 0.7216 Time elapsed: 219 sec.

Epoch 4/24
----------
train Loss: 1.5865 Acc: 0.6225 Time elapsed: 249 sec.
val Loss: 0.6911 Acc: 0.8108 Time elapsed: 273 sec.
...
val Loss: 0.3919 Acc: 0.8912 Time elapsed: 1313 sec.

Training complete in 21m 53s
Best val Acc: 0.894118

Using ResNet18 as a fixed feature extractor

Due to the requirements of the homework, ResNet18 is required to be used as a fixed feature extractor, so we need to set all the parameters to be untrainable, and only the parameters of the last layer can be trained. Simply put, don’t change the weights of other people’s models. The only thing we need to change is to set each parameter of the model’s requires_grad to False. This way, we can use ResNet18 as a fixed feature extractor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')

# !!! Add these two lines to set requires_grad to False, so that the parameters will not be updated
for param in model_conv.parameters():
param.requires_grad = False

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 102)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

# We start to train
model_conv_SGD = train_model(model_conv, criterion, optimizer_conv,
exp_lr_scheduler, num_epochs=25)

Result: Accuracy on test data: 79.11%

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Epoch 0/24
----------
train Loss: 4.6979 Acc: 0.0176 Time elapsed: 24 sec.
val Loss: 3.9863 Acc: 0.1235 Time elapsed: 48 sec.

Epoch 1/24
----------
train Loss: 4.0589 Acc: 0.1137 Time elapsed: 72 sec.
val Loss: 3.1125 Acc: 0.3608 Time elapsed: 95 sec.

Epoch 2/24
----------
train Loss: 3.4935 Acc: 0.2304 Time elapsed: 119 sec.
val Loss: 2.5003 Acc: 0.4912 Time elapsed: 142 sec.

Epoch 3/24
----------
train Loss: 3.1030 Acc: 0.3422 Time elapsed: 165 sec.
val Loss: 2.1583 Acc: 0.5510 Time elapsed: 189 sec.

Epoch 4/24
----------
train Loss: 2.7367 Acc: 0.4402 Time elapsed: 212 sec.
val Loss: 1.7064 Acc: 0.6304 Time elapsed: 236 sec.
...
val Loss: 1.0910 Acc: 0.7824 Time elapsed: 1179 sec.

Training complete in 19m 39s
Best val Acc: 0.791176

Task 6 & 4 - Transfer Learning:EfficientNet_B5

  1. Train a model using a dataset and print out the accuracy of the test: train a model on a dataset and measure the accuracy on retained test data.
  2. Fine-tuning using RestNet : ResNet18 is fine-tuned on the training data.

We need to install the efficientnet_pytorch package first:

1
pip install efficientnet_pytorch

Then we can import the package and use it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from efficientnet_pytorch import EfficientNet

# Load the pre-trained EfficientNet-B5 model
model_ft = EfficientNet.from_pretrained('efficientnet-b5')

# We obtain the number of input features for the last layer
num_ftrs = model_ft._fc.in_features
# build a new layer, the input number is num_ftrs, and the output number is 102 (because there are 102 classes in this case)
model_ft.fc = nn.Linear(num_ftrs, 102)
# Put the model on the GPU
model_ft = model_ft.to(device)

# Loss function
criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

# Start training
model_ft_effb5 = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)

Result: Accuracy on test data: 82.15%

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Epoch 0/24
----------
train Loss: 6.2860 Acc: 0.0157 Time elapsed: 156 sec.
val Loss: 5.6637 Acc: 0.0382 Time elapsed: 200 sec.

Epoch 1/24
----------
train Loss: 4.8955 Acc: 0.1039 Time elapsed: 322 sec.
val Loss: 4.5101 Acc: 0.2392 Time elapsed: 365 sec.

Epoch 2/24
----------
train Loss: 3.8566 Acc: 0.2422 Time elapsed: 485 sec.
val Loss: 3.6194 Acc: 0.4265 Time elapsed: 529 sec.

Epoch 3/24
----------
train Loss: 3.0979 Acc: 0.3637 Time elapsed: 653 sec.
val Loss: 2.8613 Acc: 0.5539 Time elapsed: 696 sec.

Epoch 4/24
----------
train Loss: 2.4323 Acc: 0.4725 Time elapsed: 818 sec.
val Loss: 2.2894 Acc: 0.6725 Time elapsed: 863 sec.
...


Epoch 24/24
----------
train Loss: 1.3168 Acc: 0.7343 Time elapsed: 4769 sec.
val Loss: 1.3167 Acc: 0.8167 Time elapsed: 4812 sec.

Training complete in 80m 12s
Best val Acc: 0.821569

Task 7 - Discussion

Compare these different methods and print out the accuracy: Compare the accuracy of the different methods on the test data and print out the training time for each method.

From the results of the above experiments, we can see that the accuracy of the model is as follows:

  • Use the CNN build by ourselves
  • [Using Transfer Learning Resnet18](#Task-5-4-Transfer Learning: Resnet18)
  • [Using Transfer Learning EfficientNet-B5](#Task-6-4-Transfer Learning: EfficientNet-B5)

Their data are as follows:

Model Accuracy Training Time Result
Self-built CNN 35% more than 1 hour Worst
Resnet18 89.41% 21 minutes highest accuracy
Resnet18 (fixed feature extractor) 79.11% 19 minutes Shortest
EfficientNet-B5 82.15% 80 mins PuPu

Conclusion

  • If we use Transfer Learning, we can obviously feel that the accuracy rate is significantly improved and the training time is greatly reduced.
  • Moreover, in the current case, the accuracy is better without fixed model parameters, although the matching time is longer because of the gradient descent.