李宏毅教授2021年机器学习课程的作业,详见ML 2021 Spring

感觉NTU学生的英语都很好啊hh,所以这也是本人第一次练习使用双语书写注释及内容

Homework 1: COVID-19 Cases Prediction (Regression)

Simple Baseline

只要跑通助教的代码就可以,经注释后的代码如下。

Download Data

If the Google drive links are dead, you can download data from kaggle, and upload data manually to the workspace.

1
2
3
4
5
6
tr_path = 'covid.train.csv'   # path to training data
tt_path = 'covid.test.csv' # path to testing data

# Google Colab special command
!gdown --id '19CCyCgJrUxtvgZF53vnctJiOJ23T5mqF' --output covid.train.csv
!gdown --id '1CE240jLm2npU-tdz81-oVKEF3T2yfT1O' --output covid.test.csv
1
2
3
4
5
6
7
8
9
10
11
12
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=19CCyCgJrUxtvgZF53vnctJiOJ23T5mqF
To: /content/covid.train.csv
100% 2.00M/2.00M [00:00<00:00, 202MB/s]
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=1CE240jLm2npU-tdz81-oVKEF3T2yfT1O
To: /content/covid.test.csv
100% 651k/651k [00:00<00:00, 129MB/s]

Import Some Packages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For data preprocess
import numpy as np
import csv
import os

# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

# set a random seed for reproducibility(可重复性)
myseed = 42069

# if True, causes cuDNN to only use deterministic convolution algorithms.(确定性卷积算法)
# 确定性卷积算法 is, algorithms which, given the same input, and when run on the same software and hardware, always produce the same output
# Performance: nondeterministic algorithms > deterministic algorithms (In most cases)
torch.backends.cudnn.deterministic = True

# if True, causes cuDNN to benchmark multiple convolution algorithms and select the fastest.
# 如果卷积网络结构不是动态变化的,即网络的输入 (batch size,图像的大小,输入的通道) 是固定的,设置为True。由于HW1并未涉及卷积运算,所以设置为False
torch.backends.cudnn.benchmark = False

# set the random seed for numpy, torch, torch.cuda
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(myseed)

Some Utilities

You do not need to modify this part.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def get_device():
''' Get device (if GPU is available, use GPU) '''
return 'cuda' if torch.cuda.is_available() else 'cpu'

def plot_learning_curve(loss_record, title=''):
''' Plot learning curve of your DNN (train & dev loss) '''
total_steps = len(loss_record['train'])
x_1 = range(total_steps)
x_2 = x_1[::len(loss_record['train']) // len(loss_record['dev'])]
figure(figsize=(6, 4)) # Set the weight, height of the figure(in feet英尺)
plt.plot(x_1, loss_record['train'], c='tab:red', label='train')
plt.plot(x_2, loss_record['dev'], c='tab:cyan', label='dev')
plt.ylim(0.0, 5.) # Limit the y-axis range to 0.0~5.0
plt.xlabel('Training steps')
plt.ylabel('MSE loss')
plt.title('Learning curve of {}'.format(title))
plt.legend() # Place a legend(图例,就是每种线代表什么) on the Axes.
plt.show()


def plot_pred(dv_set, model, device, lim=35., preds=None, targets=None):
''' Plot prediction of your DNN '''
if preds is None or targets is None:
model.eval() # Sets the module in evaluation mode.
preds, targets = [], []
for x, y in dv_set:
x, y = x.to(device), y.to(device)
with torch.no_grad(): # Context-manager that disabled gradient calculation.(无视梯度上下文,直接requires_grad=False,加速用)
pred = model(x)
preds.append(pred.detach().cpu()) # Tensor.detach(): Returns a new Tensor, detached from the current graph. Tensor.cpu(): Returns a copy of this object in CPU memory.
targets.append(y.detach().cpu())
preds = torch.cat(preds, dim=0).numpy() # Concatenates the given sequence of seq tensors in the given dimension. (和cpp中cat类似,这里就是list->Tensor->ndarray)
targets = torch.cat(targets, dim=0).numpy()

figure(figsize=(5, 5))
plt.scatter(targets, preds, c='r', alpha=0.5) # 散点图
plt.plot([-0.2, lim], [-0.2, lim], c='b')
plt.xlim(-0.2, lim)
plt.ylim(-0.2, lim)
plt.xlabel('ground truth value')
plt.ylabel('predicted value')
plt.title('Ground Truth v.s. Prediction')
plt.show()

Preprocess

We have three kinds of datasets:

  • train: for training
  • dev: for validation
  • test: for testing (w/o target value)

Dataset

The COVID19Dataset below does:

  • read .csv files
  • extract features
  • split covid.train.csv into train/dev sets
  • normalize features

Finishing TODO below might make you pass medium baseline.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class COVID19Dataset(Dataset):
''' Dataset for loading and preprocessing the COVID19 dataset '''
def __init__(self,
path,
mode='train',
target_only=False):
self.mode = mode

# Read data into numpy arrays
with open(path, 'r') as fp:
data = list(csv.reader(fp))
data = np.array(data[1:])[:, 1:].astype(float) # Remove the 0th row and 0th column 获取数值数据

# 默认情况,选取除最后一列的全部列(0~92)作为train data
if not target_only:
feats = list(range(93))
else:
# TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
# feats = list(range(40)) + [57] + [75]
pass

# 最后一列数据不同,训练集有最后一列(93th),测试集没有93th column
if mode == 'test':
# Testing data
# data: 893 x 93 (40 states + day 1 (18) + day 2 (18) + day 3 (17))
# 测试集不含lable,只需操作data
data = data[:, feats] # all rows,<feats> columns
self.data = torch.FloatTensor(data) # ndarray -> Tensor(Float32)
elif mode in ['train', 'dev']:
# Training data (train/dev sets)
# data: 2700 x 94 (40 states + day 1 (18) + day 2 (18) + day 3 (18))
target = data[:, -1] # all rows, 94th column
data = data[:, feats] # all rows, <feats(default: 0th~93th)> columns

# Splitting training data into train & dev sets
# len(train sets):len(dev sets) = 9:1 (每10个中9个作为train,1个作为dev)
if mode == 'train':
indices = [i for i in range(len(data)) if i % 10 != 0]
elif mode == 'dev':
indices = [i for i in range(len(data)) if i % 10 == 0]

# Convert data into PyTorch tensors
self.data = torch.FloatTensor(data[indices])
self.target = torch.FloatTensor(target[indices])

# Normalize features (you may remove this part to see what will happen)
# 归一化特征(通常添加后可以提升模型训练的效果)
# (第i维数据 - 第i维数据的平均值)/(第i维数据的标准差)
self.data[:, 40:] = \
(self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
/ self.data[:, 40:].std(dim=0, keepdim=True)

self.dim = self.data.shape[1]

print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
.format(mode, len(self.data), self.dim))

def __getitem__(self, index):
# Returns one sample at a time
if self.mode in ['train', 'dev']:
# For training
return self.data[index], self.target[index]
else:
# For testing (no target)
return self.data[index]

def __len__(self):
# Returns the size of the dataset
return len(self.data)

DataLoader

A DataLoader loads data from a given Dataset into batches.

1
2
3
4
5
6
7
8
9
10
def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
''' Generates a dataset, then is put into a dataloader. '''
dataset = COVID19Dataset(path, mode=mode, target_only=target_only) # Construct dataset
# num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
# pin_memory (bool, optional) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them.
dataloader = DataLoader(
dataset, batch_size,
shuffle=(mode == 'train'), drop_last=False,
num_workers=n_jobs, pin_memory=True) # Construct dataloader
return dataloader

Deep Neural Network

NeuralNet is an nn.Module designed for regression.
The DNN consists of 2 fully-connected layers with ReLU activation.
This module also included a function cal_loss for calculating loss.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class NeuralNet(nn.Module):
''' A simple fully-connected deep neural network '''
def __init__(self, input_dim):
super(NeuralNet, self).__init__()

# Define your neural network here
# TODO: How to modify this model to achieve better performance?
# A sequential container.(顺序容器) Modules will be added to it in the order they are passed in the constructor.
self.net = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64, 1)
)

# Mean squared error loss
# reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'.
# 'none': no reduction will be applied,
# 'mean': the sum of the output will be divided by the number of elements in the output,
# 'sum': the output will be summed. (Default)
self.criterion = nn.MSELoss(reduction='mean')

def forward(self, x):
''' Given input of size (batch_size x input_dim), compute output of the network '''
# Returns a tensor with all the dimensions of input of size 1 removed.
return self.net(x).squeeze(1)

def cal_loss(self, pred, target):
''' Calculate loss '''
# TODO: you may implement L1/L2 regularization here
return self.criterion(pred, target)

Train/Dev/Test

Training

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def train(tr_set, dv_set, model, config, device):
''' DNN training '''

n_epochs = config['n_epochs'] # Maximum number of epochs

# Setup optimizer
optimizer = getattr(torch.optim, config['optimizer'])(
model.parameters(), **config['optim_hparas'])

min_mse = 1000.
loss_record = {'train': [], 'dev': []} # for recording training loss
early_stop_cnt = 0
epoch = 0
while epoch < n_epochs:
model.train() # set model to training mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
mse_loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer
loss_record['train'].append(mse_loss.detach().cpu().item())

# After each epoch, test your model on the validation (development) set.
dev_mse = dev(dv_set, model, device)
if dev_mse < min_mse:
# Save model if your model improved
min_mse = dev_mse
print(f'Saving model (epoch = {epoch + 1 : 4d}, loss = {min_mse : .4f})')
# model.state_dict(): Returns a dictionary containing a whole state of the module.
torch.save(model.state_dict(), config['save_path']) # Save model to specified path <config['save_path']>
early_stop_cnt = 0
else:
early_stop_cnt += 1

epoch += 1
loss_record['dev'].append(dev_mse)
if early_stop_cnt > config['early_stop']:
# Stop training if your model stops improving for "config['early_stop']" epochs. 早停
break

print('Finished training after {} epochs'.format(epoch))
return min_mse, loss_record

Validation

1
2
3
4
5
6
7
8
9
10
11
12
def dev(dv_set, model, device):
model.eval() # set model to evalutation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
mse_loss = model.cal_loss(pred, y) # compute loss
total_loss += mse_loss.detach().cpu().item() * len(x) # accumulate loss
total_loss = total_loss / len(dv_set.dataset) # compute averaged loss

return total_loss

Testing

1
2
3
4
5
6
7
8
9
10
def test(tt_set, model, device):
model.eval() # set model to evalutation mode
preds = []
for x in tt_set: # iterate through the dataloader
x = x.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
preds.append(pred.detach().cpu()) # collect prediction 记录预测值
preds = torch.cat(preds, dim=0).numpy() # concatenate all predictions and convert to a numpy array (list->tensor->ndarray)
return preds

Setup Hyper-parameters

config contains hyper-parameters for training and the path to save your model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
device = get_device()                 # get the current available device ('cpu' or 'cuda')
os.makedirs('models', exist_ok=True) # The trained model will be saved to ./models/, exist_ok:只有在目录不存在时创建目录
target_only = False # TODO: Using 40 states & 2 tested_positive features

# TODO: How to tune these hyper-parameters to improve your model's performance?
config = {
'n_epochs': 3000, # maximum number of epochs
'batch_size': 270, # mini-batch size for dataloader
'optimizer': 'SGD', # optimization algorithm (optimizer in torch.optim)
'optim_hparas': { # hyper-parameters for the optimizer (depends on which optimizer you are using)
'lr': 0.001, # learning rate of SGD
'momentum': 0.9 # momentum for SGD
},
'early_stop': 200, # early stopping epochs (the number epochs since your model's last improvement)
'save_path': 'models/model.pth' # your model will be saved here
}

Load data and model

1
2
3
tr_set = prep_dataloader(tr_path, 'train', config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only=target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only=target_only)
1
2
3
Finished reading the train set of COVID19 Dataset (2430 samples found, each dim = 93)
Finished reading the dev set of COVID19 Dataset (270 samples found, each dim = 93)
Finished reading the test set of COVID19 Dataset (893 samples found, each dim = 93)
1
model = NeuralNet(tr_set.dataset.dim).to(device)  # Construct model and move to device

Start Training!

1
model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)
Saving model (epoch =    1, loss =  78.8524)
Saving model (epoch =    2, loss =  37.6170)
Saving model (epoch =    3, loss =  26.1203)
Saving model (epoch =    4, loss =  16.1862)
Saving model (epoch =    5, loss =  9.7153)
Saving model (epoch =    6, loss =  6.3701)
Saving model (epoch =    7, loss =  5.1802)
Saving model (epoch =    8, loss =  4.4255)
Saving model (epoch =    9, loss =  3.8009)
Saving model (epoch =   10, loss =  3.3691)
Saving model (epoch =   11, loss =  3.0943)
Saving model (epoch =   12, loss =  2.8176)
Saving model (epoch =   13, loss =  2.6274)
Saving model (epoch =   14, loss =  2.4542)
Saving model (epoch =   15, loss =  2.3012)
Saving model (epoch =   16, loss =  2.1766)
Saving model (epoch =   17, loss =  2.0641)
Saving model (epoch =   18, loss =  1.9399)
Saving model (epoch =   19, loss =  1.8978)
Saving model (epoch =   20, loss =  1.7950)
Saving model (epoch =   21, loss =  1.7164)
Saving model (epoch =   22, loss =  1.6455)
Saving model (epoch =   23, loss =  1.5912)
Saving model (epoch =   24, loss =  1.5599)
Saving model (epoch =   25, loss =  1.5197)
Saving model (epoch =   26, loss =  1.4698)
Saving model (epoch =   27, loss =  1.4189)
Saving model (epoch =   28, loss =  1.3992)
Saving model (epoch =   29, loss =  1.3696)
Saving model (epoch =   30, loss =  1.3442)
Saving model (epoch =   31, loss =  1.3231)
Saving model (epoch =   32, loss =  1.2834)
Saving model (epoch =   33, loss =  1.2804)
Saving model (epoch =   34, loss =  1.2471)
Saving model (epoch =   36, loss =  1.2414)
Saving model (epoch =   37, loss =  1.2138)
Saving model (epoch =   38, loss =  1.2083)
Saving model (epoch =   41, loss =  1.1591)
Saving model (epoch =   42, loss =  1.1484)
Saving model (epoch =   44, loss =  1.1209)
Saving model (epoch =   47, loss =  1.1122)
Saving model (epoch =   48, loss =  1.0937)
Saving model (epoch =   50, loss =  1.0842)
Saving model (epoch =   53, loss =  1.0655)
Saving model (epoch =   54, loss =  1.0613)
Saving model (epoch =   57, loss =  1.0524)
Saving model (epoch =   58, loss =  1.0394)
Saving model (epoch =   60, loss =  1.0267)
Saving model (epoch =   63, loss =  1.0248)
Saving model (epoch =   66, loss =  1.0099)
Saving model (epoch =   70, loss =  0.9829)
Saving model (epoch =   72, loss =  0.9817)
Saving model (epoch =   73, loss =  0.9743)
Saving model (epoch =   75, loss =  0.9671)
Saving model (epoch =   78, loss =  0.9643)
Saving model (epoch =   79, loss =  0.9597)
Saving model (epoch =   85, loss =  0.9549)
Saving model (epoch =   86, loss =  0.9535)
Saving model (epoch =   90, loss =  0.9467)
Saving model (epoch =   92, loss =  0.9432)
Saving model (epoch =   93, loss =  0.9231)
Saving model (epoch =   95, loss =  0.9127)
Saving model (epoch =  104, loss =  0.9117)
Saving model (epoch =  107, loss =  0.8994)
Saving model (epoch =  110, loss =  0.8935)
Saving model (epoch =  116, loss =  0.8882)
Saving model (epoch =  124, loss =  0.8872)
Saving model (epoch =  128, loss =  0.8724)
Saving model (epoch =  134, loss =  0.8722)
Saving model (epoch =  139, loss =  0.8677)
Saving model (epoch =  146, loss =  0.8654)
Saving model (epoch =  156, loss =  0.8642)
Saving model (epoch =  159, loss =  0.8528)
Saving model (epoch =  167, loss =  0.8494)
Saving model (epoch =  173, loss =  0.8492)
Saving model (epoch =  176, loss =  0.8461)
Saving model (epoch =  178, loss =  0.8403)
Saving model (epoch =  182, loss =  0.8375)
Saving model (epoch =  199, loss =  0.8295)
Saving model (epoch =  212, loss =  0.8273)
Saving model (epoch =  235, loss =  0.8252)
Saving model (epoch =  238, loss =  0.8233)
Saving model (epoch =  251, loss =  0.8211)
Saving model (epoch =  253, loss =  0.8205)
Saving model (epoch =  258, loss =  0.8175)
Saving model (epoch =  284, loss =  0.8143)
Saving model (epoch =  308, loss =  0.8136)
Saving model (epoch =  312, loss =  0.8075)
Saving model (epoch =  324, loss =  0.8045)
Saving model (epoch =  400, loss =  0.8040)
Saving model (epoch =  404, loss =  0.8010)
Saving model (epoch =  466, loss =  0.7998)
Saving model (epoch =  525, loss =  0.7993)
Saving model (epoch =  561, loss =  0.7945)
Saving model (epoch =  584, loss =  0.7903)
Saving model (epoch =  667, loss =  0.7896)
Saving model (epoch =  717, loss =  0.7823)
Saving model (epoch =  776, loss =  0.7812)
Saving model (epoch =  835, loss =  0.7797)
Saving model (epoch =  866, loss =  0.7771)
Saving model (epoch =  919, loss =  0.7770)
Saving model (epoch =  933, loss =  0.7748)
Saving model (epoch =  965, loss =  0.7705)
Saving model (epoch =  1027, loss =  0.7674)
Saving model (epoch =  1119, loss =  0.7647)
Saving model (epoch =  1140, loss =  0.7643)
Saving model (epoch =  1196, loss =  0.7620)
Saving model (epoch =  1234, loss =  0.7616)
Saving model (epoch =  1243, loss =  0.7582)
Finished training after 1444 epochs
1
plot_learning_curve(model_loss_record, title='deep model')

png

1
2
3
4
5
del model   # delete model variable
model = NeuralNet(tr_set.dataset.dim).to(device)
ckpt = torch.load(config['save_path'], map_location='cpu') # Load your best model
model.load_state_dict(ckpt)
plot_pred(dv_set, model, device) # Show prediction on the validation set

png

Testing

The predictions of your model on testing set will be stored at pred.csv.

1
2
3
4
5
6
7
8
9
10
11
def save_pred(preds, file):
''' Save predictions to specified file '''
print('Saving results to {}'.format(file))
with open(file, 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['id', 'tested_positive'])
for i, p in enumerate(preds):
writer.writerow([i, p])

preds = test(tt_set, model, device) # predict COVID-19 cases with your model
save_pred(preds, 'pred.csv') # save prediction file to pred.csv
Saving results to pred.csv

Medium Baseline

Simple Baseline Code中标注了一些TODO,TA说只要完成这些TODO也许就可以达到Medium Baseline,TODO一共有这些:

  1. TODO: Using 40 states & 2 tested_positive features (indices = 57 & 75)
  2. TODO: How to modify this model to achieve better performance?
  3. TODO: you may implement L1/L2 regularization here
  4. TODO: How to tune these hyper-parameters to improve your model’s performance?

其中最重要的是TODO1,按TA所说修改即可达到Medium Baseline,即:

1
feats = list(range(40)) + [57] + [75]

查看一下这两列数据,发现列名为tested_positivetested_positive.1,即第1天阳性和第2天阳性,我们要预测的就是93th列——tested_positive2

Strong Baseline

TA给了以下提示

  1. Feature selection (what other features are useful?)
  2. DNN architecture (layers? dimension? activation function?)
  3. Training (mini-batch? optimizer? learning rate?)
  4. L2 regularization
  5. There are some mistakes in the sample code, can you find them?

总的来说就是从三部分优化:1. Data、2. Network Structure、3. Optimization

在本题中最重要的还是Feature selection。

关于1:

可以使用SelectKBest函数,也可以对tested_positive2分析相关性(方法2,还没整理)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import pandas as pd

data = pd.read_csv(tr_path).iloc[1:, 1:]
x = data.iloc[:, 0:93]
y = data.iloc[:, 93]
# min-max normalization 极差归一化
x = (x - x.min()) / (x.max() - x.min())

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression

best_features = SelectKBest(score_func=f_regression, k = 5)
fit = best_features.fit(x, y)
df_scores = pd.DataFrame(fit.scores_)
df_columns = pd.DataFrame(x.columns)

# Concat two dataframes for better visualization
feature_scores = pd.concat([df_columns, df_scores], axis=1)
feature_scores.columns = ['Specs', 'Score'] # Naming the dataframe columns
feature_scores.nlargest(15, 'Score') # Print 15 best features
# feature_scores.nlargest(15, 'Score').index.values

最后一行Score太低,去掉

可以得到改进:

1
feats = [75, 57, 42, 60, 78, 43, 61, 79, 40, 58, 76, 41, 59, 77]

关于2:

该作业中“浅的”网络效果更好(Less is More),与后续作业不同,使用如下结构:

1
2
3
4
5
6
7
self.net = nn.Sequential(
nn.Linear(input_dim, 64),
nn.BatchNorm1d(64), # 使用BN,加速模型训练
nn.LeakyReLU(), # 更换activation function
nn.Dropout(p=0.35), # 使用Dropout,减小过拟合,注意不能在BN之前
nn.Linear(64, 1)
)

关于3:

1
2
3
4
5
6
7
8
9
10
11
12
config = {
'n_epochs': 10000, # 因为有early_stop,所以大一点没有影响
'batch_size': 200, # 微调batchsize
'optimizer': 'Adam', # 使用Adam优化器
'optim_hparas': { # 完全使用默认参数
'lr': 0.001,
#'momentum': 0.9,
#'weight_decay': 5e-4,
},
'early_stop': 500, # 由于最后训练使用了所有数据,大一点影响不大
'save_path': 'models/model.pth'
}

关于4:

TA说使用L2正则化,但其实效果提升不大

L1正则化的损失函数

L2正则化的损失函数

1
2
3
4
5
6
7
8
9
def cal_loss(self, pred, target):
''' Calculate loss '''
# TODO: you may implement L1/L2 regularization here
regularization_lambda = 0.00075
regularization_loss = 0
for param in model.parameters():
# regularization_loss += torch.sum(abs(param)) # L1 regularization_loss
regularization_loss += torch.sum(param ** 2) # L2 regularization_loss
return self.criterion(pred, target) + regularization_lambda * regularization_loss

关于5:

改为

1
return torch.sqrt(self.criterion(pred, target))

最后过了public的strong baseline,private差0.00x

Homework 2: TIMIT framewise phoneme (classification)

Phoneme Classification - Simple Baseline

首先可以看到Evaluation metric就是acc,所以越高越好。

跑通TA’s sample code就可以双过simple baseline

ps:跑一次挺慢的,colab挂上GPU也要15min左右一次。

CODE:

Phoneme Classification - Strong Baseline

TA在PPT里给了一些Hint。

  1. Model architecture (layers? dimension? activation function?)
  2. Training (batch size? optimizer? learning rate? epoch?)
  3. Tips (batch norm? dropout? regularization?)

基本还是Data、Structure、Optimization。

1-百万级数据集,所以VAL_RATIO设为5%

1
VAL_RATIO = 0.05

2-使用RAdam调优

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from torch.optim import RAdam

# fix random seed for reproducibility 可重复性
same_seeds(0)

# get device
device = get_device()
print(f'DEVICE: {device}')

# training parameters
num_epoch = 100 # number of training epoch
learning_rate = 0.0001 # learning rate

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
model = Classifier().to(device)
criterion = nn.CrossEntropyLoss() # auto append soft-max to network
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
optimizer = RAdam(model.parameters(), lr=learning_rate)

Hessian Matrix

Homework 3: Convolutional Neural Network - Image Classfication(CNN)

Simple Baseline

直接跑通可以双过simple baseline(需要fix一个好的seed),含注释的原始代码如下:

TA’s Slide: Build a convolutional neural network using labeled images with provided codes.

意思是只要使用给的CNN就可以过simple baseline。

Medium Baseline

TA’s Slide: Improve the performance using labeled images with different model architectures or data augmentations.

TA给了两个方向different model architectures或者data augmentations。

还是在那个范围内:Data、Structure、Optimization。

我们这里尝试一下data augmentations

(好像model architectures要使用resnet?)

增强处:水平反转、旋转、自动增强

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from torchvision.transforms.transforms import RandomHorizontalFlip
train_tfm1 = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
])

train_tfm2 = transforms.Compose([
transforms.Resize((128, 128)),
transforms.RandomHorizontalFlip(p=1.0),
transforms.ToTensor(),
])

train_tfm3 = transforms.Compose([
transforms.Resize((128, 128)),
transforms.RandomRotation(15),
transforms.ToTensor(),
])

train_tfm4 = transforms.Compose([
transforms.Resize((128, 128)),
transforms.AutoAugment(),
transforms.ToTensor(),
])

test_tfm = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
])

然后把原data和增强data拼接起来,这样就有足够的train data了。

1
2
3
4
5
6
7
train_set1 = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm1)
train_set2 = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm2)
train_set3 = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm3)
train_set4 = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm4)
...
train_set = ConcatDataset([train_set1, train_set2, train_set3, train_set4])
...

然后再对learning rate和n_epochs调参调参调参。。。

结果:双过maedium baseline

注1:这里还有一个使用ImageNet的mean和var做标准化的方法,但是我不太会用,加上去效果没有更好。

1
2
3
4
5
6
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

注2:关于各种transform方法的实例,详见pytorch教程ILLUSTRATION OF TRANSFORMS

Strong Baseline

TA’s Slide: Improve the performance with additional unlabeled images.

TA将本次作业数据集中的大部分labeled images变成了unlabeled images,这里也是想要让学生练习Semi-supervised Learning(自监督学习)。

那么我们就来试一下Semi-supervised Learning。

首先构建PseudoDataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class PseudoDataset(Dataset):
def __init__(self, dataset, indices, labels=[]):
self.dataset = dataset
self.indices = indices
self.targets = labels

def __getitem__(self, idx):
subset = self.dataset[self.indices[idx]]
imgs, _ = subset
if len(self.targets) > 0:
return imgs, self.targets[idx]
else:
return subset

def __len__(self):
return len(self.indices)

然后修改get_pseudo_labels

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def get_pseudo_labels(dataset, model, threshold=0.65):
# This functions generates pseudo-labels of a dataset using given model.
# It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
# You are NOT allowed to use any models trained on external data for pseudo-labeling.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Construct a data loader.
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)

# Make sure the model is in eval mode.
model.eval()
# Define softmax function.
softmax = nn.Softmax(dim=-1)

# Iterate over the dataset by batches.
pseudo_probs = []
pseudo_labels = []

for batch in tqdm(data_loader):
img, _ = batch

# Forward the data
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(img.to(device)) # model's output

# Obtain the probability distributions by applying softmax on logits.
probs = softmax(logits)
probs_max, preds = probs.max(1) # probabilities, predicts

# ---------- TODO ----------
# Filter the data and construct a new dataset.
pseudo_probs.extend(probs_max.cpu().numpy().tolist())
pseudo_labels.extend(preds.cpu().numpy().tolist())

pseudo_indices = [i for i, v in enumerate(pseudo_probs) if v >= threshold]
pseudo_set = PseudoDataset(dataset, pseudo_indices, [pseudo_labels[i] for i in pseudo_indices])

print(f"Pseudo images above Confidence {threshold:.2f}: {len(pseudo_indices)}")

model.train()
return pseudo_set

在Training中加入semi-supervised learning。

注意两点:

  1. 当验证集最好精度达到一定值时再开始semi-supervised learning,否则model对unlabel datas的label不准确。
  2. 为了避免semi-supervised learning影响效率,每semi_turns轮进行一次pseudo。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
for epoch in range(n_epochs):
# ---------- TODO ----------
# In each epoch, relabel the unlabeled dataset for semi-supervised learning.
# Then you can combine the labeled dataset and pseudo-labeled dataset for the training.
if do_semi and best_acc > 0.7 and epoch % semi_turns == 0:
# Obtain pseudo-labels for unlabeled data using trained model.
pseudo_set = get_pseudo_labels(unlabeled_set, model, threshold=threshold)

# Construct a new dataset and a data loader for training.
# This is used in semi-supervised learning only.
concat_dataset = ConcatDataset([train_set, pseudo_set])
# biased_sampler = BiasedSampler(concat_dataset, batch_size=batch_size, minor_ratio=0.9)
# train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, sampler=biased_sampler, num_workers=2, pin_memory=True)
train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)

# ---------- Training ----------
model.train()

train_loss = []
train_accs = []

for batch in tqdm(train_loader):
...
# ---------- Validation ----------
model.eval()

valid_loss = []
valid_accs = []

for batch in tqdm(valid_loader):
...

valid_loss = sum(valid_loss) / len(valid_loss)
valid_acc = sum(valid_accs) / len(valid_accs)

if valid_acc > best_acc:
best_acc = valid_acc
...
...

最后我又调整了transform和CNN architecture,然而这样的效果也不尽人意

最后一条可行的思路是加入Sampler,也就是说新拼接的数据集的dataloader应尽量选取labeled dataset,少选取pseudo dataset。

但是我发现自己再进行下去完全是在做无用功,等把后面的课听完再回来做一做试试,所以目前就到此为止了。

Course中的小知识点

如何降低loss

image-20220731173535493

data Augmentation

cross validation

避免public testing set和private testing set差距过大

image-20220731172904653

k-fold cross validation

image-20220731173229169

overfitting

image-20220731174234292

极端的例子:model在训练集上100%准确(loss=0),在测试集上准确度接近0%(loss很大)

原因:

image-20220731174434486

model的flexibility

可以通过调整层数来constrain模型,或者增加training data(数据不够?可以Data Augmentation)

image-20220731174653735

但不能constrain过度

image-20220731174735504

mismatch

image-20220731173609612

eg.

Hung-yi Lee:感谢大家为了让这个模型不准,上周五花了很多力气去点了这个video,所以这一天(2021.2.26)是今年观看人数最多的一天!

(哈哈哈哈笑死我了)

image-20220731173551040

flat minima/sharp minima

好minima和坏minima

image-20220801005600511

vanilla GD/GD with momentum

总感觉最优化理论学过,但是有点不记得了

很像共轭梯度法!

image-20220801011510734

hylee老师讲的:

image-20220801010502215

image-20220801010516608

Adaptive Learning Rate(AdaGrad)

Define

image-20220801012811831

image-20220801012854116

root mean square:均方根

Scheduling

Decay

image-20220801015052600

warmup

image-20220801015630687

learning rate adapts dynamically(RMSProp)

image-20220801013420084

image-20220801013435158

RMSProp:RMS+支点

RMSProp + Momentum(Adam)

image-20220801015847962

Cross-entropy

交叉熵

loss function会改变optimization的难度!

Batch Normalization(BN)

notation

$\hat{y}$:y hat

$e^x$:exponential x

$\sum$:Summation

$y{\prime}$:y prime

logit:softmax的输入

$\tilde{x}$:x tilde, normalization后的x

element-wise product:元素对应乘积

inference:test