๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Deep Learning/Model

์ธ๊ณต์‹ ๊ฒฝ๋ง(Artificial Neural Network) ๊ตฌํ˜„

MNIST ๋ฐ์ดํ„ฐ์˜ ์†๊ธ€์”จ๋กœ ์ ํžŒ ์ˆซ์ž ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋‹ค์ค‘ ๋ถ„๋ฅ˜(Multiclass classification) ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ์—ฌ๊ธฐ(https://www.kaggle.com/c/digit-recognizer)์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

ํŒŒ์ดํ† ์น˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๊ณต์‹ ๊ฒฝ๋ง(Artificial Neural Network)์„ ๊ตฌํ˜„ํ•  ๊ฒƒ์ด๋‹ค. ๊ตฌํ˜„ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ๋ฐ ํ™•์ธ

2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

3. ๋ชจ๋ธ ์„ค์ •

4. ๋ฐ์ดํ„ฐ ํ•™์Šต ๋ฐ ๊ฒ€์ฆ

 

1. ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ๋ฐ ํ™•์ธ

 

In:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt

 

โ–ท ๋ฐ์ดํ„ฐ ์ž…๋ ฅ์— pandas,  ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ์— numpy, ๋”ฅ๋Ÿฌ๋‹ ๊ตฌํ˜„์— torch, ์‹œ๊ฐํ™”์— matplotlib ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•  ๊ฒƒ์ด๋‹ค.

 

In:

df_total = pd.read_csv('../input/train.csv', dtype = np.float32)

 

โ–ท pandas์˜ read_csv()๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ€๋ฅธ๋‹ค. dtype ์ธ์ž๋ฅผ np.float32๋กœ ์ฃผ์–ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์ž.

 

In:

print(df_total.columns)
print(df_total.index)

 

Out:

Index(['label', 'pixel0', 'pixel1', 'pixel2', 'pixel3', 'pixel4', 'pixel5',
       'pixel6', 'pixel7', 'pixel8',
       ...
       'pixel774', 'pixel775', 'pixel776', 'pixel777', 'pixel778', 'pixel779',
       'pixel780', 'pixel781', 'pixel782', 'pixel783'],
      dtype='object', length=785)
RangeIndex(start=0, stop=42000, step=1)

 

โ–ท ๋ฐ์ดํ„ฐ์˜ ์—ด์˜ ๊ฐœ์ˆ˜๋Š” 785๊ฐœ๋กœ, ๋ ˆ์ด๋ธ”(Label)๊ณผ ํ•ด๋‹น ์œ„์น˜์˜ ํ”ฝ์…€(Pixel)์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ ์žˆ๋‹ค. ๊ฐ ์ด๋ฏธ์ง€๋ฅผ ์ด๋ฃจ๋Š” ์ „์ฒด ํ”ฝ์…€์˜ ๊ฐœ์ˆ˜๋Š” 784(28×28)๊ฐœ์ด๊ณ , ๋ฐ์ดํ„ฐ์˜ ํ–‰์˜ ๊ฐœ์ˆ˜๋Š” 42,000๊ฐœ์ด๋‹ค.

 

2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

 

In:

arr_feature = df_total.loc[:, df_total.columns != 'label'].values/255
arr_target = df_total.label.values

arr_feature_train, arr_feature_test, arr_target_train, arr_target_test = train_test_split(arr_feature, 
                                                                                          arr_target, 
                                                                                          test_size = 0.2)

 

โ–ท df_total๋กœ๋ถ€ํ„ฐ ํ”ผ์ฒ˜(Feature)์™€ ๋ ˆ์ด๋ธ”์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์„ ๋ถ„๋ฆฌํ•˜์—ฌ arr_feature๊ณผ arr_target๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. arr_feature์˜ ๊ฒฝ์šฐ, ํ”ผ์ฒ˜์— 255๋กœ ๋‚˜๋ˆ„์–ด ์ •๊ทœํ™”(Normalization)ํ•˜์˜€๋‹ค.

 

โ–ท train_test_split()์„ ์ด์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์„ธํŠธ(arr_feature_train, arr_target_train)์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ(arr_feature_test, arr_target_test)๋ฅผ 8:2๋กœ ๋‚˜๋ˆˆ๋‹ค.

 

In:

ts_feature_train = torch.from_numpy(arr_feature_train)
ts_target_train = torch.from_numpy(arr_target_train).type(torch.LongTensor)

ts_feature_test = torch.from_numpy(arr_feature_test)
ts_target_test = torch.from_numpy(arr_target_test).type(torch.LongTensor)

print(type(ts_feature_train))
print(type(ts_target_train))
print(type(ts_feature_test))
print(type(ts_target_test))

 

Out:

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>

 

โ–ท ๊ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ํ…์„œ(Tensor)๋กœ ๋ฐ”๊พผ๋‹ค.

 

In:

ds_train = torch.utils.data.TensorDataset(ts_feature_train, ts_target_train)
ds_test = torch.utils.data.TensorDataset(ts_feature_test, ts_target_test)

 

โ–ท ๊ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ณ„ ํ”ผ์ฒ˜์™€ ๋ ˆ์ด๋ธ” ํ…์„œ๋ฅผ ํ…์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ(Tensor dataset)๋กœ ํ•ฉ์นœ๋‹ค.

 

In:

batch_size = 256

ldr_train = torch.utils.data.DataLoader(ds_train, 
                                        batch_size = batch_size, 
                                        shuffle = True)
ldr_test = torch.utils.data.DataLoader(ds_train, 
                                       batch_size = batch_size, 
                                       shuffle = True)

 

batch_size: ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ

 

โ–ท ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ํƒ€์ž…์„ ๋ฐ์ดํ„ฐ๋กœ๋”(Data loader)๋กœ ๋ฐ”๊พผ๋‹ค. ์ด ๋•Œ, batch_size์™€ shuffle ์ธ์ž์— ๊ฐ’์„ ์ฃผ์–ด ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ์„ ์„ค์ •ํ•œ๋‹ค.

 

In:

def get_image(data, idx):
    plt.imshow(data[idx].reshape(28, 28))
    plt.axis('off')
    plt.show()

for i in range(10):
    get_image(arr_feature_train, np.where(arr_target_train == i)[0][1])

 

Out:

 

โ–ท ํ›ˆ๋ จ ์„ธํŠธ์˜ ๊ฐ ์ˆซ์ž๋ณ„ ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ๊ฐ ์ˆซ์ž๋ณ„ ์ฒซ ๋ฒˆ์งธ์˜ ์ธ๋ฑ์Šค๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด np.where()๋ฅผ ์ด์šฉํ•˜์˜€๋‹ค.

 

โ–ท 784๊ฐœ์˜ ํ”ฝ์…€์„ 28×28์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด reshape()๋ฅผ ์ด์šฉํ•˜์˜€๋‹ค.

 

3. ๋ชจ๋ธ ์„ค์ •

 

๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ณ(Architecture)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

In:

class ANN(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.fc_1 = nn.Linear(28*28, 512)
        self.fc_2 = nn.Linear(512, 256)
        self.fc_3 = nn.Linear(256, 128)
        self.fc_4 = nn.Linear(128, 64)
        self.fc_5 = nn.Linear(64, 10)
        
        self.dropout = nn.Dropout(p = 0.2)
        self.log_softmax = F.log_softmax
        
    def forward(self, x):
        x = self.dropout(F.relu(self.fc_1(x)))
        x = self.dropout(F.relu(self.fc_2(x)))
        x = self.dropout(F.relu(self.fc_3(x)))
        x = self.dropout(F.relu(self.fc_4(x)))
        
        x = self.log_softmax(self.fc_5(x), dim = 1)
        
        return x

 

โ–ท nn.Linear()๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ณ์™€ ๊ฐ™์ด ์ „๊ฒฐํ•ฉ ๋ ˆ์ด์–ด(Fully-connected layer)๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.

 

โ–ท nn.Dropout()๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆœ์ „ํŒŒ(Propagation) ๊ณผ์ •์— ๊ฐ ๋…ธ๋“œ๋ฅผ ๋“œ๋กญ์•„์›ƒ(Dropout)ํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ๋“œ๋กญ์•„์›ƒ๋  ํ™•๋ฅ ์€ 0.2๋กœ ์„ค์ •ํ•˜์˜€๋‹ค.

 

โ–ท F.log_softmax()๋ฅผ ์ด์šฉํ•˜์—ฌ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์—์„œ ๋กœ๊ทธ ์†Œํ”„ํŠธ ๋งฅ์Šค(Log softmax)๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋Š” ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ(Cross entropy loss)๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•œ ์ž‘์—…์ด๋‹ค.

 

โ–ท F.relu()๋ฅผ ์ด์šฉํ•˜์—ฌ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(Activation function)๋กœ ReLU ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ReLU๋Š” ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ ˆ์ด์–ด์— ์ ์šฉ๋œ๋‹ค.

 

4. ๋ฐ์ดํ„ฐ ํ•™์Šต ๋ฐ ๊ฒ€์ฆ

 

In:

model = ANN()
loss_fun = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.0001)

 

โ–ท loss_fun์„ nn.NLLLoss()๋กœ ์ •์˜ํ•จ์œผ๋กœ์จ ์•ž์˜ ๋กœ๊ทธ ์†Œํ”„ํŠธ ๋งฅ์Šค ๊ณ„์‚ฐ ๊ฒฐ๊ณผ์— ์ ์šฉํ•˜์—ฌ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ๊ตฌํ•˜๊ฒŒ ๋œ๋‹ค.

 

โ–ท  ๋ชจ๋ธ์˜ ์ตœ์ ํ™” ๊ณผ์ •์—๋Š” optim.Adam()์„ ์ด์šฉํ•˜์—ฌ ์ง„ํ–‰ํ•  ๊ฒƒ์ด๋‹ค. ํ•™์Šต๋ฅ (Learning rate)๋Š” 0.0001๋กœ ์„ค์ •ํ•˜์˜€๋‹ค.

 

In:

# (1) ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ ๋ณ€์ˆ˜ ์„ค์ •
epoch = 25
train_loss, test_loss = [], []

for e in range(epoch):

    # (2) ๋ชจ๋ธ ํ•™์Šต
    running_loss_train = 0
    
    for image, label in ldr_train:
        optimizer.zero_grad()
        
        log_pred = model(image)
        loss = loss_fun(log_pred, label)
        
        loss.backward()
        optimizer.step()
        
        running_loss_train += loss.item()
    
    # (3) ๋ชจ๋ธ ๊ฒ€์ฆ
    running_loss_test = 0
    accuracy = 0
    
    with torch.no_grad():
        model.eval()
        
        for image, label in ldr_test:
            log_pred = model(image)
            running_loss_test += loss_fun(log_pred, label)
            pred = torch.exp(log_pred)
            top_prob, top_class = pred.topk(1, dim = 1)
            equal = (top_class == label.view(*top_class.shape))
            accuracy += torch.mean(equal.type(torch.FloatTensor))
            
    model.train()
    
    train_loss.append(running_loss_train/len(ldr_train))
    test_loss.append(running_loss_test/len(ldr_test))
    
    print("Epoch: {}/{}.. ".format(e + 1, epoch),
          "Training Loss: {:.3f}.. ".format(train_loss[-1]),
          "Test Loss: {:.3f}.. ".format(test_loss[-1]),
          "Test Accuracy: {:.3f}".format(accuracy/len(ldr_test)))

 

(1) ํ•™์Šต  ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ ๋ณ€์ˆ˜ ์„ค์ •

 

epoch: ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต(์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ ๊ณผ์ •) ํšŸ์ˆ˜ ์„ค์ •

train_loss, test_loss: ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ํ•™์Šต ๊ฒฐ๊ณผ ์˜ค์ฐจ ๊ธฐ๋ก

 

(2) ๋ชจ๋ธ ํ•™์Šต

 

running_loss_train: ํ›ˆ๋ จ ์„ธํŠธ์˜ ํ•™์Šต ํšŸ์ˆ˜๋ณ„ ์˜ค์ฐจ ๊ธฐ๋ก

 

โ–ท optimizer.zero_grad()๋ฅผ ์ด์šฉํ•˜์—ฌ ์˜ตํ‹ฐ๋งˆ์ด์ €(Optimizer)์˜ ๊ฒฝ์‚ฌ๋„๋ฅผ 0์œผ๋กœ ๋งŒ๋“ ๋‹ค. ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ ๊ณผ์ •์— ๋”ฐ๋ฅธ ์—…๋ฐ์ดํŠธ(Update)๋œ ๊ฒฝ์‚ฌ๋„๊ฐ€ ๋ˆ„์ ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ฆ‰, ๋ฐฐ์น˜์— ๋Œ€ํ•œ ํ•™์Šต์ด ๋๋‚  ๋•Œ๋งˆ๋‹ค ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ๊ฒฝ์‚ฌ๋„๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค.

 

โ–ท model()์„ ์ด์šฉํ•˜์—ฌ ์ˆœ์ „ํŒŒ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฐ ํด๋ž˜์Šค ๋ณ„ ๋กœ๊ทธ ์†Œํ”„ํŠธ ๋งฅ์Šค ๊ฐ’์„ ๊ตฌํ•œ๋‹ค. log_pred์™€ label์„ loss_fun()์— ์ธ์ž๋กœ ์ฃผ์–ด ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ๊ตฌํ•œ๋‹ค.

 

โ–ท model.eval()์„ ์ด์šฉํ•˜์—ฌ ํ‰๊ฐ€๋ชจ๋“œ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

 

โ–ท loss.backward()๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ญ์ „ํŒŒ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๊ณ , optimizer.step()์„ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

 

โ–ท running_loss_train์—  ํ›ˆ๋ จ ์„ธํŠธ์˜ ๋ชจ๋“  ๋ฐฐ์น˜ loss.item()์„ ๋”ํ•˜์—ฌ ์ „์ฒด ์˜ค์ฐจ๋ฅผ ๊ตฌํ•œ๋‹ค.

 

โ–ท model.train()์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต๋ชจ๋“œ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

 

(3) ๋ชจ๋ธ ๊ฒ€์ฆ

 

running_loss_test: ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ํ•™์Šต ํšŸ์ˆ˜๋ณ„ ์˜ค์ฐจ ๊ธฐ๋ก

accuracy: ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ณผ ์‹ค์ œ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ๊ธฐ๋ก

 

โ–ท torch.no_grad()๋Š” ํ…์„œ์— ์ €์žฅ๋œ ๊ฒฝ์‚ฌ๋„๋ฅผ ์ง€์šด๋‹ค. ๋ชจ๋ธ ๊ฒ€์ฆ ๊ณผ์ •์—์„œ ํ…์„œ์— ์ €์žฅ๋œ ๊ฒฝ์‚ฌ๋„๊ฐ€ ํ•„์š”์—†๊ธฐ ๋•Œ๋ฌธ์— with๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒ€์ฆ ๊ณผ์ •์˜ ๋ชจ๋“  ํ…์„œ์— torch.no_grad()๋ฅผ ์ ์šฉํ•œ๋‹ค. ์ด๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ ๋ฐ ์—ฐ์‚ฐ ์†๋„ ์ฆ๊ฐ€์— ๋„์›€์ด ๋œ๋‹ค.

 

โ–ท model()์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ์˜ˆ์ธกํ•˜๊ณ ,  log_pred์™€ label์„ loss_fun()์— ์ธ์ž๋กœ ์ฃผ์–ด ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ๊ตฌํ•œ๋‹ค.

 

โ–ท running_loss_test์—  ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๋ชจ๋“  ๋ฐฐ์น˜ loss.item()์„ ๋”ํ•˜์—ฌ ์ „์ฒด ์˜ค์ฐจ๋ฅผ ๊ตฌํ•œ๋‹ค.

 

โ–ท torch.exp()๋ฅผ ์ด์šฉํ•˜์—ฌ log_pred๋ฅผ ํ™•๋ฅ ๋กœ ๋งŒ๋“ค๊ณ , torch.topk()๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋กœ ํ•œ๋‹ค.

 

โ–ท torch.view()๋Š” label๊ณผ top_class์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ™๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. top_class == label์„ ํ†ตํ•ด ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ ๋ ˆ์ด๋ธ”์ด ๊ฐ™์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์ด๋ž€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์€ ๋’ค, ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜์˜€๋‹ค.

 

Out:

Epoch: 1/25..  Training Loss: 1.829..  Test Loss: 0.833..  Test Accuracy: 0.775
Epoch: 2/25..  Training Loss: 0.755..  Test Loss: 0.473..  Test Accuracy: 0.864
Epoch: 3/25..  Training Loss: 0.535..  Test Loss: 0.365..  Test Accuracy: 0.894
Epoch: 4/25..  Training Loss: 0.438..  Test Loss: 0.310..  Test Accuracy: 0.910
Epoch: 5/25..  Training Loss: 0.376..  Test Loss: 0.270..  Test Accuracy: 0.922
Epoch: 6/25..  Training Loss: 0.329..  Test Loss: 0.238..  Test Accuracy: 0.930
Epoch: 7/25..  Training Loss: 0.293..  Test Loss: 0.210..  Test Accuracy: 0.937
Epoch: 8/25..  Training Loss: 0.265..  Test Loss: 0.193..  Test Accuracy: 0.942
Epoch: 9/25..  Training Loss: 0.242..  Test Loss: 0.171..  Test Accuracy: 0.949
Epoch: 10/25..  Training Loss: 0.225..  Test Loss: 0.157..  Test Accuracy: 0.954
Epoch: 11/25..  Training Loss: 0.209..  Test Loss: 0.143..  Test Accuracy: 0.958
Epoch: 12/25..  Training Loss: 0.191..  Test Loss: 0.131..  Test Accuracy: 0.961
Epoch: 13/25..  Training Loss: 0.177..  Test Loss: 0.121..  Test Accuracy: 0.965
Epoch: 14/25..  Training Loss: 0.164..  Test Loss: 0.114..  Test Accuracy: 0.967
Epoch: 15/25..  Training Loss: 0.158..  Test Loss: 0.104..  Test Accuracy: 0.970
Epoch: 16/25..  Training Loss: 0.148..  Test Loss: 0.098..  Test Accuracy: 0.972
Epoch: 17/25..  Training Loss: 0.138..  Test Loss: 0.093..  Test Accuracy: 0.973
Epoch: 18/25..  Training Loss: 0.130..  Test Loss: 0.090..  Test Accuracy: 0.974
Epoch: 19/25..  Training Loss: 0.122..  Test Loss: 0.078..  Test Accuracy: 0.978
Epoch: 20/25..  Training Loss: 0.114..  Test Loss: 0.075..  Test Accuracy: 0.979
Epoch: 21/25..  Training Loss: 0.110..  Test Loss: 0.069..  Test Accuracy: 0.980
Epoch: 22/25..  Training Loss: 0.105..  Test Loss: 0.065..  Test Accuracy: 0.982
Epoch: 23/25..  Training Loss: 0.098..  Test Loss: 0.060..  Test Accuracy: 0.983
Epoch: 24/25..  Training Loss: 0.096..  Test Loss: 0.056..  Test Accuracy: 0.984
Epoch: 25/25..  Training Loss: 0.089..  Test Loss: 0.053..  Test Accuracy: 0.985

 

In:

plt.plot(train_loss, label = 'Training loss')
plt.plot(test_loss, label = 'Test loss')
plt.xlabel('Epoch')
plt.ylabel('Cross Entropy Loss')
plt.legend(frameon = True)

 

Out:

 

โ–ท ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์˜ค์ฐจ๊ฐ€ ์ค„์–ด๋“ค๊ณ  ์žˆ๋Š” ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

โ–ท ๋งˆ์ง€๋ง‰ ์—ํญ(Epoch)์—์„œ ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„๊ฐ€ 98.5%๋กœ ์ƒ๋‹นํžˆ ํก์กฑํ•  ๋งŒํผ ๋‚˜์™”๋‹ค. ๋ฌผ๋ก , ์ •ํ™•ํ•œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”  ๋ณด์ง€ ์•Š์€(Unseen) ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฒ€์ฆ์ด ํ•„์š”ํ•  ๊ฒƒ์ด๋‹ค. ์ด๋Š” ์ƒ๋žตํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค.

 


Reference:

"MNIST: Introduction to ComputerVision with PyTorch," Abhinand, https://www.kaggle.com/abhinand05/mnist-introduction-to-computervision-with-pytorch.