๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Deep Learning/Model

์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(Recurrent Neural Network) ๊ตฌํ˜„

ํŒŒ์ดํ† ์น˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(Recurrent Neural Network)๋ฅผ ๊ตฌํ˜„ํ•  ๊ฒƒ์ด๋‹ค. ๊ตฌํ˜„ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

2. ๋ชจ๋ธ ์„ค์ •

3. ๋ชจ๋ธ ํ•™์Šต

4. ํ•™์Šต ๊ฒฐ๊ณผ

 

1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

 

In:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

string = "To climb steep hills requires a slow pace at first."

chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ?!.,:;'01"
char_list = [i for i in chars]

n_letter = len(chars)

def string_to_onehot(string):
    start = np.zeros(shape = len(char_list), dtype = int)
    end = np.zeros(shape = len(char_list), dtype = int)
    
    start[-2] = 1
    end[-1] = 1
    
    for i in string:
        idx = char_list.index(i)
        
        zero = np.zeros(shape = n_letter, dtype = int)
        zero[idx] = 1
        
        start = np.vstack([start, zero])
        
    output = np.vstack([start, end])
    
    return(output)

def onehot_to_word(onehot):
    onehot_ = torch.Tensor.numpy(onehot)
    
    return char_list[onehot_.argmax()]

 

โ–ท ํ•™์Šต์— ์ด์šฉ๋  ๋ฌธ์žฅ์€ "To climb steep hills requires a slow pace at first."๋กœ RNN์„ ํ†ตํ•ด ์ด๋ฅผ ํ•™์Šตํ•œ ๋’ค, ์˜ˆ์ธกํ•  ๊ฒƒ์ด๋‹ค.

 

โ–ท ๋ชจ๋ธ์ด ๋ฌธ์žฅ์„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ์›ํ•ซ ์ธ์ฝ”๋”ฉ์„ ํ•˜์˜€๋‹ค. char_list๋Š” ๋ฌธ์žฅ์—์„œ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๊ธ€์ž์˜ ๊ฒฝ์šฐ๋กœ, ๊ฐ ๊ธ€์ž์˜ ํ•ด๋‹น ์œ„์น˜์— 1 ๋˜๋Š” 0์œผ๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ค€์„ ์ œ์‹œํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ธ€์ž B๋Š” char_list์˜ ๋‘ ๋ฒˆ์งธ ์œ„์น˜์— ํ•ด๋‹นํ•˜๋ฏ€๋กœ ์›ํ•ซ ์ธ์ฝ”๋”ฉ์˜ ๊ฒฐ๊ณผ๋Š” [0, 1, ... , 0]๊ฐ€ ๋œ๋‹ค.

 

โ–ท string_to_onehot ํ•จ์ˆ˜๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅ ๋ฐ›์•„์„œ ์›ํ•ซ ์ธ์ฝ”๋”ฉ๋œ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

 

โ–ท onehot_to_word ํ•จ์ˆ˜๋Š” ์›ํ•ซ ์ธ์ฝ”๋”ฉ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์‹œ ์›๋ž˜์˜ ๋ฌธ์žฅ์œผ๋กœ ๋งŒ๋“ค์–ด ์ถœ๋ ฅํ•œ๋‹ค.

 

2. ๋ชจ๋ธ ์„ค์ •

 

In:

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.ih = nn.Linear(input_size, hidden_size)
        self.hh = nn.Linear(hidden_size, hidden_size)
        self.io = nn.Linear(hidden_size, output_size)
        self.act_fn = nn.Tanh()
        
    def forward(self, input, hidden):
        hidden = self.act_fn(self.ih(input) + self.hh(hidden))
        output = self.io(hidden)
        
        return output, hidden
    
    def init_hidden(self):
        return torch.zeros(1, self.hidden_size)
 
epochs = 1000
n_hidden = 50
learning_rate = 0.01

rnn = RNN(n_letter, n_hidden, n_letter)

loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr = learning_rate)

 

โ–ท RNN์€ ์ธํ’‹, ํžˆ๋“ , ์•„์›ƒํ’‹ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ๊ฐ ๋ ˆ์ด์–ด์—์„œ์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ „๋‹ฌ๋  ๋•Œ, ์„ธ ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์กฐ์ ˆํ•˜๊ฒŒ ๋œ๋‹ค.  ์ธํ’‹ ๋ ˆ์ด์–ด์—์„œ ํžˆ๋“  ๋ ˆ์ด์–ด๋กœ, ํžˆ๋“  ๋ ˆ์ด์–ด์—์„œ ํžˆ๋“  ๋ ˆ์ด์–ด๋กœ, ํžˆ๋“  ๋ ˆ์ด์–ด์—์„œ ์•„์›ƒํ’‹ ๋ ˆ์ด์–ด๋กœ ์ „๋‹ฌ๋  ๋•Œ์ด๋‹ค. RNN ํด๋ž˜์Šค ๋‚ด๋ถ€์˜ __init__ ํ•จ์ˆ˜๋Š” ์ด๋ฅผ ๊ตฌํ˜„ํ•ด ๋†“์€ ๊ฒฐ๊ณผ์ด๋‹ค.

 

โ–ท ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ํ•˜์ดํผ๋ณผ๋ฆญ ํƒ„์  ํŠธ๋กœ ์„ค์ •ํ•˜์˜€๋‹ค.

 

โ–ท init_hidden ํ•จ์ˆ˜๋Š” ์ดˆ๊ธฐ ํžˆ๋“  ๋ ˆ์ด์–ด ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค.

 

โ–ท ์—ํญ, ํžˆ๋“  ๋ ˆ์ด์–ด์˜ ์ˆ˜์™€ ํ•™์Šต๋ฅ ์„ epochs, n_hidden, learning_rate๋ฅผ ํ†ตํ•ด ์„ค์ •ํ•œ ๋’ค, RNN ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ RNN์˜ ๊ตฌ์กฐ๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค.

 

โ–ท ๋ชจ๋ธ์˜ ์†์‹ค ํ•จ์ˆ˜๋Š” MSE๋กœ, ์ตœ์ ํ™”์—๋Š” Adam ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์„ค์ •ํ•˜์˜€๋‹ค.

 

3. ๋ชจ๋ธ ํ•™์Šต

 

In:

one_hot = torch.from_numpy(string_to_onehot(string)).type_as(torch.FloatTensor())

for i in range(epochs):
    rnn.zero_grad()
    total_loss = 0
    hidden = rnn.init_hidden()
    
    for j in range(one_hot.size()[0]-1):
        input_char = one_hot[j:j+1,:]
        target = one_hot[j+1]
        
        output, hidden = rnn.forward(input_char, hidden)
        loss = loss_func(output.view(-1), target.view(-1))
        total_loss += loss
        
    total_loss.backward()
    optimizer.step()
    
    if i % 10 == 0:
        print(total_loss)

 

Out:

tensor(1.6622, grad_fn=<AddBackward0>)
tensor(0.5862, grad_fn=<AddBackward0>)
tensor(0.3629, grad_fn=<AddBackward0>)
tensor(0.2178, grad_fn=<AddBackward0>)
tensor(0.1408, grad_fn=<AddBackward0>)
tensor(0.0931, grad_fn=<AddBackward0>)
tensor(0.0806, grad_fn=<AddBackward0>)
tensor(0.0560, grad_fn=<AddBackward0>)
tensor(0.0423, grad_fn=<AddBackward0>)
tensor(0.0358, grad_fn=<AddBackward0>)
tensor(0.0256, grad_fn=<AddBackward0>)
tensor(0.0201, grad_fn=<AddBackward0>)
tensor(0.0204, grad_fn=<AddBackward0>)
tensor(0.0149, grad_fn=<AddBackward0>)
tensor(0.0115, grad_fn=<AddBackward0>)
tensor(0.0093, grad_fn=<AddBackward0>)
tensor(0.0079, grad_fn=<AddBackward0>)
tensor(0.0092, grad_fn=<AddBackward0>)
tensor(0.0072, grad_fn=<AddBackward0>)
tensor(0.0058, grad_fn=<AddBackward0>)
tensor(0.0082, grad_fn=<AddBackward0>)
tensor(0.0051, grad_fn=<AddBackward0>)
tensor(0.0043, grad_fn=<AddBackward0>)
tensor(0.0035, grad_fn=<AddBackward0>)
tensor(0.0031, grad_fn=<AddBackward0>)
tensor(0.0027, grad_fn=<AddBackward0>)
tensor(0.0033, grad_fn=<AddBackward0>)
tensor(0.0053, grad_fn=<AddBackward0>)
tensor(0.0027, grad_fn=<AddBackward0>)
tensor(0.0022, grad_fn=<AddBackward0>)
tensor(0.0019, grad_fn=<AddBackward0>)
tensor(0.0017, grad_fn=<AddBackward0>)
tensor(0.0015, grad_fn=<AddBackward0>)
tensor(0.0014, grad_fn=<AddBackward0>)
tensor(0.0013, grad_fn=<AddBackward0>)
tensor(0.0016, grad_fn=<AddBackward0>)
tensor(0.0041, grad_fn=<AddBackward0>)
tensor(0.0020, grad_fn=<AddBackward0>)
tensor(0.0014, grad_fn=<AddBackward0>)
tensor(0.0011, grad_fn=<AddBackward0>)
tensor(0.0010, grad_fn=<AddBackward0>)
tensor(0.0009, grad_fn=<AddBackward0>)
tensor(0.0008, grad_fn=<AddBackward0>)
tensor(0.0008, grad_fn=<AddBackward0>)
tensor(0.0007, grad_fn=<AddBackward0>)
tensor(0.0025, grad_fn=<AddBackward0>)
tensor(0.0020, grad_fn=<AddBackward0>)
tensor(0.0012, grad_fn=<AddBackward0>)
tensor(0.0013, grad_fn=<AddBackward0>)
tensor(0.0007, grad_fn=<AddBackward0>)
tensor(0.0006, grad_fn=<AddBackward0>)
tensor(0.0005, grad_fn=<AddBackward0>)
tensor(0.0005, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0011, grad_fn=<AddBackward0>)
tensor(0.0013, grad_fn=<AddBackward0>)
tensor(0.0006, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0029, grad_fn=<AddBackward0>)
tensor(0.0006, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0022, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0005, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0024, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0003, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(9.7875e-05, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(0.0025, grad_fn=<AddBackward0>)
tensor(0.0004, grad_fn=<AddBackward0>)
tensor(0.0002, grad_fn=<AddBackward0>)
tensor(0.0001, grad_fn=<AddBackward0>)
tensor(9.9047e-05, grad_fn=<AddBackward0>)
tensor(8.2265e-05, grad_fn=<AddBackward0>)
tensor(7.6619e-05, grad_fn=<AddBackward0>)

 

โ–ท one_hot์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์›ํ•ซ ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋ฐ”๊พผ ํ›„, ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ…์„œ์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๊พธ์–ด ๋†“์€ ๊ฒƒ์ด๋‹ค.

 

โ–ท for i in range(epochs) ๋ฌธ ์ดํ•˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ์ฝ”๋“œ์ด๊ณ , ๋ชจ๋ธ์˜ ํ•™์Šต ๊ฒฐ๊ณผ๊ฐ€ ์—ํญ์ด ๋งค 10ํšŒ ์ง€๋‚  ๋–„๋งˆ๋‹ค ์ถœ๋ ฅํ•˜๋„๋ก ๊ตฌํ˜„ํ•˜์˜€๋‹ค.

 

โ–ท ํ•™์Šต ๋ฌธ์žฅ์˜ ๊ธธ์ด๊ฐ€ ์งง์•„, MSE๊ฐ€ 0์— ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

4. ํ•™์Šต ๊ฒฐ๊ณผ

 

In:

start = torch.zeros(1, len(char_list))
start[:, -2] = 1

with torch.no_grad():
    hidden = rnn.init_hidden()
    input_char = start
    output_string = ""
    
    for i in range(len(string)):
        output, hidden = rnn.forward(input_char, hidden)
        output_string += onehot_to_word(output.data)
        input_char = output
        
print(output_string)

 

Out:

To climb steep hills ples ereerereareirii1i pilim

 

โ–ท ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ์žฅ์˜ ์ฒซ ์‹œ์ž‘์„ ๋‚˜ํƒ€๋‚ด๋Š” [0, ..., 1, 0]์„ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ์ฃผ์—ˆ๋‹ค.

 

โ–ท rnn.init_hidden ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ดˆ๊ธฐ ํžˆ๋“  ๋ ˆ์ด์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ž…๋ ฅ๊ฐ’๊ณผ ํžˆ๋“  ๋ ˆ์ด์–ด๋ฅผ rnn.forward ํ•จ์ˆ˜์— ์ธ์ž๋กœ ์ฃผ์–ด ์ถœ๋ ฅ๊ฐ’๊ณผ ํžˆ๋“  ๋ ˆ์ด์–ด๋ฅผ ์–ป๋Š”๋‹ค. ์ด ์ถœ๋ ฅ๊ฐ’๊ณผ ํžˆ๋“  ๋ ˆ์ด์–ด๋Š” ๋‹ค์Œ ์ถœ๋ ฅ๊ฐ’์„ ์–ป๊ธฐ ์œ„ํ•œ rnn.forward ํ•จ์ˆ˜์˜ ์ธ์ž๋กœ ์‚ฌ์šฉ๋˜๊ณ , ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ๊ธฐ์กด ํ•™์Šต ๋ฌธ์žฅ์˜ ๊ธธ์ด๋งŒํผ ์‹ค์‹œํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š”๋‹ค.

 

โ–ท ๋‹จ์ˆœํ•œ  ๋ฒ„์ „์ธ ๋งŒํผ ๋งŒ์กฑ์Šค๋Ÿฌ์šด ๊ฒฐ๊ณผ๋Š” ์•„๋‹ˆ์ง€๋งŒ, ์›๋ž˜ ๋ฌธ์žฅ๊ณผ ๋น„์Šทํ•œ ํ˜•ํƒœ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 


Reference:

์ตœ๊ฑดํ˜ธ, ใ€ŒํŒŒ์ดํ† ์น˜ ์ฒซ๊ฑธ์Œใ€, ํ•œ๋น›๋ฏธ๋””์–ด(2019)