๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Deep Learning/PyTorch

์ž๋™ ๋ฏธ๋ถ„(Automatic differentiation) ์‚ฌ์šฉ๋ฒ•

ํŒŒ์ดํ† ์น˜์˜ ์ž๋™ ๋ฏธ๋ถ„(Auto differentiation)์„ ์ด์šฉํ•œ ๋ณ€ํ™”๋„(Gradient) ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. ๋‹ค๋ฃฐ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ์ž๋™ ๋ฏธ๋ถ„ ์ค€๋น„

2. ๋ณ€ํ™”๋„ ๊ณ„์‚ฐ

 

1. ์ž๋™ ๋ฏธ๋ถ„ ์ค€๋น„

 

In:

import torch

x = torch.ones(2, 2, requires_grad = True)

print(x)

 

Out:

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

 

โ–ท torch.ones()์— ํ…์„œ ํฌ๊ธฐ์— ๋Œ€ํ•œ ์ธ์ž์™€ requires_grad ์ธ์ž๋ฅผ ์ฃผ์–ด ํ…์„œ๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ ์ฐฝ์— requires_grad=True๊ฐ€ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ์ดํ›„ ์—ญ์ „ํŒŒ ๊ณผ์ •์„ ์ˆ˜ํ–‰ ํ›„, ํ•ด๋‹น ํ…์„œ์˜ ๋ณ€ํ™”๋„๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

 

In:

y = x + 2

print(y)

 

Out:

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

 

โ–ท x์— ๋ง์…ˆ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ y๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ์ฝ”๋“œ ๊ฒฐ๊ณผ์— ์—ฐ์‚ฐ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ์™€ grad_fn์ด <AddBackward0>์ธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. grad_fn์—๋Š” ํ…์„œ๊ฐ€ ์—ฐ์‚ฐ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๊ณ , ์ด ์ •๋ณด๋Š” ์—ญ์ „ํŒŒ ๊ณผ์ •์— ์‚ฌ์šฉ๋  ๊ฒƒ์ด๋‹ค.

 

In:

y = x * 2

print(y)

 

Out:

tensor([[2., 2.],
        [2., 2.]], grad_fn=<MulBackward0>)

 

โ–ท y๊ฐ€ x์— 2๋ฅผ ๊ณฑํ•˜์—ฌ ๋งŒ๋“ค์–ด์กŒ๋‹ค. ์ฝ”๋“œ ๊ฒฐ๊ณผ์˜ grad_fn์ด <MulBackward0>๋กœ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, y๋Š” ๊ณฑ์…ˆ์— ๋Œ€ํ•œ ์—ฐ์ƒ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค.

 

In:

x = torch.randn(2, 2)
y = ((x * 3) / (x - 1))

print(y)
print(y.requires_grad)

y.requires_grad_(True)

print(y)
print(y.requires_grad)

 

Out:

tensor([[-3.7594,  7.5623],
        [ 1.2762,  1.2044]])
False
tensor([[-3.7594,  7.5623],
        [ 1.2762,  1.2044]], requires_grad=True)
True

 

โ–ท ํ…์„œ๋ฅผ ์ƒ์„ฑํ•  ๋•Œ, requires_grad ์ธ์ž์˜ ๊ธฐ๋ณธ๊ฐ’์€ False์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ฒซ ๋ฒˆ์งธ ํ…์„œ์˜ ์ถœ๋ ฅ ๊ฒฐ๊ณผ์—์„œ๋Š” requires_grad์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋‚˜ํƒ€๋‚˜ ์žˆ์ง€ ์•Š๋‹ค.  torch.requires_grad_()๋ฅผ ์ด์šฉํ•˜์—ฌ requires_grad ์ธ์ž์— ์ž…๋ ฅ๊ฐ’์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์ ์šฉํ•œ y์˜ ๊ฒฐ๊ด๊ฐ’์—๋Š” requires_grad๊ฐ€ True๋กœ ๋‚˜ํƒ€๋‚˜ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. ๋ณ€ํ™”๋„ ๊ณ„์‚ฐ

 

In:

x = torch.randn(2, 2, requires_grad=True)
y = x + 2
z = (y * y).sum()

z.backward()

print(x)
print(y)
print(z)

print(x.grad)
print(y.grad)
print(z.grad)

 

Out:

tensor([[-0.0246, -0.5667],
        [-0.0226,  0.1128]], requires_grad=True)
tensor([[1.9754, 1.4333],
        [1.9774, 2.1128]], grad_fn=<AddBackward0>)
tensor(14.3310, grad_fn=<SumBackward0>)
tensor([[3.9509, 2.8667],
        [3.9548, 4.2257]])
None
None

 

โ–ท z์— torch.backward()๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ญ์ „ํŒŒ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. x.grad๋Š” x์˜ ๋ณ€ํ™”๋„์ธ dz/dx์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

 

โ–ท ํ…์„œ y์™€ z์˜ ๊ฒฝ์‚ฌ๋„์˜ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋Š” None์ด ๋‚˜์™”๋‹ค. ์ด๋Š” y, z์˜ requires_grad๋Š” False์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

โ–ถ ๋งŒ์•ฝ z๊ฐ€ ์Šค์นผ๋ผ๊ฐ€ ์•„๋‹Œ ๋ฒกํ„ฐ๋ผ๋ฉด z.backward()๋ฅผ ์‹คํ–‰ํ•  ๊ฒฝ์šฐ, "RuntimeError: grad can be implicitly created only for scalar outputs"๋ผ๋Š” ๋ฌธ๊ตฌ๊ฐ€ ๋œจ๋ฉฐ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

 

In:

x = torch.randn(2, 2, requires_grad=True)
y = x + 2
z = y * y

y.backward(z)

print(x.grad)

 

Out:

tensor([[ 2.1441,  8.8653],
        [10.3739,  2.6593]])

 

โ–ท z๊ฐ€ ์Šค์นผ๋ผ๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ, ์—ญ์ „ํŒŒ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด torch.backward()์— z๋ฅผ ์ธ์ž๋กœ ๋„ฃ์–ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค.

 

์ด์™ธ์˜ ์ž๋™ ๋ฏธ๋ถ„์— ๊ด€ํ•œ ์ •๋ณด๋Š” ์—ฌ๊ธฐ(https://pytorch.org/docs/stable/autograd.html#function)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 


Reference:

"AUTOGRAD: AUTOMATIC DIFFERENTIATION," PyTorch, https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py.

'Deep Learning > PyTorch' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

ํ…์„œ(Tensor) ์‚ฌ์šฉ๋ฒ•  (0) 2020.07.24