๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

๊น์Šค ์ƒ˜ํ”Œ๋ง(Gibbs sampling)

๊น์Šค ์ƒ˜ํ”Œ๋ง(Gibbs sampling)์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. ๋‹ค๋ฃฐ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๊น์Šค ์ƒ˜ํ”Œ๋ง

2. ๊น์Šค ์ƒ˜ํ”Œ๋ง์˜ ์˜ˆ์ œ

 

1. ๊น์Šค ์ƒ˜ํ”Œ๋ง

 

๊น์Šค ์ƒ˜ํ”Œ๋ง์€ Metropolis Hastings(์ดํ•˜ MH) ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน๋ณ„ํ•œ ํ˜•ํƒœ๋กœ, ์ œ์•ˆ ๋ถ„ํฌ(Proposal distribution)๋ฅผ ์ž์‹ ์˜ Full conditional distribution๋กœ ๋‘์–ด ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ, ๊ฐ ์‹œํ–‰์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด Acceptance probability๋Š” 1์ด ๋˜๋Š” ์„ฑ์งˆ์„ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค. ๋‹ค์Œ์˜ ์ฆ๋ช…์„ ํ†ตํ•˜์—ฌ ์ด๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

 

 

โ–ท ์ œ์•ˆ ๋ถ„ํฌ๋ฅผ Full conditional posterior๋กœ ๋‘ ์œผ๋กœ์จ ๋ฏธ์„ธ ๊ท ํ˜• ์กฐ๊ฑด(Detailed balance condition)์ด ์„ฑ๋ฆฝํ•˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ alpha๋ฅผ ํ†ตํ•ด ์ „์ดํ™•๋ฅ (Transition probability)์„ ๋ณด์ •ํ•  ํ•„์š” ์—†์ด, ๋งค ์‹œํ–‰์—์„œ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„๋“ค์ด๋ฉด(Accept) ๋œ๋‹ค.

 

๊น์Šค ์ƒ˜ํ”Œ๋ง์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

โ–ท ์œ„์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ชจ์ˆ˜๊ฐ€ 2๊ฐœ์ธ ๊ฒฝ์šฐ์— ๋Œ€ํ•œ ๊ฒƒ์ด๊ณ , ์ด๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๋ฉด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ์ˆ˜์˜ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. ๊น์Šค ์ƒ˜ํ”Œ๋ง์˜ ์˜ˆ์ œ

 

๋ฌธ์ œ)

 

์•„๋ž˜ ๋ชจ๋ธ์˜ ๋ชจ์ˆ˜์ธ mu์™€ sigma^2๋ฅผ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์‹œ์˜ค.

 

 

ํ’€์ด)

 

 

โ–ท Full joint posterior์„ ์ •๋ฆฌํ•˜๋ฉด ์œ„์˜ ์‹์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ๊ฐ ๋ชจ์ˆ˜์— ๋Œ€ํ•˜์—ฌ full conditional posterior์„ ๊ตฌํ•˜๋ ค๋ฉด Full joint posterior์—์„œ ๊ด€์‹ฌ ๋ชจ์ˆ˜๋ฅผ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ๋ชจ์ˆ˜๋Š” ์ƒ์ˆ˜๋กœ ์ทจํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋ถ„ํฌ๋ฅผ ์ถ”๋ก ํ•œ๋‹ค. mu์™€ sigma^2๋Š” ๋ชจ๋‘ ์ผค๋ ˆ์‚ฌ์ „๋ถ„ํฌ๋กœ์จ ์‚ฌ์ „๋ถ„ํฌ์™€ ์‚ฌํ›„๋ถ„ํฌ๊ฐ€ ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๊ฐ ๋ชจ์ˆ˜์˜ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊น์Šค ์ƒ˜ํ”Œ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ๋ณด์ž.

 

In:

update_mu = function(n, ybar, sig2, mu_0, sig2_0) {
  sig2_1 = 1.0 / (n / sig2 + 1.0 / sig2_0)
  mu_1 = sig2_1 * (n * ybar / sig2 + mu_0 / sig2_0)
  return(rnorm(n = 1, mean = mu_1, sd = sqrt(sig2_1)))
}

update_sig2 = function(n, y, mu, nu_0, beta_0) {
  nu_1 = nu_0 + n / 2.0
  sumsq = sum( (y - mu)^2 )
  beta_1 = beta_0 + sumsq / 2.0
  out_gamma = rgamma(n = 1, shape = nu_1, rate = beta_1)
  return(1.0 / out_gamma)
}

gibbs_sampl = function(y, n_iter, init, prior) {
  ybar = mean(y)
  n = length(y)
  
  mu_out = numeric(n_iter)
  sig2_out = numeric(n_iter)
  
  mu_now = init$mu
  
  for (i in 1:n_iter) {
    sig2_now = update_sig2(n = n, 
                           y = y, 
                           mu = mu_now, 
                           nu_0 = prior$nu_0, 
                           beta_0 = prior$beta_0)
    mu_now = update_mu(n = n, 
                       ybar = ybar, 
                       sig2 = sig2_now, 
                       mu_0 = prior$mu_0, 
                       sig2_0 = prior$sig2_0)
    
    sig2_out[i] = sig2_now
    mu_out[i] = mu_now
  }
  
  return(cbind(mu = mu_out, sig2 = sig2_out))
}

 

โ–ท update_mu()์™€ update_sigma()๋ฅผ ๋งŒ๋“ค์–ด ๋ชจ์ˆ˜๋ณ„ ์—…๋ฐ์ดํŠธ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

 

โ–ท gibs_sampl()์€ ๊น์Šค ์ƒ˜ํ”Œ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋”ฐ๋ผ ์‹œํ–‰ ํšŸ์ˆ˜๋งŒํผ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

In:

y = c(1.2, 1.4, -0.5, 0.3, 0.9, 2.3, 1.0, 0.1, 1.3, 1.9)

ybar = mean(y)
n = length(y)

prior = list()
prior$mu_0 = 0.0
prior$sig2_0 = 1.0

prior$n_0 = 2.0
prior$s2_0 = 1.0
prior$nu_0 = prior$n_0 / 2.0
prior$beta_0 = prior$n_0 * prior$s2_0 / 2.0

init = list()
init$mu = 0.0

library('coda')

post_sampl = gibbs_sampl(y = y, n_iter = 1e3, init = init, prior = prior)

plot(as.mcmc(post_sampl))

summary(as.mcmc(post_sampl))

 

โ–ท y๋Š” ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์ด๋‹ค. prior์— ์‚ฌ์ „๋ถ„ํฌ์˜ ๋ชจ์ˆ˜์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•œ๋‹ค.

 

โ–ท init์— ๊น์Šค ์ƒ˜ํ”Œ๋ง์„ ํ•˜๊ธฐ ์œ„ํ•œ ์ดˆ๊ธฐ๊ฐ’์„ ์ฃผ๊ณ , gibbs_sampl()์„ ํ†ตํ•ด ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

โ–ท coda ํŒจํ‚ค์ง€๋ฅผ ์ด์šฉํ•˜์—ฌ plot()๊ณผ summary()๋ฅผ ํ†ตํ•ด ๊น์Šค ์ƒ˜ํ”Œ๋ง ๊ณผ์ •๊ณผ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Out:

Iterations = 1:1000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

       Mean     SD Naive SE Time-series SE
mu   0.9188 0.2758 0.008723       0.008723
sig2 0.9065 0.4727 0.014949       0.015826

2. Quantiles for each variable:

       2.5%    25%    50%   75% 97.5%
mu   0.3451 0.7533 0.9268 1.093 1.454
sig2 0.3796 0.5968 0.7957 1.085 2.179

 


Reference:

"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, www.coursera.org/learn/bayesian-statistics/.

"(๊ธฐ๊ณ„ ํ•™์Šต, Machine Learning) Week 10 Sampling Based Inference | Lecture 7 Gibbs Sampling," AAILab Kaist, www.youtube.com/watch?v=Q9EYGw5QbHc&t=633s.