๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

๋ฉ”ํŠธ๋กœํด๋ฆฌ์Šค ํ—ค์ด์ŠคํŒ…์Šค ์•Œ๊ณ ๋ฆฌ์ฆ˜(Metropolis-Hastings algorithm)

Metropolis-Hastings(์ดํ•˜ MH) ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค.

 

MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ MCMC(Markov Chain Monte-Carlo)์˜ ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ๋กœ์จ ํŠน์ • ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์ •์ƒ๋ถ„ํฌ๋กœ ๊ฐ–๋Š” ์ฒด์ธ์„ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ํŠน์ • ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋‹ค๋ฃฐ ๋‚ด์šฉ์œผ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. MH ์•Œ๊ณ ๋ฆฌ์ฆ˜

2. Random walk MH ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„

 

1. MH ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

โ–ท q๋Š” ์ œ์•ˆ ๋ถ„ํฌ(Proposal distribution)๋ฅผ ์˜๋ฏธํ•˜๊ณ , g๋Š” ์šฐ๋ฆฌ์˜ ๋ชฉ์  ๋ถ„ํฌ(Target distribution)์—์„œ ์ •๊ทœํ™” ์ƒ์ˆ˜(Normalizing constant)๋ฅผ ์ œ์™ธํ•œ ๋ถ€๋ถ„์ด๋‹ค. ์ฆ‰, ๋ชฉ์  ๋ถ„ํฌ์™€ g(theta)๋Š” ๋น„๋ก€ ๊ด€๊ณ„๊ฐ€ ์„ฑ๋ฆฝํ•œ๋‹ค.

 

โ–ท ์ดˆ๊ธฐ๊ฐ’ theta 0๋ฅผ ์ •ํ•˜๊ณ , q(theta i-1 | theta star)๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ ๋’ค, alpha๋ฅผ ๊ตฌํ•œ๋‹ค. alpha๊ฐ€ 1 ์ด์ƒ์ด๋ฉด theta star๋ฅผ theta i๋กœ ์ •ํ•˜๊ณ , ์•„๋‹Œ ๊ฒฝ์šฐ์—๋Š” ๊ธฐ์กด์˜ theta i-1๋ฅผ theta i๋กœ ์ •ํ•œ๋‹ค.

 

โ–ท ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ƒ์„ฑํ•œ theta ์ค‘ ์ˆ˜๋ ดํ•˜๊ธฐ ์‹œ์ž‘ํ•œ ๋ถ€๋ถ„๋ถ€ํ„ฐ ๋ชฉ์  ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ๋กœ์จ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ•ต์‹ฌ์ธ alpha๊ฐ€ ์–ด๋–ป๊ฒŒ ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€ ์•Œ์•„๋ณด์ž.

 

 

โ–ท ์œ„์˜ ๊ทธ๋ฆผ์€ ๋‘ ๋…ธ๋“œ๊ฐ„์˜ ์ฒด์ธ์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ๋ชฉ์  ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ(Markov chain)์ด ์ •์ƒ ์ƒํƒœ(Stationary state)๊ฐ€ ๋˜์–ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด  ๋ฏธ์„ธ ๊ท ํ˜• ์กฐ๊ฑด(Detailed balance condition)์„ ๋งŒ์กฑํ•˜๋„๋ก ์ œ์•ฝ์„ ์ค€๋‹ค. ์ œ์•ฝ์œผ๋กœ์จ ์œ„์˜ ์‹๊ณผ ๊ฐ™์ด ๊ฐ ์ƒํƒœ์˜ ์ „์ดํ™•๋ฅ (Transition probability)์— r์„ ๊ณฑํ•˜์—ฌ ๊ฐ™๋„๋ก ๋งŒ๋“ ๋‹ค.

 

โ–ท alpha์˜ ๋ถ„์ž q(theta i | theta star)×g(theta star)๊ฐ€ ๋ถ„๋ชจ q(theta star | theta i)×g(theta i)๋ณด๋‹ค ํฌ๋ฉด ๋ถ„์ž์— ๊ณฑํ•ด์ง€๋Š” r์€ ์ž‘๊ฒŒ, ๋ถ„๋ชจ์— ๊ณฑํ•ด์ง„ r์€ ํฌ๊ฒŒ ๋งŒ๋“ค์–ด ์ค˜์•ผ ๋ฏธ์„ธ ๊ท ํ˜• ์กฐ๊ฑด์ด ์„ฑ๋ฆฝํ•˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ r theta i -> theta star์€ 1๋กœ r theta star -> theta i๋Š” alpha์˜ ์—ญ์ˆ˜๋กœ ๋‘”๋‹ค. ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ์—๋Š” r theta i -> theta star์€ alpha๋กœ r theta i -> theta star๋Š” 1๋กœ ๋‘”๋‹ค. ์ด๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

โ–ท ์ด์ „ ์ƒํƒœ๋กœ๋ถ€ํ„ฐ ๋‹ค์Œ ์ƒํƒœ์˜ ์˜ˆ์ธก์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋ฏ€๋กœ,  r theta i -> theta star๋ฅผ ํ†ตํ•ด ๋‹ค์Œ ์ƒํƒœ๋ฅผ ๊ฒฐ์ •ํ•˜์—ฌ์•ผ ํ•œ๋‹ค. ์ฆ‰, alpha์— ๋”ฐ๋ฅธ r theta i -> theta star๊ฐ€ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„๋“ค์ผ์ง€ ๋ง์ง€(Accept or not)์— ๋Œ€ํ•ด ๊ฒฐ์ •ํ•œ๋‹ค. ์ด๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

2. Random walk MH ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„

 

๋ฌธ์ œ)

 

๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜๊ฐ€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ณ , ์ •๊ทœ๋ถ„ํฌ์˜ ํ‰๊ท ์€ t ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹คํ•œ๋‹ค. ๋ฐ์ดํ„ฐ๋กœ 1.2, 1.4, -0.5, 0.3, 0.9, 2.3, 1.0, 0.1, 1.3, 1.9๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ‰๊ท ์˜ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ๋ผ.

 

ํ’€์ด)

 

 

โ–ท ์ •๊ทœํ™” ์ƒ์ˆ˜๋ฅผ ๋ชจ๋ฅธ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์‚ฌ์ „๋ถ„ํฌ์™€ ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜์˜ ๊ณฑ์„ ์ด์šฉํ•˜์—ฌ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ๊ฒƒ์ด๋‹ค. Random walk MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด๋ณด์ž. ์•„๋ž˜์˜ ์ฝ”๋“œ๋Š” R์„ ์ด์šฉํ•œ Random walk MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ์ด๋‹ค.

 

In:

 

log_g <- function(mu, n, x_bar) {
  mu2 <- mu^2
  
  return(n * (x_bar * mu - mu2 / 2.0) - log(1.0 + mu2))
}

mh_sampl <- function(n_data, x_bar, n_iter, mu_init, prop_sd) {
  mu_out <- numeric(n_iter)
  accpt_cnt <- 0
  
  mu_now <- mu_init
  log_g_now <- log_g(mu_now, n_data, x_bar)
  
  for (i in 1:n_iter) {
    mu_cand <- rnorm(1, mu_now, prop_sd)
    log_cand <- log_g(mu_cand, n_data, x_bar)
    log_alpha <- log_cand - log_g_now
    alpha <- exp(log_alpha)
    
    if (runif(1) < alpha) {
      mu_now <- mu_cand
      accpt_cnt <- accpt_cnt + 1
      log_g_now <- log_cand
    }
    
    mu_out[i] <- mu_now
  }
  
  return(list(mu = mu_out, accpt_rate = accpt_cnt/n_iter))
}

 

โ–ท ๊ตฌํ˜„ ๊ณผ์ •์—์„œ alpha๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ์•ฝ๊ฐ„์˜ ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. alpha๋ฅผ ๊ตฌํ•  ๋•Œ, ๋กœ๊ทธ๋ฅผ ์ทจํ•˜์—ฌ ๊ณ„์‚ฐํ•œ ๋’ค, ๋‹ค์‹œ ์ง€์ˆ˜๋ฅผ ํ†ตํ•ด ์›๋ž˜์˜ alpha๋กœ ๋ฐ”๊ฟ”์ฃผ์—ˆ๋‹ค. ์ด๋Š” ๊ณ„์‚ฐ๊ณผ์ • ์ค‘ ๋„ˆ๋ฌด ์ž‘์€ ์ˆซ์ž๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ๋ฐ˜์˜ฌ๋ฆผ ์˜ค๋ฅ˜(Rounding error)๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•จ์ด๋‹ค.

 

โ–ท ์ œ์•ˆ ๋ถ„ํฌ๋กœ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ, ์ด๋Š” alpha๋ฅผ ๊ณ„์‚ฐ์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ํ•ด์ค€๋‹ค. ์™œ๋ƒํ•˜๋ฉด alpha์˜ ๋ถ„์ž ๋ฐ ๋ถ„๋ชจ์˜ ์ œ์•ˆ ๋ถ„ํฌ์— ๋”ฐ๋ฅธ ํ™•๋ฅ ์ด ๊ฐ™์•„์ ธ ์ƒ์‡„๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

In:

 

x <- c(1.2, 1.4, -0.5, 0.3, 0.9, 2.3, 1.0, 0.1, 1.3, 1.9)

x_bar <- mean(x)
n_data <- length(x)

library(coda)

post_sampl <- mh_sampl(n_data = n_data, x_bar = x_bar, n_iter = 1e4, mu_init = 0.0, prop_sd = 1)

traceplot(as.mcmc(post_sampl$mu))

 

Out:

 

 

โ–ท ์œ„์˜ ๊ทธ๋ฆผ์€ ๊ฐ ๋ฐ˜๋ณต ์ˆ˜ํ–‰์˜ ์ง„ํ–‰์— ๋”ฐ๋ฅธ ์ƒ์„ฑ๋œ mu์˜ ๋ณ€ํ™”๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ํŠน์ • ๊ตฌ๊ฐ„์—์„œ ๋™์ผํ•œ mu๋กœ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” alpha๊ฐ€ 1๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ, alpha์˜ ํ™•๋ฅ ๋กœ ์ˆ˜๋ฝ ๋˜๋Š” ๊ฑฐ์ ˆํ•˜๊ฒŒ(Accept or reject) ๋˜๋Š”๋ฐ, ๊ฑฐ์ ˆ์ด ๋ฐ˜๋ณต๋˜์–ด ๋‚˜ํƒ€๋‚œ ๊ฒฐ๊ณผ์ด๋‹ค.

 

In:

 

str(post_sampl)

 

Out:

 

List of 2
 $ mu        : num [1:1000] 0.54 0.54 0.54 0.54 0.54 ...
 $ accpt_rate: num 0.347

 

โ–ท ํ—ˆ์šฉ ๋น„์œจ์ด 34.7%์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ํ—ˆ์šฉ ๋น„์œจ์€ 23 ~ 50%๊ฐ€ ์ ๋‹นํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

 

MH ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์™€ ์‹ค์ œ ์‚ฌํ›„๋ถ„ํฌ์™€์˜ ์ฐจ์ด๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

 

In:

 

plot(density(post_sampl$mu, adjust = 2.5), main = '', xlim = c(-1.0, 3.0), ylim = c(0, 1.2), xlab = expression(mu))

curve(dt(x = x, df = 1), lty = 2, add = T) # prior for mu

curve(0.017 * exp(log_g(mu = x, n = n_data, x_bar = x_bar)), from = -1.0, to = 3.0, add = TRUE, col = 'blue')

 

Out:

 

 

โ–ท ํŒŒ๋ž‘์„ ์€ ์‹ค์ œ ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , ๊ฒ€์ •์„ ์€ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ฐ€๋„ ์ถ”์ •(Density estimation)์„ ํ†ตํ•ด ์ƒ์„ฑํ•œ ์„ ์ด๋‹ค. ์ ์„ ์€ ์‚ฌ์ „๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

 

โ–ท ๋น„๊ต์  ํŒŒ๋ž‘์„ ๊ณผ ๊ฒ€์ •์„ ์ด ๋น„์Šทํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ƒ์„ฑ ๋ฐ์ดํ„ฐ์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋Š˜๋ฆฐ๋‹ค๋ฉด ์‹ค์ œ ์‚ฌํ›„๋ถ„ํฌ์— ๋” ๊ฐ€๊นŒ์šด ๋ถ„ํฌ๊ฐ€ ๋‚˜ํƒ€๋‚  ๊ฒƒ์ด๋‹ค.

 


Reference:

"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, https://www.coursera.org/learn/bayesian-statistics/.