๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

JAGS(Just Another Gibbs Sampler) ์‚ฌ์šฉ๋ฒ•

R์˜ JAGS(Just Another Gibbs Sampler)์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. JAGS๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ณผ์ •์€ 4๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

 

1. Specify the model

2. Set up the model

3. Run the MCMC(Markov Chain Monte Carlo) sampler

4. Post processing

 

๋‹ค์Œ์˜ ๋ชจ๋ธ์— ์ด๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์ ์šฉํ•˜์—ฌ ์‚ฌํ›„๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋ณด์ž.

 

 

1. Specify the model

 

In:

library(rjags)

mod_string = " model {
  for (i in 1:n) {
    y[i] ~ dnorm(mu, 1.0/sig2)
  }
  
  mu ~ dt(0.0, 1.0/1.0, 1)
  sig2 = 1.0
} "

 

โ–ท ์œ„์˜ ์ฝ”๋“œ์™€ ๊ฐ™์ด ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜์™€ ์‚ฌ์ „๋ถ„ํฌ์— ๋Œ€ํ•ด ์ •์˜ํ•œ๋‹ค.

 

โ–ท R๊ณผ JAGS์˜ ๊ฐ€์žฅ ์ฃผ์š”ํ•œ ์ฐจ์ด๋Š” ๋ถ„ํฌ ํ•จ์ˆ˜์˜ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์— ์žˆ๋‹ค. R์—์„œ๋Š” ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ rnorm()์˜ ๋ชจ์ˆ˜๋กœ ๋ฐ›์ง€๋งŒ, JAGS์—์„œ๋Š” ํ‘œ์ค€ํŽธ์ฐจ ๋Œ€์‹  ๋ถ„์‚ฐ์˜ ์—ญ์ˆ˜์ธ Precision์„ ๋ชจ์ˆ˜๋กœ ๋ฐ›๋Š”๋‹ค. ์ด์™ธ์˜ JAGS์˜ ํ•จ์ˆ˜์™€ ๊ด€๋ จ๋œ ๋‚ด์šฉ์€ ๋ฉ”๋‰ด์–ผ(https://web.sgh.waw.pl/~atoroj/ekonometria_bayesowska/jags_user_manual.pdf)์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. Set up the model

 

In:

set.seed(777)

y = c(1.2, 1.4, -0.5, 0.3, 0.9, 
      2.3, 1.0, 0.1, 1.3, 1.9)
n = length(y)

data = list(y = y, n = n)
params = c('mu')

inits = function() {
  inits = list('mu' = 0.0)
}

mod = jags.model(textConnection(mod_string), 
                 data = data, 
                 inits = inits)

 

Out:

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 10
   Unobserved stochastic nodes: 1
   Total graph size: 15

Initializing model

  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

 

โ–ท ์œ„์˜ ์ฝ”๋“œ์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ์™€ ์ดˆ๊ธฐ๊ฐ’์„ ์„ค์ •ํ•œ ๋’ค, jags.model()์„ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ฝ”๋“œ๋ฅผ ์ปดํŒŒ์ผ ํ•œ๋‹ค.

 

3. Run the MCMC sampler

 

In:

update(mod, 500)

mod_sim = coda.samples(model = mod, 
                       variable.names = params, 
                       n.iter = 1000)

 

Out:

|**************************************************| 100%
|**************************************************| 100%

 

โ–ท update()๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ๋‹ค. ์ฆ‰, ์ดˆ๊ธฐ๊ฐ’๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ ์ฃผ์–ด์ง„ Iteration ๋™์•ˆ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. ์ด๋•Œ, ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋Š” ์ €์žฅ๋˜์ง€ ์•Š๋Š”๋‹ค. ์ด ๊ณผ์ •์€ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด ์ •์ƒ ์ƒํƒœ(Stationary state)์— ๋„๋‹ฌํ•˜๋„๋ก ๋„์›€์„ ์ค€๋‹ค.

 

โ–ท coda.samples()๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•™์Šต๋œ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ n.iter๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

 

4. Post processing

 

In:

summary(mod_sim)

plot(mod_sim)

 

Out:

Iterations = 1501:2500
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE 
       0.88664        0.31683        0.01002        0.01315 

2. Quantiles for each variable:

  2.5%    25%    50%    75%  97.5% 
0.2810 0.6746 0.8952 1.0971 1.5277 

 

 

โ–ท summary()์™€ plot()์„ ์ด์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 


Reference:

"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, https://www.coursera.org/learn/bayesian-statistics/.