๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

๊ณ„์ธต์  ๋ชจ๋ธ(Hierarchical model)

๋‹ค์Œ์˜ ์˜ˆ์ œ๋ฅผ ํ†ตํ•ด ๊ณ„์ธต์  ๋ชจ๋ธ(Hierarchical model)์˜ ํŠน์ง•์„ ์•Œ์•„๋ณด๊ณ , ๋ชจ๋ธ๋ง ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ๋ถ„์„ํ•ด ๋ณด์ž.

 

๋ฌธ์ œ)

 

์น™์ด‰์„ ์ƒ์‚ฐํ•˜๋Š” 5๊ฐœ์˜ ๊ณต์žฅ์ด ์žˆ๋‹ค. ๊ฐ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐ๋œ ์น™์ด‰ ๊ณผ์ž 1๊ฐœ์— ๋ฐ•ํ˜€ ์žˆ๋Š” ์ดˆ์ฝ”์นฉ ๊ฐœ์ˆ˜๊ฐ€ ํฌ์•„์†ก ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ณ , ํฌ์•„์†ก ๋ถ„ํฌ์˜ ๋ชจ์ˆ˜๋Š” ๊ฐ๋งˆ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค. cookies ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์น™์ด‰ ๊ณผ์ž๊ฐ€ ์ƒ์‚ฐ๋  ๋•Œ, ๋ฐ•ํ˜€ ์žˆ๋Š” ์ดˆ์ฝ”์นฉ ๊ฐœ์ˆ˜์— ๋Œ€ํ•œ ๋ชจ๋ธ๋ง์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ๋ถ„์„ํ•˜์‹œ์˜ค.

 

ํ’€์ด)

 

 

โ–ท ์œ„์˜ ๋ฌธ์ œ์— ๋Œ€ํ•ด ํฌ๊ฒŒ 3๊ฐ€์ง€ ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•์œผ๋กœ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

(1) Fully independent model: ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๋…๋ฆฝ์ด๋ผ ๊ฐ€์ •ํ•˜๊ณ , ํ•˜๋‚˜์˜ ํฌ์•„์†ก ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๊ฐ ๊ณต์žฅ๋ณ„ ์ฐจ์ด์™€ ๊ฐ™์€ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐ๋œ ์น™์ด‰์˜ ๋น„์Šทํ•œ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

 

(2) Seperate model: ๊ฐ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐ๋œ ์น™์ด‰์— ๋Œ€ํ•œ ๊ฐœ๋ณ„ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๊ฐ ๋ชจ๋ธ์˜ ๋ชจ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š”๋ฐ, ๋‹ค๋ฅธ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐ๋œ ์น™์ด‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

 

(3) Location dependent model: ๊ฐ ๊ณต์žฅ๋ณ„ ํฌ์•„์†ก ๋ถ„ํฌ์˜ ๋ชจ์ˆ˜๋ฅผ ๋”ฐ๋กœ ์ถ”์ •ํ•˜๊ณ , ๊ฐ ๋ชจ์ˆ˜๋Š” ๊ฐ๋งˆ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ํ•˜์˜€์„ ๋•Œ, ๊ฐ๋งˆ๋ถ„ํฌ์˜ ๋‘ ์ดˆ๋ชจ์ˆ˜(Hyperparameter)๊ฐ€ ๊ณตํ†ต์ ์œผ๋กœ ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ์ •ํ•˜์—ฌ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ํ•˜๋‚˜์˜ ๋ชจ๋ธ์—์„œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋ฉฐ, ๊ณต์žฅ๋ณ„ ์ƒ์‚ฐ๋œ ์น™์ด‰์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

โ–ถ ์œ„์˜ Location dependent model๊ณผ One-way ANOVA๋Š” ๋น„์Šทํ•œ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋Š”๋ฐ, One-way ANOVA๋Š” ๋ชจ์ˆ˜์˜ ๋ถ„ํฌ๊ฐ€ ๊ณ ์ •๋œ(Fixed) ์ดˆ๋ชจ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋Š” ์ ์—์„œ Location dependent model๊ณผ ๋‹ค๋ฅด๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Location dependent model์˜ ๊ทธ๋ž˜ํ”„ ํ‘œํ˜„(Graphical representation)์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

์„ธ ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ• ์ค‘ ๊ณ„์ธต์  ํฌ์•„์†ก-๊ฐ๋งˆ ๋ชจ๋ธ์„ ํ†ตํ•ด ๋ถ„์„์„ ํ•  ๊ฒƒ์ด๋‹ค. ์ˆ˜ํ–‰ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๋ฐ์ดํ„ฐ ํ™•์ธ

2. ๋ชจ๋ธ๋ง

3. ๋ชจ๋ธ ํ™•์ธ

4. ๋ชจ๋ธ ์˜ˆ์ธก ๋ฐ ๋ถ„์„

 

1. ๋ฐ์ดํ„ฐ ํ™•์ธ

 

In:

data = read.table(file = '../input/cookies.txt', header = T)

boxplot(chips ~ location, data = data)

 

Out:

 

โ–ท ๋ฐ์ดํ„ฐ๋Š” ์น™์ด‰ 1๊ฐœ์— ํฌํ•จ๋œ ์ดˆ์ฝ”์นฉ์˜ ๊ฐฏ์ˆ˜์ธ chips์™€ ์ƒ์‚ฐ๊ณต์žฅ์„ ์˜๋ฏธํ•˜๋Š” location์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

 

โ–ท ๊ณต์žฅ๋ณ„ ์ƒ์ƒ๋œ ์น™์ด‰์˜ ์ดˆ์ฝ”์นฉ ๊ฐฏ์ˆ˜๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. ๋ชจ๋ธ๋ง

 

In:

library(rjags)

mod_string = " model {
  for (i in 1:length(chips)) {
    chips[i] ~ dpois(lam[location[i]])
  }
  
  for (j in 1:max(location)) {
    lam[j] ~ dgamma(alpha, beta)
  }
  
  mu ~ dgamma(2.0, 1.0/5.0)
  sig ~ dexp(1.0)
  
  alpha = mu^2 / sig^2
  beta = mu / sig^2
} "

data_jags = as.list(data)

params = c('lam', 'mu', 'sig')

mod = jags.model(textConnection(mod_string), data = data_jags, n.chains = 3)

update(mod, 1e3)

mod_sim = coda.samples(model = mod, variable.names = params, n.iter = 5e3)
mod_comb_sim = as.mcmc(do.call(rbind, mod_sim))

 

Out:

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 150
   Unobserved stochastic nodes: 7
   Total graph size: 315

Initializing model

  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
  |**************************************************| 100%
  |**************************************************| 100%

 

โ–ท alpha์™€ beta์— ๋Œ€ํ•˜์—ฌ Re-parameterization์„ ํ•˜์—ฌ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ •ํ•˜์˜€๋‹ค.

 

โ–ท ์ด์™ธ์˜ ์ฝ”๋“œ์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ ์ƒ๋žตํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค. JAGS์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ ์ž ํ•œ๋‹ค๋ฉด, ๋‹ค์Œ์˜ ํฌ์ŠคํŒ…์„ ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค.

 

 

JAGS(Just Another Gibbs Sampler) ์‚ฌ์šฉ๋ฒ•

R์˜ JAGS(Just Another Gibbs Sampler)์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. JAGS๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ณผ์ •์€ 4๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. 1. Specify the model 2. Set up the model 3. Run the MCMC(Markov Chain Monte C..

rooney-song.tistory.com

 

3. ๋ชจ๋ธ ํ™•์ธ

 

In:

gelman.diag(mod_sim)

 

Out:

Potential scale reduction factors:

       Point est. Upper C.I.
lam[1]          1          1
lam[2]          1          1
lam[3]          1          1
lam[4]          1          1
lam[5]          1          1
mu              1          1
sig             1          1

Multivariate psrf

1

 

โ–ท ๋ชจ๋“  ๋ชจ์ˆ˜์˜ ๊ฒฐ๊ณผ๊ฐ€ 1๋กœ ์ˆ˜๋ ดํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

4. ๋ชจ๋ธ ์˜ˆ์ธก ๋ฐ ๋ถ„์„

 

In:

post_params = colMeans(mod_comb_sim)

y_hat = rep(post_params[1:5], each = 30)
resid = data$chips - y_hat

plot(jitter(y_hat), resid)

var(resid[y_hat < 7])
var(resid[y_hat > 11])

 

Out:

[1] 6.447126
[1] 13.72414

 

โ–ท ์˜ˆ์ธก๊ฐ’์ธ y hat์ด ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ๋ถ„์‚ฐ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ํฌ์•„์†ก ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— ๋‚˜ํƒ€๋‚œ ๊ฒฐ๊ณผ์ด๋‹ค.

 

In:

summary(mod_sim)

 

Out:

Iterations = 2001:7000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

         Mean     SD Naive SE Time-series SE
lam[1]  9.283 0.5359 0.004376       0.004595
lam[2]  6.227 0.4580 0.003740       0.004430
lam[3]  9.523 0.5440 0.004442       0.004543
lam[4]  8.939 0.5327 0.004349       0.004337
lam[5] 11.762 0.6196 0.005059       0.005693
mu      9.106 0.9866 0.008055       0.012361
sig     2.078 0.7131 0.005823       0.012066

2. Quantiles for each variable:

         2.5%    25%    50%    75%  97.5%
lam[1]  8.271  8.921  9.269  9.633 10.375
lam[2]  5.374  5.907  6.216  6.530  7.166
lam[3]  8.495  9.143  9.511  9.887 10.601
lam[4]  7.934  8.571  8.923  9.292 10.000
lam[5] 10.593 11.337 11.752 12.177 13.018
mu      7.239  8.492  9.066  9.694 11.165
sig     1.093  1.574  1.949  2.423  3.798

 

โ–ท 5๋ฒˆ์งธ ๊ณต์žฅ์˜ lambda๊ฐ€ ๊ฐ€์žฅ ํฐ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค.

 

์‚ฌํ›„์˜ˆ์ธก๋ถ„ํฌ๋ฅผ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

n_sim = nrow(mod_comb_sim)

lam_pred = rgamma(n = n_sim, 
                  shape = mod_comb_sim[, 'mu']^2/mod_comb_sim[, 'sig']^2, 
                  rate = mod_comb_sim[, 'mu']/mod_comb_sim[, 'sig']^2)
hist(lam_pred)

 

Out:

 

๊ฐ ๊ณต์žฅ๋ณ„ ์‚ฌํ›„์˜ˆ์ธก๋ถ„ํฌ๋ฅผ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

y_pred_1 = rpois(n_sim, lambda = mod_comb_sim[, 'lam[1]'])

plot(density(y_pred_1, bw = 1), col = 'red', ylim = c(0, 0.15), main = NA)

y_pred_2 = rpois(n_sim, lambda = mod_comb_sim[, 'lam[2]'])

lines(density(y_pred_2, bw = 1), col = 'blue')

y_pred_3 = rpois(n_sim, lambda = mod_comb_sim[, 'lam[3]'])

lines(density(y_pred_3, bw = 1), col = 'green')

y_pred_4 = rpois(n_sim, lambda = mod_comb_sim[, 'lam[4]'])

lines(density(y_pred_4, bw = 1), col = 'yellow')

y_pred_5 = rpois(n_sim, lambda = mod_comb_sim[, 'lam[5]'])

lines(density(y_pred_5, bw = 1), col = 'black')

 

Out:

 

โ–ท ๊ฐ ๊ณต์žฅ๋ณ„ ์‚ฌํ›„์˜ˆ์ธก๋ถ„ํฌ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , 5๋ฒˆ์งธ ๊ณต์žฅ์˜ ํ‰๊ท ์ด ๊ฐ€์žฅ ํฐ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

โ–ท ์œ„์˜ ๊ทธ๋ฆผ์„ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•ด ์ƒ์„ฑํ•œ ์‚ฌํ›„์˜ˆ์ธก๋ถ„ํฌ์— ์ƒ˜ํ”Œ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฐ€์„ค ๊ฒ€์ฆ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 1๋ฒˆ์งธ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐํ•œ ์น™์ด‰์˜ ์ดˆ์ฝ”์นฉ์˜ ๊ฐฏ์ˆ˜๊ฐ€ 2๋ฒˆ์งธ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐํ•œ ์ดˆ์ฝ”์นฉ๋ณด๋‹ค ๋งŽ์„ ๊ฐ€๋Šฅ์„ฑ, 1๋ฒˆ์งธ ๊ณต์žฅ์—์„œ ์ƒ์‚ฐํ•œ ์น™์ด‰์˜ ์ดˆ์ฝ”์นฉ์˜ ๊ฐฏ์ˆ˜๊ฐ€ 7๊ฐœ๋ณด๋‹ค ์ž‘์„ ํ™•๋ฅ  ๋“ฑ์ด ์žˆ๋‹ค.

 

โ–ถ ๊ณ„์ธต์  ๋ชจ๋ธ์˜ ์žฅ์ ์€ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ๋‹ค์–‘ํ•œ ๊ฐ€์„ค์— ๋Œ€ํ•œ ๋ถ„์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค๋ฅธ ์†Œ์Šค ๋˜๋Š” ์‹œ๊ฐ„์—์„œ ์–ป์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ถ„์„ํ•˜๋Š” ๋ฉ”ํƒ€ ๋ถ„์„(Meta analysis)์— ํ™œ์šฉ๋œ๋‹ค.

 


Reference:

"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, https://www.coursera.org/learn/bayesian-statistics/.