๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

๋ฒ ์ด์ง€์•ˆ ์„ ํ˜• ํšŒ๊ท€(Bayesian linear regression)

R์˜ car ํŒจํ‚ค์ง€์˜ Leihardt ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฒ ์ด์ง€์•ˆ ์„ ํ˜• ํšŒ๊ท€(Bayesian linear regression) ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ด๋‹ค. ์ˆ˜ํ–‰ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๋ฐ์ดํ„ฐ ํ™•์ธ

2. ๋ชจ๋ธ๋ง

3. ๋ชจ๋ธ ํ™•์ธ

4. ์ž”์ฐจ ๋ถ„์„

 

1. ๋ฐ์ดํ„ฐ ํ™•์ธ

 

In:

library(car)

data('Leinhardt')

pairs(Leinhardt)

 

Out:

 

โ–ท Leihardt ๋ฐ์ดํ„ฐ๋Š” ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์ธ income, infant, region๊ณผ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์ธ oil๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

 

โ–ท income๊ณผ infant๊ฐ€ ๋น„์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์„ ํ˜•๋ชจ๋ธ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ณ€์ˆ˜๊ฐ„ ์„ ํ˜•๊ด€๊ณ„๋ฅผ ๊ฐ€์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‘ ๋ณ€์ˆ˜์— ๋กœ๊ทธ๋ฅผ ์ทจํ•˜์—ฌ ๋ณ€ํ˜•ํ•  ๊ฒƒ์ด๋‹ค.

 

In:

Leinhardt$log_income = log(Leinhardt$income)
Leinhardt$log_infant = log(Leinhardt$infant)

plot(log_infant ~ log_income, data = Leinhardt)

 

Out:

 

โ–ท ์œ„์˜ ๊ทธ๋ฆผ์€ income๊ณผ infant์— ๋กœ๊ทธ๋ฅผ ์ทจํ•œ ํ›„, ์‹œ๊ฐํ™”ํ•œ ๋ชจ์Šต์ด๋‹ค. ์„ ํ˜•๊ด€๊ณ„๊ฐ€ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. ๋ชจ๋ธ๋ง

 

infant๋ฅผ ์ข…์†๋ณ€์ˆ˜๋กœ, income์„ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ๋‘์–ด ๋ฒ ์ด์ง€์•ˆ ์„ ํ˜•ํšŒ๊ท€ ๋ถ„์„์„ ํ•  ๊ฒƒ์ด๋‹ค. ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ทธ๋ž˜ํ”„  ํ‘œํ˜„(Graphical representation)์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

โ–ท beta0, ..., betak์™€ sigma^2๋Š” ํŠน์ • ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ณ  ์žˆ๊ณ , ๊ฐœ๋ณ„ ๊ด€์ธก์น˜ yi๋Š” ๋…๋ฆฝ์ด๋‹ค.

 

โ–ท beta์˜ ๋ถ„ํฌ๋ฅผ ๋ผํ”Œ๋ผ์Šค ๋ถ„ํฌ(Laplace distribution)๋กœ ๋‘๋ฉด LASSO(Least Absolute Shrinkage and Selection Operator) ํšŒ๊ท€๋ฅผ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๋‹ค. LASSO๋Š” ๋ชจ์ˆ˜์˜ ์‚ฌ์ „๋ถ„ํฌ๋กœ ๋ผํ”Œ๋ผ์Šค ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ์ˆ˜๊ฐ€ 0์ธ ์ง€์ ์„ ์„ ํ˜ธ(Favor)ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ๋ง ๊ณผ์ •์—์„œ ๋ณ€์ˆ˜ ์„ ํƒ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ๋‹ค.

 

beta์˜ ์‚ฌ์ „๋ถ„ํฌ๋ฅผ ์ •๊ทœ๋ถ„ํฌ๋กœ, sigma^2๋ฅผ ์—ญ๊ฐ๋งˆ๋ถ„ํฌ๋กœ ๋‘๊ณ , R์˜ JAGS๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ๋ง์„ ํ•  ๊ฒƒ์ด๋‹ค.

 

In:

library(rjags)

data = na.omit(Leinhardt)

mod_string = " model {
  for (i in 1:n) {
    y[i] ~ dnorm(mu[i], prec)
    mu[i] = b[1] + b[2]*log_income[i]
  }
  
  for (j in 1:2) {
    b[j] ~ dnorm(0.0, 1.0/1.0e6)
  }
  
  prec ~ dgamma(5.0/2.0, 5.0*10.0/2.0)
  sig2 = 1.0/prec
  sig = sqrt(sig2)
} "

data_jags = list(y = data$log_infant, 
                 n = nrow(data), 
                 log_income = data$log_income)

params = c('b', 'sig')

inits = function() {
  inits = list('b' = rnorm(2, 0.0, 100.0), 
               'prec' = rgamma(1, 1.0, 1.0))
}

mod = jags.model(textConnection(mod_string), 
                 data = data_jags, 
                 inits = inits, 
                 n.chains = 3)

update(mod, 1e3)

mod_sim = coda.samples(model = mod, 
                       variable.names = params, 
                       n.iter = 1e4)

mod_comb_sim = do.call(rbind, mod_sim)

 

Out:

  |**************************************************| 100%
  |**************************************************| 100%  

 

โ–ท ์œ„์˜ ์ฝ”๋“œ์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ ์ƒ๋žตํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค. JAGS์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ ์ž ํ•œ๋‹ค๋ฉด, ๋‹ค์Œ์˜ ํฌ์ŠคํŒ…์„ ํ†ตํ•ด ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค.

 

 

JAGS(Just Another Gibbs Sampler) ์‚ฌ์šฉ๋ฒ•

R์˜ JAGS(Just Another Gibbs Sampler)์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. JAGS๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ณผ์ •์€ 4๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. 1. Specify the model 2. Set up the model 3. Run the MCMC(Markov Chain Monte C..

rooney-song.tistory.com

 

3. ๋ชจ๋ธ ํ™•์ธ

 

๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์˜ ์ˆ˜๋ ด ๊ณผ์ •์„ trace plot์„ ํ†ตํ•ด ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

plot(mod_sim)

 

Out:

 

โ–ท b[1], b[2], sig์˜ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด ์ˆ˜๋ ดํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

โ–ถ trace plot์˜ ์ƒ‰๊น”์ด ๋‹ค๋ฅธ ์„ ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์ดˆ๊ธฐ๊ฐ’์„ ํ†ตํ•œ ์ˆ˜๋ ด ๊ณผ์ •์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค.

 

โ–ถ ์˜ค๋ฅธ์ชฝ์˜ ๊ทธ๋ž˜ํ”„๋Š” ๊ฐ ๋ชจ์ˆ˜์˜ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•œ ๊ฒฐ๊ณผ์ด๋‹ค.

 

Gelman-Rubin diagostic์„ ํ†ตํ•ด ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด ์ˆ˜๋ ดํ•˜์˜€๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

gelman.diag(mod_sim)

 

Out:

Potential scale reduction factors:

     Point est. Upper C.I.
b[1]          1       1.01
b[2]          1       1.01
sig           1       1.00

Multivariate psrf

1

 

โ–ท ๋ชจ๋“  ๊ฒฐ๊ณผ๊ฐ€ 1๋กœ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

autocorr.diag(mod_sim)

 

Out:

             b[1]       b[2]          sig
Lag 0  1.00000000 1.00000000  1.000000000
Lag 1  0.95097734 0.95103631  0.030863049
Lag 5  0.78000619 0.77872439 -0.003679779
Lag 10 0.61439905 0.61215120  0.007809461
Lag 50 0.07377375 0.07599907  0.003517834

 

โ–ท ์„ธ ๋ชจ์ˆ˜ ๋ชจ๋‘ ์‹œ์ฐจ๊ฐ€ 50์ธ ์ง€์ ์—์„œ๋Š” ์ž๊ธฐ์ƒ๊ด€์„ฑ์ด ๊ฑฐ์˜ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ESS(Effective Sample Size)๋ฅผ ์•Œ์•„๋ณด์ž.

 

In:

effectiveSize(mod_sim)

 

Out:

      b[1]       b[2]        sig 
  723.2895   722.9372 27469.2788 

 

โ–ท b[1]๊ณผ b[2]๋Š” ESS๊ฐ€ ์•ฝ 700์—ฌ๊ฐœ, sig๋Š” 27,000์—ฌ๊ฐœ์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๊ฐ€ ๊ฐ ๋ชจ์ˆ˜๋ณ„๋กœ 30,000๊ฐœ์ธ ๊ฒƒ์„ ๊ฐ์•ˆํ•˜๋ฉด b[1]๊ณผ b[2]์˜ ESS๋Š” ์ƒ๋Œ€์ ์œผ๋กœ sig์— ๋น„ํ•ด ์ž‘๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

โ–ท b[1]๊ณผ b[2]์˜ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ ๋ชจ์ˆ˜์˜ ํ‰๊ท ์„ ๊ตฌํ•œ๋‹ค๋ฉด ๊ดœ์ฐฎ์ง€๋งŒ, 95% ์‹ ์šฉ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜๊ธฐ์—๋Š” ESS๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ๊ธฐ์ค€์€ raftery.diag()๋ฅผ ์ด์šฉํ•˜์—ฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์˜ ์š”์•ฝ๋œ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

summary(mod_sim)

 

Out:

Iterations = 1001:11000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean      SD  Naive SE Time-series SE
b[1]  7.1775 0.44692 0.0025803      0.0166242
b[2] -0.5170 0.07236 0.0004178      0.0026928
sig   0.9709 0.06848 0.0003953      0.0004135

2. Quantiles for each variable:

        2.5%     25%     50%     75%   97.5%
b[1]  6.3027  6.8773  7.1805  7.4734  8.0663
b[2] -0.6600 -0.5649 -0.5177 -0.4683 -0.3749
sig   0.8485  0.9227  0.9669  1.0142  1.1145

 

4. ์ž”์ฐจ(Residual) ๋ถ„์„

 

์ž”์ฐจ ๋ถ„์„์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ๊ฐ€์ •์„ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ๋ณด์ž.

 

In:

X = cbind(rep(1.0, data_jags$n), data_jags$log_income)
post_params = colMeans(mod_comb_sim)

yhat = drop(X %*% post_params[1:2])
resid = data_jags$y - yhat

plot(resid)
plot(yhat, resid)
qqnorm(resid)

 

Out:

 

โ–ท ์ฒซ ๋ฒˆ์งธ ๊ทธ๋ฆผ์€ ์ž”์ฐจ์˜ ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ์ถ”์„ธ(Trend)๋‚˜ ํŒจํ„ด์„ ํ™•์ธํ•  ์ˆ˜ ์—†๋‹ค. ๋‹ค๋งŒ, ์ž”์ฐจ์˜ ํ‘œ์ค€ํŽธ์ฐจ์˜ 2๋ฐฐ ์ด์ƒ ํฐ ๋‘ ๊ฐœ์˜ ์ด์ƒ์น˜(Outlier)๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

โ–ท ๋‘ ๋ฒˆ์งธ ๊ทธ๋ฆผ์€ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ์™€ ์ž”์ฐจ์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ์ถ”์„ธ๋‚˜ ํŒจํ„ด์„ ํ™•์ธํ•  ์ˆ˜ ์—†๊ณ , ์ž”์ฐจ์˜ ๋Œ€๋žต์ ์ธ ํ‰๊ท ์ด 0์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๋ถ„์‚ฐ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ๊ณผ ์•ž์˜ ๊ฒฐ๊ณผ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‘ ๊ฐœ์˜ ์ด์ƒ์น˜๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์™„๋ฒฝํžˆ ์„ ํ˜•ํšŒ๊ท€์˜ ๊ฐ€์ •์„ ๋งŒ์กฑํ•œ๋‹ค ๋ณด๊ธฐ ์–ด๋ ต๋‹ค.

 

โ–ท ์„ธ ๋ฒˆ์งธ ๊ทธ๋ฆผ์€ ์ž”์ฐจ๊ฐ€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ์ •๊ทœํ™•๋ฅ  ๊ทธ๋ฆผ(Q-Q plot)์ด๋‹ค. ๋Œ€์ฒด์ ์œผ๋กœ ์ง์„ ์˜ ํ˜•ํƒœ๋ฅผ ๋„์ง€๋งŒ, ๋งˆ์ง€๋ง‰ ๋‘ ์ ์€ ์ง์„ ์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ์•ž์˜ ๊ทธ๋ฆผ์—์„œ ๋ณด์•˜๋‹ค์‹œํ”ผ ๋‘ ๊ฐœ์˜ ์ด์ƒ์ ์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ๋ผ๋Š” ๊ฒƒ์„ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค.

 


Reference:

"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, https://www.coursera.org/learn/bayesian-statistics/.