λ¨μ ν¬μμ‘ νκ·(Simple poisson regression) λͺ¨λΈμ λ€μκ³Ό κ°λ€.
β· κ°λ₯λλ ν¬μμ‘ λΆν¬λ‘ μ νκ³ , κ°λ₯λ λͺ¨μμ λ‘κ·Έλ₯Ό μ·¨ν κ²μ λνμ¬ λ 립λ³μμ μ νκ²°ν©μΌλ‘ μ μνλ€. μ΄λ, μμ μμμλ νλμ λ 립λ³μμ λν μ νκ²°ν©μΌλ‘ νννμμ§λ§, λ 립λ³μκ° μ¬λ¬κ°μ΄λ©΄ μ΄ λ 립λ³μλ€μ μ νκ²°ν©μ ν΅ν΄ κ°λ₯λ λͺ¨μμ λ‘κ·Έλ₯Ό μ·¨ν κ°μ λνμ¬ μ μνλ€.
β· ν¬μμ‘ νκ· λͺ¨λΈμ μμΈ‘μ κ°λ³ λ°μ΄ν°μ κ°λ₯λμ νκ· , μ¦, κ°λ₯λμ λͺ¨μλ₯Ό ν΅ν΄ μ΄λ£¨μ΄μ§λ€.
β· λ² μ΄μ§μ ν¬μμ‘ νκ·(Bayesian poisson regression)λ λ 립λ³μμ μ νκ²°ν©λ λͺ¨μ betaμ μ¬μ λΆν¬λ₯Ό μ μνλ€λ μ μμ ν¬μμ‘ νκ·μ μ°¨μ΄κ° μλ€.
Rμ COUNT ν¨ν€μ§μ badhealth λ°μ΄ν°λ₯Ό μ΄μ©νμ¬ λ² μ΄μ§μ ν¬μμ‘ νκ· λΆμμ μνν κ²μ΄λ€. μν κ³Όμ μ λ€μκ³Ό κ°λ€.
1. λ°μ΄ν° νμΈ
2. λͺ¨λΈλ§
3. λͺ¨λΈ νμΈ
4. λͺ¨λΈ μμΈ‘ λ° λΆμ
1. λ°μ΄ν° νμΈ
In:
library(COUNT)
data('badhealth')
pairs(badhealth)
Out:
β· numvisitκ³Ό ageλ μ°μν λ³μ, badhλ λ²μ£Όν λ³μλ‘ λ°μ΄ν°κ° ꡬμ±λμ΄ μλ€.
2. λͺ¨λΈλ§
numvisitμ μ’ μλ³μλ‘, ageμ badhλ₯Ό λ 립λ³μλ‘ λμ΄ λ² μ΄μ§μ ν¬μμ‘ νκ· λΆμμ ν κ²μ΄λ€.
In:
library(rjags)
mod_string = " model {
for (i in 1:length(numvisit)) {
numvisit[i] ~ dpois(lam[i])
log(lam[i]) = b0 + b_badh*badh[i] + b_age*age[i] + b_badh_age*age[i]*badh[i]
}
b0 ~ dnorm(0.0, 1.0/1e6)
b_badh ~ dnorm(0.0, 1.0/1e4)
b_age ~ dnorm(0.0, 1.0/1e4)
b_badh_age ~ dnorm(0.0, 1.0/1e4)
} "
data_jags = as.list(badhealth)
params = c('b0', 'b_badh', 'b_age', 'b_badh_age')
mod = jags.model(textConnection(mod_string),
data = data_jags,
n.chains = 3)
update(mod, 1e3)
mod_sim = coda.samples(model = mod,
variable.names = params,
n.iter = 5e3)
mod_comb_sim = as.mcmc(do.call(rbind, mod_sim))
Out:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 1127
Unobserved stochastic nodes: 4
Total graph size: 3665
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|**************************************************| 100%
|**************************************************| 100%
β· λ 립λ³μλ‘μ¨ b_badhμ b_ageμ κ΅νΈμμ©μ κ³ λ €νμ¬ κ΅μ°¨νμ μΆκ°νμλ€.
β· μ΄μΈμ μ½λμ λν λ΄μ©μ μλ΅νλλ‘ νκ² λ€. JAGSμ μ¬μ©λ²μ λν΄ μμλ³΄κ³ μ νλ€λ©΄, λ€μμ ν¬μ€ν μ μ°Έκ³ νκΈΈ λ°λλ€.
3. λͺ¨λΈ νμΈ
Gelman-Rubin diagosticμ ν΅ν΄ λ§λ₯΄μ½ν 체μΈμ΄ μλ ΄νμλμ§ νμΈνμ¬ λ³΄μ.
In:
gelman.diag(mod_sim)
Out:
Potential scale reduction factors:
Point est. Upper C.I.
b0 1.02 1.03
b_age 1.01 1.03
b_badh 1.01 1.02
b_badh_age 1.01 1.02
Multivariate psrf
1.01
β· λͺ¨λ λͺ¨μμ κ²°κ³Όκ° 1λ‘ μλ ΄νκ³ μλ€λ κ²μ νμΈν μ μλ€.
4. λͺ¨λΈ μμΈ‘ λ° λΆμ
μμ λͺ¨λΈμ ν΅ν΄ μμΈ‘μ νκ³ , μ€μ κ²°κ³Όμ λΉκ΅ν΄λ³΄μ.
In:
X = as.matrix(badhealth[, -1])
X = cbind(X, with(badhealth, badh*age))
post_coef = apply(mod_comb_sim, 2, median)
log_lam_hat = post_coef['b0'] + X %*% post_coef[c('b_badh', 'b_age', 'b_badh_age')]
lam_hat = exp(log_lam_hat)
plot(lam_hat, badhealth$numvisit)
Out:
β· λͺ¨λΈμ μμΈ‘κ²°κ³Όκ° λ°μ΄ν°μ λ³λμ μ μ€λͺ νμ§ λͺ»νλ κ²μ νμΈν μ μλ€. νμ§λ§ lamda hatμ΄ μ¦κ°ν μλ‘ μ€μ κ²°κ³Όλ μ΄λ μ λ μ¦κ°νλ κ²μ νμΈν μ μλ€.
β· ν¬μμ‘ λΆν¬μ νΉμ±μΌλ‘ νκ· κ³Ό λΆμ°μ΄ λͺ¨μλ‘ ννλλ€. μ¦, μμΈ‘κ°μΈ νκ· μ΄ ν¬λ©΄ ν΄μλ‘ λΆμ°λ μ¦κ°νκ² λκ³ , μμ κ²°κ³Όμμλ μ΄λ¬ν νΉμ±μ νμΈν μ μλ€.
βΆ μμ κ²½μ°μ κ°μ΄ ν¬μμ‘ νκ·κ° λ°μ΄ν°μ λ³λμ μ μ€λͺ νμ§ λͺ»ν λ, λμμΌλ‘ μμ΄ν νκ·λ₯Ό μ¬μ©ν μ μλ€.
λͺ¨λΈμ κ²°κ³Όμ λνμ¬ λΆμνμ¬λ³΄μ.
In:
summary(mod_sim)
Out:
Iterations = 2001:7000
Thinning interval = 1
Number of chains = 3
Sample size per chain = 5000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
b0 0.340722 0.082551 6.740e-04 0.0054368
b_age 0.008652 0.002115 1.727e-05 0.0001386
b_badh 1.566322 0.177987 1.453e-03 0.0127482
b_badh_age -0.010900 0.004126 3.369e-05 0.0003004
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
b0 0.178886 0.283849 0.340609 0.39707 0.498929
b_age 0.004571 0.007236 0.008664 0.01009 0.012775
b_badh 1.209814 1.445276 1.567056 1.68999 1.904278
b_badh_age -0.018683 -0.013763 -0.010927 -0.00807 -0.002601
β· b_ageμ b_badhμ μΆμ μΉμ νκ· κ³Ό λ°±λΆμμ λͺ¨λ μμ κ°μΈ κ²μ νμΈν μ μλ€. μ΄λ numvisitμ νκ· κ³Ό μ΄λ€ λ³μκ°μ κ΄κ³κ° μμ κ΄κ³λ₯Ό λνλΈλ€κ³ λ³Ό μ μλ€. μ¦, b_ageμ΄λ b_badhκ° μ¦κ°νλ€λ©΄ νκ· μ μΌλ‘ numvisitλ μ¦κ°νλ€κ³ ν΄μν μ μλ€.
β· b_ageμ b_badhμ κ΅μ°¨νμ μμμΈ κ²μΌλ‘ λνλ¬λ€. μ΄λ b_ageμ b_badhμ λν 보μ κ°μΌλ‘ ν΄μν μ μλ€.
X1(age: 35, badh: 0)κ³Ό X2(age: 35, badh: 1)μ μ¬ν μμΈ‘ λΆν¬λ₯Ό ꡬνκ³ , λμ λΉκ΅νμ¬ λ³΄μ.
In:
x_1 = c(0, 35, 0)
x_2 = c(1, 35, 35)
log_lam_1 = mod_comb_sim[, 'b0'] + mod_comb_sim[, c('b_badh', 'b_age', 'b_badh_age')] %*% x_1
log_lam_2 = mod_comb_sim[, 'b0'] + mod_comb_sim[, c('b_badh', 'b_age', 'b_badh_age')] %*% x_2
lam_1 = exp(log_lam_1)
lam_2 = exp(log_lam_2)
n_sim = length(log_lam_1)
y_1 = rpois(n = n_sim, lambda = lam_1)
y_2 = rpois(n = n_sim, lambda = lam_2)
plot(table(factor(y_1, levels = 0:18))/n_sim, pch = 2, ylab = 'Posterior Prob.', xlab = 'Num. of Visits')
points(table(y_2 + 0.1)/n_sim, col = 'red')
Out:
β· κ²μ μμ λΆν¬λ X1, λΉ¨κ°μμ λΆν¬λ X2μ μ¬νμμΈ‘λΆν¬λ₯Ό μλ―Ένλ€. λ μ¬νμμΈ‘λΆν¬μμ 보λ€μνΌ X2μ μ¬νμμΈ‘νκ· μ΄ X1μ μ¬νμμΈ‘νκ· λ³΄λ€ ν° κ²μ μ μ μλ€.
X2μ numvisitμ΄ X1μ numvisitλ³΄λ€ ν΄ νλ₯ μ ꡬνμ¬ λ³΄μ.
In:
mean(y_2 > y_1)
Out:
[1] 0.9157333
β· μ¬νμμΈ‘λΆν¬λ‘λΆν° X2μ numvisitμ΄ X1μ numvisitλ³΄λ€ ν΄ νλ₯ μ μ½ 91%μΈ κ²μΌλ‘ λνλ¬λ€.
Reference:
"Bayesian Statistics: From Concept to Data AnalysisTechniques and Models," Coursera, https://www.coursera.org/learn/bayesian-statistics/.
'Statistics > Bayesian Statistics' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
Bayesian linear model for New York air quality measurements (0) | 2020.09.01 |
---|---|
κ³μΈ΅μ λͺ¨λΈ(Hierarchical model) (0) | 2020.08.22 |
λ² μ΄μ§μ λ‘μ§μ€ν± νκ·(Bayesian logistic regression) (0) | 2020.08.19 |
DIC(Deviance Information Criterion) (0) | 2020.08.18 |
λ² μ΄μ§μ μ ν νκ·(Bayesian linear regression) (0) | 2020.08.17 |