๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Statistics/Bayesian Statistics

์‹ ์šฉ๊ตฌ๊ฐ„(Credible interval)

์‹ ์šฉ๊ตฌ๊ฐ„(Credible interval)์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. ๋‹ค๋ฃฐ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ์‹ ์šฉ๊ตฌ๊ฐ„์˜ ์ •์˜

2. ์‹ ์šฉ๊ตฌ๊ฐ„์˜ ์˜ˆ์ œ

 

1. ์‹ ์šฉ๊ตฌ๊ฐ„์˜ ์ •์˜

 

์‹ ์šฉ๊ตฌ๊ฐ„์˜ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

โ–ท ๋นˆ๋„์ฃผ์˜(Frequentist) ๊ด€์ ์—์„œ๋Š” ๋ชจ์ˆ˜๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ ๋ขฐ๊ตฌ๊ฐ„(Confidence interval)์— ๋Œ€ํ•œ ํ•ด์„์ด ์šฐ๋ฆฌ์˜ ์ง๊ด€๊ณผ ๋งž์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์‹ ์šฉ๊ตฌ๊ฐ„์€ ๋ชจ์ˆ˜์— ๋Œ€ํ•œ ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ ์šฉ๊ตฌ๊ฐ„์˜ ํ•ด์„์ด ์šฐ๋ฆฌ์˜ ์ง๊ด€๊ณผ ์ผ์น˜ํ•œ๋‹ค. ์ฆ‰, ๋ชจ์ˆ˜๊ฐ€ ํ•ด๋‹น ์‹ ์šฉ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด ์กด์žฌํ•  ํ™•๋ฅ ์— ๋Œ€ํ•œ ํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

2. ์‹ ์šฉ๊ตฌ๊ฐ„์˜ ์˜ˆ์ œ

 

๋ฌธ์ œ)

 

๋™์ „์˜ ์•ž๋ฉด์ด ๋‚˜์˜ฌ ํ™•๋ฅ ์ด ๊ท ์ผ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ณ , ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜๋Š” ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ์„ ๋”ฐ๋ฅธ๋‹ค. ์ด ๋•Œ, ๋™์ „์„ ๋˜์กŒ๋”๋‹ˆ ์•ž๋ฉด์ด ๋‚˜์™”๋‹ค. ์ด ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌํ›„๋ถ„ํฌ์˜ ์‹ ์šฉ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜์—ฌ๋ผ.

 

ํ’€์ด)

 

 

โ–ท ์œ„์˜ ๊ณผ์ •์„ ํ†ตํ•ด ์‚ฌํ›„๋ถ„ํฌ๋ฅผ ๊ตฌํ•œ ํ›„, 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ์‹ ์šฉ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ETI(Equal Tailed Interval)์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 95% ์‹ ์šฉ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜๊ณ ์ž ํ•  ๊ฒฝ์šฐ, ์–‘ ๋์˜ 2.5%์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์„ ์‹ ์šฉ๊ตฌ๊ฐ„์œผ๋กœ ์ •ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ HPDI(Highest Posterior Density Interval)์ด๋‹ค.  HPDI๋Š” ํ™•๋ฅ  ๋ฐ€๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์ง€์—ญ๋ถ€ํ„ฐ ์„ ํƒํ•˜์—ฌ ์ฐพ๋Š” ๊ฐ€์žฅ ์งง์€ ์‹ ์šฉ๊ตฌ๊ฐ„์ด๋‹ค.

 

โ–ท ์œ„์˜ ๊ทธ๋ฆผ์€ ๋‘ ์‹ ์šฉ๊ตฌ๊ฐ„์„ ์‹œ๊ฐํ™”ํ•˜์˜€๋‹ค. 95% ETI๋Š” (0.158, 0.987), HPDI๋Š” (0.224, 1)์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ด ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๊ฐ ์‹ ์šฉ๊ตฌ๊ฐ„์— ๋ชจ์ˆ˜๊ฐ€ ํฌํ•จ๋  ํ™•๋ฅ ์€ 95%๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

<์‹ ์šฉ๊ตฌ๊ฐ„ ์‹œ๊ฐํ™” R ์ฝ”๋“œ>

x <- seq(from = 0, to = 1, length.out = 100)
y <- 2*x

plot(x, y, main = 'Equal Tailed Intervals: (0.158, 0.987)', 
     xlab = expression(theta), ylab = expression('f(' ~ theta ~'|y)'), 
     type = 'l')

abline(h = 0)

region_x <- x[0.158 <= x & x <= 0.987]
region_y <- y[0.158 <= x & x <= 0.987]

region_x <- c(region_x[1], region_x, region_x[length(region_x)])
region_y <- c(0, region_y, 0)

polygon(region_x, region_y, density = 10, col = 'blue')

plot(x, y, main = 'HPD Intervals: (0.224, 1)', 
     xlab = expression(theta), ylab = expression('f(' ~ theta ~'|y)'), 
     type = 'l')

abline(h = 0)

region_x <- x[0.224 <= x]
region_y <- y[0.224 <= x]

region_x <- c(region_x[1], region_x, region_x[length(region_x)])
region_y <- c(0, region_y, 0)

polygon(region_x, region_y, density = 10, col = 'red')

 


Reference:

"์‹ ์šฉ๊ตฌ๊ฐ„๊ณผ ์‹ ๋ขฐ๊ตฌ๊ฐ„์˜ ์ฐจ์ด Credible Interval vs Confidence Interval," ์ƒ์ƒˆ์šฐ์ดˆ๋ฐฅ์ง‘, https://freshrimpsushi.tistory.com/752.

"Bayesian Statistics: From Concept to Data Analysis," Coursera, https://www.coursera.org/learn/bayesian-statistics/.