๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ „์ฒด ๊ธ€

(80)
ํ˜‘์—… ํ•„ํ„ฐ๋ง(Collaborative Filtering)์„ ์ด์šฉํ•œ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์ถ”์ฒœ ์œ ์ €์˜ ์• ๋‹ˆ๋ฉ”์ด์…˜ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์•ˆ ๋ณธ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์ค‘ ์–ด๋–ค ๊ฒƒ์„ ์ถ”์ฒœํ•  ์ง€์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค. ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ฐ์ดํ„ฐ๋Š” ์—ฌ๊ธฐ(www.kaggle.com/CooperUnion/anime-recommendations-database)์—์„œ ์–ป์—ˆ๋‹ค. ํ˜‘์—… ํ•„ํ„ฐ๋ง(Collaborative filtering)์˜ ๋Œ€ํ‘œ์ ์ธ 3๊ฐ€์ง€ ๋ฐฉ์‹์„ R๋กœ ์ง์ ‘ ๊ตฌํ˜„ํ•˜๊ณ , ์ด๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”์ฒœํ•˜์—ฌ ๋ณด์ž. ์ด๋ฅผ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ์•„๋ž˜์˜ Reference์˜ ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์˜€๋‹ค. 1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ 2. ํ˜‘์—… ํ•„ํ„ฐ๋ง 2-1. ์œ ์ € ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง 2-2. ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง 2-3. ํ–‰๋ ฌ ์ธ์ˆ˜๋ถ„ํ•ด ํ˜‘์—… ํ•„ํ„ฐ๋ง 3. ์„ฑ๋Šฅ ๋น„๊ต 4. ์ถ”์ฒœ ๊ฒฐ๊ณผ 1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ In: library(dplyr) library(tidyr) ..
[Level 2] ์กฐ์ด์Šคํ‹ฑ ์ฝ”๋”ฉํ…Œ์ŠคํŠธ ์—ฐ์Šต - ์กฐ์ด์Šคํ‹ฑ ์กฐ์ด์Šคํ‹ฑ์œผ๋กœ ์•ŒํŒŒ๋ฒณ ์ด๋ฆ„์„ ์™„์„ฑํ•˜์„ธ์š”. ๋งจ ์ฒ˜์Œ์—” A๋กœ๋งŒ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ex) ์™„์„ฑํ•ด์•ผ ํ•˜๋Š” ์ด๋ฆ„์ด ์„ธ ๊ธ€์ž๋ฉด AAA, ๋„ค ๊ธ€์ž๋ฉด AAAA ์กฐ์ด์Šคํ‹ฑ์„ ๊ฐ ๋ฐฉํ–ฅ์œผ๋กœ ์›€์ง์ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. โ–ฒ - ๋‹ค programmers.co.kr def cnt_up_down(chr): cnt_from_a = abs(ord('A')-ord(chr)) cnt_from_z = abs(ord('Z')-ord(chr)+1) return(min(cnt_from_a, cnt_from_z)) def solution(name): cnt = 0 for i in name: cnt += cnt_up_down(i) lst_name = list(name) len_name = len(lst_name) if ls..
[Level 2] ํฐ ์ˆ˜ ๋งŒ๋“ค๊ธฐ ์ฝ”๋”ฉํ…Œ์ŠคํŠธ ์—ฐ์Šต - ํฐ ์ˆ˜ ๋งŒ๋“ค๊ธฐ programmers.co.kr def solution(num, k): max_num = [] for i, n in enumerate(num): while len(max_num) > 0 and max_num[-1] 0: max_num.pop() k -= 1 if k == 0: max_num += list(num[i:]) break max_num.append(n) max_num = max_num[:-k] if k > 0 else max_num return(''.join(max_num)) โ–ท ์ด ๋ฌธ์ œ๋Š” ๊ทธ๋ฆฌ๋”” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ํ‘ธ๋Š” ๋ฌธ์ œ๋กœ, ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด์˜ ๋‘ ๋ฒˆ์งธ ์ˆซ์ž๋ถ€ํ„ฐ ์ด์ „์˜ ์ˆซ์ž์™€ ๋น„๊ต๋ฅผ ํ†ตํ•ด ๊ฐ€์žฅ ํฐ ์ˆซ์ž๋ฅผ ๋งŒ๋“ค์–ด ๋‚˜๊ฐ€์•ผ ํ•œ๋‹ค. โ–ท ์œ„ ์ฝ”๋“œ์˜..
๊ธฐ๋ณธ ํ•จ์ˆ˜ ์‚ฌ์šฉ๋ฒ• R์˜ ๋‚ด์žฅ๋œ ๊ธฐ๋ณธ ํ•จ์ˆ˜์˜ ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž. โ–ก if ( ) { } else if ( ) { } else { } In: char = 'A' if (char == 'B') { print('if') } else if (char == 'C') { print('else if') } else { print('else') } Out: [1] "else" โ–ก for ( ) { } In: for (i in 1:5) { print(i) } Out: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 โ–ก ifelse In: char = 'A' ifelse(char == 'A', 'char is A', 'char is not A') Out: [1] "char is A" โ–ท ifelse ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋Š” ์กฐ๊ฑด์„ ..
๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ(Bayesian network)๋ฅผ ํ™œ์šฉํ•œ King County์˜ ์ง‘๊ฐ’ ๋ถ„์„ ๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ(Bayesian network)๋ฅผ ํ™œ์šฉํ•˜์—ฌ King County์˜ ์ง‘๊ฐ’์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋‹ค์–‘ํ•œ ์š”์†Œ์˜ ์ธ๊ณผ๊ด€๊ณ„๋ฅผ ํ™•์ธํ•˜๊ณ , ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์ด ์ด ํ”„๋กœ์ ํŠธ์˜ ๋ชฉ์ ์ด๋‹ค. ๋ฐ์ดํ„ฐ์˜ ์ถœ์ฒ˜๋Š” ์—ฌ๊ธฐ(www.kaggle.com/harlfoxem/housesalesprediction)์ด๊ณ , ์•„๋ž˜์˜ ๊ตฌ์„ฑ ์ˆœ์„œ๋Œ€๋กœ ๋ถ„์„ ๋ฐ ๋ชจ๋ธ๋ง ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ด๋‹ค. ๋ชจ๋“  ์ฝ”๋“œ๋Š” R๋กœ ์ž‘์„ฑ๋˜์—ˆ๋‹ค. 1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ 2. ์‹œ๊ฐํ™” ๋ฐ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„ 3. ๋‹ค์ค‘ํšŒ๊ท€๋ถ„์„ 4. ๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ๋ง 1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ In: # Statistic library(car) # Data manipulation library(dplyr) library(tidyr) # Visualization library(ggplot2) librar..
caret ์‚ฌ์šฉ๋ฒ• caret ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์˜ ์ ์šฉ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด์ž. In: df_iris = iris str(df_iris) Out: 'data.frame':150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels ..
dplyr ์‚ฌ์šฉ๋ฒ• dplyr์˜ ๋Œ€ํ‘œ์ ์ธ ํ•จ์ˆ˜ select, filter, mutate, summarise, group_by, sample_n, sample_frac์˜ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ณ  ์ ์šฉํ•˜์—ฌ ๋ณด์ž. In: library(dplyr) df_iris = iris str(df_iris) Out: Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3...
๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ• ๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•˜์—ฌ ์•Œ์•„๋ณด์ž. โ–ก ๊ฒฐ์ธก์น˜ ํ™•์ธ In: library(MASS) df_car = Cars93 df_car %>% sapply(function(x) sum(is.na(x))) Out: Manufacturer Model Type 0 0 0 Min.Price Price Max.Price 0 0 0 MPG.city MPG.highway AirBags 0 0 0 DriveTrain Cylinders EngineSize 0 0 0 Horsepower RPM Rev.per.mile 0 0 0 Man.trans.avail Fuel.tank.capacity Passengers 0 0 0 Length Wheelbase Width 0 0 0 Turn.circle Rear.seat.room Luggag..
CSV ํŒŒ์ผ ์ฝ๊ธฐ/์“ฐ๊ธฐ CSV ํŒŒ์ผ์„ ์ƒ์„ฑํ•œ ๋’ค, ์ด๋ฅผ ํŠน์ • ๊ฒฝ๋กœ์— ์“ฐ๊ณ  ์ฝ๊ธฐ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ณด์ž. In: df_car = mtcars df_car %>% head() Out: mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Va..
๋ฌธ์ž์—ด ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ• R์˜ ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ํ•จ์ˆ˜์™€ stringr ํŒจํ‚ค์ง€๋ฅผ ํ™œํ•˜์—ฌ ๋ฌธ์ž์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. โ–ก ๋ฌธ์ž ์ด์–ด ๋ถ™์ด๊ธฐ In: paste('Rooney', 'Song', sep = '_') paste0('Rooney', 'Song') str_c('Rooney', 'Song', sep = '_') Out: [1] "Rooney_Song" [1] "RooneySong" [1] "Rooney_Song" โ–ท paste, paste0 ํ•จ์ˆ˜๋Š” ๋‘ ๋ฌธ์ž์—ด์„ ๋ถ™์—ฌ์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹ค. ๋‘ ํ•จ์ˆ˜์˜ ์ฐจ์ด์ ์€ paste0 ํ•จ์ˆ˜๋Š” ๋ถ™์ผ ๋•Œ, ์‚ฌ์ด์— ๋ฌธ์ž๋ฅผ ์‚ฝ์ž…ํ•˜์ง€ ์•Š๊ณ , ๋ฐ”๋กœ ๋ถ™์ธ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. paste ํ•จ์ˆ˜์˜ sep ์ธ์ž๋ฅผ ํ†ตํ•ด ๋‘ ๋ฌธ์ž์—ด์„ ๋ถ™์ผ ๋•Œ, ์‚ฌ์ด์— ๋“ค์–ด๊ฐˆ ๋ฌธ์ž๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. โ–ท str_c ํ•จ์ˆ˜๋Š” st..
์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ• R์˜ ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ํ•จ์ˆ˜์™€ lubridate ํŒจํ‚ค์ง€๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค. โ–ก ๊ธฐ๋ณธ ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ In: date_form = c('%Y%m%d', '%Y.%m.%d', '%Y~%m~%d', '%Y-%m-%d') date_1 = as.Date('20201019', tryFormats = date_form) date_2 = as.Date('2020.10.19', tryFormats = date_form) date_3 = as.Date('2020~10~19', tryFormats = date_form) print(date_1) print(date_2) print(date_3) print(class(date_1)) print(class(date_2)..
tidyr ์‚ฌ์šฉ๋ฒ• tidyr ํŒจ์บ์ง€์˜ ๋Œ€ํ‘œ์ ์ธ ํ•จ์ˆ˜ gather, spread, seperate, unite์˜ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ณ  ์ ์šฉํ•˜์—ฌ ๋ณด์ž. In: library(dplyr) library(tidyr) df_iris = iris df_iris$id = 1:nrow(df_iris) df_iris = df_iris[, c(6, 1:5)] str(df_iris) Out: 'data.frame':150 obs. of 6 variables: $ id : int 1 2 3 4 5 6 7 8 9 10 ... $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ ..