๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Programming/Python

Selenium๊ณผ BeautifulSoup๋ฅผ ํ™œ์šฉํ•œ ํฌ๋กค๋ง

BeautifulSoup์™€ Selenium์„ ์ด์šฉํ•˜์—ฌ ํ”ผํŒŒ ์˜จ๋ผ์ธ์˜ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์—์„œ 5์›”๋ถ€ํ„ฐ 10์›”๊นŒ์ง€์˜ ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ์ด์šฉ์ž ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋ง ํ•œ ํ›„, ํ”ผํŒŒ ์˜จ๋ผ์ธ์˜ ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ์„ ํ˜ธ๋„๋ฅผ ํ™•์ธํ•  ๊ฒƒ์ด๋‹ค. ์ •์  ํฌ๋กค๋ง๋งŒ์œผ๋กœ ํ•ด๋‹น ์‚ฌ์ดํŠธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ •์  ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜๋Š” BeautifulSoup์™€ ๋™์  ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜๋Š” Selenium์„ ํ•จ๊ป˜ ํ™œ์šฉํ•˜์—ฌ ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ˆ˜ํ–‰ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๊ณผ์ •

2. ํฌ๋กค๋ง

3. ์‹œ๊ฐํ™”

 

1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๊ณผ์ •

 

ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ์ด์šฉ์ž ์ˆ˜๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ”ผํŒŒ ์˜จ๋ผ์ธ์˜ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ(fifaonline4.nexon.com/datacenter/dailysquad)์— ์ ‘์†ํ•œ ํ›„, ๋‹ค์Œ ๊ณผ์ •์„ ํ†ตํ•ด ์ˆ˜ํ–‰ํ•˜์—ฌ์•ผ ํ•œ๋‹ค.

 

(1) ๋‚ ์งœ ์„ ํƒ

 

 

โ–ท ๋นจ๊ฐ„ ์ ์„  ํ…Œ๋‘๋ฆฌ ๋ถ€๋ถ„์„ ์„ ํƒํ•˜์—ฌ ๋‚ ์งœ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

โ–ท ๋‚ ์งœ๋ฅผ ๋ณ€๊ฒฝํ•  ๊ฒฝ์šฐ, ์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋นจ๊ฐ„ ์ ์„  ํ…Œ๋‘๋ฆฌ ๋ถ€๋ถ„ URL์ด ๋ฐ”๋€Œ๋Š”๋ฐ, strDate์— ํ• ๋‹น๋œ ๋‚ ์งœ๋ฅผ ๋ฐ”๊พธ์–ด ๋‚ ์งœ๋ณ„ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ๊ฒƒ์ด๋‹ค.

 

(2) ํฌ์ง€์…˜ ์„ ํƒ

 

 

โ–ท ๋นจ๊ฐ„ ์ ์„  ํ…Œ๋‘๋ฆฌ ๋ถ€๋ถ„์— ํ•ด๋‹น๋œ ํƒญ์„ ์„ ํƒํ•˜์—ฌ, ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ์ •๋ณด ๋ฐ ์ด์šฉ์ž ์ˆ˜์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํ”„๋กœ์„ธ์Šค๋ฅผ ์š”์•ฝํ•˜๋ฉด, ๋‚ ์งœ๋ฅผ ์„ ํƒํ•˜๊ณ , ํฌ์ง€์…˜์„ ์„ ํƒํ•œ ๋’ค, ์„ ์ˆ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ํฌ์ง€์…˜ ํƒญ์„ ์„ ํƒํ•  ์‹œ, URL์€ ๋ฐ”๋€Œ์ง€ ์•Š๊ณ , ์ž์ฒด์ ์œผ๋กœ ์‚ฌ์ดํŠธ๊ฐ€ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ”๋€Œ๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋™์  ํฌ๋กค๋ง์ด ํ•„์ˆ˜์ ์ด๋‹ค.

 

2. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

 

In:

import pandas as pd
import time
import datetime
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

 

โ–ท ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ํฌ๋กค๋ง ๊ฒฐ๊ณผ๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•œ pandas, ์‹œ๊ฐ„ ํƒ€์ž…์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ time๊ณผ datetime, ์ •์  ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ bs4, ๋™์  ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด selenium์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

In:

def get_player_pref(bs, date):
    sel_bs = bs.select('#divBestUsePositionPlayer')
    player_pref = sel_bs[0].select('div.item_list.swiper-slide')
    
    name, rank, overall, position,  price, pay, num_user, season = list(), list(), list(), list(), list(), list(), list(), list()
    
    order = 1
    
    for p in player_pref:
        rank.append(order)
        name.append(p.find('span', {'class', 'name'}).text)
        overall.append(p.find('span', {'class', 'ovr'}).text)
        position.append(p.find('span', {'class', 'position'}).text.strip())
        price.append(p.find('span', {'class', 'price span_bp1'}).text)
        pay.append(p.find('span', {'class', 'pay'}).text)
        num_user.append(p.find('p', {'class', 'txt'}).text)
    
        temp = str(p.select('span.icon')[0].img)
        temp = temp.replace('<img src="http://s.nx.com/s2/game/fo4/obt/externalAssets/season/', '')
        temp = temp.replace('.png"/>', '')
        season.append(temp)
    
        order += 1
    
    tap = bs.select_one('#middle > div > div > div.daily > div.position_pl.daily_wrap > div.position_pl__cont.content_wrap > div.position_tab > div.tab_item.active > a').text
    tap = [tap for _ in range(len(name))]
    
    date = [date for _ in range(len(name))]
    
    df_player_pref = pd.DataFrame(tap, columns = ['tap'])
    df_player_pref['date'] = date; df_player_pref['name'] = name; df_player_pref['rank'] = rank; df_player_pref['overall'] = overall; df_player_pref['position'] = position; df_player_pref['price'] = price; df_player_pref['pay'] = pay; df_player_pref['num_user'] = num_user; df_player_pref['season'] = season
    
    return df_player_pref

 

โ–ท get_player_pref ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋Š” URL์˜ ๋‚ด์šฉ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ , ๋‘ ๋ฒˆ์งธ ์ธ์ž๋Š” ํ•ด๋‹น URL์˜ ๋‚ ์งœ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ํ•ด๋‹น ๋‚ ์งœ์— ๋Œ€ํ•œ URL ์ •๋ณด๋ฅผ ๋ฐ›์•„์™€, ํŠน์ • ํฌ์ง€์…˜์˜ ์„ ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์  ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์–ป์€ ๋’ค, ์ด๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€ํ™˜ ํ›„, ์ถœ๋ ฅํ•œ๋‹ค.

 

In:

firefox_driver = r'C:/firefoxdriver/geckodriver.exe'
driver = webdriver.Firefox(executable_path = firefox_driver)

col = ['tap', 'date', 'name', 'rank', 'overall', 'position', 'price', 'pay', 'num_user', 'season']
df_player_pref = pd.DataFrame(columns = col)

time_unit = datetime.timedelta(days = 1)

end_date = datetime.date.today()
start_date = datetime.datetime.strptime('2020-05-20', '%Y-%m-%d').date()

for i in range((end_date-start_date).days):
    str_date = str(start_date+i*time_unit)
    str_date = str_date.replace('-', '.')
    
    url = 'http://fifaonline4.nexon.com/datacenter/dailysquad?strDate={}'.format(str_date)
    driver.get(url)
    
    time.sleep(5)
    
    for j in range(1, 17):
        pos_xpath = '//*[@id="middle"]/div/div/div[3]/div[7]/div[2]/div[1]/div[{}]/a'.format(j)
        
        wait = WebDriverWait(driver, 100)
        element = wait.until(EC.presence_of_element_located((By.XPATH, pos_xpath)))
        
        pos_tab = element.find_element_by_xpath(pos_xpath)
        
        wait = WebDriverWait(driver, 100)
        element = wait.until(EC.element_to_be_clickable((By.XPATH, pos_xpath)))
        
        driver.execute_script('arguments[0].click();', element)
        
        time.sleep(2.5)
        
        html = driver.page_source
        bs = BeautifulSoup(html, 'lxml')
        df_player_pref = pd.concat([df_player_pref, get_player_pref(bs, str_date)])

 

โ–ท ๋™์  ํฌ๋กค๋ง์— ์‚ฌ์šฉ๋˜๋Š” ๋ธŒ๋ผ์šฐ์ €๋กœ Firefox๋ฅผ ์„ ํƒํ•˜์˜€๋Š”๋ฐ, ์ด๋Š” Chrome๋ณด๋‹ค ์—๋Ÿฌ๊ฐ€ ๋œ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. Chrome ๋“œ๋ผ์ด๋ฒ„๋Š” ํŠน์ • ์š”์†Œ๋ฅผ ์„ ํƒํ•  ๋•Œ, ํ•ญ์ƒ ์š”์†Œ์˜ ๊ฐ€์šด๋ฐ๋งŒ ์„ ํƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ๋Œ€์น˜ ๋ชปํ•œ ์š”์†Œ์˜ ์œ„์น˜์— ๋”ฐ๋ผ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

โ–ท ํฌ๋กค๋ง์„ ์ˆ˜ํ–‰ํ•  ๋•Œ, time.sleep ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹œ๊ฐ„์ ์ธ ์—ฌ์œ ๋ฅผ ์ฃผ์—ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ด์œ ๋Š” ๋ณ€๊ฒฝ๋œ ํŽ˜์ด๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋”ฉํ•˜๋Š”๋ฐ ์–ด๋Š ์ •๋„ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ์ด์œ ๋Š” ์‹œ๊ฐ„์ ์ธ ์—ฌ์œ ๋ฅผ ์ฃผ์ง€ ์•Š๋Š”๋‹ค๋ฉด ์„œ๋ฒ„์—์„œ DDoS ๊ณต๊ฒฉ์œผ๋กœ ๊ฐ„์ฃผํ•˜๊ณ , ์ ‘์†์„ ๋ง‰๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

โ–ท WebDriverWair ํ•จ์ˆ˜๋Š” ํฌ๋กค๋ง ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ๋•Œ, ํ•ด๋‹น ์ž‘์—…์ด ์ง„ํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์ „ ์ค€๋น„๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ค์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋“œ๋ผ์ด๋ฒ„, ๋‘ ๋ฒˆ์งธ ์ธ์ž๋Š” ํ•ด๋‹น ์ž‘์—…์ด ์ˆ˜ํ–‰๋˜์ง€ ์•Š์•˜์„ ๋•Œ, ๊ธฐ๋‹ค๋ฆด ์ˆ˜ ์žˆ๋Š” ์‹œ๊ฐ„์„ ์˜๋ฏธํ•œ๋‹ค. util ํ•จ์ˆ˜์˜ ์ธ์ž๋กœ ์–ด๋–ค ์ž‘์—…์˜ ์ˆ˜ํ–‰์„ ๊ธฐ๋‹ค๋ฆด ๊ฒƒ์ธ์ง€์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์•Œ๋ ค์ฃผ์–ด์•ผ ํ•œ๋‹ค.

 

โ–ท ํฌ์ง€์…˜ ํƒญ์„ ํด๋ฆญํ•˜๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด execute_script ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ์งธ ์ธ์ž๋Š” ์ˆ˜ํ–‰ํ•˜๊ณ  ์žํ•˜๋Š” ๋™์  ์ด๋ฒคํŠธ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ, ๋‘ ๋ฒˆ์งธ ์ธ์ž๋Š” ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

 

โ–ท page_source ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋™์  ์ด๋ฒคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„์˜ ํŽ˜์ด์ง€๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋ฅผ BeautifulSoup ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํŽ˜์ด์ง€์˜ ๋‚ด์šฉ์„ ์–ป์€ ๋’ค, get_player_pref ํ•จ์ˆ˜์— ์ธ์ž๋กœ ์ฃผ์–ด, ํŠน์ • ํฌ์ง€์…˜์˜ ์„ ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ์–ป๋Š”๋‹ค. ์–ป์€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ concat ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ df_player_pref ๋ณ€์ˆ˜์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ•์ ํ•˜๊ฒŒ ๋œ๋‹ค.

 

โ–ท ์ด์ค‘ for ๋ฌธ์˜ ์ฒซ ๋ฒˆ์งธ for ๋ฌธ์—์„œ๋Š” ๋‚ ์งœ๋ฅผ ์„ ํƒํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ for ๋ฌธ์—์„œ๋Š” ํฌ์ง€์…˜์„ ์„ ํƒํ•˜์—ฌ, 5์›” 20์ผ๋ถ€ํ„ฐ ํ˜„์žฌ๊นŒ์ง€์˜ ๋ชจ๋“  ํฌ์ง€์…˜์˜ ์„ ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

In:

df_player_pref.to_csv('../input/player_preference.csv', encoding = 'utf-8-sig')

 

โ–ท ํฌ๋กค๋ง ์ž‘์—…์ด ๋๋‚œ ํ›„, ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ csv ํ˜•ํƒœ๋กœ ์ €์žฅํ•œ๋‹ค.

 

์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„์˜ ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

 

4. ์‹œ๊ฐํ™”

 

์•ฝ 5๊ฐœ์›”๋™์•ˆ ์œ ์ €์˜ ์„ ์ˆ˜๋ณ„ ์ด์šฉ์ž ์ˆ˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ํฌ์ง€์…˜๋ณ„ ์„ ์ˆ˜์˜ ์„ ํ˜ธ๋„๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋ณด์ž. ์‹œ๊ฐํ™”๋Š” R์„ ์ด์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค.

 

In:

#####################
# Library & Setting #
#####################

library(readr)
library(ggplot2)
library(scales)
library(dplyr)
library(forcats)
library(stringr)
library(tidytext)
library(showtext)

font_add_google('Nanum Gothic', 'Gothic')
showtext_auto()

theme_set(theme_minimal() + 
            theme(plot.title = element_text(face = 'bold', colour = 'grey10'), 
                  plot.subtitle = element_text(colour = 'grey25'), 
                  panel.grid.major = element_line(colour = 'grey90', size = 1), 
                  panel.grid.minor = element_line(colour = 'grey80', size = 0.5, linetype = 'dashed'), 
                  legend.position = 'top', 
                  legend.spacing.x = unit(0.125, 'cm'), 
                  legend.background = element_rect(fill = NULL, linetype = 'dotted'), 
                  strip.background = element_blank(), 
                  strip.text = element_text(face = 'bold', colour = 'grey25', size = 11.25)))

################################
# Data Loading & Preprocessing #
################################

df_player = read_csv('C:/Users/user/Desktop/Project/Player_Preference_Analysis/Data/player_preference.csv') %>% 
  select(-X1)

df_player$price = gsub(',', '', df_player$price)
df_player$price = gsub(' ', '', df_player$price)
df_player$price = as.numeric(gsub('BP', '', df_player$price))

df_player$num_user = gsub(',', '', df_player$num_user)

df_player$num_user = lapply(str_extract_all(df_player$num_user, '[0-9]+'), function(x) x[1]) %>% 
  unlist() %>% 
  as.numeric()

df_player$date = as.Date(df_player$date, '%Y.%m.%d')

#######
# EDA #
#######

# player preference

df_temp = df_player %>% 
  group_by(tap, name, season) %>% 
  summarise(total_num_user = sum(num_user)) %>% 
  arrange(tap, -total_num_user)

length(unique(df_player$position))

FWD = c('CF', 'LW', 'RW', 'ST')
MID = c('CAM', 'CDM', 'CM', 'LM', 'RM')
DEF = c('CB', 'LB', 'LWB', 'RB', 'RWB')

df_temp$position = ifelse(df_temp$tap %in% FWD, 'FWD', 
                          ifelse(df_temp$tap %in% MID, 'MID', 
                                 ifelse(df_temp$tap %in% DEF, 'DEF', 
                                        ifelse(df_temp$tap %in% 'GK', 'GK', 'Total'))))

df_temp$rank = 0

tap = unique(df_temp$tap)

for (i in tap) {
  df_temp[df_temp$tap == i, 'rank'] = rank(-df_temp$total_num_user[df_temp$tap == i])
}

df_temp$season_name = paste(df_temp$name, '/', df_temp$season)

df_temp %>% 
  filter(rank <= 10, position == 'FWD') %>% 
  mutate(season_name = reorder_within(season_name, total_num_user, tap)) %>% 
  ggplot(aes(fct_reorder(season_name, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'red', fill = 'red', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  facet_wrap(~ tap, scales = 'free') + 
  coord_flip() + 
  scale_x_reordered() + 
  labs(x = NULL, y = 'Num. of Users')

df_temp %>% 
  filter(rank <= 10, position == 'MID') %>% 
  mutate(season_name = reorder_within(season_name, total_num_user, tap)) %>% 
  ggplot(aes(fct_reorder(season_name, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'green', fill = 'green', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  facet_wrap(~ tap, scales = 'free') + 
  coord_flip() + 
  scale_x_reordered() + 
  labs(x = NULL, y = 'Num. of Users') + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

df_temp %>% 
  filter(rank <= 10, position == 'DEF') %>% 
  mutate(season_name = reorder_within(season_name, total_num_user, tap)) %>% 
  ggplot(aes(fct_reorder(season_name, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'blue', fill = 'blue', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  facet_wrap(~ tap, scales = 'free') + 
  coord_flip() + 
  scale_x_reordered() + 
  labs(x = NULL, y = 'Num. of Users') + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

df_temp %>% 
  filter(rank <= 10, position == 'GK') %>% 
  ggplot(aes(fct_reorder(season_name, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'black', fill = 'black', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  facet_wrap(~ tap, scales = 'free') + 
  coord_flip() + 
  scale_x_reordered() + 
  labs(x = NULL, y = 'Num. of Users')

df_temp %>% 
  filter(rank <= 10, position == 'Total') %>% 
  ungroup() %>% 
  mutate(tap = 'Total') %>% 
  mutate(season_name = reorder_within(season_name, total_num_user, tap)) %>% 
  ggplot(aes(fct_reorder(season_name, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'grey', fill = 'grey', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  facet_wrap(~ tap, scales = 'free') + 
  coord_flip() + 
  scale_x_reordered() + 
  labs(x = NULL, y = 'Num. of Users')

 

Out:

 

โ–ท ์œ ์ €์˜ ์ˆ˜๋Š” ๊ฐ ์„ ์ˆ˜๋ณ„ ์œ ์ € ์ˆ˜์˜ ์ด ํ•ฉ์„ ์˜๋ฏธํ•œ๋‹ค.

 

โ–ท ์†ํฅ๋ฏผ ์„ ์ˆ˜๊ฐ€ CF, LW, RW, LM, RM์—์„œ ์ƒ์œ„๊ถŒ์— ์œ„์น˜ํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, FIFA ์˜จ๋ผ์ธ์—์„œ "์›”๋“œ ํด๋ž˜์Šค"๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ํ•จ๋ถ€๋ฅดํฌ ์‹œ์ ˆ๋ถ€ํ„ฐ ๋ชจ๋“  ๊ณจ์„ ์ง€์ผœ๋ณธ ํŒฌ์œผ๋กœ์„œ ๋ชน์‹œ ํ๋ญ‡ํ•œ ๊ฒฐ๊ณผ๊ฐ€ ์•„๋‹ ์ˆ˜ ์—†๋‹ค.

 

์œ ์ €๋“ค์ด ์„ ํ˜ธํ•˜๋Š” ์‹œ์ฆŒ์„ ์•Œ์•„๋ณด์ž.

 

In:

df_player %>% 
  filter(tap != '์ „์ฒด') %>% 
  group_by(season) %>% 
  summarise(total_num_user = sum(num_user)) %>% 
  arrange(-total_num_user) %>% 
  ggplot(aes(reorder(season, total_num_user), total_num_user)) + 
  geom_bar(stat = 'identity', colour = 'steelblue', fill = 'steelblue', alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  coord_flip() + 
  labs(x = NULL, y = 'Num. of Users')

 

Out:

 

โ–ท ์œ ์ €์˜ ์ˆ˜๋Š” ๊ฐ ์‹œ์ฆŒ์˜ ์„ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์œ ์ €์˜ ์ˆ˜ ์ด ํ•ฉ์„ ์˜๋ฏธํ•œ๋‹ค.

 

ํฌ์ง€์…˜๋ณ„ ์œ ์ €์˜ ์„ ์ˆ˜ ๊ฐ€์น˜๋ฅผ ์•Œ์•„๋ณด์ž.

 

In:

df_player %>% 
  filter(tap != '์ „์ฒด') %>% 
  mutate(all_price = price*num_user, 
         position_cat = if_else(tap %in% FWD, 'FWD', 
                               if_else(tap %in% MID, 'MID', 
                                      if_else(tap %in% DEF, 'DEF', 'GK')))) %>% 
  group_by(tap, position_cat) %>% 
  summarise(total_price = sum(all_price), 
            total_num_user = sum(num_user)) %>% 
  mutate(mean_price = total_price/total_num_user) %>% 
  ggplot(aes(reorder(tap, mean_price), mean_price)) + 
  geom_bar(stat = 'identity', aes(colour = position_cat, fill = position_cat), alpha = 0.5, size = 1) + 
  scale_y_continuous(label = comma) + 
  scale_colour_manual(values = c('blue', 'red', 'black', 'green')) + 
  scale_fill_manual(values = c('blue', 'red', 'black', 'green')) + 
  labs(x = NULL, y = 'Price', fill = NULL, colour = NULL) + 
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

 

Out:

 

โ–ท ์œ„์˜ ๊ฐ€๊ฒฉ์€ ํ•œ ๋ช…์˜ ์œ ์ €๊ฐ€ ๊ฐ ํฌ์ง€์…˜์— ์‚ฌ์šฉํ•˜๋Š” ๊ธˆ์•ก์˜ ํ‰๊ท ์„ ์˜๋ฏธํ•œ๋‹ค.

 

โ–ท ์˜ˆ์ƒ๋Œ€๋กœ ํฌ์›Œ๋“œ, ๋ฏธ๋“œํ•„๋”, ์ˆ˜๋น„์ˆ˜, ๊ณจํ‚คํผ ์ˆœ์œผ๋กœ ๊ฐ€์น˜๊ฐ€ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ๊ณจ์„ ๊ฐ€์žฅ ๋งŽ์ด ๋„ฃ๋Š” ST, CF์˜ ๊ฐ€์น˜๊ฐ€ ์••๋„์ ์œผ๋กœ ๋†’์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.