s t u d y . . ๐Ÿง/์ด๊ฒƒ์ €๊ฒƒ

[์ถ”์ฒœ ์‹œ์Šคํ…œ] ๋ฉ”๋‰ด ์ถ”์ฒœ ์‹œ์Šคํ…œ (2)

H J 2023. 3. 28. 22:38

์ €๋ฒˆ ๊ธ€์—์„œ ์ด์–ด์ง‘๋‹ˆ๋‹ค ~


์ €๋ฒˆ์— ์–ด๋–ค ๊ฑด ๊ฐ™์€ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์ œ๋Œ€๋กœ ๋ฝ‘ํžˆ๋Š”๋ฐ ์–ด๋–ค ๊ฑด ์•ˆ ๋ผ์„œ ์—ด์‹ฌํžˆ ์„œ์น˜ํ•ด๋ดค๋”๋‹ˆ...!

CountVectorizer๊ฐ€ ์ž์ฒด์ ์œผ๋กœ ํ•œ ๊ธ€์ž์ธ ๊ฒฝ์šฐ์—๋Š” ๋”ฐ๋กœ ์นด์šดํŠธ๋ฅผ ํ•ด์ฃผ์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•œ๋‹ค..
์•„๋งˆ ์˜์–ด์—์„œ๋Š” ํ•œ ๊ธ€์ž์ธ ๊ฒฝ์šฐ์—๋Š” ์ค‘์š”ํ•œ ์˜๋ฏธ๊ฐ€ ์—†์–ด์„œ ๊ทธ๋Ÿฐ ๊ฒƒ ๊ฐ™๋‹ค..

๊ทธ์น˜๋งŒ ๋‚ด ๋ฐ์ดํ„ฐ์…‹์€ ํ•œ๊ธ€์ด๊ณ  ๋–ก, ๊ตญ, ํƒ•, ์ „, ์ฐœ, ๋ฉด ๋“ฑ๋“ฑ ํ•œ ๊ธ€์ž์ธ ๊ฒƒ๋“ค๋„ ๋งค์šฐ ๋งŽ๊ณ ..

,๋‚˜ ๊ณต๋ฐฑ, /๊ฐ€ ํฌํ•จ๋œ ํ…์ŠคํŠธ๋„ ์ธ์‹ํ•ด์ฃผ์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•ด์„œ ๋”ฐ๋กœ ๋ฐ์ดํ„ฐ ๊ฐ’์„ ๋ณ€๊ฒฝํ•ด์ฃผ์—ˆ๋‹ค

 

import pandas as pd

df = pd.read_csv('food_list.csv',encoding='cp949')
data = df[['๊ตฌ๋ถ„', '์Œ์‹๋ช…']] # ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ€์ ธ์˜ค๊ธฐ

for i in range (len(data)): # ,๊ฐ€ ํฌํ•จ๋˜๊ฑฐ๋‚˜ ํ•œ ๊ธ€์ž์ธ ์นดํ…Œ๊ณ ๋ฆฌ๋ช… ๋ณ€๊ฒฝ / ๋ฌธ์ž๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ํ•œ ๊ธ€์ž์ธ ์Œ์‹๋ช… ๋ณ€๊ฒฝ
	if (',' in data['๊ตฌ๋ถ„'][i]):
		data.loc[i, "๊ตฌ๋ถ„"] = data['๊ตฌ๋ถ„'][i].replace(',', '_')

	if (len(data['๊ตฌ๋ถ„'][i]) == 1):
		data.loc[i, "๊ตฌ๋ถ„"] = data['๊ตฌ๋ถ„'][i] + "_"

	if (len(data['์Œ์‹๋ช…'][i]) == 1):
		data.loc[i, "์Œ์‹๋ช…"] = data['์Œ์‹๋ช…'][i] + "_"

	if (" / " in data['์Œ์‹๋ช…'][i]):
		data.loc[i, "์Œ์‹๋ช…"] = data['์Œ์‹๋ช…'][i].replace(" / ", '_')
	
	if ("/" in data['์Œ์‹๋ช…'][i]):
		data.loc[i, "์Œ์‹๋ช…"] = data['์Œ์‹๋ช…'][i].replace("/", '_')

์ด๋ ‡๊ฒŒ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„ data.loc์„ ์‚ฌ์šฉํ•ด์„œ ๊ฐ’์„ ๋ณ€๊ฒฝํ•ด์ฃผ์—ˆ๋‹ค..!


์จ”์ž” ~

๋น„ํฌ
์• ํ”„ํ„ฐ

์ด์ œ ๋‹ค์Œ์œผ๋กœ ํ•  ์ผ์€.. !

 

์ •ํ™•ํ•˜์ง€ ์•Š์€ ์Œ์‹๋ช…์„ ์ž…๋ ฅ๋ฐ›์•˜์„ ๋•Œ ์œ ์‚ฌํ•œ ์Œ์‹์„ ์ถ”์ฒœํ•ด์ค˜์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•„๋งˆ๋„ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ๋ฅผ ํ•ด์•ผ๋˜์ง€ ์•Š์„๊นŒ..?? ๐Ÿš€๐Ÿ‘ฝ