๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ’ก Data Analysis/๐Ÿ“‚ Project - Analysis of KakaoTalk (end)

[DA][Python] (1์ฐจ ์„ค๊ณ„) ์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์ฝ”๋“œ ์ˆ˜์ • ์„ค๊ณ„ (์™„์„ฑ๋ณธ X)

by Sun A 2024. 7. 5.

 

๊ฒฐ๋ก ๋ถ€ํ„ฐ ๋งํ•˜์ž๋ฉด ์•„๋ž˜์ฒ˜๋Ÿผ ์„ค๊ณ„ํ•œ ์ฝ”๋“œ์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋งŽ์ด ๋‚˜๊ณ  ํ•จ์ˆ˜๋ฅผ ๋„ˆ๋ฌด ๋งŽ์ด ๋‚˜๋ˆด์œผ๋ฉฐ ํ•จ์ˆ˜๋ช…์ด ๋ชจํ˜ธํ•˜๋‹ค๋Š” ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์•„ 2์ฐจ ์„ค๊ณ„๋กœ ๋‹ค์‹œ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์˜€๋‹ค.


์ด์ „ ๊ธ€ ์‚ดํŽด๋ณด๊ธฐ

๋”๋ณด๊ธฐ

์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (1)

 

[DA][Python] ์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (1)

๊ฐœ์š”์นด์นด์˜คํ†ก ์ฑ„ํŒ…๋ฐฉ์˜ ๋Œ€ํ™” ๋‚ด์šฉ์„ ๋‹ค์šด๋กœ๋“œ ํ•œ ํ›„, ๋Œ€ํ™” ๋‚ด์šฉ์„ ๋ถ„์„ํ•˜์—ฌ ์œ ์˜๋ฏธํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•œ๋‹ค. ์ตœ์ข… ๊ฒฐ๊ณผ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ–ˆ์„ ์‹œ, ์ตœ์ข…์ ์œผ๋กœ ์ถœ๋ ฅ๋˜๋Š”

sundery.tistory.com

์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (2)

 

[DA][Python] ์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (2)

์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (1) ์‚ดํŽด๋ณด๊ธฐ [Python] ์นด์นด์˜คํ†ก ๋Œ€ํ™” ๋‚ด์šฉ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ (1)๊ฐœ์š”์นด์นด์˜คํ†ก ์ฑ„ํŒ…๋ฐฉ์˜ ๋Œ€ํ™” ๋‚ด์šฉ์„ ๋‹ค์šด๋กœ๋“œ ํ•œ ํ›„, ๋Œ€ํ™” ๋‚ด์šฉ์„ ๋ถ„์„ํ•˜์—ฌ ์œ ์˜๋ฏธํ•œ ๊ฒฐ๊ณผ

sundery.tistory.com

 

 

๊ธฐ์กด ์ž‘์„ฑ ์ฝ”๋“œ


  
import pandas as pd
# ํ…์ŠคํŠธ ํŒŒ์ผ ๊ฒฝ๋กœ
kakao_file = "C:/Users/wkdtj/OneDrive/desktop/kakao2-data.txt"
# ํŒŒ์ผ ์ฝ๊ธฐ
with open(kakao_file, 'r', encoding='utf-8') as file:
lines = file.readlines()
data = []
current_date = None
current_weekday = None
#์‹œ๊ฐ„ ๋ณ€ํ™˜ ํ•จ์ˆ˜ (์˜ค์ „, ์˜คํ›„ => ์˜ค์ „ ์—†์• ๊ณ  ์˜คํ›„์— +12)
def convert_to_24hr(time_str):
if '์˜ค์ „' in time_str:
time = time_str.replace('์˜ค์ „ ', '')
hours, minutes = map(int, time.split(':'))
if hours == 12: # ์˜ค์ „ 12์‹œ๋Š” 00์‹œ๋กœ ๋ณ€ํ™˜
hours = 0
elif '์˜คํ›„' in time_str:
time = time_str.replace('์˜คํ›„ ', '')
hours, minutes = map(int, time.split(':'))
if hours != 12: # ์˜คํ›„ 12์‹œ๋Š” ๊ทธ๋Œ€๋กœ ๋‘๊ณ , ๋‚˜๋จธ์ง€๋Š” 12๋ฅผ ๋”ํ•จ
hours += 12
return f"{hours:02}:{minutes:02}"
# ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
for line in lines:
line = line.strip()
# ๋‚ ์งœ ํ˜•์‹: "--------------- YYYY๋…„ M์›” D์ผ ์š”์ผ ---------------"
if line.startswith('---------------') and '๋…„' in line and '์›”' in line and '์ผ' in line:
parts = line.strip('- ').split(' ')
year = parts[0][:-1] # 'YYYY๋…„'์—์„œ 'YYYY' ์ถ”์ถœ
month = parts[1][:-1] # 'M์›”'์—์„œ 'M' ์ถ”์ถœ
day = parts[2][:-1] # 'D์ผ'์—์„œ 'D' ์ถ”์ถœ
current_weekday = parts[3] #์š”์ผ์—์„œ ์š”์ผ ์ถ”์ถœ
current_date = f"{year} {month.zfill(2)} {day.zfill(2)}"
# ๋Œ€ํ™” ํ˜•์‹: "[๋ณด๋‚ธ ์‚ฌ๋žŒ] [์‹œ๊ฐ„] ๋ฉ”์‹œ์ง€"
elif line.startswith('[') and '] [' in line:
try:
sender, rest = line.split('] [', 1)
name = sender[1:] # ์•ž์˜ '[' ์ œ๊ฑฐ
time, message = rest.split('] ', 1)
time_24hr = convert_to_24hr(time) # ์‹œ๊ฐ„์„ 24์‹œ๊ฐ„ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜
if current_date: # ํ˜„์žฌ ๋‚ ์งœ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ถ”๊ฐ€
data.append([current_date, time_24hr, name, message, current_weekday])
except ValueError:
continue # ํ˜•์‹์— ๋งž์ง€ ์•Š๋Š” ๋ผ์ธ์€ ๊ฑด๋„ˆ๋œ€
#๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
df = pd.DataFrame(data, columns=['Date', 'Time', 'Name', 'Message', 'Day_of_Week'])
# ๋‚ ์งœ์™€ ์‹œ๊ฐ„์„ ๋ถ„๋ฆฌ
df['Year'] = df['Date'].str.split(' ').str[0].astype(int)
df['Month'] = df['Date'].str.split(' ').str[1].astype(int)
df['Day'] = df['Date'].str.split(' ').str[2].astype(int)
# ์ปฌ๋Ÿผ ์ˆœ์„œ ๋ณ€๊ฒฝ
df = df[['Year', 'Month', 'Day', 'Day_of_Week', 'Time', 'Name', 'Message']]
# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ถœ๋ ฅ
print(df)

๊ธฐ์กด์— ์ž‘์„ฑํ•œ ์ฝ”๋“œ๋Š” ์œ„์™€ ๊ฐ™์•˜๋‹ค.

๋Œ์•„์˜จ ํ”ผ๋“œ๋ฐฑ์€ ์‚ฌ์šฉํ•œ ํ•จ์ˆ˜๊ฐ€ ํ•œ ๊ฐœ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ฝ”๋“œ์˜ ์žฌ์‚ฌ์šฉ์„ฑ์ด ๋–จ์–ด์ง€๋ฉฐ ์ •์˜ํ•œ ๋‚ด์šฉ์ด ๊ฐ„๊ฒฐํ•˜์ง€ ์•Š์•„์„œ ์•„์‰ฌ์› ๋‹ค๋Š” ์ ์ด๋‹ค.

 

๊ทธ๋ž˜์„œ ์ƒˆ๋กญ๊ฒŒ ์ฝ”๋“œ๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ทธ ๋ฐฉ์‹๋Œ€๋กœ ๊ฐ€๋ณผ ์ƒ๊ฐ์ด๋‹ค. ์ง€๊ธˆ ์„ค๊ณ„ํ•˜๋Š” ์ฝ”๋“œ๋„ ํ‹€๋ฆด ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŒ์•ฝ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋ฉด ๊ธ€์ด ์˜ฌ๋ผ๊ฐ€์ง€ ์•Š์„์ง€๋„ ๐Ÿฅฒ

 

์ฝ”๋“œ ์„ค๊ณ„

  • ์‚ฌ์‹ค ์ดˆ๋ฐ˜์˜ ์„ค๊ณ„๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ํ™” ํ•œ ํ›„์— ํ•จ์ˆ˜๋ฅผ ํ˜•์„ฑํ•˜๋ ค ํ–ˆ๋Š”๋ฐ ์•„์ง ๋ถ€์กฑํ•œ ์ง€์‹์ด ๋งŽ์•„์„œ ๋‚˜์—๊ฒ ์–ด๋ ค์šด ๊ณผ์ •์ด์—ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ํ•จ์ˆ˜๋ฅผ ์ƒ๊ฐํ•˜๊ณ  ์ฝ”๋“œ๋ฅผ ์งœ๋Š” ๋ถ€๋ถ„์—์„œ๋„ ์–ด๋ ค์›€์ด ๋งŽ์•˜์œผ๋ฉฐ, ์•„๋ฌด๋ž˜๋„ ์ฒ˜์Œ ์ƒ๊ฐํ•œ ๋ฐฉ๋ฒ•์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ผ์ธ๋ณ„๋กœ ์ •์ œํ•˜๊ณ  ํ”„๋ ˆ์ž„ํ™”ํ•˜๋Š” ๊ณผ์ •์ด์—ˆ๋‹ค๋ณด๋‹ˆ ์ƒ๊ฐ์˜ ๊ณผ์ •์„ ๋ฐ”๊พธ๋Š” ๊ฒƒ๋„ ์–ด๋ ค์šด ๊ฒƒ ๊ฐ™์•„์„œ ๋‹ค์‹œ for๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ผ์ธ๋ณ„๋กœ ํ™•์ธํ•˜๋˜ ๊ทธ ๋‚ด์— ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“œ๋Š” ์ฝ”๋“œ ์„ค๊ณ„๋กœ ๋ณ€๊ฒฝ๋˜์—ˆ๋‹ค.
  • ์ตœ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ๋งŒ๋“ค์–ด๋ณผ ์ƒ๊ฐ์ด๋‹ค

 

1. Date ์ถœ๋ ฅ ํ•จ์ˆ˜

- ์›๋ณธ ๋ฐ์ดํ„ฐ

--------------- 2023๋…„ 3์›” 24์ผ ๊ธˆ์š”์ผ ---------------
  • ์—ฌ๊ธฐ์„œ Year, Month, Day, Day_of_Week ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
  • ํ•„์š”ํ•œ ํ•จ์ˆ˜
    1. ๋ฌธ์ž ์ •์ œ (processed_date_line)
    2. Date ๊ฐ’์˜ ๊ฐ ๋ณ€์ˆ˜๊ฐ’ ๋ณ€ํ™˜ (read_date)

 

2. Time ์ถœ๋ ฅ ํ•จ์ˆ˜

- ์›๋ณธ ๋ฐ์ดํ„ฐ

[์•„๋นต] [์˜คํ›„ 2:16] ๐ŸŠ ์‚ถ์˜ ๊ตํ›ˆ
  • ํ•„์š”ํ•œ ํ•จ์ˆ˜
    1. ๋ฌธ์ž ์ •์ œ (processed_message_line)
    2. ์‹œ๊ฐ„ ํ˜•์‹ ๋ณ€ํ™˜ ํ•จ์ˆ˜ (covert_24hr)
    3. Time ๊ฐ’ ๋ฐ˜ํ™˜ (read_time)

 

3. Name ์ถœ๋ ฅ ํ•จ์ˆ˜

- ์›๋ณธ ๋ฐ์ดํ„ฐ

[์•„๋นต] [์˜คํ›„ 2:16] ๐ŸŠ ์‚ถ์˜ ๊ตํ›ˆ
  • ํ•„์š”ํ•œ ํ•จ์ˆ˜
    1. ๋ฌธ์ž ์ •์ œ (processed_message_line)
    2. Name ๊ฐ’ ๋ฐ˜ํ™˜ (read_name)

 

4. Message ์ถœ๋ ฅ ํ•จ์ˆ˜

[์•„๋นต] [์˜คํ›„ 2:16] ๐ŸŠ ์‚ถ์˜ ๊ตํ›ˆ
  • ํ•„์š”ํ•œ ํ•จ์ˆ˜
    1. ๋ฌธ์ž ์ •์ œ (processed_message_line)
    2. Message ๊ฐ’ ๋ฐ˜ํ™˜ (read_message)

 

5. ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ์— ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ํ•จ์ˆ˜

  • ๋งŒ๋“ค์–ด๋‘” data = [] ๋ฆฌ์ŠคํŠธ์— ์•ž์„œ ๋งŒ๋“  ๋ฐ์ดํ„ฐ๋“ค์„ ์ถ”๊ฐ€ํ•œ๋‹ค.
  • ํ•„์š”ํ•œ ํ•จ์ˆ˜
    1. ๋ฆฌ์ŠคํŠธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜ (add_data)

 

6. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ํ™” 

  • ์œ„์— ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฝ์ž…๋œ data ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ํ™” ํ•˜์—ฌ ์ถœ๋ ฅํ•œ๋‹ค.

 

์ƒ์„ฑํ•ด์•ผ ํ•˜๋Š” ํ•จ์ˆ˜ ์š”์•ฝ ๋ฐ ์ˆœ์„œ ์ •๋ฆฌ

  1. ๋ฌธ์ž ์ •์ œ (processed_date_line)
  2. ๋ฌธ์ž ์ •์ œ (processed_message_line)
  3. Date ๊ฐ’์˜ ๊ฐ ๋ณ€์ˆ˜๊ฐ’ ๋ณ€ํ™˜ (read_date)
  4. ์‹œ๊ฐ„ ํ˜•์‹ ๋ณ€ํ™˜ ํ•จ์ˆ˜ (covert_24hr)
  5. Time ๊ฐ’ ๋ฐ˜ํ™˜ (read_time)
  6. Name ๊ฐ’ ๋ฐ˜ํ™˜ (read_name)
  7. Message ๊ฐ’ ๋ฐ˜ํ™˜ (read_message)
  8. ๋ฆฌ์ŠคํŠธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜ (add_data)

9. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ํ™” ํ•˜์—ฌ ์ถœ๋ ฅ

 


ํ•ด๋‹น ์„ค๊ณ„์— ๋งž์ถฐ์„œ ๋‹ค์Œ ๊ธ€์—์„œ๋Š” ํ•จ์ˆ˜๋ฅผ ์ˆœ์„œ์— ๋งž์ถฐ์„œ ์ž‘์„ฑํ•ด๋ณด๋Š” ๊ฒŒ์‹œ๋ฌผ์„ ์ž‘์„ฑํ•  ์˜ˆ์ •์ด๋‹ค.