본문 바로가기
🙋 0. 문코딩 소개/0-3 자격증

빅데이터 분석 기사 실기 1유형 정리(2025년 1유형 체험 문제, 데이터마님 전처리 100문제 풀이)

by 달님🌙 2025. 6. 13.
반응형
0. 빅데이터분석기사 실기

 

 

1. 공부 계획

 

 

6.17.화 (D-4) 시작  유튜브 어답터 핵심 강의 1유형 인강 보기
  데이터마닝 1유형 1~30번 예제 문제 풀기
6.18.수 (D-3)  데이터마닝 1유형 31~60번 예제 문제 풀기
6.19.목  (D-2)  유튜브 어답터 핵심 강의 2유형 인강 보기
6.20.금  (D-1)  유튜브 어답터 핵심 강의 3유형 인강 보기(2025 + 2024)
  기출 8회 1~3유형 풀기
  기출 9회 1~3유형 풀기
6.21.토  ( D-DAY 필수 코드 암기

 

 

 

 

 

2. 기출문제

 

9회 1유형

# 1번
import pandas as pd
# df = pd.read_csv("loan.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert_v2/main/part4/ch9/loan.csv")
#(1)
df["총대출액"]= df["신용대출"] + df["담보대출"]
# print(df.info)
#(2)
grouped = df.groupby(["지역코드","성별"])["총대출액"].sum().unstack()
# print(grouped)
(3)
grouped["차이"] = abs(grouped[1] - grouped[2])
grouped = grouped.sort_values("차이",ascending=False)
# print(grouped.info)
# 답 : 4100000278

# 2번
import pandas as pd
# df = pd.read_csv("crime.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert_v2/main/part4/ch9/crime.csv")
# print(df)
# (1)
bs = df[df["구분"] == "발생건수"].iloc[:, 2:].reset_index(drop=True)
gg = df[df["구분"] == "검거건수"].iloc[:,2:].reset_index(drop=True)

final = gg / bs
# print(final)

# (2)
list = final.idxmax(axis = 1)
# print(list)

sum = 0
for i,item in enumerate(list):
  sum = sum + gg.loc[i,item]
sum
# print(sum)
# print(gg)
# print(961 + 713+812+1300+1350+1300+1363)  # 노가다

# 3번
import pandas as pd
# df = pd.read_csv("hr.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert_v2/main/part4/ch9/hr.csv")
# print(df)
# (1)
# print(df.isnull().sum())
# df = df[df["만족도"].isnull()]
mean = df["만족도"].mean()
# print(mean)
df["만족도"] = df["만족도"].fillna(mean)
# print(df["만족도"].isnull().sum())

# (2)
# print(df)
gm = df.groupby(['부서','성과등급'])["근속연수"].transform("mean").astype(int)
df["근속연수"] = df["근속연수"].fillna(gm)
# print(df)

# (3)
df["연봉_근속연수"] = df["연봉"] / df["근속연수"]
df_year = df.sort_values("연봉_근속연수",ascending=False)
year = df.nlargest(3, "연봉_근속연수")
A = year.iloc[-1]["근속연수"]
print(A)
# a = 1


# (4)
df["연봉_만족도"] = df["연봉"] / df["만족도"]
# print(df)
df_like = df.sort_values("연봉_만족도", ascending = False)
like = df.nlargest(2, "연봉_만족도")
print(like)
B = like.iloc[-1]["교육참가횟수"]
print(B)

#  B = 6

 

 

 8회 1유형 1~3번

# 기출8회 1유형 문제풀이
import pandas as pd
# df = pd.read_csv("drinks.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch8/drinks.csv")

# 1번
continent = df.groupby("continent")['beer_servings'].mean()
# print(continent)
# top_continent = continent.idxmax()
# print(top_continent)
# 답 : Europe

df = df[df['continent'] == "Europe"]
df = df.sort_values('beer_servings', ascending = False)
# print(df.head())
# 답 : 313

# 2번
import pandas as pd
# df = pd.read_csv("tourist.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch8/tourist.csv")

# print(df.columns)
# print(df.info)
df['방문객 합계'] = df['관광'] + df['공무'] + df['사업'] + df['기타'] 
df['관광객 비율'] = df['관광'] / df['방문객 합계']
# high = df.sort_values('관광객 비율', ascending = False)
# print(high) 
#답 : (1) 사업 -> 203
df = df.sort_values('관광', ascending = False)
# print(df)
#답 : (2)국가42 ->  238
#답 : (3)441

# 3번
import pandas as pd

# df = pd.read_csv("chem.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch8/chem.csv")
print(df)
#(1)민맥스
# 암기 : 
# preprocessing임
#모르겠으면 sklearn 임포트해서 __all__ 찾기
#scaler 선언부터 해주고 fit_transform 진행
# df[[]] 대괄호 두개써야해!!
# from sklearn import
import sklearn
# print(sklearn.__all__)
# print(sklearn.preprocessing.__all__)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
co_scaled = scaler.fit_transform(df[['co']])
nmhc_scaled = scaler.fit_transform(df[['nmhc']]) 
# print(df['co'])
# print(co_scaled)
co_std = co_scaled.std()
nmhc_std = nmhc_scaled.std()
# print(co_std, nmhc_std)
#(3)
print(round(co_std - nmhc_std, 3))

 

7회

#7회 1유형


#1번
import pandas as pd
# df = pd.read_csv("student_assessment.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch7/student_assessment.csv")

# print(df.isnull().sum())#score
# print(df.shape) # 2565,4
df = df.dropna()
# print(df.shape)
# print(df)
# print(df['id_assessment'].value_counts())
# print(dir(df)) # 가장많이 수강과목 : 12번과목, 33회
df = df[df['id_assessment'] == 12]
# print(df)
# print(df)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df['score'] = scaler.fit_transform(df[['score']])
# print(help(StandardScaler.fit_transform))
df = df.sort_values('score', ascending=False)
# 답 : 2.183
# print(df)


#2번
import pandas as pd
# df = pd.read_csv("stock_market.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch7/stock_market.csv")
print(df.shape) # 1000 80
df_corr = df.corr()['close'].abs()
# print(df_corr)
df_corr = df_corr.loc['DE1':'DE77']
# print(df_corr)
corr = df_corr.idxmax()
# print(corr) # DE14
# print(df["DE14"].mean() ) #-0.0004



#3번

import pandas as pd
# df = pd.read_csv("air_quality.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lovedlim/bigdata_analyst_cert/main/part4/ch7/air_quality.csv")
# print(df)
print(df.shape)
i1 = df['CO2'].quantile(0.25)
i3 = df['CO2'].quantile(0.75)
IQR = i3-i1
low = i1 - 1.5 * IQR
up = i3 +  1.5 * IQR
print(i, low, up)
df = df[(df["CO2"] < low) | (df["CO2"] > up)]
print(df.shape)
#답 304개!

 

6회

ㅇㅇ

 

3. 연습문제 풀이 정리

 

- 1유형 체험 문제

# 출력을 원할 경우 print() 함수 활용
# 예시) print(df.head())

# getcwd(), chdir() 등 작업 폴더 설정 불필요
# 파일 경로 상 내부 드라이브 경로(C: 등) 접근 불가

import pandas as pd

df = pd.read_csv("data/employee_performance.csv")

# 사용자 코딩

# 1번
ms = df['고객만족도'].mean()
df['고객만족도'] = df['고객만족도'].fillna(ms)
# print(df.isnull().sum())

# 2번
df = df.dropna(subset =['근속연수'])
# print(help(df.dropna))


#3번
# quantile_3 = df['고객만족도'].quantile(0.75)
# print(int(quantile_3))

# print(dir(df))
# print(df['고객만족도'].sort_values().iloc[713]) # 952
# print(952 * 0.75)


#4번
# print(int(df.groupby('부서')['연봉'].mean().sort_values(ascending=False).iloc[1]))
#야매법
# print(int(df.groupby('부서')['연봉'].mean().sort_values(ascending=False).iloc[1]))
print(df['부서'].unique())
# print(int(df[df['부서'] == 'Sales']['연봉'].mean() + 0.5))


# print(df.isnull().sum())
# 해당 화면에서는 제출하지 않으며, 문제 풀이 후 답안제출에서 결괏값 제출

 

- 1유형 데이터마님 판다스 연습 튜토리얼

 

1~30번

https://www.datamanim.com/dataset/99_pandas/pandasMain.html

import pandas as pd
# 1번
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/lol.csv', sep = '\t')
# print(df)

# 2번
# print(df.head(5))


# # 3번
# print(df.shape)
# print('행 : ' , df.shape[0])
# print('열 : ' , df.shape[1])



# 4
# print(df.columns)

# 5
# print(df.columns[5])


# 6
# print(df.iloc[:,5].dtype)


# 7
# print(df.index)

# 8
# print(df.iloc[2,5])



dataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/Jeju.csv'
df = pd.read_csv(dataUrl, encoding = 'euc-kr')


# 9
# print(type(df))

# 10
# print(df.tail(3))



# 11
# print(help(df))
# print(dir(df))


# print(df.select_dtypes(exclude=object).columns)




# 12
# print(df.select_dtypes(include=object).columns)





# 13
# print(df.isnull().sum())





# 14
# print(df.columns, df.columns.sum(), df.dtypes?) //틀린답
# print(df.info()) 





# 15
# print(df.describe())






# 16
# print(df['거주인구'])





# 17
# print(df['평균 속도'].quantile(0.75) - df['평균 속도'].quantile(0.25))



# 18
# print(df['읍면동명'].nunique())

# print(dir(df))





# 19
# print(df['읍면동명'].unique())





# 20
dataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv'
df = pd.read_csv(dataUrl)
# print(df)





# 21
# print(df.loc[df['quantity'] == 3].head())
# print(df.loc[df['quantity'] == 3].head()) # head는 디폴트값이 5인가?





# 22
# print(df.loc[df['quantity'] == 3].head().reset_index(drop=True))






# 23
# print(df[['quantity', 'item_price']]) # 컬럼 두개면 괄호 두개 ?
# print(df.columns)
# print(df[['order_id', 'quantity', 'item_price']]) # 컬럼 두개면 괄호 두개 ? 세개여도 두개!







# 24
df['new_price'] = df['item_price'].str[1:].astype('float')
# print(df['new_price'].head())





# 25
# print(len(df.loc[df['new_price'] <= 5]))







# 26
# print(df.loc[df['item_name'] == 'Chicken Salad Bowl'].reset_index(drop=True))






# 27

# print(df.loc[(df['new_price'] <= 9) & (df['item_name'] == 'Chicken Salad Bowl')])
# print(df.loc[(df.new_price <= 9) & (df.item_name == 'Chicken Salad Bowl')].head())


# 28
# print(df.sort_values('new_price').reset_index(drop=True).head())







# 29

# print(df.loc[df.item_name.str.contains('Chips')].head())



# 30
# print(df.iloc[:, ::2])

# 그렇다면 짝수행은?
# print(df.iloc[::2, :])
# 그렇다면 홀수 열은? 
print(df.iloc[:, 1::2].head())

 

 

31~60번

# 31
# print(df.sort_values('new_price',ascending=False).reset_index(drop=True                                                ))


# 32
# print(df.loc[(df.item_name == 'Steak Salad') |(df.item_name == 'Bowl')])



# 33
# print(df.loc[(df.item_name == 'Steak Salad') |(df.item_name == 'Bowl')].drop_duplicates('item_name'))



# 34
# print(df.loc[(df.item_name == 'Steak Salad') |(df.item_name == 'Bowl')].drop_duplicates('item_name', keep ='last'))



# 35
# print(df.loc[df.new_price >= df.new_price.mean()])


# 36
# df.loc[df.item_name == 'Izze','item_name'] = 'Fizzy Lizzy'
# print(df.head())
# df.loc[df.item_name == 'Izze', 'item_name']



# 37
# print(df.choice_description.isnull().sum())


# 38
df.loc[df.choice_description.isnull(), 'choice_description'] = 'NoData'
# print(df.head())

# 39
df.loc[df.choice_description.str.contains('Black')]


# 40
# print(len(df.loc[~df.choice_description.str.contains('Vegetables')]))


# 41
# print(df[df.item_name.str.startswith('N')])

# 42
# print(df[df.item_name.str.len() >=15])

# 43
lst = [1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
# print(df.loc[df.new_price.isin(lst)])



# 03_Grouping



# 44

df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/AB_NYC_2019.csv')
# print(df.head())

# 45
ans = df.groupby('host_name').size().head()
# print(ans)

# 그룹바이 안하는방법
ans = df.host_name.value_counts().sort_index()
# print(ans)


# 46 
# ans = df.groupby('host_name').size().to_frame().rename(columns={0:'counts'})
# print(ans.sort_values('counts', ascending=False))
# print(df)

# 47
# print(df.groupby(['neighbourhood_group','neighbourhood'], as_index = False).size())

# 48
# print(df.groupby(['neighbourhood_group', 'neighbourhood'],as_index=False).size().groupby(['neighbourhood_group'],as_index=False).max())

# 49
# print(df[['neighbourhood_group','price']].groupby('neighbourhood_group').agg('mean', 'var', 'max', 'min'))

# 50
# print(df[['neighbourhood_group', 'reviews_per_month']].groupby('neighbourhood_group').agg(['mean', 'var', 'max', 'min']))

# 51
# print(df.groupby(['neighbourhood','neighbourhood_group']).price.mean())


# 52

# print(df.groupby(['neighbourhood', 'neighbourhood_group']).price.mean().unstack)


# 53
# print(df.groupby(['neighbourhood', 'neighbourhood_group']).price.mean().unstack().fillna(-999))


# 54
# print(df[df.neighbourhood_group == 'Queens'].groupby('neighbourhood').price.agg(['mean','var','max','min']))

# 55
# ans = df[['neighbourhood_group', 'room_type']].groupby(['neighbourhood_group', 'room_type']).size().unstack()
# ans.loc[:,:]  = ans.values / ans.sum(axis=1).values.reshape(-1,1)
# print(ans)

# Apply, Map

# 56
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv')
print(df.shape)


# 57
dic = {
'Unknown' : 'N',
'Less than $40K' : 'a',
'$40K - $60K' : 'b',
'$60K - $80K' : 'c',
'$80K - $120K' : 'd',
'$120K +' : 'e'
       }

df['newIncome'] = df.Income_Category.map(lambda x : dic[x])
# print(df.head())


# 58
def change(x) :
  if x == 'Unknown' : return 'N'
  elif  x == 'Less than $40K' : return 'a'
  elif  x == '$40K - $60K' : return 'b'
  elif  x == '$60K - $80K' : return 'c'
  elif  x == '$80K - $120K' : return 'd'
  elif  x == '$120K +' : return 'e'

df['newIncome'] = df.Income_Category.apply(change)
# print(df['newIncome'])

# 59
df['AgeState'] = df.Customer_Age.map(lambda x  : x//10 *10)
ans = df['AgeState'].value_counts().sort_index()
print(ans)

# 60
df['newEduLevel'] = df.Education_Level.map(lambda x : 1 if 'Graduate' in x else 0)
print(df['newEduLevel'].value_counts())

 

 

 

 

4. 참고자료

 

 

https://dataq.goorm.io

https://www.youtube.com/watch?v=ucKYrlbTN2s&list=PLSlDi2AkDv82Qv7B3WiWypQSFmOCb-G_-&index=17

 

 

 

https://www.datamanim.com/dataset/03_dataq/re.html

 

 

https://www.kaggle.com/datasets/agileteam/bigdatacertificationkr

 

Big Data Certification KR

퇴근후딴짓 의 빅데이터 분석기사 실기 (Python, R tutorial code) 커뮤니티

www.kaggle.com

 

 

 

 

https://www.inflearn.com/course/%EB%B9%85%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%B6%84%EC%84%9D%EA%B8%B0%EC%82%AC-%EC%8B%A4%EA%B8%B0#curriculum

 

[퇴근후딴짓] 빅데이터 분석기사 실기 (작업형1,2,3) 강의 | 퇴근후딴짓 - 인프런

퇴근후딴짓 | , ❤️공지❤️2025 체험 예시문제 영상 및 코드 업데이트 완료 (2025.6)🔥 빅데이터 분석기사 스터디 https://discord.gg/V8acvTnHhH 🔥[사진]직무 역량과 인재상의 변화, 누구에게나 필요한

www.inflearn.com

 

반응형

댓글