그랜드부스팅 공략

모델 공략

그랜드부스팅 공략

백준파이썬개발자:프로젝트골드 2024. 3. 3. 16:33

그랜드 부스팅코드이다.

랜덤포레스트와 같이 NULL값은 받을 수있고 숫자형 데이터만 받아낼수있다.

from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(n_estimators=100, max_depth=3, min_samples_leaf=4, max_features=0.2, random_state=0)
gb.fit(train.drop(['id', 'target'],axis=1), train.target)
features = train.drop(['id', 'target'],axis=1).columns.values
print("----- Training Done -----")

n_estimators: 부스팅 단계의 수로, 100개의 부스팅 단계를 사용합니다.
max_depth: 각 트리의 최대 깊이를 3으로 제한합니다.
min_samples_leaf: 리프 노드가 되기 위한 최소 샘플 수를 4로 설정합니다.
max_features: 각 트리에서 고려할 최대 특성의 비율을 0.2로 설정합니다.
random_state: 랜덤 시드를 0으로 설정하여 재현 가능한 결과를 보장합니다.

시각화코드이다.

# Scatter plot 
trace = go.Scatter(
    y = gb.feature_importances_,
    x = features,
    mode='markers',
    marker=dict(
        sizemode = 'diameter',
        sizeref = 1,
        size = 13,
        #size= rf.feature_importances_,
        #color = np.random.randn(500), #set color equal to a variable
        color = gb.feature_importances_,
        colorscale='Portland',
        showscale=True
    ),
    text = features
)
data = [trace]

layout= go.Layout(
    autosize= True,
    title= 'Gradient Boosting Machine Feature Importance',
    hovermode= 'closest',
     xaxis= dict(
         ticklen= 5,
         showgrid=False,
        zeroline=False,
        showline=False
     ),
    yaxis=dict(
        title= 'Feature Importance',
        showgrid=False,
        zeroline=False,
        ticklen= 5,
        gridwidth= 2
    ),
    showlegend= False
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig,filename='scatter2010')

아래코드는 변수가 학습에 미친영향을 변수별로 정렬해서 나타낸 코드이다.

x, y = (list(x) for x in zip(*sorted(zip(gb.feature_importances_, features), 
                                                            reverse = False)))
trace2 = go.Bar(
    x=x ,
    y=y,
    marker=dict(
        color=x,
        colorscale = 'Viridis',
        reversescale = True
    ),
    name='Gradient Boosting Classifer Feature importance',
    orientation='h',
)

layout = dict(
    title='Barplot of Feature importances',
     width = 900, height = 2000,
    yaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=True,
    ))

fig1 = go.Figure(data=[trace2])
fig1['layout'].update(layout)
py.iplot(fig1, filename='plots')