랜덤포레스트 공략

랜덤포레스트는 NULL값을 받아낼수있으며 숫자형데이터만 받아낼수있다

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=150, max_depth=8, min_samples_leaf=4, max_features=0.2, n_jobs=-1, random_state=0)
rf.fit(train.drop(['id', 'target'],axis=1), train.target)
features = train.drop(['id', 'target'],axis=1).columns.values
print("----- Training Done -----")

RandomForestClassifier 모델을 초기화하고, 주요 매개변수들을 설정합니다.

n_estimators: 트리의 수로, 150개의 결정 트리를 사용합니다.
max_depth: 각 트리의 최대 깊이를 8로 제한합니다.
min_samples_leaf: 리프 노드가 되기 위한 최소 샘플 수를 4로 설정합니다.
max_features: 각 트리에서 고려할 최대 특성의 비율을 0.2로 설정합니다.
n_jobs: 병렬 처리를 위한 작업 수를 결정합니다. -1로 설정하면 모든 가능한 프로세서를 사용합니다.
random_state: 랜덤 시드를 0으로 설정하여 재현 가능한 결과를 보장합니다.

이후 시각화

# Scatter plot 
trace = go.Scatter(
    y = rf.feature_importances_,
    x = features,
    mode='markers',
    marker=dict(
        sizemode = 'diameter',
        sizeref = 1,
        size = 13,
        #size= rf.feature_importances_,
        #color = np.random.randn(500), #set color equal to a variable
        color = rf.feature_importances_,
        colorscale='Portland',
        showscale=True
    ),
    text = features
)
data = [trace]

layout= go.Layout(
    autosize= True,
    title= 'Random Forest Feature Importance',
    hovermode= 'closest',
     xaxis= dict(
         ticklen= 5,
         showgrid=False,
        zeroline=False,
        showline=False
     ),
    yaxis=dict(
        title= 'Feature Importance',
        showgrid=False,
        zeroline=False,
        ticklen= 5,
        gridwidth= 2
    ),
    showlegend= False
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig,filename='scatter2010')

중요변수 출력

x, y = (list(x) for x in zip(*sorted(zip(rf.feature_importances_, features), 
                                                            reverse = False)))
trace2 = go.Bar(
    x=x ,
    y=y,
    marker=dict(
        color=x,
        colorscale = 'Viridis',
        reversescale = True
    ),
    name='Random Forest Feature importance',
    orientation='h',
)

layout = dict(
    title='Barplot of Feature importances',
     width = 900, height = 2000,
    yaxis=dict(
        showgrid=False,
        showline=False,
        showticklabels=True,
#         domain=[0, 0.85],
    ))

fig1 = go.Figure(data=[trace2])
fig1['layout'].update(layout)
py.iplot(fig1, filename='plots')

'모델 공략' 카테고리의 다른 글

그랜드부스팅 공략 (0)	2024.03.03

백준파이썬개발자:프로젝트골드

랜덤포레스트 공략

'모델 공략' 카테고리의 다른 글

티스토리툴바

랜덤포레스트 공략

'모델 공략' 카테고리의 다른 글

'모델 공략' Related Articles

티스토리툴바