2021-10-6(따릉이 프로젝트 완성하기 11)

요약

트리계열 모델의 Ensemble

여러 모델을 실험한 결과 트레 계열 모델이 그나마 좋은 성능을 보입니다. 그래서 오늘은 이러한 모델을 가지고 ensemble을 해봤습니다.

models = [
    ('xgboost', XGBRegressor(random_state=2021)),
    ('lgb', lgb.LGBMRegressor(random_state=2021)),
    # ('rf', RandomForestRegressor(n_jobs = -1,random_state=2021))
]
params = {
    'xgboost': {
        "gamma": np.linspace(0,0.5,20),
        "max_depth": range(2, 6), 
        "learning_rate" : [0.001,0.01,0.1],
        "n_estimators": [80,100,150,170]
    },
    'lgb': {
        "gamma": [0.0],
        "max_depth": range(2, 6), 
        "learning_rate" : [0.001,0.01,0.1],
        "n_estimators": [80,100]
    },
    'rf': {
        "max_depth": range(2, 6),
        "min_samples_split": range(2, 6),
        "min_samples_leaf": range(2, 6), 
        "n_estimators": [80,100,120],
        }
}

kfold = KFold(n_splits=10, shuffle=True, random_state=2021)
Ensamble_predictions = np.zeros(test.shape[0])
for model_name, model in models:
    param_grid = params[model_name]
    grid = GridSearchCV(model, cv=kfold, n_jobs=-1, param_grid=param_grid, scoring = 'neg_mean_squared_error')
    grid = grid.fit(X_train[x_columns], Y_train)

    model = grid.best_estimator_
    valid_predictions = model.predict(X_test[x_columns])
    score = evaluate(Y_test, valid_predictions)['mse'][0]
    print("{0},score:{1},best_params_:{2} ".format(model_name, score, grid.best_params_))
    
    if model_name =='lgb':
      Ensamble_predictions += model.predict(test[x_columns])*0.5
    else:
      Ensamble_predictions += model.predict(test[x_columns])*0.5

xgboost,score:1307.199983522324,best_params_:{'gamma': 0.0, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 80} 
lgb,score:1160.1852960080262,best_params_:{'gamma': 0.0, 'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 80}

데이콘에 제출했던 앙상블은 randomforest까지 포함하여 lgb는 0.5, xgb,rf는 각각 0.25씩 분배한 것입니다. 제출한 결과 2정도 성능이 나빠졌습니다. lgb 단일 모델로 한 것보다 성능이 더 안좋아져서 셋중에 가장 성능이 떨어진 rf를 제외하여 위 코드처럼 진행했습니다. 제출은 내일이나 가능할 것같습니다.

아직은 결론은 짓지 않았지만 현재까지는 staking,, ensemble 와 같이 다른 모델과 혼합한 결과물은 대체로 괄목할만한 결과를 보진 못했습니다. 단일 모델로 사용하는게 현재까지는 유리 해 보입니다

'Data Diary' 카테고리의 다른 글

2021-10-08(CNN_Global AG, 가중치 규제, Augmentation) (0)	2021.10.08
2021-10-07(CNN_배치정규화 &따릉이 12) (0)	2021.10.08
2021-10-5(따릉이 프로젝트 완성하기 10) (0)	2021.10.06
2021-10-04,05(딥러닝 CNN_기본 특징 설명) (0)	2021.10.04
2021-09-30(딥러닝 CNN 2_Optimizer) (0)	2021.09.30

H_record

2021-10-6(따릉이 프로젝트 완성하기 11)

요약

'Data Diary' 카테고리의 다른 글

티스토리툴바

2021-10-6(따릉이 프로젝트 완성하기 11)

요약

'Data Diary' 카테고리의 다른 글

'Data Diary' Related Articles

티스토리툴바