代码之家  ›  专栏  ›  技术社区  ›  David

如何修复估计器多输出分类器的无效参数min_samples_split

  •  0
  • David  · 技术社区  · 6 年前

    尝试使用网格搜索运行我的机器学习管道时出现以下错误。我不确定这个错误来自何处,因为网格搜索似乎命名正确,参数正确。

    "ValueError: Invalid parameter min_samples_split for estimator MultiOutputClassifier(estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                max_depth=None, max_features='auto', max_leaf_nodes=None,
                min_impurity_decrease=0.0, min_impurity_split=None,
                min_samples_leaf=1, min_samples_split=2,
                min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
                oob_score=False, random_state=None, verbose=0,
                warm_start=False),
               n_jobs=None). Check the list of available parameters with `estimator.get_params().keys()`.
    "
    
    
    
    
    model = Pipeline([
            ('features', FeatureUnion([
    
                ('text_pipeline', Pipeline([
                    ('vect', CountVectorizer(tokenizer=tokenize)),
                    ('tfidf', TfidfTransformer())
                ])),
    
                ('starting_verb', StartingVerbExtractor())
            ])),
    
            ('clf', MultiOutputClassifier(RandomForestClassifier()))
        ])
    
        parameters = {
            'features__text_pipeline__vect__ngram_range': ((1, 1), (1, 2)),
            'features__text_pipeline__vect__max_df': (0.5, 0.75, 1.0),
            'features__text_pipeline__vect__max_features': (None, 5000, 10000),
            'clf__n_estimators': [50, 100, 200],
            'clf__min_samples_split': [2, 3, 4]
        }
    
    
        cv = GridSearchCV(model, param_grid=parameters, verbose=2, n_jobs=4)
    
    0 回复  |  直到 6 年前
        1
  •  0
  •   David    6 年前

    找到了错误的罪魁祸首。我将参数更改为:

    parameters = {
            'features__text_pipeline__vect__ngram_range': ((1, 1), (1, 2)),
            'features__text_pipeline__vect__max_df': (0.5, 0.75, 1.0),
            'features__text_pipeline__vect__max_features': (None, 5000, 10000),
            'clf__estimator__n_estimators': [50, 100, 200],
            'clf__estimator__min_samples_split': [2, 3, 4]
        }