代码之家  ›  专栏  ›  技术社区  ›  sds Niraj Rajbhandari

如何使用另一个估计器组合sklearn估计器?

  •  0
  • sds Niraj Rajbhandari  · 技术社区  · 7 年前

    LogisticRegression RandomForestClassifier 并使用 GaussianNB

    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.naive_bayes import GaussianNB
    
    X, y = make_classification(n_samples=1000, n_features=4,
                               n_informative=2, n_redundant=0,
                               random_state=0, shuffle=False)
    
    logit = LogisticRegression(random_state=0)
    logit.fit(X, y)
    
    randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
    randf.fit(X, y)
    
    X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])
    
    nb = GaussianNB()
    nb.fit(X1, y)
    

    我该怎么做呢 Pipeline 这样我就可以把它传给 cross_validate GridSearchCV ?

    我想我可以定义自己的类来实现 fit predict_proba

    1 回复  |  直到 7 年前
        1
  •  1
  •   sds Niraj Rajbhandari    7 年前

    不,SKL中没有内置任何东西,不需要编写自定义代码就可以学会做你想做的事情。您可以使用 FeatureUnion Pipeline predict_proba transform 方法。

    大概是这样的:

    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.naive_bayes import GaussianNB
    from sklearn.base import BaseEstimator, TransformerMixin
    from sklearn.pipeline import Pipeline, FeatureUnion
    
    X, y = make_classification(n_samples=1000, n_features=4,
                               n_informative=2, n_redundant=0,
                               random_state=0, shuffle=False)
    
    # This is the custom transformer that will convert 
    # predict_proba() to pipeline friendly transform()
    class PredictProbaTransformer(BaseEstimator, TransformerMixin):
        def __init__(self, clf=None):
            self.clf = clf
    
        def fit(self, X, y):
            if self.clf is not None:
                self.clf.fit(X, y)
    
            return self
    
        def transform(self, X):
    
            if self.clf is not None:
                # Drop the 2nd column but keep 2d shape
                # because FeatureUnion wants that 
                return self.clf.predict_proba(X)[:,[0]]
    
            return X
    
        # This method is important for correct working of pipeline
        def fit_transform(self, X, y):
            return self.fit(X, y).transform(X)
    
    logit = LogisticRegression(random_state=0)
    randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
    
    pipe = Pipeline([
                     ('stack',FeatureUnion([
                                  ('logit', PredictProbaTransformer(logit)),
                                  ('randf', PredictProbaTransformer(randf)),
                                  #You can add more classifiers with custom wrapper like above
                                           ])),
                     ('nb',GaussianNB())])
    
    pipe.fit(X, y)
    

    现在你只需打电话 pipe.predict() 所有的事情都会正确地完成。

    有关FeatureUnion的更多信息,请参阅我对类似问题的其他回答:-

    推荐文章