代码之家  ›  专栏  ›  技术社区  ›  Saurabh Jain

如何使用sklearn管道元素的输出

  •  0
  • Saurabh Jain  · 技术社区  · 8 年前

    我有三个特点:

    feature_one -> number of tokens in the given sentence.  
    feature_two -> number of verbs in the given sentence.  
    feature_three -> number of tokens - number of verbs in the given sentence.  
    (feature_one - feature_two)
    

    我写了自定义变压器 feature_one feature_two feature_three 这样我就可以使用 功能1 功能二 通过以下方式运行管道:

    Pipeline([
            #input to feature_one and feature_two is list of sentences.
            ("feature", FeatureUnion([
                ("feature_one", feature_one_transformer()),
                ("feature_two", feature_two_transformer())          
            ])),
    
            ("feature_three", feature_three_transformer())    
    ])
    

    feature\u one\u变压器:

    class feature_one_transformer(BaseEstimator, TransformerMixin):
    
        def __init__(self):
            pass
    
        def fit(self, x, y):
            return self
    
        def transform(self, sentence_list):
            number_of_tokens_in_sentence_list = list()
    
            for sentence in sentence_list:
                number_of_tokens = compute_number_of_tokens
    
                number_of_tokens_in_sentence_lista.append(number_of_tokens)
    
            return pandas.DataFrame(number_of_tokens_in_sentence_list)
    

    class feature_two_transformer(BaseEstimator, TransformerMixin):
        def __init__(self):
            pass
    
        def fit(self, x, y):
            return self
    
        def transform(self, sentence_list):
            number_of_verbs_in_sentence_list = list()
    
            for sentence in sentence_list:
                number_of_verbs = compute_number_of_verbs_in_sentence
    
                number_of_verbs_in_sentence_lista.append(number_of_verbs)
    
            return pandas.DataFrame(number_of_verbs_in_sentence_list)
    

    有人能告诉我应该如何为feature\u三编写自定义transformer,以及如何在管道中使用,以便使用feature\u一和feature\u二个transformer的结果吗。非常感谢。

    1 回复  |  直到 8 年前
        1
  •  1
  •   Oriol Mirosa    8 年前

    class features_transformer(BaseEstimator, TransformerMixin):
    
        def __init__(self, variable):
            self.variable = variable
    
        def fit(self, X):
            return self
    
        def transform(self, X):
            X['number_of_tokens'] = X[self.variable].apply(lambda cell: compute_number_of_tokens(cell))
            X['number_of_verbs'] = X[self.variable].apply(lambda cell: compute_number_of_verbs(cell))
            X['tokens_minus_verbs'] = X['number_of_tokens'] - X['number_of_verbs']
    
            return X
    
    new_X = features_transformer('sentences').fit_transform(X)
    
    推荐文章