代码之家  ›  专栏  ›  技术社区  ›  Walter U.

“特征重要性”的“一个热编码”变量的显示名称

  •  1
  • Walter U.  · 技术社区  · 8 年前

    在完成算法的训练和验证后,如何正确显示“一个热编码”特征的名称?我想整齐地显示每个功能的名称及其重要性。以下是我尝试过的:

    显示功能重要性:

    grid_search.best_estimator_.feature_importances_
    array([  7.67359589e-02,   7.20731884e-02,   4.38667330e-02,
             1.69222269e-02,   1.51816327e-02,   1.66947835e-02,
             1.56858183e-02,   3.43347923e-01,   5.95555727e-02,
             7.65422356e-02,   1.11224727e-01,   1.02677088e-02,
             1.32720377e-01,   1.06447326e-04,   4.45207929e-03,
             4.62258699e-03])
    

    获取一个热门类别名称:

    cat_one_hot_attribs = list(encoder.classes_)
    print(cat_one_hot_attribs)
    ['<1H OCEAN', 'INLAND', 'ISLAND', 'NEAR BAY', 'NEAR OCEAN']
    

    获取其余名称(其他类别):

    num_attribs = list(X_train)
    
    ['longitude',
     'latitude',
     'housing_median_age',
     'total_rooms',
     'total_bedrooms',
     'population',
     'households',
     'median_income',
     'rooms_per_household',
     'bedrooms_per_household',
     'population_per_household',
     0,
     1,
     2,
     3,
     4]
    

    现在我做以下几点:

    attributes = num_attribs + cat_one_hot_attribs
    
    print(pd.DataFrame(sorted(zip(feature_importance, attributes), reverse=True)))
    

    但我得到以下信息:

             0                         1
    0   0.343348             median_income
    1   0.132720                         1
    2   0.111225  population_per_household
    3   0.076736                 longitude
    4   0.076542    bedrooms_per_household
    5   0.072073                  latitude
    6   0.059556       rooms_per_household
    7   0.043867        housing_median_age
    8   0.016922               total_rooms
    9   0.016695                population
    10  0.015686                households
    11  0.015182            total_bedrooms
    12  0.010268                         0
    13  0.004623                         4
    14  0.004452                         3
    15  0.000106                         2
    

    有人能建议一种方法来正确显示这个吗?非常感谢。

    编辑:

    根据@cásááá的回答,我尝试了以下方法:

    feature_importance = grid_search.best_estimator_.feature_importances_
    
    cat_one_hot_attribs = list(encoder.classes_)
    
    num_attribs = list(X_train)
    attributes = num_attribs + cat_one_hot_attribs
    
    vals = sorted(zip(feature_importance, attributes), key=lambda x: x[0], reverse=True)
    df = pd.DataFrame(vals)
    print(df)
    

    1 回复  |  直到 7 年前
        1
  •  2
  •   cs95 abhishek58g    8 年前

    分解它。首先按键排序。确保只有 feature_importance 考虑了。

    设置:

    import pandas as pd
    import numpy as np
    
    feature_importance = np.array([  7.67359589e-02,   7.20731884e-02,   4.38667330e-02,
         1.69222269e-02,   1.51816327e-02,   1.66947835e-02,
         1.56858183e-02,   3.43347923e-01,   5.95555727e-02,
         7.65422356e-02,   1.11224727e-01,   1.02677088e-02,
         1.32720377e-01,   1.06447326e-04,   4.45207929e-03,
         4.62258699e-03])
    
    cat_one_hot_attribs = ['<1H OCEAN', 'INLAND', 'ISLAND', 'NEAR BAY', 'NEAR OCEAN']
    
    num_attribs = ['longitude',
     'latitude',
     'housing_median_age',
     'total_rooms',
     'total_bedrooms',
     'population',
     'households',
     'median_income',
     'rooms_per_household',
     'bedrooms_per_household',
     'population_per_household',
     0,
     1,
     2,
     3,
     4]
    
    attributes = num_attribs
    

    vals 功能重要性 .

    vals = sorted(zip(feature_importance, attributes), key=lambda x: x[0], reverse=True)
    df = pd.DataFrame(vals)
    

    然后,使用 .replace 用中的值替换编码 cat_one_hot_attribs

    df.iloc[:, -1] = df.iloc[:, -1].replace({i : k for i, k in enumerate(cat_one_hot_attribs)})
    df
    
               0                         1
    0   0.343348             median_income
    1   0.132720                    INLAND
    2   0.111225  population_per_household
    3   0.076736                 longitude
    4   0.076542    bedrooms_per_household
    5   0.072073                  latitude
    6   0.059556       rooms_per_household
    7   0.043867        housing_median_age
    8   0.016922               total_rooms
    9   0.016695                population
    10  0.015686                households
    11  0.015182            total_bedrooms
    12  0.010268                 <1H OCEAN
    13  0.004623                NEAR OCEAN
    14  0.004452                  NEAR BAY
    15  0.000106                    ISLAND