代码之家  ›  专栏  ›  技术社区  ›  Pedro Pereira da Silva

KeyError:“['Value''标志']不在索引中”

  •  1
  • Pedro Pereira da Silva  · 技术社区  · 7 年前

    我提取了以下代码 并使用了。要处理的csv文件,以提取2个变量:指标和主题。

    #这是变量:位置指示器主题测量频率时间值标志

    这是文件的链接。csv https://ufile.io/2wo1j

    import pandas as pd
    from sklearn.preprocessing import LabelEncoder
    from sklearn.cluster import KMeans
    from sklearn.cluster import AgglomerativeClustering
    from sklearn.metrics import silhouette_score , silhouette_samples
    from sklearn.metrics import adjusted_rand_score
    from sklearn.decomposition import PCA
    
    import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
    warnings.filterwarnings("ignore")
    import seaborn as sns
    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import numpy as np
    sns.set(style="white", color_codes=True)
    
    # Carrega dataset dos lirios
    health = pd.read_csv("saude.csv")
    # informação do Dataset
    print("Registos Iniciais:")
    print (health.head(2))
    print("Registos Finais:")
    print(health.tail(2))
    p = health.reindex(columns=['LOCATION', 'INDICATOR', 'SUBJECT', 'MEASURE', 'FREQUENT', 'TIME', 'Value', 'flag'])
    #print(iris.species.value_counts())
    
    p.dropna(axis=1, how='all', inplace=True)
    health_matrix = pd.DataFrame.as_matrix(p[['INDICATOR','SUBJECT']])
    for n_clusters in range(2,11):
        cluster_model = KMeans(n_clusters=n_clusters, random_state=10)
        cluster_labels = cluster_model.fit_predict(health_matrix)
        silhouette_avg = 
        silhouette_score(health_matrix,cluster_labels,metric='euclidean')
        adj_rand_score = adjusted_rand_score(health['LOCATION'],cluster_labels)
        print("Para o nr de clusters =", n_clusters,  "A Média da silhueta é:", 
        silhouette_avg)
        print ("Para o nr de clusters =", n_clusters, 
          "O rand_score ajustado é:", adj_rand_score)   
    

    然后它给出了以下错误:

    KeyError: "['INDICATOR','SUBJECT'] not in index"
    
    1 回复  |  直到 7 年前
        1
  •  0
  •   Has QUIT--Anony-Mousse    7 年前

    p[['INDICATOR','SUBJECT']] 不做你想做的事。它不选择两列,而是选择一列,名称为数组 ['INDICATOR','SUBJECT'] . 由于此列不存在,因此会出现错误。

    推荐文章