代码之家  ›  专栏  ›  技术社区  ›  armstrong3701

如何有效地处理熊猫数据框中缺失的数据并计算条件统计?

  •  2
  • armstrong3701  · 技术社区  · 10 月前

    我有一个pandas DataFrame df,它在多列中包含一些缺失的数据。DataFrame的结构如下:

    ID  Age  Gender  Score
    1   25    Male    85.0
    2   30  Female    90.0
    3   22     NaN    78.0
    4   27    Male     NaN
    5   21  Female    80.0
    

    我如何根据这些年龄段计算平均“分数” ['0-20', '21-25', '26-30', '31-40', '41+'] ?

    1 回复  |  直到 10 月前
        1
  •  3
  •   yashaswi k    10 月前
    import pandas as pd
    
    data = {'ID': [1, 2, 3, 4, 5],
        'Age': [25, 30, 22, 27, 21],
        'Gender': ['Male', 'Female', None, 'Male', 'Female'],
        'Score': [85.0, 90.0, 78.0, None, 80.0]
    }
    
    df = pd.DataFrame(data)
    age_bins = [0, 20, 25, 30, 40, float('inf')]
    age_labels = ['0-20', '21-25', '26-30', '31-40', '41+']
    
    df['AgeRange'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels)
    
    mean_score_by_age_range = df.groupby('AgeRange')['Score'].mean()
    
    print(mean_score_by_age_range)
    

    输出:

    AgeRange
    0-20      NaN
    21-25    81.0
    26-30    90.0
    31-40     NaN
    41+       NaN
    Name: Score, dtype: float64