代码之家  ›  专栏  ›  技术社区  ›  Hariom Singh

熊猫从excel创建合并报表

  •  2
  • Hariom Singh  · 技术社区  · 7 年前

    我有一个excel文件,详细如下。我试着用panda在excel中只得到前5种语言和它们的总和

    files   language    blank   comment code
    61         Java     1031     533    3959
    10         Maven    73        66    1213
    12         JSON      0         0    800
    32         XML       16       74    421
    7          HTML      14       16    161
    1      Markdown     23        0      39
    1           CSS      0        0      1
    

    下面是我的代码

    import pandas as pd
    from openpyxl import load_workbook
    
    df = pd.read_csv("myfile_cloc.csv", nrows=20)
    #df =  df.iloc[1:]
    top_five = df.head(5)
    
    print(top_five)
    print(top_five['language'])
    print(top_five['code'].sum())
    
    d = {'Languages (CLOC) (Top 5 Only)': "", 'LOC (CLOC)Only Code': 0}
    newdf = pd.DataFrame(data=d)
    
    newdf['Languages (CLOC) (Top 5 Only)'] = str(top_five['language'])
    newdf['LOC (CLOC)Only Code'] = top_five['code'].sum()
    
    #Load excel to append the consolidated info
    writer = newdf.ExcelWriter("myfile_cloc.xlsx", engine='openpyxl')
    book = load_workbook('myfile_cloc.xlsx')
    
    writer.book = book
    newdf.to_excel(writer, sheet_name='top_five', index=False)
    writer.save()
    

    在这方面需要建议

    newdf['Languages (CLOC) (Top 5 Only)'] = str(top_five['language'])
    newdf['LOC (CLOC)Only Code'] = top_five['code'].sum()
    

    所以预期的产出可以是

    Languages (CLOC) (Top 5 Only)  LOC (CLOC)Only Code
    Java,Maven,JSON,XML,HTML          6554
    

    正在获取错误

    raise ValueError('If using all scalar values, you must pass'
    ValueError: If using all scalar values, you must pass an index
    
    2 回复  |  直到 7 年前
        1
  •  1
  •   Mohamed Thasin ah    7 年前

    试试这个,

    解决这个问题的方法之一 index 属性

    a=df.head()
    df=pd.DataFrame({"Languages (CLOC) (Top 5 Only)": ','.join(a['language'].unique()),"LOC (CLOC)Only Code":a['code'].sum()},index=range(1))
    

    另一种解决办法, 使用 from_records 并在数据帧中传递dict列表。

    df=pd.DataFrame.from_records([{"Languages (CLOC) (Top 5 Only)": ','.join(a['language'].unique()),"LOC (CLOC)Only Code":a['code'].sum()}])
    

    输出:

        Languages (CLOC) (Top 5 Only)    LOC (CLOC)Only Code
    0        Java,Maven,JSON,XML,HTML                   6554
    
        2
  •  0
  •   Yog    7 年前
     import pandas as pd
    sheet1 = pd.read_csv("/home/mycomputer/Desktop/practise/sorting_practise.csv")
    sheet1.head()
    sortby_blank=sheet1.sort_values('blank',ascending=False)
    sortby_blank['blank'].head(5).sum()
    values = sortby_blank['blank'].head(5).sum()
    

    /home/nptel/Desktop/practice/sorting_practice.csv--->文件目录

    空白--->要排序的列

    如果需要底部值,请使用.tail()。

    “values”变量将得到您正在寻找的答案