代码之家  ›  专栏  ›  技术社区  ›  erip Jigar Trivedi

如何创建熊猫中按索引分组的记录列表?

  •  1
  • erip Jigar Trivedi  · 技术社区  · 7 年前

    我有一个CSV记录:

    name,credits,email
    bob,,test1@foo.com
    bob,6.0,test@foo.com
    bill,3.0,something_else@a.com
    bill,4.0,something@a.com
    tammy,5.0,hello@gmail.org
    

    name

    {
      "bob": [
          { "credits": null, "email": "test1@foo.com"},
          { "credits": 6.0, "email": "test@foo.com" }
      ], 
      // ...
    }
    

    我目前的解决方案有点笨拙,因为它似乎只将pandas用作读取CSV的工具,但它生成了我预期的JSONish输出:

    #!/usr/bin/env python3
    
    import io
    import pandas as pd
    from pprint import pprint
    from collections import defaultdict
    
    def read_data():
        s = """name,credits,email
    bob,,test1@foo.com
    bob,6.0,test@foo.com
    bill,3.0,something_else@a.com
    bill,4.0,something@a.com
    tammy,5.0,hello@gmail.org
    """
    
        data = io.StringIO(s)
        return pd.read_csv(data)
    
    if __name__ == "__main__":
        df = read_data()
        columns = df.columns
        index_name = "name"
        print(df.head())
    
        records = defaultdict(list)
    
        name_index = list(columns.values).index(index_name)
        columns_without_index = [column for i, column in enumerate(columns) if i != name_index]
    
        for record in df.values:
            name = record[name_index]
            record_without_index = [field for i, field in enumerate(record) if i != name_index]
            remaining_record = {k: v for k, v in zip(columns_without_index, record_without_index)}
            records[name].append(remaining_record)
        pprint(dict(records))
    

    有没有办法在当地的熊猫(和numpy)身上做同样的事情?

    1 回复  |  直到 7 年前
        1
  •  4
  •   MaxU - stand with Ukraine    7 年前

    这就是你想要的吗?

    cols = df.columns.drop('name').tolist()
    

    或者按照@jezrael的建议:

    cols = df.columns.difference(['name']) 
    

    然后:

    s = df.groupby('name')[cols].apply(lambda x: x.to_dict('r')).to_json()
    

    让我们很好地打印出来:

    In [45]: print(json.dumps(json.loads(s), indent=2))
    {
      "bill": [
        {
          "credits": 3.0,
          "email": "something_else@a.com"
        },
        {
          "credits": 4.0,
          "email": "something@a.com"
        }
      ],
      "bob": [
        {
          "credits": null,
          "email": "test1@foo.com"
        },
        {
          "credits": 6.0,
          "email": "test@foo.com"
        }
      ],
      "tammy": [
        {
          "credits": 5.0,
          "email": "hello@gmail.org"
        }
      ]
    }