代码之家  ›  专栏  ›  技术社区  ›  kits

将嵌套Json转换为多个Csv文件

  •  0
  • kits  · 技术社区  · 1 年前

    我有一个使用api的json pull,下面是文件的样子:

    [
        {
            "id": 181,
            "emp_number": "527",
            "clock_id": "124",
            "organization_trackings": [
                {
                    "title": "Division",
                    "value": "200",
                    "value_description": "Cons"
                },
                {
                    "title": "Location",
                    "value": "951",
                    "value_description": "Jasp"
                },
                {
                    "title": "Special",
                    "value": "20",
                    "value_description": "Remote"
                },
                {
                    "title": "Primary Department",
                    "value": "200",
                    "value_description": "DPT1"
                },
                {
                    "title": "Secondary Department",
                    "value": "2000",
                    "value_description": "DH"
                },
                {
                    "title": "Function",
                    "value": "0000",
                    "value_description": "Resource"
                }
            ]
        },
        {
            "id": 181837,
            "emp_number": "649",
            "timeclock_id": "528",
            "organization_trackings": [
                {
                    "title": "Division",
                    "value": "200",
                    "value_description": "Consultant"
                },
                {
                    "title": "Location",
                    "value": "001",
                    "value_description": "Atlanta"
                },
                {
                    "title": "Function",
                    "value": "0000",
                    "value_description": "Resource"
                }
            ]
        }
    ]
    

    我需要将其转换为2个csv文件

    first csv fileis:  
    id  emp_number  clock_id
    181 527 124
    181837  649 528
    

    第二个csv文件:

    emp_number,title,value,value_description
    527,Location,951,Jasp
    527,Special,20,Remote
    527,Primary Department,200,DPT1
    527,Secondary Department,2000,DH
    527,Function,0,Resource
    649,Division,200,Consultant
    649,Location,1,Atlanta
    649,Function,0,Resource
    

    我是python的新手,有人能告诉我如何在python中做到这一点吗。。?

    2 回复  |  直到 1 年前
        1
  •  0
  •   Andrej Kesely    1 年前

    我猜是关键 "timeclock_id": "528" 是一个拼写错误,应该是 "clock_id": "528"

    尝试:

    import json
    
    with open("your_data.json", "r") as f_in:
        data = json.load(f_in)
    
    df1 = pd.DataFrame(data)
    df2 = pd.DataFrame(
        [
            {"emp_number": d["emp_number"], **dd}
            for d in data
            for dd in d["organization_trackings"]
        ]
    )
    
    df1.pop("organization_trackings")
    
    print(df1)
    print()
    print(df2)
    
    # df1.to_csv('df1.csv', index=False)
    # df2.to_csv('df2.csv', index=False)
    

    打印:

           id emp_number clock_id
    0     181        527      124
    1  181837        649      528
    
      emp_number                 title value value_description
    0        527              Division   200              Cons
    1        527              Location   951              Jasp
    2        527               Special    20            Remote
    3        527    Primary Department   200              DPT1
    4        527  Secondary Department  2000                DH
    5        527              Function  0000          Resource
    6        649              Division   200        Consultant
    7        649              Location   001           Atlanta
    8        649              Function  0000          Resource
    
        2
  •  0
  •   richard    1 年前

    这个检查 clock_id 如果失败,检查 timeclock_id
    工作样品: https://onlinegdb.com/WaR32RLl2

    from io import StringIO
    import csv, json
    
    json_file = """\
    [ { "id": 181, "emp_number": "527", "clock_id": "124", "organization_trackings": [ { "title": "Division", "value": "200", "value_description": "Cons" }, { "title": "Location", "value": "951", "value_description": "Jasp" }, { "title": "Special", "value": "20", "value_description": "Remote" }, { "title": "Primary Department", "value": "200", "value_description": "DPT1" }, { "title": "Secondary Department", "value": "2000", "value_description": "DH" }, { "title": "Function", "value": "0000", "value_description": "Resource" } ] }, { "id": 181837, "emp_number": "649", "timeclock_id": "528", "organization_trackings": [ { "title": "Division", "value": "200", "value_description": "Consultant" }, { "title": "Location", "value": "001", "value_description": "Atlanta" }, { "title": "Function", "value": "0000", "value_description": "Resource" } ] } ]
    """
    
    with (
        StringIO(json_file) as file,  # replace with open(FILE_PATH, 'r')
        open('csv_file1.csv', 'w', newline='') as f1,
        open('csv_file2.csv', 'w', newline='') as f2):
        data = json.load(file)
        w1, w2 = csv.writer(f1), csv.writer(f2)
        w1.writerow(['id', 'emp_number', 'clock_id'])
        w2.writerow(['emp_number', 'title', 'value', 'value_description'])
        for i in data:
            _id, emp_number, clock_id = i.get('id'), i.get('emp_number'), i.get('clock_id', i.get('timeclock_id'))
            orgs = [[emp_number] + list(j.values()) for j in i['organization_trackings']]
            w1.writerow([_id, emp_number, clock_id])
            w2.writerows(orgs)