代码之家  ›  专栏  ›  技术社区  ›  James Black

将json文件转换为具有更多列的csv文件-Python

  •  -1
  • James Black  · 技术社区  · 1 年前

    我想转换下面的json文件:

        [
            {
                "userid": "5275800381",
                "status": "UserStatus.RECENTLY",
                "name": "Ah",
                "bot": false,
                "username": "None"
            },
            {
                "userid": "5824657725",
                "status": "UserStatus.LAST_MONTH",
                "name": "A45",
                "bot": false,
                "username": "None"
            },
            {
                "userid": "5160075986",
                "status": "UserStatus.RECENTLY",
                "name": "CTLA",
                "bot": false,
                "username": "james888"
            }
        ]
    

    转换为具有更多列且不重复的csv文件,如下所示:

        username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
    

    输出文件应为:

        username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
        ,5275800381,False,False,False,False,False,False,False,False,False,False
        ,5824657725,False,False,False,False,False,False,False,False,False,False
        james888,5160075986,False,False,False,False,False,False,False,False,False,False
    

    我尝试了以下代码:

        import json
    
        with open('target_user2.json', 'r', encoding='utf-8') as fp:
            target = json.load(fp) #this file contains the json
    
        with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
            nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
            for item in target:
                if item['userid'] in [x['userid'] for x in target]:
                    if item['username'] == "None":
                        item['username'] == ""
                        record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                        nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""
    

    它不起作用,因为错误是由项['user-id'](带有空格的用户id不起作用)生成的,但项['userid']起作用。

    我该怎么解决这个问题?

    根据Barmar的建议,我更新了以下代码:

        import json
    
        with open('target_user2.json', 'r', encoding='utf-8') as fp:
            target = json.load(fp) #this file contains the json
    
        with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
            nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
            for item in target:
                if item['userid'] in [x['userid'] for x in target]:
                    if item['username'] == "None":
                        item['username'] == ""
                    record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                    nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""
    
                    
    

    然而,还有另一个问题即将出现,这并不是有意的。当target_user2.json中username为None时,我将其分配为blank:

        item['username'] == ""           
            
    

    它不起作用;它从targetuser2.json中写入具有相同值的所有用户名。

    这是怎么回事?

    3 回复  |  直到 1 年前
        1
  •  0
  •   Mark Tolonen    1 年前

    使用 csv 模块正确处理CSV格式,例如包含引号或分隔符(在这种情况下为逗号)的字段。 csv.DictWriter 也有助于管理列和默认值 False 缺少字段的值:

    import csv
    import json
    
    with open('input.json') as f:
        data = json.load(f)
    
    # newline='' is an open requirement for writing csv files per documentation.
    with open('output.csv', 'w', newline='', encoding='utf8') as f:
        columns = 'username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated'.split(',')
        writer = csv.DictWriter(f, fieldnames=columns, restval=False)  # restval is the default value of fields
        writer.writeheader()
        ids = set()  # Track unique IDs
        for user in data:
            user_id = user['userid']
            if user_id not in ids:
                # Build a dict with fields to change.
                row = {'username': user['username'] if user['username'] != 'None' else '',
                       'user id': user_id,
                       #'name': user['name'],  # Always False in example, but if you want it...
                       'is_bot': user['bot']}
                writer.writerow(row)
                ids.add(user_id)
    

    输出

    username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
    ,5275800381,False,False,False,False,False,False,False,False,False,False
    ,5824657725,False,False,False,False,False,False,False,False,False,False
    james888,5160075986,False,False,False,False,False,False,False,False,False,False
    
        2
  •  0
  •   Barmar    1 年前
    1. 使用 csv 模块编写CSV文件,而不是自己格式化。

    2. 使用 set 以检测重复的用户ID并跳过它们。

    3. 修复更换时的逻辑 None 带有空字符串的username。

    4. 访问JSON时,必须使用 userid 作为关键,而不是 user id .

    import csv
    
    userids = set()
    
    with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
        nf_csv = csv.writer(nf)
        nf_csv.writerow(['username', 'user id', 'access hash', 'name', 'group', 'group id', 'is_bot', 'is_admin', 'dc_id', 'have_photo', 'phone', 'elaborated'])
        for item in target:
            if item['userid'] not in userids: # prevent duplicate userids
                userids.add(item['userid'])
                if item['username'] == "None":
                    item['username'] = ""
                record = [item['username'], item['userid'], 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False']
                nf_csv.writerow(record)
    
        3
  •  -1
  •   James Black    1 年前

    在巴马的大力帮助下,我纠正了他提出的所有建议;而且效果非常好:

        import json
    
        with open('target_user2.json', 'r', encoding='utf-8') as fp:
            target = json.load(fp) #this file contains the json
    
        with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
            nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
            for item in target:
                if item['userid'] in [x['userid'] for x in target]:     # remove space between user and id to avoid error
                    if item['username'] == "None":
                        item['username'] = ""
                    record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                    nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""