代码之家 › 专栏 › 技术社区 › James Black

将json文件转换为具有更多列的csv文件-Python

csv json python

-1

James Black · 技术社区 · 1 年前

我想转换下面的json文件:

    [
        {
            "userid": "5275800381",
            "status": "UserStatus.RECENTLY",
            "name": "Ah",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "5824657725",
            "status": "UserStatus.LAST_MONTH",
            "name": "A45",
            "bot": false,
            "username": "None"
        },
        {
            "userid": "5160075986",
            "status": "UserStatus.RECENTLY",
            "name": "CTLA",
            "bot": false,
            "username": "james888"
        }
    ]

转换为具有更多列且不重复的csv文件,如下所示:

    username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated

输出文件应为:

    username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
    ,5275800381,False,False,False,False,False,False,False,False,False,False
    ,5824657725,False,False,False,False,False,False,False,False,False,False
    james888,5160075986,False,False,False,False,False,False,False,False,False,False

我尝试了以下代码:

    import json

    with open('target_user2.json', 'r', encoding='utf-8') as fp:
        target = json.load(fp) #this file contains the json

    with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
        nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
        for item in target:
            if item['userid'] in [x['userid'] for x in target]:
                if item['username'] == "None":
                    item['username'] == ""
                    record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                    nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""

它不起作用,因为错误是由项['user-id'](带有空格的用户id不起作用)生成的,但项['userid']起作用。

我该怎么解决这个问题?

根据Barmar的建议,我更新了以下代码:

    import json

    with open('target_user2.json', 'r', encoding='utf-8') as fp:
        target = json.load(fp) #this file contains the json

    with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
        nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
        for item in target:
            if item['userid'] in [x['userid'] for x in target]:
                if item['username'] == "None":
                    item['username'] == ""
                record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""

然而,还有另一个问题即将出现,这并不是有意的。当target_user2.json中username为None时,我将其分配为blank:

    item['username'] == ""

它不起作用;它从targetuser2.json中写入具有相同值的所有用户名。

这是怎么回事?

3 回复 | 直到 1 年前

Mark Tolonen 1 年前

使用 csv 模块正确处理CSV格式,例如包含引号或分隔符(在这种情况下为逗号)的字段。 csv.DictWriter 也有助于管理列和默认值 False 缺少字段的值:

import csv
import json

with open('input.json') as f:
    data = json.load(f)

# newline='' is an open requirement for writing csv files per documentation.
with open('output.csv', 'w', newline='', encoding='utf8') as f:
    columns = 'username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated'.split(',')
    writer = csv.DictWriter(f, fieldnames=columns, restval=False)  # restval is the default value of fields
    writer.writeheader()
    ids = set()  # Track unique IDs
    for user in data:
        user_id = user['userid']
        if user_id not in ids:
            # Build a dict with fields to change.
            row = {'username': user['username'] if user['username'] != 'None' else '',
                   'user id': user_id,
                   #'name': user['name'],  # Always False in example, but if you want it...
                   'is_bot': user['bot']}
            writer.writerow(row)
            ids.add(user_id)

输出

username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated
,5275800381,False,False,False,False,False,False,False,False,False,False
,5824657725,False,False,False,False,False,False,False,False,False,False
james888,5160075986,False,False,False,False,False,False,False,False,False,False

Barmar 1 年前

使用 csv 模块编写CSV文件,而不是自己格式化。
使用 set 以检测重复的用户ID并跳过它们。
修复更换时的逻辑 None 带有空字符串的username。
访问JSON时,必须使用 userid 作为关键,而不是 user id .

import csv

userids = set()

with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
    nf_csv = csv.writer(nf)
    nf_csv.writerow(['username', 'user id', 'access hash', 'name', 'group', 'group id', 'is_bot', 'is_admin', 'dc_id', 'have_photo', 'phone', 'elaborated'])
    for item in target:
        if item['userid'] not in userids: # prevent duplicate userids
            userids.add(item['userid'])
            if item['username'] == "None":
                item['username'] = ""
            record = [item['username'], item['userid'], 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False']
            nf_csv.writerow(record)

-1

James Black 1 年前

在巴马的大力帮助下,我纠正了他提出的所有建议;而且效果非常好:

    import json

    with open('target_user2.json', 'r', encoding='utf-8') as fp:
        target = json.load(fp) #this file contains the json

    with open('members2.csv', 'w', encoding='utf-8') as nf:    # target_userid2.txt or target_userid2.json
        nf.write('username,user id,access hash,name,group,group id,is_bot,is_admin,dc_id,have_photo,phone,elaborated' + '\n')
        for item in target:
            if item['userid'] in [x['userid'] for x in target]:     # remove space between user and id to avoid error
                if item['username'] == "None":
                    item['username'] = ""
                record = item['username'] + ',' + item['userid'] + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False' + ',' + 'False'
                nf.write(json.dumps(record).replace('"', '') + '\n')       # write data without ""