代码之家  ›  专栏  ›  技术社区  ›  maudulus

python文件中的有效json加载

  •  0
  • maudulus  · 技术社区  · 7 年前

    我的json遇到问题:

    第一个问题是 SyntaxError: Non-ASCII character '\xe2' in file 所以我补充说 # -*- coding: utf-8 -*- 在我文件的顶部。

    然后问题变成了我加载json的问题 x = json.loads(x) 以下内容: ValueError: Expecting , delimiter: line 3 column 52 (char 57) 是的。我引用了 this stackoverflow solution 所以加了一个 r 在我的json面前:

    x = r"""[
      { my validated json... }
    ]"""
    

    但后来我犯了个错误 TypeError: sequence item 3: expected string or Unicode, NoneType found -我认为 不知怎么扔下来的?

    json类似于以下内容:

    [
      {
        "brief": "Brief 1",
        "description": "Description 1",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
        ],
        "price": "145",
        "tags": [
          "tag1",
          "tag2",
          "tag3"
        ],
        "title": "Title 1"
      },
      {
        "brief": "Brief 2",
        "description": "Description 2",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
        ],
        "price": "150",
        "tags": [
          "tag4",
          "tag5",
          "tag6",
          "tag7",
          "tag8"
        ],
        "title": "Title 2"
      },{
        "brief": "blah blah 5'0\" to 5'4\"",
        "buyerPickup": true,
        "condition": "Good",
        "coverShipping": false,    
        "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
        "photos": [
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
          "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
        ],
        "price": "240",
        "tags": [
          "tag2",
          "5'0\"-5'4\""
        ],
        "title": "blah blah 17\" Frame",
        "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
      } 
    ]
    

    当前代码

    # -*- coding: utf-8 -*-
    
    import csv
    import json
    
    x = """[
          {
            "brief": "Brief 1",
            "description": "Description 1",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
            ],
            "price": "145",
            "tags": [
              "tag1",
              "tag2",
              "tag3"
            ],
            "title": "Title 1"
          },
          {
            "brief": "Brief 2",
            "description": "Description 2",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
            ],
            "price": "150",
            "tags": [
              "tag4",
              "tag5",
              "tag6",
              "tag7",
              "tag8"
            ],
            "title": "Title 2"
          },{
            "brief": "blah blah 5'0\" to 5'4\"",
            "buyerPickup": true,
            "condition": "Good",
            "coverShipping": false,    
            "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
            ],
            "price": "240",
            "tags": [
              "tag2",
              "5'0\"-5'4\""
            ],
            "title": "blah blah 17\" Frame",
            "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
          } 
        ]"""
    
    x = json.loads(x)
    
    f = csv.writer(open("example.csv", "wb+"))
    
    f.writerow(["Handle","Title","Body (HTML)", "Vendor","Type","Tags","Published","Option1 Name","Option1 Value","Variant Inventory Qty","Variant Inventory Policy","Variant Fulfillment Service","Variant Price","Variant Requires Shipping","Variant Taxable","Image Src"])
    
        for x in x:
    
            allTags = "\"" + ','.join(x["tags"]) + "\""
            images = x["photos"]
            f.writerow([x["title"],
                        x["title"],
                        x["description"],
                        "Vendor Name",
                        "Widget",
                        allTags,
                        "TRUE",
                        "Title",
                        "Default Title",
                        "1",
                        "deny",
                        "manual",
                        x["price"],
                        "TRUE",
                        "TRUE",
                        images.pop(0) if images else None])
            while images:
                f.writerow([x["title"],None,None,None,None,None,None,None,None,None,None,None,None,None,None,images.pop(0)])
    

    错误消息: 我看到的完整回溯:回溯(最后一个最近的调用):

    Traceback (most recent call last): File "runnit2.py", line 976, in <module> allTags = "\"" + ','.join(x["tags"]) + "\"" TypeError: sequence item 3: expected string or Unicode, NoneType found

    更新: 我已经确定数据,特别是[x[“title”]、x[“title”]、x[“description”],有一些代码不喜欢的字符。 'ascii' codec can't encode character u'\u201d' in position 9: ordinal not in range(128) .我已经对x[“description”].encode('utf-8')等做了一个快速修复,但它几乎消除了该单元格中的所有内容。有没有更好的方法在冒犯角色后不删除所有内容?

    4 回复  |  直到 7 年前
        1
  •  3
  •   Marlon Abeykoon    7 年前

    根据您发布的示例数据,我假设发布的json的第一个索引在值的第三个索引中有一个空值 tag 钥匙。即:标签7

    "tags": [
              "tag4",
              "tag5",
              "tag6",
              "tag7",
              "tag8"
            ],
    

    为了摆脱 TypeError 由于空值而增加,如果它们存在,可以简单地检查和替换它们,如下所示。

    x["tags"] = ["" if i is None else i for i in x["tags"]]
    allTags = "\"" + ','.join(x["tags"]) + "\""
    

    我指定了一个空字符串来替换空字符串。

    或者,可以使用 None filter() 功能。

    allTags = "\"" + ','.join(filter(None, x["tags"])) + "\""
    

    注意 添加 r"[...]" 并修复for循环中的缩进问题。

        2
  •  1
  •   igrinis    7 年前

    使用原始字符串并将文件编码设置为 utf-8 打开时处于正常(非二进制模式)模式。对于Python3.6来说就足够了。

    在Python2.7上应该使用 codecs.open('example.csv', 'w', encoding='utf-8') 而不是普通的 open() 处理Unicode内容时。也, csv Python2.7上的模块不支持现成的Unicode,因此我建议切换到 unicodecsv 或遵循 this answer .

        3
  •  0
  •   huifer    7 年前

    用w修改读写 如果必须使用WB,请使用以下函数。 您需要在所有文本前面添加r来处理特殊符号。

    import csv
    import json
    
    x = r"""[
          {
            "brief": "Brief 1",
            "description": "Description 1",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example2.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example3.jpg?0101010101010"
            ],
            "price": "145",
            "tags": [
              "tag1",
              "tag2",
              "tag3"
            ],
            "title": "Title 1"
          },
          {
            "brief": "Brief 2",
            "description": "Description 2",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example4.jpg?0101010101010",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-example5.jpg?0101010101010"
            ],
            "price": "150",
            "tags": [
              "tag4",
              "tag5",
              "tag6",
              "tag7",
              "tag8"
            ],
            "title": "Title 2"
          },{
            "brief": "blah blah 5'0\" to 5'4\"",
            "buyerPickup": true,
            "condition": "Good",
            "coverShipping": false,    
            "description": "blah blah 5'0\" to 5'4\". blah blah.Size L/20”\n 5’8-5’11\n29lbs\n3x7 speed\n\n  \r\n\r\n",
            "photos": [
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-010101.jpeg?11111",
              "https://cdn.shopify.com/s/files/1/01/01/01/files/imgs-020202?111111"
            ],
            "price": "240",
            "tags": [
              "tag2",
              "5'0\"-5'4\""
            ],
            "title": "blah blah 17\" Frame",
            "front": "https://firebasestorage.googleapis.com/v0/b/example.appspot.com/o/Images%2F0007891113.jpg?alt=media&token=111-11-11-11-111"    
          } 
        ]"""
    
    x = json.loads(x)
    
    
    def to_str(bytes_or_str):
        if isinstance(bytes_or_str, bytes):
            value = bytes_or_str.encode('utf-8')
        else:
            value = bytes_or_str
        return value
    
    
    def to_bytes(bytes_or_str):
        if isinstance(bytes_or_str, str):
            value = bytes_or_str.encode('utf-8')
        else:
            value = bytes_or_str
    
        return value
    
    
    f = csv.writer(open("example.csv", "w+"))
    writeList = ["Handle", "Title", "Body (HTML)", "Vendor", "Type", "Tags", "Published", "Option1 Name", "Option1 Value",
                 "Variant Inventory Qty", "Variant Inventory Policy", "Variant Fulfillment Service", "Variant Price",
                 "Variant Requires Shipping", "Variant Taxable", "Image Src"]
    newList = []
    for item in writeList:
        newList.append(to_bytes(item))
    
    f.writerow(newList)
    
    for x in x:
    
        allTags = r"\"" + ','.join(x["tags"]) + r"\""
        images = x["photos"]
        f.writerow([x["title"],
                    x["title"],
                    x["description"],
                    "Vendor Name",
                    "Widget",
                    allTags,
                    "TRUE",
                    "Title",
                    "Default Title",
                    "1",
                    "deny",
                    "manual",
                    x["price"],
                    "TRUE",
                    "TRUE",
                    images.pop(0) if images else None])
        while images:
            f.writerow([x["title"], None, None, None, None, None, None, None, None, None, None, None, None, None, None,
                        images.pop(0)])
    
        4
  •  0
  •   Souradeep Nanda    7 年前

    这个问题可能重复 how to convert characters like \x22 into string

    在清除代码时,错误归结为

    import json
    
    x = '''
      {
        "brief": "\""
      }'''
    
    x = json.loads(x)
    

    考虑更换 \" 具有 \u201d

    import json
    
    x = '{"brief": "\u201d"}'
    
    x = json.loads(x)