代码之家  ›  专栏  ›  技术社区  ›  James Mishra

如何用python中的括号解析邮件头中的电子邮件?

  •  0
  • James Mishra  · 技术社区  · 6 年前

    我用蟒蛇有困难 email 模块来解析邮件,其中from头中有括号。这似乎只是使用时的问题 email.policy.default 与…相反 email.policy.compat32 .

    除了转换策略,是否有解决这个问题的方法?

    对于python 3.6.5,下面是一个最小的工作示例:

    import email
    import email.policy as email_policy
    
    raw_mime_msg=b"from: James Mishra \\(says hi\\) <james@example.com>"
    
    compat32_obj = email.message_from_bytes(
        raw_mime_msg, policy=email_policy.compat32)
    
    default_obj = email.message_from_bytes(
        raw_mime_msg, policy=email_policy.default)
    
    print(compat32_obj['from'])
    print(default_obj['from'])
    

    第一个print语句返回: James Mishra \(says hi\) <james@example.com> 第二个print语句返回:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1908, in get_address
        token, value = get_group(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1867, in get_group
        "display name but found '{}'".format(value))
    email.errors.HeaderParseError: expected ':' at end of group display name but found '\(says hi\) <james@example.com>'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1734, in get_mailbox
        token, value = get_name_addr(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1720, in get_name_addr
        token, value = get_angle_addr(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1646, in get_angle_addr
        "expected angle-addr but found '{}'".format(value))
    email.errors.HeaderParseError: expected angle-addr but found '\(says hi\) <james@example.com>'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "test_email.py", line 12, in <module>
        print(default_obj['from'])
      File "/usr/local/lib/python3.6/email/message.py", line 391, in __getitem__
        return self.get(name)
      File "/usr/local/lib/python3.6/email/message.py", line 471, in get
        return self.policy.header_fetch_parse(k, v)
      File "/usr/local/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
        return self.header_factory(name, value)
      File "/usr/local/lib/python3.6/email/headerregistry.py", line 589, in __call__
        return self[name](name, value)
      File "/usr/local/lib/python3.6/email/headerregistry.py", line 197, in __new__
        cls.parse(value, kwds)
      File "/usr/local/lib/python3.6/email/headerregistry.py", line 340, in parse
        kwds['parse_tree'] = address_list = cls.value_parser(value)
      File "/usr/local/lib/python3.6/email/headerregistry.py", line 331, in value_parser
        address_list, value = parser.get_address_list(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1931, in get_address_list
        token, value = get_address(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1911, in get_address
        token, value = get_mailbox(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1737, in get_mailbox
        token, value = get_addr_spec(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1583, in get_addr_spec
        token, value = get_local_part(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1413, in get_local_part
        obs_local_part, value = get_obs_local_part(str(local_part) + value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1454, in get_obs_local_part
        token, value = get_word(value)
      File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1340, in get_word
        if value[0]=='"':
    IndexError: string index out of range
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   user2357112    6 年前

    email.policy.default 旨在符合电子邮件RFC,而您的邮件不符合 RFC 5322 . 如果括号部分应该是注释,那么消息应该看起来像

    raw_mime_msg=b"from: James Mishra (says hi) <james@example.com>"
    

    服从。如果它不应该是注释,那么圆括号应该出现在带引号的字符串中。可能看起来像

    raw_mime_msg=b'from: "James Mishra (says hi)" <james@example.com>'
    

    由于您的消息不符合,因此使用期望符合性的策略是不合适的。如果要处理不符合的消息, email.policy.compat32 比…更好的选择 电子邮件.策略.默认 .