什么时候
email.message_from_bytes()
在头中输入了unicode/表情符号,结果输出会导致意外的TypeErrors。是否可以在将输入传递给之前对其进行处理(编码、解码等)
message_from_bytes()
以防止这些类型错误?
总体目标是
gyb.py
从gyb生成的.eml文件中成功清理+还原备份,其中一些文件的电子邮件头中包含unicode/表情符号。此外,应该保留unicode/表情符号,而不损坏它们(就像在示例输出中一样)
最少复制:
import email
f = open('./sample.eml', 'rb')
bytes = f.read()
message = email.message_from_bytes(bytes)
# No unicode/emoji: works as expected:
print(message['to'])
print(len(message['to']))
# With unicode/emoji: unexpected TypeError:
print(message['from'])
print(len(message['from']))
样品.eml
To: recipient <[email protected]>
From:ð¥senderð¥ <[email protected]>
输出:
$ python check-message.py
recipient <[email protected]>
26
����sender���� <[email protected]>
Traceback (most recent call last):
File "V:\gyb\jkm\check-message.py", line 10, in <module>
print(len(message['from']))
^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'Header' has no len()
与更大的gyb恢复问题相关的Github问题