代码之家 › 专栏 › 技术社区 › shadowtalker

Bytes对象以“repr格式”存储为b'foo',而不是encode()-ing to string-如何修复?

character-encoding unicode python-3.x python

shadowtalker · 技术社区 · 7 年前

一些倒霉的同事将一些数据保存到如下文件中:

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(str(s))

当他们应该使用

s = b'The em dash: \xe2\x80\x94'
with open('foo.txt', 'w') as f:
    f.write(s.decode())

现在 foo.txt 看起来像

b'The em-dash: \xe2\x80\x94'

而不是

The em dash: â

我已将此文件作为字符串读取:

with open('foo.txt') as f:
    bad_foo = f.read()

现在我该如何转换 bad_foo 从错误保存的格式到正确保存的字符串?

3 回复 | 直到 7 年前

Paritosh Singh 7 年前

你可以试试 literal eval

from ast import literal_eval
test = r"b'The em-dash: \xe2\x80\x94'"
print(test)
res = literal_eval(test)
print(res.decode())

shadowtalker 7 年前

如果您相信输入不是恶意的 ,您可以使用 ast.literal_eval 在断了的绳子上。

import ast

# Create a sad broken string
s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object
s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal
s_fixed = s_bytes.decode()

否则,您将不得不手动解析并删除或替换有问题的repr'ed转义符。

-2

ozcanyarimdunya 7 年前

这段代码在我的计算机上运行正常。但如果你仍然有错误,这可能会帮助你

with open('foo.txt', 'r', encoding="utf-8") as f:
    print(f.read())

推荐文章

Harimbola Santatra · 如何获取JSON中包含unicode代码点的键的值?

1 年前

oymonk · 如何使Excel识别Unicode CSV上的数据列?

1 年前

paarandika · 如何使用PyMuPDF将unicode文本插入PDF?

1 年前

TLeo · 无法从导出的Instagram聊天记录中解析非ASCII字符[重复]

1 年前

Boltu · pandas从url返回的值是什么?

1 年前

trystine · 试图运行CausalNex错误UnicodeEncodeError:“charmap”编解码器无法对位置263607-263621中的字符进行编码:字符映射到<undefined>

2 年前

Danny Coleiro · 向文本字符串添加不可见字符

2 年前

bsteo · Python re.sub(),带有Unicode表达式词典[重复]

2 年前

é³ä¿ç William · Flask应用程序在新计算机上遇到Unicode编码错误[已关闭]

2 年前

user3443063 · 如何将unicode数字转换为std::wstring?

2 年前