代码之家 › 专栏 › 技术社区 › Craig McQueen Dr. Watson

Python 3中的流/字符串/字节数组转换

python-3.x encoding

Craig McQueen Dr. Watson · 技术社区 · 16 年前

Python3清理了Python对Unicode字符串的处理。我假设作为这项工作的一部分,根据 Python 3 documentation 与 Python 2 documentation .

例如,从概念上将ByTestStream转换为不同形式ByTestStream的编解码器已被删除:

base64_编解码器
bz2_编解码器
十六进制编解码器

腐烂13

我不关心rot_13,但我想知道实现行尾样式转换(Unix行尾与Windows行尾)的“最佳方式”是什么,这实际上应该是在编码到字节流之前完成的Unicode到Unicode转换,特别是在使用UTF-16时,如前所述 this other SO question .

2 回复 | 直到 9 年前

Craig McQueen Dr. Watson 10 年前

base64 base64 模块
bz2 现在可以使用 bz2
十六进制字符串编码/解码 hexlify 和 unhexlify 委员会的职能 binascii 模块(有点隐藏功能)

我想这意味着没有标准的框架来创建这样的字符串/字节数组转换模块,但是在Python3中它们是在个案的基础上完成的。

Python 3.2的更新

A comment on a blog post "Compressing text using Pythonâs unicode support"

引用评论:

因为它们是文本对文本或但是二进制到二进制的转换, 中的encode()/decode()方法 Python3.x不支持这种类型的其用法仅限于Python2.x 特征)。

编解码器本身回到了3.2版本, 但是你需要检查一下编解码器模块API,以便使用它们无法通过对象方法使用

查看 Python 3 docs for codecs â Binary Transforms

从 a blog post by Barry Warsaw :

您知道吗,Python2提供了一些编解码器来进行有趣的转换,例如Caeser旋转(即rot13)?因此,您可以执行以下操作:
>>> 'foo'.encode('rot-13')
'sbb'
>>> from codecs import getencoder
>>> encoder = getencoder('rot-13')
>>> rot13string = encoder(mystring)[0]
由于codecsapi,您必须从编码器的返回值中获取第0个元素。有点难看,但它在两个版本的Python中都能工作。

JAB 16 年前

open() ,及 \n 写入文件时将自动转换为该值。诚然,这只适用于作为文本而不是数据打开的文件。(您还可以指定将文本写入文件时使用的编码,这有时很有用。)

http://docs.python.org/3.1/library/functions.html#open

yourstring = yourstring.replace('\n', '\r\n') 用于从Linux样式转换为Windows样式,以及 yourstring = yourstring.replace('\r\n', '\n') \n 到 \r\n 在Windows系统上,如果启用了通用换行符模式(默认设置),则仍然如此。)

同样,如果您想要在各种Unicode映射之间进行转换(假设您使用的是字节序列,因为Python内部使用的字符串实际上没有设置为任何特定类型的Unicode),那么只需使用 bytes.decode() 或 bytearray.decode() 然后使用 str.encode() . 对于从UTF-8到UTF-16的转换:

newstring = yourbytes.decode('utf-8')
yourbytes = newstring.encode('utf-16')

采用这种方式时,换行符在两种Unicode格式之间的转换不应该有任何问题。

还有 str.translate() str.maketrans() ,但我不确定这些是否有用:

http://docs.python.org/3.1/library/stdtypes.html#str.translate
http://docs.python.org/3.1/library/stdtypes.html#str.maketrans

另一方面,rot_13可以实现为:

import string
rot_13 = str.maketrans({x: chr((ord(x) - ord('A') + 13) % 26 + ord('A') if x.isupper() else ((ord(x) - ord('a') + 13) % 26 + ord('a'))) for x in string.ascii_letters})

# Using hard-coded values:

rot_13 = str.maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', 'NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm')

无论如何,使用 S.translate(rot_13) 将导致正常字符串变为 rot_13 和字符串变为普通字符串。