代码之家 › 专栏 › 技术社区 › Fredrik Pihl

python utf-8,如何对齐打印输出

utf-8 unicode python

Fredrik Pihl · 技术社区 · 15 年前

我有一个包含日文字符和“正常”字符的数组。如何对齐这些打印输出?

#!/usr/bin/python
# coding=utf-8

a1=['ãã', 'ãã¾ã', 'trazan', 'ãã', 'ãã¾ãã']
a2=['dipsy', 'laa-laa', 'banarne', 'po', 'tinky winky']

for i,j in zip(a1,a2):
    print i.ljust(12),':',j

print '-'*8

for i,j in zip(a1,a2):
    print i,len(i)
    print j,len(j)

输出:

ãã       : dipsy
ãã¾ã    : laa-laa
trazan       : banarne
ãã       : po
ãã¾ãã : tinky winky
--------
ãã 6
dipsy 5
ãã¾ã 9
laa-laa 7
trazan 6
banarne 7
ãã 6
po 2
ãã¾ãã 12
tinky winky 11

谢谢, Fredrik

3 回复 | 直到 7 年前

Josh Lee ZZ Coder 15 年前

使用 unicodedata.east_asian_width 函数,在计算字符串长度时跟踪哪些字符窄和宽。

#!/usr/bin/python
# coding=utf-8

import sys
import codecs
import unicodedata

out = codecs.getwriter('utf-8')(sys.stdout)

def width(string):
    return sum(1+(unicodedata.east_asian_width(c) in "WF")
        for c in string)

a1=[u'ãã', u'ãã¾ã', u'trazan', u'ãã', u'ãã¾ãã']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']

for i,j in zip(a1,a2):
    out.write('%s %s: %s\n' % (i, ' '*(12-width(i)), j))

输出:

ãã          : dipsy
ãã¾ã        : laa-laa
trazan        : banarne
ãã          : po
ãã¾ãã      : tinky winky

在某些Web浏览器字体中,它看起来不太合适,但在终端窗口中,它们排列正确。

jcdyer Anand S Kumar 15 年前

使用Unicode对象而不是字节字符串:

#!/usr/bin/python
# coding=utf-8

a1=[u'ãã', u'ãã¾ã', u'trazan', u'ãã', u'ãã¾ãã']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']

for i,j in zip(a1,a2):
    print i.ljust(12),':',j

print '-'*8

for i,j in zip(a1,a2):
    print i,len(i)
    print j,len(j)

Unicode对象直接处理字符。

Clemens Schwaighofer 7 年前

您需要手动构建字符串,还需要手动构建格式长度。这样做不容易

下面的三个函数执行此操作(需要unicodedata):

shortenStringCJK:正确缩短到适合某些输出的长度(不是用于获取x字符的长度切割)

def shortenStringCJK(string, width, placeholder='..'):
# get the length with double byte charactes
string_len_cjk = stringLenCJK(str(string))
# if double byte width is too big
if string_len_cjk > width:
    # set current length and output string
    cur_len = 0
    out_string = ''
    # loop through each character
    for char in str(string):
        # set the current length if we add the character
        cur_len += 2 if unicodedata.east_asian_width(char) in "WF" else 1
        # if the new length is smaller than the output length to shorten too add the char
        if cur_len <= (width - len(placeholder)):
            out_string += char
    # return string with new width and placeholder
    return "{}{}".format(out_string, placeholder)
else:
    return str(string)

stringlenjk:获取正确的长度(如在终端上占用的空间)

def stringLenCJK(string):
    # return string len including double count for double width characters
    return sum(1 + (unicodedata.east_asian_width(c) in "WF") for c in string)

格式化长度:格式化长度,以便根据双字节字符的宽度进行调整。如果没有这个,长度将不平衡。

def formatLen(string, length):
    # returns length udpated for string with double byte characters
    # get string length normal, get string length including double byte characters
    # then subtract that from the original length
    return length - (stringLenCJK(string) - len(string))

然后输出一些字符串:预先定义格式字符串

format_str = "|{{:<{len}}}|"
format_len = 26
string_len = 26

输出如下(其中_string是要输出的字符串)

print("Normal : {}".format(
    format_str.format(
        len=formatLen(shortenStringCJK(_string, width=string_len), format_len))
    ).format(
        shortenStringCJK(_string, width=string_len)
    )
)