代码之家 › 专栏 › 技术社区 › Neil Trodden

如何在linux中重新添加unicode字节顺序标记?

unicode bash linux

Neil Trodden · 技术社区 · 16 年前

我有一个相当大的SQL文件,它以FFFE的字节顺序标记开始。我使用支持unicode的linux拆分工具将此文件拆分为100000行块。但当将这些信息传递回windows时,它确实如此不与第一个部件以外的任何部件一样,只有FFFE字节顺序标记处于启用状态。

如何使用echo(或任何其他bash命令)添加这两个字节的代码?

7 回复 | 直到 16 年前

Community CDub 8 年前

基于赛德的 solution of Anonymous , sed -i '1s/^/\xef\xbb\xbf/' foo foo . 有用的是,它还将ASCII文件转换为带有BOM表的UTF8

yingted Amit 8 年前

要将BOM表添加到以“foo-”开头的所有文件中,可以使用 sed . 塞德 具有进行备份的选项。

sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*

strace

sed -i '1s/^/\xff\xfe/' foo-*

确保您需要设置UTF-16,因为UTF-8是不同的。

andrewdotn 15 年前

对于设置正确字节顺序标记的通用解决方案,无论文件是UTF-8、UTF-16还是UTF-32I,我都将使用vims 'bomb' 选项:

$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.

( -e -s 表示不打印状态消息; -c (我的意思是这样做)

Martin Wang 12 年前

试试uconv

uconv --add-signature

Matthew Flaschen 16 年前

for i in $(ls *.sql)
do
  cp "$i" "$i.temp"
  printf '\xFF\xFE' > "$i"
  cat "$i.temp" >> "$i"
  rm "$i.temp"
done

Dennis Williamson 16 年前

Matthew Flaschen的答案很好,但也有一些缺陷。

最好让一切都取决于成功的拷贝,或者测试临时文件的存在,或者对拷贝进行操作。如果你是一个带吊带的人,你会做一个组合,就像我在下面说明的那样
这个 ls 这是不必要的。
我会使用比“I”-也许是“file”更好的变量名。

当然,你也可以非常 paranoid,并在开始时检查临时文件是否存在,这样您就不会意外地覆盖它和/或使用UUID或生成的文件名。mktemp、tempfile或uuidgen中的一个就可以做到这一点。

td=TMPDIR
export TMPDIR=

usertemp=~/temp            # set this to use a temp directory on the same filesystem
                           # you could use ./temp to ensure that it's one the same one
                           # you can use mktemp -d to create the dir instead of mkdir

if [[ ! -d $usertemp ]]    # if this user temp directory doesn't exist
then                       # then create it, unless you can't 
    mkdir $usertemp || export TMPDIR=$td    # if you can't create it and TMPDIR is/was
fi                                          # empty then mktemp automatically falls
                                            # back to /tmp

for file in *.sql
do
    # TMPDIR if set overrides the argument to -p
    temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }

    { printf '\xFF\xFE' > "$temp" &&
    cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }

    { rm "$file" && 
    mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td

陷阱可能比我添加的所有单独的错误处理程序都要好。

毫无疑问,对于一次性脚本来说,所有这些额外的注意都是多余的,但这些技术可以在关键时刻节省您的时间,特别是在多文件操作中。

sanmai 7 年前

$ printf '\xEF\xBB\xBF' > bom.txt

$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt