代码之家 › 专栏 › 技术社区 › dexter

UTF7编码文本截断

utf-7 utf-8 string c#

1

dexter · 技术社区 · 15 年前

我对utf7编码类截断“+4”序列有问题。我很想知道为什么会这样。我试着用utf8编码从byte[]数组中获取字符串,这似乎很管用。 Utf8有类似的问题吗?本质上,我使用这个转换产生的输出来构造由rtf字符串构成的html。

以下是片段:

    UTF7Encoding utf = new UTF7Encoding(); 
    UTF8Encoding utf8 = new UTF8Encoding(); 

    string test = "blah blah 9+4"; 

    char[] chars = test.ToCharArray(); 
    byte[] charBytes = new byte[chars.Length]; 

    for (int i = 0; i < chars.Length; i++) 
    { 

        charBytes[i] = (byte)chars[i]; 

     }


    string resultString = utf8.GetString(charBytes); 
    string resultStringWrong = utf.GetString(charBytes); 

    Console.WriteLine(resultString);  //blah blah 9+4  
    Console.WriteLine(resultStringWrong);  //blah 9

2 回复 | 直到 15 年前

1

Steve Townsend 15 年前

像这样通过char数组转换成字节数组是行不通的。如果希望字符串特定于字符集 byte[] 执行以下操作:

UTF7Encoding utf = new UTF7Encoding();
UTF8Encoding utf8 = new UTF8Encoding();

string test = "blah blah 9+4";

byte[] utfBytes = utf.GetBytes(test);
byte[] utf8Bytes = utf8.GetBytes(test);

string utfString = utf.GetString(utfBytes);
string utf8String = utf8.GetString(utf8Bytes);

Console.WriteLine(utfString);  
Console.WriteLine(utf8String);

输出:

嘘嘘9+4

嘘嘘9+4

2

1

user180326user180326 15 年前

您的字符串未正确转换为utf7字节。你应该打电话 utf.GetBytes() 而不是将字符转换为字节。

我怀疑在utf7中,对应于“+”的ascii代码实际上是为编码国际unicode字符而保留的。