代码之家  ›  专栏  ›  技术社区  ›  corydoras

使用目标c/cocoa取消对Unicode字符的scape,即\u1234

  •  34
  • corydoras  · 技术社区  · 15 年前

    我从中获取数据的一些站点正在返回utf-8字符串,并转义了utf-8字符,即: \u5404\u500b\u90fd

    是否有一个内置的cocoa函数可以帮助实现这一点,或者我必须编写自己的解码算法。

    4 回复  |  直到 10 年前
        1
  •  23
  •   Community CDub    8 年前

    没有内置的函数来执行C unescaping。

    你可以骗一点 NSPropertyListSerialization 由于“旧文本样式”plist支持通过 \Uxxxx :

    NSString* input = @"ab\"cA\"BC\\u2345\\u0123";
    
    // will cause trouble if you have "abc\\\\uvw"
    NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"];
    NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];
    NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""];
    NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];
    NSString* unesc = [NSPropertyListSerialization propertyListFromData:data
                       mutabilityOption:NSPropertyListImmutable format:NULL
                       errorDescription:NULL];
    assert([unesc isKindOfClass:[NSString class]]);
    NSLog(@"Output = %@", unesc);
    

    但请注意,这不是很有效。最好是编写自己的解析器。(顺便问一下,您正在解码JSON字符串吗?如果是,你可以使用 the existing JSON parsers )

        2
  •  90
  •   Community CDub    8 年前

    这是正确的 可可 不提供解决方案 然而 核心基础 做: CFStringTransform .

    字符串转换 生活在一个尘土飞扬,偏远的角落里的苹果操作系统(和iOS),所以它是一个鲜为人知的宝石。它是苹果的前端 ICU compatible 字符串转换引擎。它可以在希腊语和拉丁语之间执行真正的魔法般的音译(或者任何已知的脚本),但它也可以用于执行一些平凡的任务,比如从一个蹩脚的服务器上解开字符串:

    NSString *input = @"\\u5404\\u500b\\u90fd";
    NSString *convertedString = [input mutableCopy];
    
    CFStringRef transform = CFSTR("Any-Hex/Java");
    CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES);
    
    NSLog(@"convertedString: %@", convertedString);
    
    // prints: 各個都, tada!
    

    如我所说, 字符串转换 非常强大。它支持许多预定义的转换,如大小写映射、规范化或Unicode字符名转换。甚至可以设计自己的转换。

    我不知道为什么苹果不能从可可中得到它。

    编辑2015:

    OS X 10.11和IOS 9将以下方法添加到基金会:

    - (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;
    

    所以上面的例子变成…

    NSString *input = @"\\u5404\\u500b\\u90fd";
    NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java"
                                                         reverse:YES];
    
    NSLog(@"convertedString: %@", convertedString);
    

    谢谢 @nschmidt 为了抬头。

        3
  •  11
  •   Christoph    13 年前

    这就是我最后写的。希望这能帮助一些人。

    + (NSString*) unescapeUnicodeString:(NSString*)string
    {
    // unescape quotes and backwards slash
    NSString* unescapedString = [string stringByReplacingOccurrencesOfString:@"\\\"" withString:@"\""];
    unescapedString = [unescapedString stringByReplacingOccurrencesOfString:@"\\\\" withString:@"\\"];
    
    // tokenize based on unicode escape char
    NSMutableString* tokenizedString = [NSMutableString string];
    NSScanner* scanner = [NSScanner scannerWithString:unescapedString];
    while ([scanner isAtEnd] == NO)
    {
        // read up to the first unicode marker
        // if a string has been scanned, it's a token
        // and should be appended to the tokenized string
        NSString* token = @"";
        [scanner scanUpToString:@"\\u" intoString:&token];
        if (token != nil && token.length > 0)
        {
            [tokenizedString appendString:token];
            continue;
        }
    
        // skip two characters to get past the marker
        // check if the range of unicode characters is
        // beyond the end of the string (could be malformed)
        // and if it is, move the scanner to the end
        // and skip this token
        NSUInteger location = [scanner scanLocation];
        NSInteger extra = scanner.string.length - location - 4 - 2;
        if (extra < 0)
        {
            NSRange range = {location, -extra};
            [tokenizedString appendString:[scanner.string substringWithRange:range]];
            [scanner setScanLocation:location - extra];
            continue;
        }
    
        // move the location pas the unicode marker
        // then read in the next 4 characters
        location += 2;
        NSRange range = {location, 4};
        token = [scanner.string substringWithRange:range];
        unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16);
        [tokenizedString appendString:[NSString stringWithFormat:@"%C", codeValue]];
    
        // move the scanner past the 4 characters
        // then keep scanning
        location += 4;
        [scanner setScanLocation:location];
    }
    
    // done
    return tokenizedString;
    }
    
    + (NSString*) escapeUnicodeString:(NSString*)string
    {
    // lastly escaped quotes and back slash
    // note that the backslash has to be escaped before the quote
    // otherwise it will end up with an extra backslash
    NSString* escapedString = [string stringByReplacingOccurrencesOfString:@"\\" withString:@"\\\\"];
    escapedString = [escapedString stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];
    
    // convert to encoded unicode
    // do this by getting the data for the string
    // in UTF16 little endian (for network byte order)
    NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES];
    size_t bytesRead = 0;
    const char* bytes = data.bytes;
    NSMutableString* encodedString = [NSMutableString string];
    
    // loop through the byte array
    // read two bytes at a time, if the bytes
    // are above a certain value they are unicode
    // otherwise the bytes are ASCII characters
    // the %C format will write the character value of bytes
    while (bytesRead < data.length)
    {
        uint16_t code = *((uint16_t*) &bytes[bytesRead]);
        if (code > 0x007E)
        {
            [encodedString appendFormat:@"\\u%04X", code];
        }
        else
        {
            [encodedString appendFormat:@"%C", code];
        }
        bytesRead += sizeof(uint16_t);
    }
    
    // done
    return encodedString;
    }
    
        4
  •  2
  •   Community CDub    8 年前

    简单代码:

    const char *cString = [unicodeStr cStringUsingEncoding:NSUTF8StringEncoding];
    NSString *resultStr = [NSString stringWithCString:cString encoding:NSNonLossyASCIIStringEncoding];
    

    来自: https://stackoverflow.com/a/7861345