代码之家  ›  专栏  ›  技术社区  ›  ziggystar

如何在Java中取消显示Java字符串文本?

  •  63
  • ziggystar  · 技术社区  · 15 年前

    我正在用Java处理一些Java源代码。我正在提取字符串文本并将它们提供给一个获取字符串的函数。问题是我需要将字符串的未转义版本传递给函数(即,这意味着转换 \n \\ 一个单身汉 \ 等)。

    如果有人想知道的话,我正在尝试在反编译的模糊Java文件中解模糊字符串文本。

    11 回复  |  直到 7 年前
        1
  •  104
  •   Michael    7 年前

    问题

    org.apache.commons.lang.StringEscapeUtils.unescapeJava() 这里给出的另一个答案实在是帮不上什么忙。

    • \0 为空。
    • .
    • 它不能应付政府承认的那种越狱行为 java.util.regex.Pattern.compile() 所有使用它的东西,包括 \a , \e ,尤其是 \cX .
    • 这看起来像UCS-2代码,而不是UTF-16代码:它们使用折旧后的 charAt 接口而不是 codePoint 接口,从而传播了一种错觉 char

    解决方案

    我编写了一个字符串unescaper,它解决了OPs问题,而没有Apache代码的所有恼人之处。

    /*
     *
     * unescape_perl_string()
     *
     *      Tom Christiansen <tchrist@perl.com>
     *      Sun Nov 28 12:55:24 MST 2010
     *
     * It's completely ridiculous that there's no standard
     * unescape_java_string function.  Since I have to do the
     * damn thing myself, I might as well make it halfway useful
     * by supporting things Java was too stupid to consider in
     * strings:
     * 
     *   => "?" items  are additions to Java string escapes
     *                 but normal in Java regexes
     *
     *   => "!" items  are also additions to Java regex escapes
     *   
     * Standard singletons: ?\a ?\e \f \n \r \t
     * 
     *      NB: \b is unsupported as backspace so it can pass-through
     *          to the regex translator untouched; I refuse to make anyone
     *          doublebackslash it as doublebackslashing is a Java idiocy
     *          I desperately wish would die out.  There are plenty of
     *          other ways to write it:
     *
     *              \cH, \12, \012, \x08 \x{8}, \u0008, \U00000008
     *
     * Octal escapes: \0 \0N \0NN \N \NN \NNN
     *    Can range up to !\777 not \377
     *    
     *      TODO: add !\o{NNNNN}
     *          last Unicode is 4177777
     *          maxint is 37777777777
     *
     * Control chars: ?\cX
     *      Means: ord(X) ^ ord('@')
     *
     * Old hex escapes: \xXX
     *      unbraced must be 2 xdigits
     *
     * Perl hex escapes: !\x{XXX} braced may be 1-8 xdigits
     *       NB: proper Unicode never needs more than 6, as highest
     *           valid codepoint is 0x10FFFF, not maxint 0xFFFFFFFF
     *
     * Lame Java escape: \[IDIOT JAVA PREPROCESSOR]uXXXX must be
     *                   exactly 4 xdigits;
     *
     *       I can't write XXXX in this comment where it belongs
     *       because the damned Java Preprocessor can't mind its
     *       own business.  Idiots!
     *
     * Lame Python escape: !\UXXXXXXXX must be exactly 8 xdigits
     * 
     * TODO: Perl translation escapes: \Q \U \L \E \[IDIOT JAVA PREPROCESSOR]u \l
     *       These are not so important to cover if you're passing the
     *       result to Pattern.compile(), since it handles them for you
     *       further downstream.  Hm, what about \[IDIOT JAVA PREPROCESSOR]u?
     *
     */
    
    public final static
    String unescape_perl_string(String oldstr) {
    
        /*
         * In contrast to fixing Java's broken regex charclasses,
         * this one need be no bigger, as unescaping shrinks the string
         * here, where in the other one, it grows it.
         */
    
        StringBuffer newstr = new StringBuffer(oldstr.length());
    
        boolean saw_backslash = false;
    
        for (int i = 0; i < oldstr.length(); i++) {
            int cp = oldstr.codePointAt(i);
            if (oldstr.codePointAt(i) > Character.MAX_VALUE) {
                i++; /****WE HATES UTF-16! WE HATES IT FOREVERSES!!!****/
            }
    
            if (!saw_backslash) {
                if (cp == '\\') {
                    saw_backslash = true;
                } else {
                    newstr.append(Character.toChars(cp));
                }
                continue; /* switch */
            }
    
            if (cp == '\\') {
                saw_backslash = false;
                newstr.append('\\');
                newstr.append('\\');
                continue; /* switch */
            }
    
            switch (cp) {
    
                case 'r':  newstr.append('\r');
                           break; /* switch */
    
                case 'n':  newstr.append('\n');
                           break; /* switch */
    
                case 'f':  newstr.append('\f');
                           break; /* switch */
    
                /* PASS a \b THROUGH!! */
                case 'b':  newstr.append("\\b");
                           break; /* switch */
    
                case 't':  newstr.append('\t');
                           break; /* switch */
    
                case 'a':  newstr.append('\007');
                           break; /* switch */
    
                case 'e':  newstr.append('\033');
                           break; /* switch */
    
                /*
                 * A "control" character is what you get when you xor its
                 * codepoint with '@'==64.  This only makes sense for ASCII,
                 * and may not yield a "control" character after all.
                 *
                 * Strange but true: "\c{" is ";", "\c}" is "=", etc.
                 */
                case 'c':   {
                    if (++i == oldstr.length()) { die("trailing \\c"); }
                    cp = oldstr.codePointAt(i);
                    /*
                     * don't need to grok surrogates, as next line blows them up
                     */
                    if (cp > 0x7f) { die("expected ASCII after \\c"); }
                    newstr.append(Character.toChars(cp ^ 64));
                    break; /* switch */
                }
    
                case '8':
                case '9': die("illegal octal digit");
                          /* NOTREACHED */
    
        /*
         * may be 0 to 2 octal digits following this one
         * so back up one for fallthrough to next case;
         * unread this digit and fall through to next case.
         */
                case '1':
                case '2':
                case '3':
                case '4':
                case '5':
                case '6':
                case '7': --i;
                          /* FALLTHROUGH */
    
                /*
                 * Can have 0, 1, or 2 octal digits following a 0
                 * this permits larger values than octal 377, up to
                 * octal 777.
                 */
                case '0': {
                    if (i+1 == oldstr.length()) {
                        /* found \0 at end of string */
                        newstr.append(Character.toChars(0));
                        break; /* switch */
                    }
                    i++;
                    int digits = 0;
                    int j;
                    for (j = 0; j <= 2; j++) {
                        if (i+j == oldstr.length()) {
                            break; /* for */
                        }
                        /* safe because will unread surrogate */
                        int ch = oldstr.charAt(i+j);
                        if (ch < '0' || ch > '7') {
                            break; /* for */
                        }
                        digits++;
                    }
                    if (digits == 0) {
                        --i;
                        newstr.append('\0');
                        break; /* switch */
                    }
                    int value = 0;
                    try {
                        value = Integer.parseInt(
                                    oldstr.substring(i, i+digits), 8);
                    } catch (NumberFormatException nfe) {
                        die("invalid octal value for \\0 escape");
                    }
                    newstr.append(Character.toChars(value));
                    i += digits-1;
                    break; /* switch */
                } /* end case '0' */
    
                case 'x':  {
                    if (i+2 > oldstr.length()) {
                        die("string too short for \\x escape");
                    }
                    i++;
                    boolean saw_brace = false;
                    if (oldstr.charAt(i) == '{') {
                            /* ^^^^^^ ok to ignore surrogates here */
                        i++;
                        saw_brace = true;
                    }
                    int j;
                    for (j = 0; j < 8; j++) {
    
                        if (!saw_brace && j == 2) {
                            break;  /* for */
                        }
    
                        /*
                         * ASCII test also catches surrogates
                         */
                        int ch = oldstr.charAt(i+j);
                        if (ch > 127) {
                            die("illegal non-ASCII hex digit in \\x escape");
                        }
    
                        if (saw_brace && ch == '}') { break; /* for */ }
    
                        if (! ( (ch >= '0' && ch <= '9')
                                    ||
                                (ch >= 'a' && ch <= 'f')
                                    ||
                                (ch >= 'A' && ch <= 'F')
                              )
                           )
                        {
                            die(String.format(
                                "illegal hex digit #%d '%c' in \\x", ch, ch));
                        }
    
                    }
                    if (j == 0) { die("empty braces in \\x{} escape"); }
                    int value = 0;
                    try {
                        value = Integer.parseInt(oldstr.substring(i, i+j), 16);
                    } catch (NumberFormatException nfe) {
                        die("invalid hex value for \\x escape");
                    }
                    newstr.append(Character.toChars(value));
                    if (saw_brace) { j++; }
                    i += j-1;
                    break; /* switch */
                }
    
                case 'u': {
                    if (i+4 > oldstr.length()) {
                        die("string too short for \\u escape");
                    }
                    i++;
                    int j;
                    for (j = 0; j < 4; j++) {
                        /* this also handles the surrogate issue */
                        if (oldstr.charAt(i+j) > 127) {
                            die("illegal non-ASCII hex digit in \\u escape");
                        }
                    }
                    int value = 0;
                    try {
                        value = Integer.parseInt( oldstr.substring(i, i+j), 16);
                    } catch (NumberFormatException nfe) {
                        die("invalid hex value for \\u escape");
                    }
                    newstr.append(Character.toChars(value));
                    i += j-1;
                    break; /* switch */
                }
    
                case 'U': {
                    if (i+8 > oldstr.length()) {
                        die("string too short for \\U escape");
                    }
                    i++;
                    int j;
                    for (j = 0; j < 8; j++) {
                        /* this also handles the surrogate issue */
                        if (oldstr.charAt(i+j) > 127) {
                            die("illegal non-ASCII hex digit in \\U escape");
                        }
                    }
                    int value = 0;
                    try {
                        value = Integer.parseInt(oldstr.substring(i, i+j), 16);
                    } catch (NumberFormatException nfe) {
                        die("invalid hex value for \\U escape");
                    }
                    newstr.append(Character.toChars(value));
                    i += j-1;
                    break; /* switch */
                }
    
                default:   newstr.append('\\');
                           newstr.append(Character.toChars(cp));
               /*
                * say(String.format(
                *       "DEFAULT unrecognized escape %c passed through",
                *       cp));
                */
                           break; /* switch */
    
            }
            saw_backslash = false;
        }
    
        /* weird to leave one at the end */
        if (saw_backslash) {
            newstr.append('\\');
        }
    
        return newstr.toString();
    }
    
    /*
     * Return a string "U+XX.XXX.XXXX" etc, where each XX set is the
     * xdigits of the logical Unicode code point. No bloody brain-damaged
     * UTF-16 surrogate crap, just true logical characters.
     */
     public final static
     String uniplus(String s) {
         if (s.length() == 0) {
             return "";
         }
         /* This is just the minimum; sb will grow as needed. */
         StringBuffer sb = new StringBuffer(2 + 3 * s.length());
         sb.append("U+");
         for (int i = 0; i < s.length(); i++) {
             sb.append(String.format("%X", s.codePointAt(i)));
             if (s.codePointAt(i) > Character.MAX_VALUE) {
                 i++; /****WE HATES UTF-16! WE HATES IT FOREVERSES!!!****/
             }
             if (i+1 < s.length()) {
                 sb.append(".");
             }
         }
         return sb.toString();
     }
    
    private static final
    void die(String foa) {
        throw new IllegalArgumentException(foa);
    }
    
    private static final
    void say(String what) {
        System.out.println(what);
    }
    

    如果它能帮助别人,欢迎你加入它,没有任何附加条件。如果你改进它,我很乐意你给我邮寄你的增强,但你当然不必。

        2
  •  49
  •   Community Mohan Dere    6 年前

    String unescapeJava(String) 方法 StringEscapeUtils Apache Commons Lang .

    下面是一个示例片段:

        String in = "a\\tb\\n\\\"c\\\"";
    
        System.out.println(in);
        // a\tb\n\"c\"
    
        String out = StringEscapeUtils.unescapeJava(in);
    
        System.out.println(out);
        // a    b
        // "c"
    

    java.io.Writer .


    看起来像 处理Unicode转义 u ,但不是八进制转义,或Unicode转义与无关 s。

        /* Unicode escape test #1: PASS */
        
        System.out.println(
            "\u0030"
        ); // 0
        System.out.println(
            StringEscapeUtils.unescapeJava("\\u0030")
        ); // 0
        System.out.println(
            "\u0030".equals(StringEscapeUtils.unescapeJava("\\u0030"))
        ); // true
        
        /* Octal escape test: FAIL */
        
        System.out.println(
            "\45"
        ); // %
        System.out.println(
            StringEscapeUtils.unescapeJava("\\45")
        ); // 45
        System.out.println(
            "\45".equals(StringEscapeUtils.unescapeJava("\\45"))
        ); // false
    
        /* Unicode escape test #2: FAIL */
        
        System.out.println(
            "\uu0030"
        ); // 0
        System.out.println(
            StringEscapeUtils.unescapeJava("\\uu0030")
        ); // throws NestableRuntimeException:
           //   Unable to parse unicode value: u003
    

    引自JLS:

    提供八进制转义是为了与C兼容,但只能表示Unicode值 \u0000 \u00FF ,因此通常首选Unicode转义。

    如果您的字符串可以包含八进制转义,那么您可能需要首先将它们转换为Unicode转义,或者使用另一种方法。

    无关的 u 也记录如下:

    Java编程语言指定了一种将Unicode编写的程序转换为ASCII的标准方法,这种方法可以将程序转换为可由基于ASCII的工具处理的形式。转换包括通过添加额外的 u -例如, \uxxxx \uuxxxx -同时将源文本中的非ASCII字符转换为Unicode转义,每个转义包含一个u。

    这种转换的版本同样可以被Java编程语言的编译器接受,并且表示完全相同的程序。确切的Unicode源代码稍后可以通过转换多个 u u u

    u ,则在使用之前可能还需要对此进行预处理

        3
  •  17
  •   Udo Klimaschewski    12 年前

    遇到类似的问题,对提出的解决方案也不满意,自己实现了这个。

    也可用作 Github

    /**
     * Unescapes a string that contains standard Java escape sequences.
     * <ul>
     * <li><strong>&#92;b &#92;f &#92;n &#92;r &#92;t &#92;" &#92;'</strong> :
     * BS, FF, NL, CR, TAB, double and single quote.</li>
     * <li><strong>&#92;X &#92;XX &#92;XXX</strong> : Octal character
     * specification (0 - 377, 0x00 - 0xFF).</li>
     * <li><strong>&#92;uXXXX</strong> : Hexadecimal based Unicode character.</li>
     * </ul>
     * 
     * @param st
     *            A string optionally containing standard java escape sequences.
     * @return The translated string.
     */
    public String unescapeJavaString(String st) {
    
        StringBuilder sb = new StringBuilder(st.length());
    
        for (int i = 0; i < st.length(); i++) {
            char ch = st.charAt(i);
            if (ch == '\\') {
                char nextChar = (i == st.length() - 1) ? '\\' : st
                        .charAt(i + 1);
                // Octal escape?
                if (nextChar >= '0' && nextChar <= '7') {
                    String code = "" + nextChar;
                    i++;
                    if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                            && st.charAt(i + 1) <= '7') {
                        code += st.charAt(i + 1);
                        i++;
                        if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                                && st.charAt(i + 1) <= '7') {
                            code += st.charAt(i + 1);
                            i++;
                        }
                    }
                    sb.append((char) Integer.parseInt(code, 8));
                    continue;
                }
                switch (nextChar) {
                case '\\':
                    ch = '\\';
                    break;
                case 'b':
                    ch = '\b';
                    break;
                case 'f':
                    ch = '\f';
                    break;
                case 'n':
                    ch = '\n';
                    break;
                case 'r':
                    ch = '\r';
                    break;
                case 't':
                    ch = '\t';
                    break;
                case '\"':
                    ch = '\"';
                    break;
                case '\'':
                    ch = '\'';
                    break;
                // Hex Unicode: u????
                case 'u':
                    if (i >= st.length() - 5) {
                        ch = 'u';
                        break;
                    }
                    int code = Integer.parseInt(
                            "" + st.charAt(i + 2) + st.charAt(i + 3)
                                    + st.charAt(i + 4) + st.charAt(i + 5), 16);
                    sb.append(Character.toChars(code));
                    i += 5;
                    continue;
                }
                i++;
            }
            sb.append(ch);
        }
        return sb.toString();
    }
    
        4
  •  10
  •   jstricker    11 年前

    http://commons.apache.org/lang/ :

    StringEscapeUtils

    StringEscapeUtils.unescapeJava(String str)

        5
  •  9
  •   DaoWen    13 年前

    我知道这个问题很老了,但是我想要一个不涉及JRE6以外的库的解决方案(即apachecommons是不可接受的),我使用内置的 java.io.StreamTokenizer :

    import java.io.*;
    
    // ...
    
    String literal = "\"Has \\\"\\\\\\\t\\\" & isn\\\'t \\\r\\\n on 1 line.\"";
    StreamTokenizer parser = new StreamTokenizer(new StringReader(literal));
    String result;
    try {
      parser.nextToken();
      if (parser.ttype == '"') {
        result = parser.sval;
      }
      else {
        result = "ERROR!";
      }
    }
    catch (IOException e) {
      result = e.toString();
    }
    System.out.println(result);
    

    Has "\  " & isn't
     on 1 line.
    
        6
  •  6
  •   Chad Retz    14 年前

    我有点晚了,但我想我应该提供我的解决方案,因为我需要相同的功能。我决定使用Java编译器API,这样速度会变慢,但结果会更准确。基本上我创建一个类然后返回结果。方法如下:

    public static String[] unescapeJavaStrings(String... escaped) {
        //class name
        final String className = "Temp" + System.currentTimeMillis();
        //build the source
        final StringBuilder source = new StringBuilder(100 + escaped.length * 20).
                append("public class ").append(className).append("{\n").
                append("\tpublic static String[] getStrings() {\n").
                append("\t\treturn new String[] {\n");
        for (String string : escaped) {
            source.append("\t\t\t\"");
            //we escape non-escaped quotes here to be safe 
            //  (but something like \\" will fail, oh well for now)
            for (int i = 0; i < string.length(); i++) {
                char chr = string.charAt(i);
                if (chr == '"' && i > 0 && string.charAt(i - 1) != '\\') {
                    source.append('\\');
                }
                source.append(chr);
            }
            source.append("\",\n");
        }
        source.append("\t\t};\n\t}\n}\n");
        //obtain compiler
        final JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
        //local stream for output
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        //local stream for error
        ByteArrayOutputStream err = new ByteArrayOutputStream();
        //source file
        JavaFileObject sourceFile = new SimpleJavaFileObject(
                URI.create("string:///" + className + Kind.SOURCE.extension), Kind.SOURCE) {
            @Override
            public CharSequence getCharContent(boolean ignoreEncodingErrors) throws IOException {
                return source;
            }
        };
        //target file
        final JavaFileObject targetFile = new SimpleJavaFileObject(
                URI.create("string:///" + className + Kind.CLASS.extension), Kind.CLASS) {
            @Override
            public OutputStream openOutputStream() throws IOException {
                return out;
            }
        };
        //file manager proxy, with most parts delegated to the standard one 
        JavaFileManager fileManagerProxy = (JavaFileManager) Proxy.newProxyInstance(
                StringUtils.class.getClassLoader(), new Class[] { JavaFileManager.class },
                new InvocationHandler() {
                    //standard file manager to delegate to
                    private final JavaFileManager standard = 
                        compiler.getStandardFileManager(null, null, null); 
                    @Override
                    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
                        if ("getJavaFileForOutput".equals(method.getName())) {
                            //return the target file when it's asking for output
                            return targetFile;
                        } else {
                            return method.invoke(standard, args);
                        }
                    }
                });
        //create the task
        CompilationTask task = compiler.getTask(new OutputStreamWriter(err), 
                fileManagerProxy, null, null, null, Collections.singleton(sourceFile));
        //call it
        if (!task.call()) {
            throw new RuntimeException("Compilation failed, output:\n" + 
                    new String(err.toByteArray()));
        }
        //get the result
        final byte[] bytes = out.toByteArray();
        //load class
        Class<?> clazz;
        try {
            //custom class loader for garbage collection
            clazz = new ClassLoader() { 
                protected Class<?> findClass(String name) throws ClassNotFoundException {
                    if (name.equals(className)) {
                        return defineClass(className, bytes, 0, bytes.length);
                    } else {
                        return super.findClass(name);
                    }
                }
            }.loadClass(className);
        } catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }
        //reflectively call method
        try {
            return (String[]) clazz.getDeclaredMethod("getStrings").invoke(null);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    

    它需要一个数组,以便您可以批量取消scape。因此,以下简单测试成功:

    public static void main(String[] meh) {
        if ("1\02\03\n".equals(unescapeJavaStrings("1\\02\\03\\n")[0])) {
            System.out.println("Success");
        } else {
            System.out.println("Failure");
        }
    }
    
        7
  •  5
  •   Tvaroh    8 年前

    作为记录,如果使用Scala,可以执行以下操作:

    StringContext.treatEscapes(escaped)
    
        8
  •  4
  •   Alex - GlassEditor.com    5 年前

    Java 13添加了一个方法: String#translateEscapes .

    它在Java13和Java14中是一个预览特性,但在Java15中被提升为一个完整特性。

        9
  •  3
  •   Nathan Ryan    13 年前

    import java.util.Arrays;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Decoder {
    
        // The encoded character of each character escape.
        // This array functions as the keys of a sorted map, from encoded characters to decoded characters.
        static final char[] ENCODED_ESCAPES = { '\"', '\'', '\\',  'b',  'f',  'n',  'r',  't' };
    
        // The decoded character of each character escape.
        // This array functions as the values of a sorted map, from encoded characters to decoded characters.
        static final char[] DECODED_ESCAPES = { '\"', '\'', '\\', '\b', '\f', '\n', '\r', '\t' };
    
        // A pattern that matches an escape.
        // What follows the escape indicator is captured by group 1=character 2=octal 3=Unicode.
        static final Pattern PATTERN = Pattern.compile("\\\\(?:(b|t|n|f|r|\\\"|\\\'|\\\\)|((?:[0-3]?[0-7])?[0-7])|u+(\\p{XDigit}{4}))");
    
        public static CharSequence decodeString(CharSequence encodedString) {
            Matcher matcher = PATTERN.matcher(encodedString);
            StringBuffer decodedString = new StringBuffer();
            // Find each escape of the encoded string in succession.
            while (matcher.find()) {
                char ch;
                if (matcher.start(1) >= 0) {
                    // Decode a character escape.
                    ch = DECODED_ESCAPES[Arrays.binarySearch(ENCODED_ESCAPES, matcher.group(1).charAt(0))];
                } else if (matcher.start(2) >= 0) {
                    // Decode an octal escape.
                    ch = (char)(Integer.parseInt(matcher.group(2), 8));
                } else /* if (matcher.start(3) >= 0) */ {
                    // Decode a Unicode escape.
                    ch = (char)(Integer.parseInt(matcher.group(3), 16));
                }
                // Replace the escape with the decoded character.
                matcher.appendReplacement(decodedString, Matcher.quoteReplacement(String.valueOf(ch)));
            }
            // Append the remainder of the encoded string to the decoded string.
            // The remainder is the longest suffix of the encoded string such that the suffix contains no escapes.
            matcher.appendTail(decodedString);
            return decodedString;
        }
    
        public static void main(String... args) {
            System.out.println(decodeString(args[0]));
        }
    }
    

    我应该注意到,apachecommons Lang3似乎没有遇到公认解决方案中指出的弱点。也就是说, StringEscapeUtils 似乎能处理八进制转义和多重转义 u Unicode转义字符。这意味着,除非您有一些强烈的理由来避免使用apachecommons,否则您可能应该使用它而不是我的解决方案(或这里的任何其他解决方案)。

        10
  •  3
  •   Jens Piegsa    8 年前

    org.apache.commons.lang3.StringEscapeUtils org.apache.commons.text.StringEscapeUtils#unescapeJava(String) 相反。它需要一个额外的 Maven dependency

            <dependency>
                <groupId>org.apache.commons</groupId>
                <artifactId>commons-text</artifactId>
                <version>1.4</version>
            </dependency>
    

    似乎可以处理一些更特殊的情况,例如unescapes:

    • 转义八进制和unicode值
    • \\b \\n , \\t \\f , \\r
        11
  •  0
  •   Ashwin Jayaprakash    15 年前

    如果要从文件中读取unicode转义字符,则很难做到这一点,因为字符串将与反斜杠的转义一起逐字读取:

    Blah blah...
    Column delimiter=;
    Word delimiter=\u0020 #This is just unicode for whitespace
    
    .. more stuff
    

    在这里,当您从文件中读取第3行时,字符串/行将具有:

    "Word delimiter=\u0020 #This is just unicode for whitespace"
    

    字符串中的char[]将显示:

    {...., '=', '\\', 'u', '0', '0', '2', '0', ' ', '#', 't', 'h', ...}
    

    Commons StringUnescape不会为您取消此功能(我尝试了unescapeXml())。你必须手动完成 described here

    因此,子字符串“\u0020”应该变成1个字符“\u0020”

    但是如果你用这个“\u0020”来做 String.split("... ..... ..", columnDelimiterReadFromFile) 它实际上是在内部使用regex,它将直接工作,因为从文件读取的字符串被转义,非常适合在regex模式中使用!!(困惑?)