代码之家  ›  专栏  ›  技术社区  ›  John Topley

Java相当于JavaScript的encodeURIComponent,它产生相同的输出?

  •  112
  • John Topley  · 技术社区  · 17 年前

    我一直在尝试各种Java代码,试图想出一种方法,对包含引号、空格和“外来”Unicode字符的字符串进行编码,并产生与JavaScript相同的输出 encodeURIComponent 功能。

    我的酷刑测试字符串是: “A”B“

    如果我在Firebug中输入以下JavaScript语句:

    encodeURIComponent('"A" B ± "');
    

    "%22A%22%20B%20%C2%B1%20%22"
    

    import java.io.UnsupportedEncodingException;
    import java.net.URLEncoder;
    
    public class EncodingTest
    {
      public static void main(String[] args) throws UnsupportedEncodingException
      {
        String s = "\"A\" B ± \"";
        System.out.println("URLEncoder.encode returns "
          + URLEncoder.encode(s, "UTF-8"));
    
        System.out.println("getBytes returns "
          + new String(s.getBytes("UTF-8"), "ISO-8859-1"));
      }
    }
    

    URLEncoder.encode returns %22A%22+B+%C2%B1+%22
    getBytes returns "A" B ± "

    encodeURIComponent ?

    编辑: 我正在使用Java 1.4,很快就会转向Java 5。

    13 回复  |  直到 14 年前
        1
  •  126
  •   ripper234 Jonathan    14 年前

    这是我最后想出的类:

    import java.io.UnsupportedEncodingException;
    import java.net.URLDecoder;
    import java.net.URLEncoder;
    
    /**
     * Utility class for JavaScript compatible UTF-8 encoding and decoding.
     * 
     * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
     * @author John Topley 
     */
    public class EncodingUtil
    {
      /**
       * Decodes the passed UTF-8 String using an algorithm that's compatible with
       * JavaScript's <code>decodeURIComponent</code> function. Returns
       * <code>null</code> if the String is <code>null</code>.
       *
       * @param s The UTF-8 encoded String to be decoded
       * @return the decoded String
       */
      public static String decodeURIComponent(String s)
      {
        if (s == null)
        {
          return null;
        }
    
        String result = null;
    
        try
        {
          result = URLDecoder.decode(s, "UTF-8");
        }
    
        // This exception should never occur.
        catch (UnsupportedEncodingException e)
        {
          result = s;  
        }
    
        return result;
      }
    
      /**
       * Encodes the passed String as UTF-8 using an algorithm that's compatible
       * with JavaScript's <code>encodeURIComponent</code> function. Returns
       * <code>null</code> if the String is <code>null</code>.
       * 
       * @param s The String to be encoded
       * @return the encoded String
       */
      public static String encodeURIComponent(String s)
      {
        String result = null;
    
        try
        {
          result = URLEncoder.encode(s, "UTF-8")
                             .replaceAll("\\+", "%20")
                             .replaceAll("\\%21", "!")
                             .replaceAll("\\%27", "'")
                             .replaceAll("\\%28", "(")
                             .replaceAll("\\%29", ")")
                             .replaceAll("\\%7E", "~");
        }
    
        // This exception should never occur.
        catch (UnsupportedEncodingException e)
        {
          result = s;
        }
    
        return result;
      }  
    
      /**
       * Private constructor to prevent this class from being instantiated.
       */
      private EncodingUtil()
      {
        super();
      }
    }
    
        2
  •  69
  •   Tomalak    12 年前

    MDC on encodeURIComponent() :

    • [-a-zA-Z0-9._*~'()!]

    Java 1.5.0 documentation on URLEncoder :

    • 文字字符(正则表达式表示): [-a-zA-Z0-9._*]
    • " " 转换为加号 "+" .

    所以基本上,为了获得理想的结果,使用 URLEncoder.encode(s, "UTF-8") 然后进行一些后处理:

    • 替换所有出现的 "+" 随着 "%20"
    • 替换所有出现的 "%xx" [~'()!] 回到他们的字面对应部分
        3
  •  16
  •   Ravi Wallau    17 年前

    使用Java 6附带的javascript引擎:

    
    import javax.script.ScriptEngine;
    import javax.script.ScriptEngineManager;
    
    public class Wow
    {
        public static void main(String[] args) throws Exception
        {
            ScriptEngineManager factory = new ScriptEngineManager();
            ScriptEngine engine = factory.getEngineByName("JavaScript");
            engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
        }
    }
    
    

    输出:%22%22%20B%20%c2%b1%20%22

    情况不同,但更接近你想要的。

        4
  •  7
  •   Chris Nitchie    10 年前

    我用 java.net.URI#getRawPath() 例如。

    String s = "a+b c.html";
    String fixed = new URI(null, null, s, null).getRawPath();
    

    价值 fixed 将是 a+b%20c.html ,这就是你想要的。

    对输出进行后处理 URLEncoder.encode() 位于URI中。例如

    URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");
    

    a%20b%20c.html a b c.html .

        5
  •  5
  •   Joe Mill    14 年前

    import java.io.UnsupportedEncodingException;
    import java.util.BitSet;
    
    public final class EscapeUtils
    {
        /** used for the encodeURIComponent function */
        private static final BitSet dontNeedEncoding;
    
        static
        {
            dontNeedEncoding = new BitSet(256);
    
            // a-z
            for (int i = 97; i <= 122; ++i)
            {
                dontNeedEncoding.set(i);
            }
            // A-Z
            for (int i = 65; i <= 90; ++i)
            {
                dontNeedEncoding.set(i);
            }
            // 0-9
            for (int i = 48; i <= 57; ++i)
            {
                dontNeedEncoding.set(i);
            }
    
            // '()*
            for (int i = 39; i <= 42; ++i)
            {
                dontNeedEncoding.set(i);
            }
            dontNeedEncoding.set(33); // !
            dontNeedEncoding.set(45); // -
            dontNeedEncoding.set(46); // .
            dontNeedEncoding.set(95); // _
            dontNeedEncoding.set(126); // ~
        }
    
        /**
         * A Utility class should not be instantiated.
         */
        private EscapeUtils()
        {
    
        }
    
        /**
         * Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
         * 
         * @param input
         *            A component of a URI
         * @return the escaped URI component
         */
        public static String encodeURIComponent(String input)
        {
            if (input == null)
            {
                return input;
            }
    
            StringBuilder filtered = new StringBuilder(input.length());
            char c;
            for (int i = 0; i < input.length(); ++i)
            {
                c = input.charAt(i);
                if (dontNeedEncoding.get(c))
                {
                    filtered.append(c);
                }
                else
                {
                    final byte[] b = charToBytesUTF(c);
    
                    for (int j = 0; j < b.length; ++j)
                    {
                        filtered.append('%');
                        filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
                        filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
                    }
                }
            }
            return filtered.toString();
        }
    
        private static byte[] charToBytesUTF(char c)
        {
            try
            {
                return new String(new char[] { c }).getBytes("UTF-8");
            }
            catch (UnsupportedEncodingException e)
            {
                return new byte[] { (byte) c };
            }
        }
    }
    
        6
  •  3
  •   sangupta    15 年前
        7
  •  2
  •   balazs    8 年前

    这是我使用的:

    private static final String HEX = "0123456789ABCDEF";
    
    public static String encodeURIComponent(String str) {
        if (str == null) return null;
    
        byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
        StringBuilder builder = new StringBuilder(bytes.length);
    
        for (byte c : bytes) {
            if (c >= 'a' ? c <= 'z' || c == '~' :
                c >= 'A' ? c <= 'Z' || c == '_' :
                c >= '0' ? c <= '9' :  c == '-' || c == '.')
                builder.append((char)c);
            else
                builder.append('%')
                       .append(HEX.charAt(c >> 4 & 0xf))
                       .append(HEX.charAt(c & 0xf));
        }
    
        return builder.toString();
    }
    

    RFC 3986 .


    public static String decodeURIComponent(String str) {
        if (str == null) return null;
    
        int length = str.length();
        byte[] bytes = new byte[length / 3];
        StringBuilder builder = new StringBuilder(length);
    
        for (int i = 0; i < length; ) {
            char c = str.charAt(i);
            if (c != '%') {
                builder.append(c);
                i += 1;
            } else {
                int j = 0;
                do {
                    char h = str.charAt(i + 1);
                    char l = str.charAt(i + 2);
                    i += 3;
    
                    h -= '0';
                    if (h >= 10) {
                        h |= ' ';
                        h -= 'a' - '0';
                        if (h >= 6) throw new IllegalArgumentException();
                        h += 10;
                    }
    
                    l -= '0';
                    if (l >= 10) {
                        l |= ' ';
                        l -= 'a' - '0';
                        if (l >= 6) throw new IllegalArgumentException();
                        l += 10;
                    }
    
                    bytes[j++] = (byte)(h << 4 | l);
                    if (i >= length) break;
                    c = str.charAt(i);
                } while (c == '%');
                builder.append(new String(bytes, 0, j, UTF_8));
            }
        }
    
        return builder.toString();
    }
    
        8
  •  1
  •   Mike Bryant    11 年前

    这是Ravi Wallau解决方案的一个简单示例:

    public String buildSafeURL(String partialURL, String documentName)
            throws ScriptException {
        ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
        ScriptEngine scriptEngine = scriptEngineManager
                .getEngineByName("JavaScript");
    
        String urlSafeDocumentName = String.valueOf(scriptEngine
                .eval("encodeURIComponent('" + documentName + "')"));
        String safeURL = partialURL + urlSafeDocumentName;
    
        return safeURL;
    }
    
    public static void main(String[] args) {
        EncodeURIComponentDemo demo = new EncodeURIComponentDemo();
        String partialURL = "https://www.website.com/document/";
        String documentName = "Tom & Jerry Manuscript.pdf";
    
        try {
            System.out.println(demo.buildSafeURL(partialURL, documentName));
        } catch (ScriptException se) {
            se.printStackTrace();
        }
    }
    

    https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf

    它还回答了Loren Shqipognja评论中关于如何将String变量传递给 encodeURIComponent() .方法 scriptEngine.eval() 返回a Object ,因此它可以通过以下方式转换为String String.valueOf() 除其他方法外。

        10
  •  1
  •   Community Mohan Dere    4 年前

    public static String uriEncode(String string) {
        String result = string;
        if (null != string) {
            try {
                String scheme = null;
                String ssp = string;
                int es = string.indexOf(':');
                if (es > 0) {
                    scheme = string.substring(0, es);
                    ssp = string.substring(es + 1);
                }
                result = (new URI(scheme, ssp, null)).toString();
            } catch (URISyntaxException usex) {
                // ignore and use string that has syntax error
            }
        }
        return result;
    }
    
        11
  •  1
  •   AlexN    6 年前

    对我来说,这奏效了:

    import org.apache.http.client.utils.URIBuilder;
    
    String encodedString = new URIBuilder()
      .setParameter("i", stringToEncode)
      .build()
      .getRawQuery() // output: i=encodedString
      .substring(2);
    

    或使用不同的UriBuilder

    import javax.ws.rs.core.UriBuilder;
    
    String encodedString = UriBuilder.fromPath("")
      .queryParam("i", stringToEncode)
      .toString()   // output: ?i=encodedString
      .substring(3);
    

    在我看来,使用标准库比手动后处理更好。@Chris的回答看起来不错,但不适用于URL,比如“ http://a+b

        12
  •  0
  •   honzajde    14 年前

    Escaper percentEscaper = new PercentEscaper("-_.*", false);

    false表示PercentEscaper使用“%20”而不是“+”来逃离空格

        13
  •  0
  •   aristotll    8 年前

    String encodedUrl = new URI(null, url, null).toASCIIString(); url 我用 UriComponentsBuilder