동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

programing

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

nasanasas 2020. 9. 18. 08:16

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

따옴표, 공백 및 "이국적인"유니 코드 문자를 포함하는 문자열을 인코딩하고 JavaScript의 encodeURIComponent 함수와 동일한 출력을 생성하는 것을 시도하는 Java 코드의 다양한 비트를 실험 해 왔습니다 .

내 고문 테스트 문자열 : "A"B ± "

Firebug에 다음 JavaScript 문을 입력하면 :

encodeURIComponent('"A" B ± "');

-그러면 다음을 얻습니다.

"%22A%22%20B%20%C2%B1%20%22"

내 작은 테스트 Java 프로그램은 다음과 같습니다.

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

public class EncodingTest
{
  public static void main(String[] args) throws UnsupportedEncodingException
  {
    String s = "\"A\" B ± \"";
    System.out.println("URLEncoder.encode returns "
      + URLEncoder.encode(s, "UTF-8"));

    System.out.println("getBytes returns "
      + new String(s.getBytes("UTF-8"), "ISO-8859-1"));
  }
}

-이 프로그램은 다음을 출력합니다.

URLEncoder.encode는 % 22A % 22 + B + % C2 % B1 + % 22를 반환합니다.
getBytes는 "A"B ± "를 반환합니다.

닫히지 만 시가는 없습니다! JavaScript와 동일한 출력을 생성하도록 Java를 사용하여 UTF-8 문자열을 인코딩하는 가장 좋은 방법은 무엇입니까 encodeURIComponent?

편집 : Java 1.4로 곧 Java 5로 이동하고 있습니다.

구현 차이점을 살펴보면 다음과 같습니다.

의 MDCencodeURIComponent() :

리터럴 문자 (정규식 표현) : [-a-zA-Z0-9._*~'()!]

에 대한 Java 1.5.0 설명서URLEncoder :

리터럴 문자 (정규식 표현) : [-a-zA-Z0-9._*]
공백 문자 " "는 더하기 기호로 변환됩니다 "+".

따라서 기본적으로 원하는 결과를 얻으려면 사용 URLEncoder.encode(s, "UTF-8")하고 몇 가지 사후 처리를 수행하십시오.

모든 항목을 다음 "+"으로 대체"%20"
등을 "%xx"나타내는 모든 항목을 [~'()!]리터럴 대응 부분으로 바꿉니다.

이것은 결국 내가 생각해 낸 수업입니다.

import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;

/**
 * Utility class for JavaScript compatible UTF-8 encoding and decoding.
 * 
 * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
 * @author John Topley 
 */
public class EncodingUtil
{
  /**
   * Decodes the passed UTF-8 String using an algorithm that's compatible with
   * JavaScript's <code>decodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   *
   * @param s The UTF-8 encoded String to be decoded
   * @return the decoded String
   */
  public static String decodeURIComponent(String s)
  {
    if (s == null)
    {
      return null;
    }

    String result = null;

    try
    {
      result = URLDecoder.decode(s, "UTF-8");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;  
    }

    return result;
  }

  /**
   * Encodes the passed String as UTF-8 using an algorithm that's compatible
   * with JavaScript's <code>encodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   * 
   * @param s The String to be encoded
   * @return the encoded String
   */
  public static String encodeURIComponent(String s)
  {
    String result = null;

    try
    {
      result = URLEncoder.encode(s, "UTF-8")
                         .replaceAll("\\+", "%20")
                         .replaceAll("\\%21", "!")
                         .replaceAll("\\%27", "'")
                         .replaceAll("\\%28", "(")
                         .replaceAll("\\%29", ")")
                         .replaceAll("\\%7E", "~");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;
    }

    return result;
  }  

  /**
   * Private constructor to prevent this class from being instantiated.
   */
  private EncodingUtil()
  {
    super();
  }
}

Java 6과 함께 제공되는 JavaScript 엔진 사용 :


import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class Wow
{
    public static void main(String[] args) throws Exception
    {
        ScriptEngineManager factory = new ScriptEngineManager();
        ScriptEngine engine = factory.getEngineByName("JavaScript");
        engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
    }
}

출력 : % 22A % 22 % 20B % 20 % c2 % b1 % 20 % 22

케이스는 다르지만 원하는 것에 더 가깝습니다.

나는 사용한다 java.net.URI#getRawPath().

String s = "a+b c.html";
String fixed = new URI(null, null, s, null).getRawPath();

의 값이 fixed될 것입니다 a+b%20c.html당신이 원하는이다.

의 출력을 사후 처리 하면 URI에 있어야URLEncoder.encode() 하는 플러스가 제거됩니다 . 예를 들면

URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");

당신을 줄 것 a%20b%20c.html으로 해석 될 것이다, a b c.html.

게시 된 솔루션에는 한 가지 문제가 있기 때문에 인코딩해야하는 문자열에 +가 있으면 공백으로 변환됩니다.

그래서 여기 내 수업이 있습니다.

import java.io.UnsupportedEncodingException;
import java.util.BitSet;

public final class EscapeUtils
{
    /** used for the encodeURIComponent function */
    private static final BitSet dontNeedEncoding;

    static
    {
        dontNeedEncoding = new BitSet(256);

        // a-z
        for (int i = 97; i <= 122; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // A-Z
        for (int i = 65; i <= 90; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // 0-9
        for (int i = 48; i <= 57; ++i)
        {
            dontNeedEncoding.set(i);
        }

        // '()*
        for (int i = 39; i <= 42; ++i)
        {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(33); // !
        dontNeedEncoding.set(45); // -
        dontNeedEncoding.set(46); // .
        dontNeedEncoding.set(95); // _
        dontNeedEncoding.set(126); // ~
    }

    /**
     * A Utility class should not be instantiated.
     */
    private EscapeUtils()
    {

    }

    /**
     * Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
     * 
     * @param input
     *            A component of a URI
     * @return the escaped URI component
     */
    public static String encodeURIComponent(String input)
    {
        if (input == null)
        {
            return input;
        }

        StringBuilder filtered = new StringBuilder(input.length());
        char c;
        for (int i = 0; i < input.length(); ++i)
        {
            c = input.charAt(i);
            if (dontNeedEncoding.get(c))
            {
                filtered.append(c);
            }
            else
            {
                final byte[] b = charToBytesUTF(c);

                for (int j = 0; j < b.length; ++j)
                {
                    filtered.append('%');
                    filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
                    filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
                }
            }
        }
        return filtered.toString();
    }

    private static byte[] charToBytesUTF(char c)
    {
        try
        {
            return new String(new char[] { c }).getBytes("UTF-8");
        }
        catch (UnsupportedEncodingException e)
        {
            return new byte[] { (byte) c };
        }
    }
}

http://blog.sangupta.com/2010/05/encodeuricomponent-and.html에 문서화 된 또 다른 구현을 생각해 냈습니다 . 구현은 유니 코드 바이트도 처리 할 수 있습니다.

다음은 Ravi Wallau의 솔루션에 대한 간단한 예입니다.

public String buildSafeURL(String partialURL, String documentName)
        throws ScriptException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    ScriptEngine scriptEngine = scriptEngineManager
            .getEngineByName("JavaScript");

    String urlSafeDocumentName = String.valueOf(scriptEngine
            .eval("encodeURIComponent('" + documentName + "')"));
    String safeURL = partialURL + urlSafeDocumentName;

    return safeURL;
}

public static void main(String[] args) {
    EncodeURIComponentDemo demo = new EncodeURIComponentDemo();
    String partialURL = "https://www.website.com/document/";
    String documentName = "Tom & Jerry Manuscript.pdf";

    try {
        System.out.println(demo.buildSafeURL(partialURL, documentName));
    } catch (ScriptException se) {
        se.printStackTrace();
    }
}

산출: https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf

It also answers the hanging question in the comments by Loren Shqipognja on how to pass a String variable to encodeURIComponent(). The method scriptEngine.eval() returns an Object, so it can converted to String via String.valueOf() among other methods.

This is what I'm using:

private static final String HEX = "0123456789ABCDEF";

public static String encodeURIComponent(String str) {
    if (str == null) return null;

    byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
    StringBuilder builder = new StringBuilder(bytes.length);

    for (byte c : bytes) {
        if (c >= 'a' ? c <= 'z' || c == '~' :
            c >= 'A' ? c <= 'Z' || c == '_' :
            c >= '0' ? c <= '9' :  c == '-' || c == '.')
            builder.append((char)c);
        else
            builder.append('%')
                   .append(HEX.charAt(c >> 4 & 0xf))
                   .append(HEX.charAt(c & 0xf));
    }

    return builder.toString();
}

It goes beyond Javascript's by percent-encoding every character that is not an unreserved character according to RFC 3986.

This is the oposite conversion:

public static String decodeURIComponent(String str) {
    if (str == null) return null;

    int length = str.length();
    byte[] bytes = new byte[length / 3];
    StringBuilder builder = new StringBuilder(length);

    for (int i = 0; i < length; ) {
        char c = str.charAt(i);
        if (c != '%') {
            builder.append(c);
            i += 1;
        } else {
            int j = 0;
            do {
                char h = str.charAt(i + 1);
                char l = str.charAt(i + 2);
                i += 3;

                h -= '0';
                if (h >= 10) {
                    h |= ' ';
                    h -= 'a' - '0';
                    if (h >= 6) throw new IllegalArgumentException();
                    h += 10;
                }

                l -= '0';
                if (l >= 10) {
                    l |= ' ';
                    l -= 'a' - '0';
                    if (l >= 6) throw new IllegalArgumentException();
                    l += 10;
                }

                bytes[j++] = (byte)(h << 4 | l);
                if (i >= length) break;
                c = str.charAt(i);
            } while (c == '%');
            builder.append(new String(bytes, 0, j, UTF_8));
        }
    }

    return builder.toString();
}

I have found PercentEscaper class from google-http-java-client library, that can be used to implement encodeURIComponent quite easily.

PercentEscaper from google-http-java-client javadoc google-http-java-client home

I have successfully used the java.net.URI class like so:

public static String uriEncode(String string) {
    String result = string;
    if (null != string) {
        try {
            String scheme = null;
            String ssp = string;
            int es = string.indexOf(':');
            if (es > 0) {
                scheme = string.substring(0, es);
                ssp = string.substring(es + 1);
            }
            result = (new URI(scheme, ssp, null)).toString();
        } catch (URISyntaxException usex) {
            // ignore and use string that has syntax error
        }
    }
    return result;
}

for me this worked:

import org.apache.http.client.utils.URIBuilder;

String encodedString = new URIBuilder()
  .setParameter("i", stringToEncode)
  .build()
  .getRawQuery() // output: i=encodedString
  .substring(2);

or with a different UriBuilder

import javax.ws.rs.core.UriBuilder;

String encodedString = UriBuilder.fromPath("")
  .queryParam("i", stringToEncode)
  .toString()   // output: ?i=encodedString
  .substring(3);

In my opinion using a standard library is a better idea rather than post processing manually. Also @Chris answer looked good, but it doesn't work for urls, like "http://a+b c.html"

Guava library has PercentEscaper:

Escaper percentEscaper = new PercentEscaper("-_.*", false);

"-_.*" are safe characters

false says PercentEscaper to escape space with '%20', not '+'

I used String encodedUrl = new URI(null, url, null).toASCIIString(); to encode urls. To add parameters after the existing ones in the url I use UriComponentsBuilder

참고URL : https://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-outpu

'programing' 카테고리의 다른 글

새 JSON 데이터로 데이터 테이블 테이블을 수동으로 업데이트하는 방법 (0)	2020.09.18
내 Visual Studio 2017 팀 탐색기 패널에서 원격 분기 목록을 어떻게 새로 고칠 수 있습니까? (0)	2020.09.18
Rails를 사용하여 정수형 열이 아닌 기본 키를 어떻게 설정할 수 있습니까? (0)	2020.09.18
.NET에서 "클램프"기능은 어디에서 찾을 수 있습니까? (0)	2020.09.18
동일한 문자열에 대한 JSTL if 태그 (0)	2020.09.18

현재글동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

nasanasa

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

동일한 출력을 생성하는 JavaScript의 encodeURIComponent에 해당하는 Java?

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바