Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><sup><em>First a disclaimer beforehand: the posted code snippets are all basic examples. You'll need to handle trivial <code>IOException</code>s and <code>RuntimeException</code>s like <code>NullPointerException</code>, <code>ArrayIndexOutOfBoundsException</code> and consorts yourself.</em></sup></p> <hr> <h3>Preparing</h3> <p>We first need to know at least the URL and the charset. The parameters are optional and depend on the functional requirements.</p> <pre><code>String url = "http://example.com"; String charset = "UTF-8"; // Or in Java 7 and later, use the constant: java.nio.charset.StandardCharsets.UTF_8.name() String param1 = "value1"; String param2 = "value2"; // ... String query = String.format("param1=%s&amp;param2=%s", URLEncoder.encode(param1, charset), URLEncoder.encode(param2, charset)); </code></pre> <p>The query parameters must be in <code>name=value</code> format and be concatenated by <code>&amp;</code>. You would normally also <a href="http://en.wikipedia.org/wiki/Percent-encoding" rel="noreferrer">URL-encode</a> the query parameters with the specified charset using <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLEncoder.html#encode-java.lang.String-java.lang.String-" rel="noreferrer"><code>URLEncoder#encode()</code></a>.</p> <p><sup>The <code>String#format()</code> is just for convenience. I prefer it when I would need the String concatenation operator <code>+</code> more than twice.</sup></p> <hr> <h3>Firing a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.3" rel="noreferrer">HTTP GET</a> request with (optionally) query parameters</h3> <p>It's a trivial task. It's the default request method.</p> <pre><code>URLConnection connection = new URL(url + "?" + query).openConnection(); connection.setRequestProperty("Accept-Charset", charset); InputStream response = connection.getInputStream(); // ... </code></pre> <p>Any query string should be concatenated to the URL using <code>?</code>. The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2" rel="noreferrer"><code>Accept-Charset</code></a> header may hint the server what encoding the parameters are in. If you don't send any query string, then you can leave the <code>Accept-Charset</code> header away. If you don't need to set any headers, then you can even use the <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URL.html#openStream%28%29" rel="noreferrer"><code>URL#openStream()</code></a> shortcut method.</p> <pre><code>InputStream response = new URL(url).openStream(); // ... </code></pre> <p>Either way, if the other side is a <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html" rel="noreferrer"><code>HttpServlet</code></a>, then its <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html#doGet%28javax.servlet.http.HttpServletRequest,%20javax.servlet.http.HttpServletResponse%29" rel="noreferrer"><code>doGet()</code></a> method will be called and the parameters will be available by <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/ServletRequest.html#getParameter%28java.lang.String%29" rel="noreferrer"><code>HttpServletRequest#getParameter()</code></a>.</p> <p>For testing purposes, you can print the response body to stdout as below:</p> <pre><code>try (Scanner scanner = new Scanner(response)) { String responseBody = scanner.useDelimiter("\\A").next(); System.out.println(responseBody); } </code></pre> <hr> <h3>Firing a <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5" rel="noreferrer">HTTP POST</a> request with query parameters</h3> <p>Setting the <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html#setDoOutput%28boolean%29" rel="noreferrer"><code>URLConnection#setDoOutput()</code></a> to <code>true</code> implicitly sets the request method to POST. The standard HTTP POST as web forms do is of type <code>application/x-www-form-urlencoded</code> wherein the query string is written to the request body.</p> <pre><code>URLConnection connection = new URL(url).openConnection(); connection.setDoOutput(true); // Triggers POST. connection.setRequestProperty("Accept-Charset", charset); connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded;charset=" + charset); try (OutputStream output = connection.getOutputStream()) { output.write(query.getBytes(charset)); } InputStream response = connection.getInputStream(); // ... </code></pre> <p>Note: whenever you'd like to submit a HTML form programmatically, don't forget to take the <code>name=value</code> pairs of any <code>&lt;input type="hidden"&gt;</code> elements into the query string and of course also the <code>name=value</code> pair of the <code>&lt;input type="submit"&gt;</code> element which you'd like to "press" programmatically (because that's usually been used in the server side to distinguish if a button was pressed and if so, which one).</p> <p>You can also cast the obtained <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html" rel="noreferrer"><code>URLConnection</code></a> to <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html" rel="noreferrer"><code>HttpURLConnection</code></a> and use its <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setRequestMethod%28java.lang.String%29" rel="noreferrer"><code>HttpURLConnection#setRequestMethod()</code></a> instead. But if you're trying to use the connection for output you still need to set <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html#setDoOutput%28boolean%29" rel="noreferrer"><code>URLConnection#setDoOutput()</code></a> to <code>true</code>.</p> <pre><code>HttpURLConnection httpConnection = (HttpURLConnection) new URL(url).openConnection(); httpConnection.setRequestMethod("POST"); // ... </code></pre> <p>Either way, if the other side is a <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html" rel="noreferrer"><code>HttpServlet</code></a>, then its <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html#doPost%28javax.servlet.http.HttpServletRequest,%20javax.servlet.http.HttpServletResponse%29" rel="noreferrer"><code>doPost()</code></a> method will be called and the parameters will be available by <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/ServletRequest.html#getParameter%28java.lang.String%29" rel="noreferrer"><code>HttpServletRequest#getParameter()</code></a>.</p> <hr> <h3>Actually firing the HTTP request</h3> <p>You can fire the HTTP request explicitly with <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html#connect%28%29" rel="noreferrer"><code>URLConnection#connect()</code></a>, but the request will automatically be fired on demand when you want to get any information about the HTTP response, such as the response body using <a href="http://docs.oracle.com/javase/8/docs/api/java/net/URLConnection.html#getInputStream%28%29" rel="noreferrer"><code>URLConnection#getInputStream()</code></a> and so on. The above examples does exactly that, so the <code>connect()</code> call is in fact superfluous.</p> <hr> <h3>Gathering HTTP response information</h3> <ol> <li><p><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html" rel="noreferrer">HTTP response status</a>:</p> <p>You need a <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html" rel="noreferrer"><code>HttpURLConnection</code></a> here. Cast it first if necessary.</p> <pre><code>int status = httpConnection.getResponseCode(); </code></pre></li> <li><p><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html" rel="noreferrer">HTTP response headers</a>:</p> <pre><code>for (Entry&lt;String, List&lt;String&gt;&gt; header : connection.getHeaderFields().entrySet()) { System.out.println(header.getKey() + "=" + header.getValue()); } </code></pre></li> <li><p><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17" rel="noreferrer">HTTP response encoding</a>:</p> <p>When the <code>Content-Type</code> contains a <code>charset</code> parameter, then the response body is likely text based and we'd like to process the response body with the server-side specified character encoding then.</p> <pre><code>String contentType = connection.getHeaderField("Content-Type"); String charset = null; for (String param : contentType.replace(" ", "").split(";")) { if (param.startsWith("charset=")) { charset = param.split("=", 2)[1]; break; } } if (charset != null) { try (BufferedReader reader = new BufferedReader(new InputStreamReader(response, charset))) { for (String line; (line = reader.readLine()) != null;) { // ... System.out.println(line) ? } } } else { // It's likely binary content, use InputStream/OutputStream. } </code></pre></li> </ol> <hr> <h3>Maintaining the session</h3> <p>The server side session is usually backed by a cookie. Some web forms require that you're logged in and/or are tracked by a session. You can use the <a href="http://docs.oracle.com/javase/8/docs/api/java/net/CookieHandler.html" rel="noreferrer"><code>CookieHandler</code></a> API to maintain cookies. You need to prepare a <a href="http://docs.oracle.com/javase/8/docs/api/java/net/CookieManager.html" rel="noreferrer"><code>CookieManager</code></a> with a <a href="http://docs.oracle.com/javase/8/docs/api/java/net/CookiePolicy.html" rel="noreferrer"><code>CookiePolicy</code></a> of <a href="http://docs.oracle.com/javase/8/docs/api/java/net/CookiePolicy.html#ACCEPT_ALL" rel="noreferrer"><code>ACCEPT_ALL</code></a> before sending all HTTP requests.</p> <pre><code>// First set the default cookie manager. CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL)); // All the following subsequent URLConnections will use the same cookie manager. URLConnection connection = new URL(url).openConnection(); // ... connection = new URL(url).openConnection(); // ... connection = new URL(url).openConnection(); // ... </code></pre> <p>Note that this is known to not always work properly in all circumstances. If it fails for you, then best is to manually gather and set the cookie headers. You basically need to grab all <code>Set-Cookie</code> headers from the response of the login or the first <code>GET</code> request and then pass this through the subsequent requests.</p> <pre><code>// Gather all cookies on the first request. URLConnection connection = new URL(url).openConnection(); List&lt;String&gt; cookies = connection.getHeaderFields().get("Set-Cookie"); // ... // Then use the same cookies on all subsequent requests. connection = new URL(url).openConnection(); for (String cookie : cookies) { connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]); } // ... </code></pre> <p>The <code>split(";", 2)[0]</code> is there to get rid of cookie attributes which are irrelevant for the server side like <code>expires</code>, <code>path</code>, etc. Alternatively, you could also use <code>cookie.substring(0, cookie.indexOf(';'))</code> instead of <code>split()</code>.</p> <hr> <h3>Streaming mode</h3> <p>The <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html" rel="noreferrer"><code>HttpURLConnection</code></a> will by default buffer the <em>entire</em> request body before actually sending it, regardless of whether you've set a fixed content length yourself using <code>connection.setRequestProperty("Content-Length", contentLength);</code>. This may cause <code>OutOfMemoryException</code>s whenever you concurrently send large POST requests (e.g. uploading files). To avoid this, you would like to set the <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode%28int%29" rel="noreferrer"><code>HttpURLConnection#setFixedLengthStreamingMode()</code></a>.</p> <pre><code>httpConnection.setFixedLengthStreamingMode(contentLength); </code></pre> <p>But if the content length is really not known beforehand, then you can make use of chunked streaming mode by setting the <a href="http://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setChunkedStreamingMode%28int%29" rel="noreferrer"><code>HttpURLConnection#setChunkedStreamingMode()</code></a> accordingly. This will set the HTTP <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.41" rel="noreferrer"><code>Transfer-Encoding</code></a> header to <code>chunked</code> which will force the request body being sent in chunks. The below example will send the body in chunks of 1KB.</p> <pre><code>httpConnection.setChunkedStreamingMode(1024); </code></pre> <hr> <h3>User-Agent</h3> <p>It can happen that <a href="https://stackoverflow.com/questions/13670692/403-forbidden-with-java-but-not-web-browser">a request returns an unexpected response, while it works fine with a real web browser</a>. The server side is probably blocking requests based on the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43" rel="noreferrer"><code>User-Agent</code></a> request header. The <code>URLConnection</code> will by default set it to <code>Java/1.6.0_19</code> where the last part is obviously the JRE version. You can override this as follows:</p> <pre><code>connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"); // Do as if you're using Chrome 41 on Windows 7. </code></pre> <p>Use the User-Agent string from a <a href="http://www.useragentstring.com/pages/useragentstring.php" rel="noreferrer">recent browser</a>.</p> <hr> <h3>Error handling</h3> <p>If the HTTP response code is <code>4nn</code> (Client Error) or <code>5nn</code> (Server Error), then you may want to read the <code>HttpURLConnection#getErrorStream()</code> to see if the server has sent any useful error information.</p> <pre><code>InputStream error = ((HttpURLConnection) connection).getErrorStream(); </code></pre> <p>If the HTTP response code is -1, then something went wrong with connection and response handling. The <code>HttpURLConnection</code> implementation is in older JREs somewhat buggy with keeping connections alive. You may want to turn it off by setting the <code>http.keepAlive</code> system property to <code>false</code>. You can do this programmatically in the beginning of your application by:</p> <pre><code>System.setProperty("http.keepAlive", "false"); </code></pre> <hr> <h3>Uploading files</h3> <p>You'd normally use <a href="http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2" rel="noreferrer"><code>multipart/form-data</code></a> encoding for mixed POST content (binary and character data). The encoding is in more detail described in <a href="http://www.faqs.org/rfcs/rfc2388.html" rel="noreferrer">RFC2388</a>.</p> <pre><code>String param = "value"; File textFile = new File("/path/to/file.txt"); File binaryFile = new File("/path/to/file.bin"); String boundary = Long.toHexString(System.currentTimeMillis()); // Just generate some unique random value. String CRLF = "\r\n"; // Line separator required by multipart/form-data. URLConnection connection = new URL(url).openConnection(); connection.setDoOutput(true); connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary); try ( OutputStream output = connection.getOutputStream(); PrintWriter writer = new PrintWriter(new OutputStreamWriter(output, charset), true); ) { // Send normal param. writer.append("--" + boundary).append(CRLF); writer.append("Content-Disposition: form-data; name=\"param\"").append(CRLF); writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF); writer.append(CRLF).append(param).append(CRLF).flush(); // Send text file. writer.append("--" + boundary).append(CRLF); writer.append("Content-Disposition: form-data; name=\"textFile\"; filename=\"" + textFile.getName() + "\"").append(CRLF); writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF); // Text file itself must be saved in this charset! writer.append(CRLF).flush(); Files.copy(textFile.toPath(), output); output.flush(); // Important before continuing with writer! writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary. // Send binary file. writer.append("--" + boundary).append(CRLF); writer.append("Content-Disposition: form-data; name=\"binaryFile\"; filename=\"" + binaryFile.getName() + "\"").append(CRLF); writer.append("Content-Type: " + URLConnection.guessContentTypeFromName(binaryFile.getName())).append(CRLF); writer.append("Content-Transfer-Encoding: binary").append(CRLF); writer.append(CRLF).flush(); Files.copy(binaryFile.toPath(), output); output.flush(); // Important before continuing with writer! writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary. // End of multipart/form-data. writer.append("--" + boundary + "--").append(CRLF).flush(); } </code></pre> <p>If the other side is a <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html" rel="noreferrer"><code>HttpServlet</code></a>, then its <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServlet.html#doPost%28javax.servlet.http.HttpServletRequest,%20javax.servlet.http.HttpServletResponse%29" rel="noreferrer"><code>doPost()</code></a> method will be called and the parts will be available by <a href="http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpServletRequest.html#getPart%28java.lang.String%29" rel="noreferrer"><code>HttpServletRequest#getPart()</code></a> (note, thus <strong>not</strong> <code>getParameter()</code> and so on!). The <code>getPart()</code> method is however relatively new, it's introduced in Servlet 3.0 (Glassfish 3, Tomcat 7, etc). Prior to Servlet 3.0, your best choice is using <a href="http://commons.apache.org/fileupload" rel="noreferrer">Apache Commons FileUpload</a> to parse a <code>multipart/form-data</code> request. Also see <a href="https://stackoverflow.com/questions/2422468/upload-big-file-to-servlet/2424824#2424824">this answer</a> for examples of both the FileUpload and the Servelt 3.0 approaches.</p> <hr> <h3>Dealing with untrusted or misconfigured HTTPS sites</h3> <p>Sometimes you need to connect a HTTPS URL, perhaps because you're writing a web scraper. In that case, you may likely face a <code>javax.net.ssl.SSLException: Not trusted server certificate</code> on some HTTPS sites who doesn't keep their SSL certificates up to date, or a <code>java.security.cert.CertificateException: No subject alternative DNS name matching [hostname] found</code> or <code>javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name</code> on some misconfigured HTTPS sites.</p> <p>The following one-time-run <code>static</code> initializer in your web scraper class should make <code>HttpsURLConnection</code> more lenient as to those HTTPS sites and thus not throw those exceptions anymore.</p> <pre><code>static { TrustManager[] trustAllCertificates = new TrustManager[] { new X509TrustManager() { @Override public X509Certificate[] getAcceptedIssuers() { return null; // Not relevant. } @Override public void checkClientTrusted(X509Certificate[] certs, String authType) { // Do nothing. Just allow them all. } @Override public void checkServerTrusted(X509Certificate[] certs, String authType) { // Do nothing. Just allow them all. } } }; HostnameVerifier trustAllHostnames = new HostnameVerifier() { @Override public boolean verify(String hostname, SSLSession session) { return true; // Just allow them all. } }; try { System.setProperty("jsse.enableSNIExtension", "false"); SSLContext sc = SSLContext.getInstance("SSL"); sc.init(null, trustAllCertificates, new SecureRandom()); HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory()); HttpsURLConnection.setDefaultHostnameVerifier(trustAllHostnames); } catch (GeneralSecurityException e) { throw new ExceptionInInitializerError(e); } } </code></pre> <hr> <h3>Last words</h3> <p>The <a href="http://hc.apache.org/httpcomponents-client-ga/" rel="noreferrer">Apache HttpComponents HttpClient</a> is <em>much</em> more convenient in this all :)</p> <ul> <li><a href="http://hc.apache.org/httpcomponents-client-ga/tutorial/html/" rel="noreferrer">HttpClient Tutorial</a></li> <li><a href="http://hc.apache.org/httpcomponents-client-ga/examples.html" rel="noreferrer">HttpClient Examples</a></li> </ul> <hr> <h3>Parsing and extracting HTML</h3> <p>If all you want is parsing and extracting data from HTML, then better use a HTML parser like <a href="http://jsoup.org" rel="noreferrer">Jsoup</a></p> <ul> <li><a href="https://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers/3154281#3154281">What are the pros/cons of leading HTML parsers in Java</a></li> <li><a href="https://stackoverflow.com/questions/2835505/how-to-scan-a-website-or-page-for-info-and-bring-it-into-my-program/2835555#2835555">How to scan and extract a webpage in Java</a></li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload