Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>No. <code>writeHead</code> writes HTTP headers to the underlying TCP stream. It has absolutely nothing to do with HTML.</p> <p>You're running into an issue because your server returns the wholesale HTML content of the requested URL. You then pass this string into jQuery, which is apparently adding contained CSS styles into <em>your</em> document.</p> <p>Generally, it is a terrible idea to take random code from a user-supplied URL and run in the context of your page. It opens you to gaping security holes &ndash; the CSS artifacts you're seeing are one example.</p> <p>To be blunt, your code has numerous problems, so bear with me as I point out some issues.</p> <pre><code>app.get('/htmlTest', function (req, res) { res.writeHead(200, { 'content-type': 'text/html' }); </code></pre> <p>Here, you respond to the browser with a success status (<code>200</code>) <em>beore</em> your server actually does anything. This is incorrect: you should only respond with either a success or error code after you know if the request succeeded or failed.</p> <pre><code> request(req.query.html, function (error, response, body) { if (error) { res.write(error.toString()); res.end('\n'); } </code></pre> <p>Here would be a good place to respond with an error code, since we know that the request did actually fail. <code>res.send(500, error)</code> would do the trick.</p> <pre><code> else if (response.statusCode == 200) { res.write(body); res.end('\n'); } </code></pre> <p>And here's where we could respond with a success code. Rather than use <code>writeHead</code>, use Express's <code>set</code> and <code>send</code> methods &ndash; things like <code>Content-Length</code> will be correctly set:</p> <pre><code>res.set('Content-Type', 'text/html'); res.send(body); </code></pre> <p>Now what happens if <code>response.statusCode != 200</code>? You don't handle that case. <code>error</code> is only set in the case of network errors (such as inability to connect to the target server). The target server can still respond with a non-200 status, and your node server would never respond to the browser. In fact, the connection would hang open until the user kills it. This could be fixed with a simple <code>else res.end()</code>.</p> <hr> <p>Even with these issues resolved, we still haven't addressed the fact that it's not a good idea to try to parse arbitrary HTML in the browser.</p> <p>If I were you, I'd use something that parses HTML into a DOM on the server, and then I'd return only the necessary information back to the browser as JSON. <a href="https://github.com/MatthewMueller/cheerio" rel="nofollow">cheerio</a> is the module you probably want to use &ndash; it looks just like jQuery, only it runs on the server.</p> <p>I'd do this:</p> <pre><code>var cheerio = require('cheerio'), url = require('url'), request = require('request'); app.get('/htmlTest', function(req, res) { request(req.query.url, function(err, response, body) { if (err) res.send(500, err); // network error, send a 500 else if (response.status != 200) res.send(500, { httpStatus: response.status }); // server returned a non-200, send a 500 else { // WARNING! We should probably check that the response content-type is html var $ = cheerio.load(body); // load the returned HTML into cheerio var images = []; $('img').each(function() { // Image srcs can be relative. // You probably need the absolute URL of the image, so we should resolve the src. images.push(url.resolve(req.query.url, this.src)); }); res.send({ title: $('title').text(), images: images }); // send back JSON with the image URLs } }); }); </code></pre> <p>Then from the browser:</p> <pre><code>$.ajax({ url: '/htmlTest', data: { url: url }, dataType: 'json', success: function(data) { // data.images has your image URLs }, error: function() { // something went wrong } }); </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload