Note that there are some explanatory texts on larger screens.

plurals
  1. POIs it possible to retrieve the *full* HTML page source of an iframe with Javascript?
    primarykey
    data
    text
    <p>I am trying to figure out how to retrieve the <strong>full</strong> (that means <strong>all data</strong>) HTML page source from an <code>&lt;iframe&gt;</code> whose <code>src</code> is from the same originating domain as the page that it is embedded on. I want the exact source code at any given time, which could be dynamic due to Javascript or php generating the <code>&lt;iframe&gt;</code> html output. This means AJAX calls like <a href="http://api.jquery.com/jQuery.get/" rel="nofollow noreferrer"><code>$.get()</code></a> will not work for me as the page could have been modified via Javascript or generated uniquely based on the request time or <a href="http://php.net/manual/en/function.mt-rand.php" rel="nofollow noreferrer"><code>mt_rand()</code></a> in php. I have not been able to retrieve the exact <code>&lt;!DOCTYPE&gt;</code> declaration from my <code>&lt;iframe&gt;</code>.</p> <p>I have been experimenting around and searching through Stack Overflow and have not found a solution that retrieves <strong>all</strong> of the page source including the <code>&lt;!DOCTYPE&gt;</code> declaration.</p> <p>One of the answers in <a href="https://stackoverflow.com/questions/982717/how-do-i-get-the-entire-pages-html-with-jquery#12523515">How do I get the entire page&#39;s HTML with jQuery?</a> suggests that in order to retrieve the <code>&lt;!DOCTYPE&gt;</code> information, you need to construct this declaration manually, by retrieving the <code>&lt;iframe&gt;</code>'s <code>document.doctype</code> property and then adding all of the attributes to the <code>&lt;!DOCTYPE&gt;</code> declaration yourself. Is this really the only way to retrieve this information from the <code>&lt;iframe&gt;</code>'s HTML page source?</p> <p>Here are some notable Stack Overflow posts that I have looked through and that this is not a duplicate of:</p> <ul> <li><a href="https://stackoverflow.com/questions/13913824/javascript-get-current-page-current-source">Javascript: Get current page CURRENT source</a></li> <li><a href="https://stackoverflow.com/questions/2419749/get-selected-elements-outer-html">Get selected element&#39;s outer HTML</a></li> <li><a href="https://stackoverflow.com/questions/4612143/how-to-get-page-source-using-jquery">https://stackoverflow.com/questions/4612143/how-to-get-page-source-using-jquery</a></li> <li><a href="https://stackoverflow.com/questions/982717/how-do-i-get-the-entire-pages-html-with-jquery">How do I get the entire page&#39;s HTML with jQuery?</a></li> <li><a href="https://stackoverflow.com/questions/9366382/jquery-get-all-html-source-of-a-page-but-excluding-some-ids">Jquery: get all html source of a page but excluding some #ids</a></li> <li><a href="https://stackoverflow.com/questions/5587844/jquery-get-html-including-the-selector">jQuery: Get HTML including the selector?</a></li> </ul> <p>Here is some of my local test code that illustrates my best attempt so far, which only retrieves the data within and including the <code>&lt;iframe&gt;</code>'s <code>&lt;html&gt;</code> tag:</p> <p><strong>main.html</strong></p> <pre><code>&lt;html&gt; &lt;head&gt; &lt;title&gt;Testing with iframe&lt;/title&gt; &lt;script src="http://code.jquery.com/jquery-1.9.1.min.js"&gt;&lt;/script&gt; &lt;script type="text/javascript"&gt; function test() { var doc = document.getElementById('iframe-source').contentWindow.document; var html = $('html', doc).clone().wrap('&lt;p&gt;').parent().html(); $('#output').val(html); } &lt;/script&gt; &lt;/head&gt; &lt;body&gt; &lt;textarea id="output"&gt;&lt;/textarea&gt; &lt;iframe id="iframe-source" src="iframe.html" onload="javascript:test()"&gt;&lt;/iframe&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p><br /> <strong>iframe.html</strong></p> <pre><code>&lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt; &lt;html class="html-tag-class"&gt; &lt;head class="head-tag-class"&gt; &lt;title&gt;iframe Testing&lt;/title&gt; &lt;/head&gt; &lt;body class="body-tag-class"&gt; &lt;h2&gt;Testing header tag&lt;/h2&gt; &lt;p&gt;This is &lt;strong&gt;very&lt;/strong&gt; exciting&lt;/p&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p><br /> And here is a <strong>screenshot</strong> of these files run together in Google Chrome version 27.0.1453.110 m: <img src="https://i.stack.imgur.com/y1LEk.png" alt="iframe testing"></p> <h3>Summary</h3> <p>As you can see, Google Chrome's <code>Inspect element</code> shows that within the <code>&lt;iframe&gt;</code> the <code>&lt;!DOCTYPE&gt;</code> declaration is present, so how can I retrieve this data with the page source? This question also applies to any other declarations or other tags that are not contained within the <code>&lt;html&gt;</code> tags.</p> <p><br /> Any help or advice on retrieving this full page source code via Javascript would be greatly appreciated.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. CO"I want the exact source code at any given time" - seems like you have some misconceptions. "HTML source" is unchangeable - it is the HTML string served from the server (e.g. PHP). What is dynamic is the DOM (parsed HTML) which JS acts upon. `innerHTML`/`outerHTML` is nothing more than a serialization of the DOM. So, to summarize, you either send an Ajax request to the page and obtain the HTML source (the actual source before the JS executes) or to get the a serialization of the DOM use the answer which you linked.
      singulars
    2. CO@FabrícioMatté - Thanks for your response. The serialization of the `DOM` may not match the page source exactly, but I suppose manually constructing the `doctype` would be required in that case.
      singulars
    3. COHow likely is the source going to change between requests? If you want the exact doctype string, you could use ajax to get the source, extract the doctype string, and then proceed with using the DOM changes. Depending upon how the html is being served from the webserver and how it is requested, it might only end up with one request and then always use cache (probably not optimal in your situation, though), or a `200 OK` and a `304 Not Modified` (or something similar; I'm pretty sure I have the HTTP codes right at least).
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload