Note that there are some explanatory texts on larger screens.

plurals
  1. POextracting html between tags using perl
    text
    copied!<p>I want to extract all the html between the tag of a string or file Ive been looking at using (perl) with the module html::parser, I thought this would be a simple task but its turning out to be quite tricky? I found some code which works but dont know how to save results to a string ?? any help appreciated or if you can show me some code on how this can be achived using HTML::TokeParser or similar.</p> <p>Thanks</p> <pre><code>my $content=&lt;&lt;EOF; &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt; &lt;head&gt; &lt;title&gt;Some title goes here&lt;/title&gt; &lt;/head&gt; &lt;body bgcolor="#FFFFFF"&gt; &lt;div id="leftcol"&gt; menu column &lt;/div&gt; &lt;div id="body"&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;p&gt;&lt;strong&gt;some header&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;p&gt;&lt;img src="img.gif" /&gt; image here&lt;/p&gt; &lt;p&gt;&lt;strong&gt;some header&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;/div&gt; &lt;div id="rightcol"&gt; news column &lt;/div&gt; &lt;/body&gt; &lt;/html&gt; EOF my $p = HTML::Parser-&gt;new( api_version =&gt; 3 ); $p-&gt;handler( start =&gt; \&amp;start_handler, "self,tagname,attr" ); $p-&gt;parse($content); exit; sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; return unless ( $tagname eq 'body' ); $self-&gt;handler( start =&gt; sub { print shift }, "text" ); $self-&gt;handler( text =&gt; sub { print shift }, "text" ); $self-&gt;handler( end =&gt; sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self-&gt;eof; } else { print $text; } }, "tagname,self,text"); } </code></pre> <hr> <p>if i modify the above Sub routine start text and end handlers like below</p> <h2>why doesnt the text from those varibles get saved in mine ?</h2> <pre><code>$self-&gt;handler( start =&gt; sub { my ($text) = @_; $inner_body = $inner_body. $text; }, "text" ); $self-&gt;handler( text =&gt; sub { my ($text) = @_; $inner_body = $inner_body. $text; }, "text" ); $self-&gt;handler( end =&gt; sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self-&gt;eof; } else { $inner_body = $inner_body. $text; } }, "tagname,self,text"); </code></pre> <p>}</p> <h2>print $inner_body; # &lt;-- prints blank ???</h2> <p>Desired output to be saved in varible</p> <hr> <pre><code> &lt;div id="leftcol"&gt; menu column &lt;/div&gt; &lt;div id="body"&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;p&gt;&lt;strong&gt;some header&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;p&gt;&lt;img src="img.gif" /&gt; image here&lt;/p&gt; &lt;p&gt;&lt;strong&gt;some header&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;some text goes here some text goes here&lt;br /&gt; some text goes here some text goes here&lt;/p&gt; &lt;/div&gt; &lt;div id="rightcol"&gt; news column &lt;/div&gt; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload