Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I modify a complex XML document in Perl to add additional markup to text nodes?
    text
    copied!<p>I have an XML document like this:</p> <pre><code>&lt;article&gt; &lt;author&gt;Smith&lt;/author&gt; &lt;date&gt;2011-10-10&lt;/date&gt; &lt;description&gt;Article about &lt;b&gt;frobnitz&lt;/b&gt;, crulps and furtikurty's. Mainly frobnitz&lt;/description&gt; &lt;/article&gt; </code></pre> <p>I need to parse this in Perl and then add new tags around some words or phrases (eg to link to definitions). I want to tag only the first instance of a target word and narrow my search to just what's in a given tag (eg description tag only).</p> <p>I can parse with <a href="http://search.cpan.org/~mirod/XML-Twig-3.38/Twig.pm" rel="nofollow">XML::Twig</a> and set a "twig_handler" for the description tag. But when I call <em>$node->text</em> I get the text with intervening tags removed. Really what I want to do is traverse down the (very small) tree so that existing tags are preserved and not broken. The final XML output should therefore look like this:</p> <pre><code>&lt;article&gt; &lt;author&gt;Smith&lt;/author&gt; &lt;date&gt;2011-10-10&lt;/date&gt; &lt;description&gt;Article about &lt;b&gt;&lt;a href="dictionary.html#frobnitz"&gt;frobnitz&lt;/a&gt;&lt;/b&gt;, &lt;a href="dictionary.html#crulps"&gt;crulps&lt;/a&gt; and &lt;a href="dictionary.html#furtikurty"&gt;furtikurty&lt;/a&gt;'s. Mainly frobnitz&lt;/description&gt; &lt;/article&gt; </code></pre> <p>I also have <a href="http://search.cpan.org/~pajas/XML-LibXML-1.70/LibXML.pod" rel="nofollow">XML::LibXML</a> available on the target environment but I'm not sure how to start there...</p> <p>Here's my minimal test case so far. Appreciate any help!</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use XML::Twig; my %dictionary = ( frobnitz =&gt; 'dictionary.html#frobnitz', crulps =&gt; 'dictionary.html#crulps', furtykurty =&gt; 'dictionary.html#furtykurty', ); sub markup_plain_text { my ( $text ) = @_; foreach my $k ( keys %dictionary ) { $text =~ s/(^|\W)($k)(\W|$)}/$1&lt;a href="$dictionary{$k}"&gt;$2&lt;\/a&gt;$3/si; } return $text; } sub convert { my( $t, $node ) = @_; warn "convert: TEXT=[" . $node-&gt;text . "]\n"; $node-&gt;set_text( markup_plain_text($node-&gt;text) ); return 1; } sub markup { my ( $text ) = @_; my $t = XML::Twig-&gt;new( twig_handlers =&gt; { description =&gt; \&amp;convert }, pretty_print =&gt; 'indented', ); $t-&gt;parse( $text ); return $t-&gt;flush; } my $orig = &lt;&lt;END_XML; &lt;article&gt; &lt;author&gt;Smith&lt;/author&gt; &lt;date&gt;2011-10-10&lt;/date&gt; &lt;description&gt;Article about &lt;b&gt;frobnitz&lt;/b&gt;, crulps and furtikurty's. Mainly frobnitz's&lt;/description&gt; &lt;/article&gt; END_XML ; markup($orig); </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload