Note that there are some explanatory texts on larger screens.

plurals
  1. POpreserve formatting when updating xml file with groovy
    primarykey
    data
    text
    <p>I have a large number of XML files that contain URLs. I'm writing a groovy utility to find each URL and replace it with an updated version.</p> <p>Given example.xml:</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;page&gt; &lt;content&gt; &lt;section&gt; &lt;link&gt; &lt;url&gt;/some/old/url&lt;/url&gt; &lt;/link&gt; &lt;link&gt; &lt;url&gt;/some/old/url&lt;/url&gt; &lt;/link&gt; &lt;/section&gt; &lt;section&gt; &lt;link&gt; &lt;url&gt; /a/different/old/url?with=specialChars&amp;amp;escaped=true &lt;/url&gt; &lt;/link&gt; &lt;/section&gt; &lt;/content&gt; &lt;/page&gt; </code></pre> <p>Once the script has run, example.xml should contain:</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;page&gt; &lt;content&gt; &lt;section&gt; &lt;link&gt; &lt;url&gt;/a/new/and/improved/url&lt;/url&gt; &lt;/link&gt; &lt;link&gt; &lt;url&gt;/a/new/and/improved/url&lt;/url&gt; &lt;/link&gt; &lt;/section&gt; &lt;section&gt; &lt;link&gt; &lt;url&gt; /a/different/new/and/improved/url?with=specialChars&amp;amp;stillEscaped=true &lt;/url&gt; &lt;/link&gt; &lt;/section&gt; &lt;/content&gt; &lt;/page&gt; </code></pre> <p>This is easy to do using groovy's excellent xml support, except that I want to <strong>change the URLs and nothing else</strong> about the file.</p> <p>By that I mean:</p> <ul> <li>whitespace must not change (files might contain spaces, tabs, or both)</li> <li>comments must be preserved</li> <li>windows vs. unix-style line separators must be preserved</li> <li>the xml declaration at the top must not be added or removed</li> <li>attributes in tags must retain their order</li> </ul> <p>So far, after trying many combinations of XmlParser, DOMBuilder, XmlNodePrinter, XmlUtil.serialize(), and so on, I've landed on reading each file line-by-line and applying an ugly hybrid of the xml utilities and regular expressions.</p> <p>Reading and writing each file:</p> <pre><code>files.each { File file -&gt; def lineEnding = file.text.contains('\r\n') ? '\r\n' : '\n' def newLineAtEof = file.text.endsWith(lineEnding) def lines = file.readLines() file.withWriter { w -&gt; lines.eachWithIndex { line, index -&gt; line = update(line) w.write(line) if (index &lt; lines.size-1) w.write(lineEnding) else if (newLineAtEof) w.write(lineEnding) } } } </code></pre> <p>Searching for and updating URLs within a line:</p> <pre><code>def matcher = (line =~ urlTagRegexp) //matches a &lt;url&gt; element and its contents matcher.each { groups -&gt; def urlNode = new XmlParser().parseText(line) def url = urlNode.text() def newUrl = translate(url) if (newUrl) { urlNode.value = newUrl def replacement = nodeToString(urlNode) line = matcher.replaceAll(replacement) } } def nodeToString(node) { def writer = new StringWriter() writer.withPrintWriter { printWriter -&gt; def printer = new XmlNodePrinter(printWriter) printer.preserveWhitespace = true printer.print(node) } writer.toString().replaceAll(/[\r\n]/, '') } </code></pre> <p>This mostly works, except it can't handle a tag split over multiple lines, and messing with newlines when writing the files back out is cumbersome.</p> <p>I'm new to groovy, but I feel like there must be a groovier way of doing this.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload