Note that there are some explanatory texts on larger screens.

plurals
  1. POGroovy Regex illegal Characters
    primarykey
    data
    text
    <p>I have a Groovy script that converts some very poorly formatted data into XML. This part works fine, but it's also happily passing some characters along that aren't legal in XML. So I'm adding some code to strip these out, and this is where the problem is coming from.</p> <p>The code that isn't compiling is this:</p> <p><code>def illegalChars = ~/[\u0000-\u0008]|[\u000B-\u000C]|[\u000E-\u001F]|[\u007F-\u009F]/</code></p> <p>What I'm wondering is, why? What am I doing wrong here? I tested this regex in <a href="http://regexpal.com/" rel="nofollow noreferrer">http://regexpal.com/</a> and it works as expected, but I'm getting an error compiling it in Groovy:</p> <blockquote> <p>[ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] line 23:26: unexpected char: 0x0</p> </blockquote> <p>The line above is <code>line 23</code>. The surrounding lines are just variable declarations that I haven't changed while working on the regex.</p> <p>Thanks!</p> <p>Update: The code compiles, but it's not filtering as I'd expected it to. In regexpal I put the regex:</p> <blockquote> <p>[\u0000-\u0008\u000B-\u000C\u000E-\u001F\u007F-\u009F]</p> </blockquote> <p>and the test data:</p> <pre><code>name='lang'&gt;E&lt;/field&gt;&lt;field name='title'&gt;CHEMICAL IMMUNOLOGY AND ALLERGY&lt;/field&gt;&lt;/doc&gt; &lt;doc&gt;&lt;field name='page'&gt;72-88&lt;/field&gt;&lt;field name='shm'&gt;3146.757500&lt;/field&gt;&lt;field name='pubc'&gt;47&lt;/field&gt;&lt;field name='cs'&gt;1&lt;/field&gt;&lt;field name='issue'&gt;NUMBER&lt;/field&gt; &lt;field name='auth'&gt;Dvorak, A.&lt;/field&gt;&lt;field name='pub'&gt;KARGER&lt;/field&gt;&lt;field name='rr'&gt;GBP013.51&lt;/field&gt;&lt;field name='issn'&gt;1660-2242&lt;/field&gt;&lt;field name='class1'&gt;TS&lt;/field&gt;&lt;field name='freq'&gt;S&lt;/field&gt;&lt;field name='class2'&gt;616.079&lt;/field&gt;&lt;field name='text'&gt;Subcellular Localization of the Cytokines, Basic Fibroblast Growth Factor and Tumor Necrosis Factor- in Mast Cells&lt;/field&gt;&lt;field name='id'&gt;RN170369808&lt;/field&gt;&lt;field name='volume'&gt;VOL 85&lt;/field&gt; &lt;field name='year'&gt;2005&lt;/field&gt;&lt;field name='lang'&gt;E&lt;/field&gt;&lt;field name='title'&gt;CHEMICAL IMMUNOLOGY AND ALLERGY&lt;/field&gt;&lt;/doc&gt;&lt;doc&gt;&lt;field name='page'&gt;89-97&lt;/field&gt;&lt;field name='shm'&gt;3146.757500&lt;/field&gt;&lt;field name='pubc'&gt;47&lt;/field&gt;&lt;field name='cs'&gt;1&lt;/field&gt;&lt;field </code></pre> <p>It's a grab from a file with one of the illegal characters, so it's a little random. But regexpal highlights only the illegal character, but in Groovy it's replacing even the '&lt;' and '>' characters with empty strings, so it's basically annihilating the entire document.</p> <p>The code snippet:</p> <pre><code> def List parseFile(File file){ println "reading File name: ${file.name}" def lineCount = 0 List data = new ArrayList() file.eachLine { String input -&gt; lineCount ++ String line = input if(input =~ illegalChars){ line = input.replaceAll(illegalChars, " ") } Map document = new HashMap() elementNames.each(){ token -&gt; def val = getValue(line, token) if(val != null){ if(token.equals("ISSUE")){ List entries = val.split(";") document.putAt("year",entries.getAt(0).trim()) if(entries.size() &gt; 1){ document.putAt("volume", entries.getAt(1).trim()) } if(entries.size() &gt; 2){ document.putAt("issue", entries.getAt(2).trim()) } } else { document.putAt(token, val) } } } data.add(document) } println "done" return data } </code></pre> <p>I don't see any reason that the two should behave differently; am I missing something?</p> <p>Again, thanks!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload