Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>EDIT:</p> <p>OK now that I understand this can go across tags I think I understand the difficulty here.</p> <p>The only algorithm I can think of here is to walk the XML tree reading the text portions searching for your match - you'll need to do this matching yourself character by character across multiple nodes. The difficulty of course is to not munge the tree in the process...</p> <p>Here's how I would do it:</p> <p>Create a a walker to walk to the XML tree. Whenever you think you've found the start of the string match, save whatever the current parent node is. When (and if) you find the end of your string match check if the saved node is the same as the end node's parent. If they are the same then its safe to modify the tree.</p> <p>Example doc:</p> <pre><code>&lt;doc&gt;This is a an &lt;b&gt;example text I made up&lt;/b&gt; on the spot! Nutty.&lt;/doc&gt; </code></pre> <p>Test 1: Match: example text</p> <p>The walker would walk along until it finds the "e" in example, and it would save the parent node (<code>&lt;b&gt;</code> node) and keep walking until it found the end of <code>text</code> where it would check to see if it was still in the same reference node <code>&lt;b&gt;</code> which it is, so it is a match and you can tag it with or whatever.</p> <p>Test 2: Match: an example</p> <p>The walker would first hit <code>a</code> and quickly reject it, then hit <code>an</code> and save the <code>&lt;doc&gt;</code> node. It would continue to match over to the <code>example</code> text until it realizes that example's parent node is <code>&lt;b&gt;</code> and not <code>&lt;doc&gt;</code> at which point the match is failed and no node is installed.</p> <p>Implementation 1:</p> <p>If you are only matching straight text, then the simple matcher using a Java (SAX or something) seems like a way to go here.</p> <p>Implementation 2:</p> <p>If matching input is regex itself, then you'll need something very special. I know of no engine which could work here for sure, what you <em>might</em> be able to do is write a bit of ugly something to do it... Maybe some sort of recursive walker which would break down the XML tree into smaller and smaller node-sets, searching the complete text at each level...</p> <p>Very rough (non-working) code:</p> <pre><code>def search(raw, regex): tree = parseXml(raw) text = getText(tree) if match(text, regex): def searchXML(tree, regex): text = getFlatText(tree) if match(text, regex): # check if this text node might match textNodes = getTextNodes(tree) for (tn : textNodes): # check if its contained in a single text node if match(tn, regex): return tn xmlnodes = getXMLNodes(tree) for (xn : xmlnodes): # check if any of the children contain the text match = searchXML(xn, regex) if match return match return tree # matches some combination of text/nodes at this level # but not at a sublevel else: return None # no match in this subtree </code></pre> <p>Once you know where the node is that should contain your match, I'm not sure what can do though because you don't know how you can figure out the index inside the text where it is needed from the regex... Maybe someone has an regex out there you can modify...</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload