Note that there are some explanatory texts on larger screens.

plurals
  1. POXSLT: Preceeding element, possibly not a sibling, but do not cross specific tag
    primarykey
    data
    text
    <p>I am attempting to perform some text canonicalization to replace some contractions. Here is some example input:</p> <pre><code>&lt;?xml version="1.0"?&gt; &lt;transcript&gt; &lt;p id="p1"&gt; &lt;s id="s1"&gt;&lt;w&gt;Here&lt;/w&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;w&gt;let&lt;/w&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s2"&gt;&lt;w&gt;Here&lt;/w&gt; &lt;w&gt;'s&lt;/w&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;w&gt;let&lt;/w&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s3"&gt;&lt;foo&gt;&lt;w&gt;Here&lt;/w&gt;&lt;/foo&gt;&lt;bar&gt;&lt;w&gt;'s&lt;/w&gt;&lt;/bar&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;foo&gt;&lt;w&gt;let&lt;/w&gt;&lt;/foo&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s4"&gt;&lt;w&gt;Here&lt;/w&gt;&lt;bar&gt;&lt;baz&gt;&lt;w&gt;'s&lt;/w&gt;&lt;/baz&gt;&lt;/bar&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;baz&gt;&lt;bar&gt;&lt;w&gt;let&lt;/w&gt;&lt;/bar&gt;&lt;w&gt;'s&lt;/w&gt;&lt;/baz&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s5"&gt;&lt;w&gt;Look&lt;/w&gt; &lt;w&gt;here&lt;/w&gt;&lt;/s&gt; &lt;s id="s6"&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;another&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;&lt;/s&gt; &lt;/p&gt; &lt;/transcript&gt; </code></pre> <p>In this example, I want to replace "here's" with "hers is" and "let's" with "let us". Thus, my desired output is,</p> <pre><code>&lt;?xml version="1.0"?&gt; &lt;transcript&gt; &lt;p id="p1"&gt; &lt;s id="s1"&gt;&lt;w&gt;Here&lt;/w&gt; &lt;w&gt;is&lt;/w&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;w&gt;let&lt;/w&gt; &lt;w&gt;us&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s2"&gt;&lt;w&gt;Here&lt;/w&gt; &lt;w&gt;is&lt;/w&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;w&gt;let&lt;/w&gt; &lt;w&gt;us&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s3"&gt;&lt;foo&gt;&lt;w&gt;Here&lt;/w&gt;&lt;/foo&gt; &lt;bar&gt;&lt;w&gt;is&lt;/w&gt;&lt;/bar&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;foo&gt;&lt;w&gt;let&lt;/w&gt;&lt;/foo&gt; &lt;w&gt;us&lt;/w&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s4"&gt;&lt;w&gt;Here&lt;/w&gt; &lt;bar&gt;&lt;baz&gt;&lt;w&gt;is&lt;/w&gt;&lt;/baz&gt;&lt;/bar&gt; &lt;w&gt;an&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;, &lt;baz&gt;&lt;bar&gt;&lt;w&gt;let&lt;/w&gt;&lt;/bar&gt; &lt;w&gt;us&lt;/w&gt;&lt;/baz&gt; &lt;w&gt;consider&lt;/w&gt; &lt;w&gt;it&lt;/w&gt;&lt;/s&gt; &lt;s id="s5"&gt;&lt;w&gt;Look&lt;/w&gt; &lt;w&gt;here&lt;/w&gt;&lt;/s&gt; &lt;s id="s6"&gt;&lt;w&gt;'s&lt;/w&gt; &lt;w&gt;another&lt;/w&gt; &lt;w&gt;example&lt;/w&gt;&lt;/s&gt; &lt;/p&gt; &lt;/transcript&gt; </code></pre> <p>I was able to put together some (probably nothing near elegant or optimal) code that can handle <code>s1</code> and <code>s2</code>, but I do not see that I can generalize it to something useful.</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"&gt; &lt;xsl:output method="xml"/&gt; &lt;xsl:template match="@*|node()"&gt; &lt;xsl:copy&gt; &lt;xsl:apply-templates select="@*|node()"/&gt; &lt;/xsl:copy&gt; &lt;/xsl:template&gt; &lt;xsl:template match="w[translate(text(),'S','s')=&amp;quot;'s&amp;quot;][preceding-sibling::*[1]/self::w[translate(text(),'HERE','here')='here']]"&gt; &lt;xsl:text&gt; &lt;/xsl:text&gt; &lt;xsl:copy&gt;&lt;xsl:copy-of select="@*"/&gt;is&lt;/xsl:copy&gt; &lt;/xsl:template&gt; &lt;xsl:template match="w[translate(text(),'S','s')=&amp;quot;'s&amp;quot;][preceding-sibling::*[1]/self::w[translate(text(),'LET','let')='let']]"&gt; &lt;xsl:text&gt; &lt;/xsl:text&gt; &lt;xsl:copy&gt;&lt;xsl:copy-of select="@*"/&gt;us&lt;/xsl:copy&gt; &lt;/xsl:template&gt; &lt;/xsl:stylesheet&gt; </code></pre> <p>Some details:</p> <ul> <li><p>Assume words are all wrapped in <code>&lt;w&gt;</code> tags and that the "words" of interest are consecutive (though not necessarily siblings)</p></li> <li><p>Arbitrary tags may wrap one or the other or both of the word and the 's.</p></li> <li><p>The substitution should not cross sentence <code>&lt;s&gt;</code> boundaries (as shown in s5 and s6) - though if this is impossible, I will not cry.</p></li> <li><p>If a space already exists between word and 's, I still want to replace the 's. The exact spacing of the result (one space or two) does not matter.</p></li> <li><p>Ideally, the space will be added to the nearest common ancestor of the two <code>&lt;w&gt;</code> tags containing the word and the 's.</p></li> </ul> <p>Thanks for any guidance you can give!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload