Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>OOXML</strong> is a defined standard which has <strong><a href="http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-376,%20Second%20Edition,%20Part%201%20-%20Fundamentals%20And%20Markup%20Language%20Reference.zip" rel="nofollow">its own specification</a></strong>. To create a general transform from <strong>OOXML</strong> to <strong>HTML</strong> (that's interesting, even if I think there should be already existing implementations around the web) you should study at least a bit of the standard (and you need to study a bit of XSLT I think).</p> <p>Generally (very generally), the contents of a WordML document is mainly composed by <code>w:p</code> (paragraphs) elements containing <code>w:r</code> <em>runs</em> (region of text with same properties). Inside each run, you can normally find the text properties of the region (<code>w:rPr</code>) and the text itself (<code>w:t</code>).</p> <p>The model is much more intricated, but you can start working on this general structure.</p> <p>For instance, you can start working with the following (a bit) general transform. Note that it manages only paragraphs with bold, italic and undelined text.</p> <hr> <p><strong>XSLT 2.0</strong> tested under <strong>Saxon-HE 9.2.1.1J</strong></p> <pre><code>&lt;xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" exclude-result-prefixes="w"&gt; &lt;xsl:output method="html"/&gt; &lt;xsl:strip-space elements="*"/&gt; &lt;xsl:template match="w:document/w:body"&gt; &lt;html&gt; &lt;body&gt; &lt;xsl:apply-templates select="w:p"/&gt; &lt;/body&gt; &lt;/html&gt; &lt;/xsl:template&gt; &lt;!-- match paragraph --&gt; &lt;xsl:template match="w:p"&gt; &lt;p&gt; &lt;xsl:apply-templates select="w:r"/&gt; &lt;/p&gt; &lt;/xsl:template&gt; &lt;!-- match run with property --&gt; &lt;xsl:template match="w:r[w:rPr]"&gt; &lt;xsl:apply-templates select="w:rPr/*[1]"/&gt; &lt;/xsl:template&gt; &lt;!-- Recursive template for bold, italic and underline properties applied to the same run. Escape to paragraph text --&gt; &lt;xsl:template match="w:b | w:i | w:u"&gt; &lt;xsl:element name="{local-name(.)}"&gt; &lt;xsl:choose&gt; &lt;!-- recurse to next sibling property i, b or u --&gt; &lt;xsl:when test="count(following-sibling::*[1])=1"&gt; &lt;xsl:apply-templates select="following-sibling::* [local-name(.)='i' or local-name(.)='b' or local-name(.)='u']"/&gt; &lt;/xsl:when&gt; &lt;xsl:otherwise&gt; &lt;!-- escape to text --&gt; &lt;xsl:apply-templates select="parent::w:rPr/ following-sibling::w:t"/&gt; &lt;/xsl:otherwise&gt; &lt;/xsl:choose&gt; &lt;/xsl:element&gt; &lt;/xsl:template&gt; &lt;!-- match run without property --&gt; &lt;xsl:template match="w:r[not(w:rPr)]"&gt; &lt;xsl:apply-templates select="w:t"/&gt; &lt;/xsl:template&gt; &lt;!-- match text --&gt; &lt;xsl:template match="w:t"&gt; &lt;xsl:value-of select="."/&gt; &lt;/xsl:template&gt; &lt;/xsl:stylesheet&gt; </code></pre> <p>Applied on:</p> <pre><code>&lt;w:document xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"&gt; &lt;w:body&gt; &lt;w:p&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t xml:space="preserve"&gt;This is a &lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t xml:space="preserve"&gt;bold &lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;w:i/&gt; &lt;/w:rPr&gt; &lt;w:t&gt;with a bit of italic&lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t xml:space="preserve"&gt; &lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t&gt;paragr&lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t&gt;a&lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:rPr&gt; &lt;w:b/&gt; &lt;/w:rPr&gt; &lt;w:t&gt;ph&lt;/w:t&gt; &lt;/w:r&gt; &lt;w:r&gt; &lt;w:t xml:space="preserve"&gt; with some non-bold in it too.&lt;/w:t&gt; &lt;/w:r&gt; &lt;/w:p&gt; &lt;/w:body&gt; &lt;/w:document&gt; </code></pre> <p>produces:</p> <pre><code>&lt;html&gt; &lt;body&gt; &lt;p&gt;&lt;b&gt;This is a &lt;/b&gt;&lt;b&gt;bold &lt;/b&gt;&lt;b&gt;&lt;i&gt;with a bit of italic&lt;/i&gt;&lt;/b&gt;&lt;b&gt; &lt;/b&gt;&lt;b&gt;paragr&lt;/b&gt;&lt;b&gt;a&lt;/b&gt;&lt;b&gt;ph&lt;/b&gt; with some non-bold in it too. &lt;/p&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <hr> <p>The side effect of having grotesque HTML code is unavoidable, due to the WordML underlaying schema. Perhaps the task of making the final HTML much legible could be deferred to some user friendly (and powerful) utility like <a href="http://tidy.sourceforge.net/" rel="nofollow">HTML tidy</a>.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload