Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The issue is that you're trying to parse SGML with an XML tool. They're not the same. If you want to use an XML tool/language to access the data, you will probably need to convert the SGML to XML before trying to parse it. </p> <p>Ideally you'd either use a language/tool that supports SGML (like OmniMark) or something that can handle "XML like" data (like nokogiri from the first answer?).</p> <p>This can be pretty straight forward, but can get tricky at some points. Especially if you're talking about multiple doctypes (DTD's). (Also, there's no such thing as "well-formed" SGML. Yes, the elements/etc. have to be nested correctly but SGML <em>has</em> to have a DTD.)</p> <p>Here are some differences between SGML and XML that you'd need to handle. (You may not want to go this route, but it may be helpful for informational purposes anyway.):</p> <ol> <li><p><strong>DOCTYPE declaration</strong></p> <p>The DOCTYPE declaration in your example is a perfectly valid SGML doctype. The <code>[]</code> (internal subset) doesn't have to have anything in it. If you do have declarations in the internal subset (usually entity declarations), you're more than likely going to have to keep a doctype declaration in the XML.</p> <p>The issue the XML parser is having is that you don't have a system identifier in the declaration. In an XML doctype declaration, the system identifier is required if there is a public identifier. In an SGML doctype declaration, it's not required. </p> <p><em>Bottom line</em>: unless you need the XML to parse to a DTD/Schema or have declarations in the internal subset, strip the doctype declaration. If the XML does have to be valid, you'll at least need to add a system identifier. Don't forget to add the <code>&lt;?xml ...?&gt;</code> processing instruction.</p></li> <li><p><strong>Elements without end tags</strong></p> <p>The <code>&lt;hardhyphen&gt;</code> and <code>&lt;hyphen&gt;</code> elements are valid SGML. SGML DTD's allow you to specify tag minimization. What this means is that you can specify whether or not an end tag is required. (You can also make the start tag optional, but that's crazy talk.) In XML you have to close these elements (like <code>&lt;hardhyphen/&gt;</code> or <code>&lt;hardhyphen&gt;&lt;/hardhyphen&gt;</code>)</p> <p>The best thing to do is to look at your SGML DTD and see what elements have optional end tags. The tag minimization is specified right after the element name in the element declaration. A '-' means the tag is required. An 'o' (letter 'oh') means that the tag is optional. For example if you see <code>&lt;!ELEMENT hyphen - o (#PCDATA)&gt;</code>, this means that the start tag is required (<code>-</code>) and the end tag is optional (<code>o</code>). If you see <code>&lt;!ELEMENT hyphen - - (#PCDATA)&gt;</code>, both the start and the end tags are required.</p> <p><em>Bottom line</em>: properly close all of the elements that don't have end tags</p></li> <li><p><strong>Processing instructions</strong></p> <p>Processing instructions (PI's) in SGML don't have the second <code>?</code> when they are closed like XML does. You'll need to add the second <code>?</code>.</p> <p>Example SGML PI: <code>&lt;?asdf jkl&gt;</code></p> <p>Example XML PI: <code>&lt;?asdf jkl?&gt;</code></p></li> <li><p><strong>Inclusions/Exclusions</strong></p> <p>You probably won't have to worry about this, but in an SGML DTD you can specify in an element declaration that another element is allowed anywhere inside of that element (or not allowed). This can be a pain if your target XML needs to parse to a DTD; XML DTD's do not allow inclusions/exclusions.</p> <p>This is what an inclusion might look like:</p> <p><code>&lt;!ELEMENT chapter - - (section)+ +(revst|revend)&gt;</code></p> <p>This is saying that <code>revst</code> or <code>revend</code> are allowed anywhere inside of <code>chapter</code>. If the element declaration had <code>-(revst|revend)</code>, it would mean that <code>revst</code> or <code>revend</code> is <em>not</em> allowed anywhere inside of <code>chapter</code>.</p></li> </ol> <p>Hope this helps.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload