Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>OpenDocument (.odt) is practically a zip package containing multiple xml files. Content.xml contains the actual textual content of the document. We are interested in headings and they can be found inside text:h tags. Read more about <a href="http://www.langintro.com/odfdom_tutorials/odf_internals.html" rel="nofollow">ODT</a>.</p> <p>I found an implementation for extracting headings from .odt files with <a href="http://technosophos.com/content/reading-odt-files-querypath" rel="nofollow">QueryPath</a>.</p> <p>Since the original question was about Java, here it is. First we need to get access to content.xml by using ZipFile. Then we use SAX to parse xml content out of content.xml. Sample code simply prints out all the headings:</p> <p><pre><code>Test3.odt content.xml 3764 1 My New Great Paper 2 Abstract 2 Introduction 2 Content 3 More content 3 Even more 2 Conclusions </pre><code></p> <p>Sample code:</p> <pre> public void printHeadingsOfOdtFIle(File odtFile) { try { ZipFile zFile = new ZipFile(odtFile); System.out.println(zFile.getName()); ZipEntry contentFile = zFile.getEntry("content.xml"); System.out.println(contentFile.getName()); System.out.println(contentFile.getSize()); XMLReader xr = XMLReaderFactory.createXMLReader(); OdtDocumentContentHandler handler = new OdtDocumentContentHandler(); xr.setContentHandler(handler); xr.parse(new InputSource(zFile.getInputStream(contentFile))); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { new OdtDocumentStructureExtractor().printHeadingsOfOdtFIle(new File("Test3.odt")); } </code></pre> <p>Relevant parts of used ContentHandler look like this:</p> <pre> @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { temp = ""; if("text:h".equals(qName)) { String headingLevel = atts.getValue("text:outline-level"); if(headingLevel != null) { System.out.print(headingLevel + " "); } } } @Override public void characters(char[] ch, int start, int length) throws SAXException { char[] subArray = new char[length]; System.arraycopy(ch, start, subArray, 0, length); temp = new String(subArray); fullText.append(temp); } @Override public void endElement(String uri, String localName, String qName) throws SAXException { if("text:h".equals(qName)) { System.out.println(temp); } } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload