Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>UPDATE:</strong></p> <p>Yes, you can try to process the 3 files in "parallel" using SAX parsers, if your callbacks implement "Sleep/wake up/check if other SAX parsers said to proceed" mechanism. Basically poor approximation of threads and messaging. </p> <p>It would only work if the elements in each XML file were ordered in the same exact order and ideally in alphabetical one - this way you can move linearly inside each file via SAX parser and guarantee that you hit identical elements at the same time and thus only hold 3-6 elements in memory at once. It's basically merging 3 sorted arrays into 1 sorted array.</p> <p>I <strong>seriously</strong> doubt this approach would even remotely be superior to the original algo I listed below, but if that's what you want to try to implement, go for it.</p> <p><strong>ORIGINAL:</strong></p> <p>Basically, the best (if not the only) way of doing what you want is to build a database of all the elements in need of merging. </p> <p>Probably mapping an element name-or-id to N true/false fields, one for each XML file; or even a single yes/no value for "already merged" - I will use the latter option in my example logic below.</p> <p>Whether that database would be implemented as in-memory-hash; or a tied hash stored in a file to avoid memory issues, or a proper database (implemented as XML, or as SQLite, or DBM, or a real database backend) is less important; except that the first option obviously sucks memory-consumption-wise. </p> <p>Please note the XML database option, since you MIGHT manage to use the resulting XML file as the database. That might actually be your easiest option, not sure - I would personally recommend a tied hash or real database back-end if you have one.</p> <p>Having done that, the algorithm is obvious:</p> <ul> <li><p>Loop over each file using SAX parser</p></li> <li><p>On each element found, search out that element in the database. if already marked as processed, skip. If not, add to database as processed.</p></li> <li><p>Find that same element in all the <strong>subsequent</strong> files, using XPath. E.g. when processing file2.xml, only search file3.xml, since file1.xml would not have the element (or else it would have been processed out of file1.xml and already appear in database).</p></li> <li><p>Merge all the elements you found using XPath as well as the element from the current file, and insert into resultant XML file and save that.</p></li> <li><p>End both loops.</p></li> </ul> <p>Please note that this answer does not directly address which modules to use to implement each step - presumably XML::Parser or any other sax parser for parsing, XML::XPath for searching in other files, and something like XML::SAX::Writer to write resulting file I presume, though as I never had to write a file in non-DOM model, I don't want to make te latter an official recommendation; and if you want to know which module is best for that you may want to make that into a separate question or hope someone else answers this one with more precise module recommendations.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload