Note that there are some explanatory texts on larger screens.

plurals
  1. POHow best use Regular Expressions to convert Heirarchical Text File into XML?
    primarykey
    data
    text
    <p>Good morning - </p> <p>I'm interested in seeing an efficient way of parsing the values of an <em>heirarchical</em> text file (i.e., one that has a Title => Multiple Headings => Multiple Subheadings => Multiple Keys => Multiple Values) into a simple XML document. For the sake of simplicity, the answer would be written using:</p> <ul> <li>Regex (preferrably in PHP)</li> <li>or, PHP code (e.g., if looping were more efficient)</li> </ul> <p>Here's an example of an Inventory file I'm working with. Note that Header = <strong>FOODS</strong>, Sub-Header = <strong>Type (A, B...)</strong>, Keys = <strong>PRODUCT (or CODE, etc.)</strong> and Values may have one more more lines.</p> <pre><code>**FOODS - TYPE A** ___________________________________ **PRODUCT** 1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 2) La Fe String Cheese **CODE** Sell by date going back to February 1, 2009 **MANUFACTURER** Quesos Mi Pueblito, LLC, Passaic, NJ. **VOLUME OF UNITS** 11,000 boxes **DISTRIBUTION** NJ, NY, DE, MD, CT, VA ___________________________________ **PRODUCT** 1) Peanut Brittle No Sugar Added; 2) Peanut Brittle Small Grind; 3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating **CODE** 1) Lots 7109 - 8350 inclusive; 2) Lots 8198 - 8330 inclusive; 3) Lots 7075 - 9012 inclusive; 4) Lots 7100 - 8057 inclusive; 5) Lots 7152 - 8364 inclusive **MANUFACTURER** Star Kay White, Inc., Congers, NY. **VOLUME OF UNITS** 5,749 units **DISTRIBUTION** NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN **FOODS - TYPE B** ___________________________________ **PRODUCT** Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice; **CODE** 990-10/2 10/5 **MANUFACTURER** San Mar Manufacturing Corp., Catano, PR. **VOLUME OF UNITS** 384 **DISTRIBUTION** PR </code></pre> <p>And here's the desired output (please excuse any XML syntactical errors):</p> <pre><code>&lt;foods&gt; &lt;food type = "A" &gt; &lt;product&gt;Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese&lt;/product&gt; &lt;product&gt;La Fe String Cheese&lt;/product&gt; &lt;code&gt;Sell by date going back to February 1, 2009&lt;/code&gt; &lt;manufacturer&gt;Quesos Mi Pueblito, LLC, Passaic, NJ.&lt;/manufacturer&gt; &lt;volume&gt;11,000 boxes&lt;/volume&gt; &lt;distibution&gt;NJ, NY, DE, MD, CT, VA&lt;/distribution&gt; &lt;/food&gt; &lt;food type = "A" &gt; &lt;product&gt;Peanut Brittle No Sugar Added&lt;/product&gt; &lt;product&gt;Peanut Brittle Small Grind&lt;/product&gt; &lt;product&gt;Homestyle Peanut Brittle Nuggets/Coconut Oil Coating&lt;/product&gt; &lt;code&gt;Lots 7109 - 8350 inclusive&lt;/code&gt; &lt;code&gt;Lots 8198 - 8330 inclusive&lt;/code&gt; &lt;code&gt;Lots 7075 - 9012 inclusive&lt;/code&gt; &lt;code&gt;Lots 7100 - 8057 inclusive&lt;/code&gt; &lt;code&gt;Lots 7152 - 8364 inclusive&lt;/code&gt; &lt;manufacturer&gt;Star Kay White, Inc., Congers, NY.&lt;/manufacturer&gt; &lt;volume&gt;5,749 units&lt;/volume&gt; &lt;distibution&gt;NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN&lt;/distribution&gt; &lt;/food&gt; &lt;food type = "B" &gt; &lt;product&gt;Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice&lt;/product&gt; &lt;code&gt;990-10/2 10/5&lt;/code&gt; &lt;manufacturer&gt;San Mar Manufacturing Corp., Catano, PR&lt;/manufacturer&gt; &lt;volume&gt;384&lt;/volume&gt; &lt;distibution&gt;PR&lt;/distribution&gt; &lt;/food&gt; &lt;/FOODS&gt; &lt;!-- and so forth --&gt; </code></pre> <p>So far, my approach (which might be quite inefficient with a huge text file) would be one of the following:</p> <ol> <li><p><strong>Loops and multiple Select/Case statements</strong>, where the file is loaded into a string buffer, and while looping through each line, see if it matches one of the header/subheader/key lines, append the appropriate xml tag to a xml string variable, and then add the child nodes to the xml based on IF statements regarding which key name is most recent (which seems time-consuming and error-prone, esp. if the text changes even slightly) -- OR</p></li> <li><p><strong>Use REGEX (Regular Expressions)</strong> to find and replace key fields with appropriate xml tags, clean it up with an xml library, and export the xml file. Problem is, I barely use regular expressions, so I'd need some <em>example-based</em> help.</p></li> </ol> <p>Any help or advice would be appreciated.</p> <p>Thanks.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload