Note that there are some explanatory texts on larger screens.

plurals
  1. POBest way to parse HTML in Javascript
    primarykey
    data
    text
    <p>I am having a lot of trouble learning RegExp and coming up with a good algorithm to do this. I have this string of HTML that I need to parse. Note that when I am parsing it, it is still a string object and not yet HTML on the browser as I need to parse it before it gets there. The HTML looks like this: </p> <pre><code>&lt;html&gt; &lt;head&gt; &lt;title&gt;Geoserver GetFeatureInfo output&lt;/title&gt; &lt;/head&gt; &lt;style type="text/css"&gt; table.featureInfo, table.featureInfo td, table.featureInfo th { border:1px solid #ddd; border-collapse:collapse; margin:0; padding:0; font-size: 90%; padding:.2em .1em; } table.featureInfo th { padding:.2em .2em; font-weight:bold; background:#eee; } table.featureInfo td{ background:#fff; } table.featureInfo tr.odd td{ background:#eee; } table.featureInfo caption{ text-align:left; font-size:100%; font-weight:bold; text-transform:uppercase; padding:.2em .2em; } &lt;/style&gt; &lt;body&gt; &lt;table class="featureInfo2"&gt; &lt;tr&gt; &lt;th class="dataLayer" colspan="5"&gt;Tibetan Villages&lt;/th&gt; &lt;/tr&gt; &lt;!-- EOF Data Layer --&gt; &lt;tr class="dataHeaders"&gt; &lt;th&gt;ID&lt;/th&gt; &lt;th&gt;Latitude&lt;/th&gt; &lt;th&gt;Longitude&lt;/th&gt; &lt;th&gt;Place Name&lt;/th&gt; &lt;th&gt;English Translation&lt;/th&gt; &lt;/tr&gt; &lt;!-- EOF Data Headers --&gt; &lt;!-- Data --&gt; &lt;tr&gt; &lt;!-- Feature Info Data --&gt; &lt;td&gt;3394&lt;/td&gt; &lt;td&gt;29.1&lt;/td&gt; &lt;td&gt;93.15&lt;/td&gt; &lt;td&gt;བསྡམས་གྲོང་ཚོ།&lt;/td&gt; &lt;td&gt;Dam Drongtso &lt;/td&gt; &lt;/tr&gt; &lt;!-- EOF Feature Info Data --&gt; &lt;!-- End Data --&gt; &lt;/table&gt; &lt;br/&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>and I need to get it like this:</p> <pre><code>3394, 29.1, 93.15, བསྡམས་གྲོང་ཚོ།, Dam Drongtso </code></pre> <p>Basically an array...even better if it matches according to its field headers and from which table they are somehow, which look like this:</p> <pre><code>Tibetan Villages ID Latitude Longitude Place Name English Translation </code></pre> <p>Finding out JavaScript does not support wonderful mapping was a bummer and I have what I want working already. However it is VERY VERY hard coded and I'm thinking I should probably use RegExp to handle this better. Unfortunately I am having a real tough time :(. Here is my function to parse my string (very ugly IMO):</p> <pre><code> function parseHTML(html){ //Getting the layer name alert(html); //Lousy attempt at RegExp var somestring = html.replace('/m//\&lt;html\&gt;+\&lt;body\&gt;//m/',' '); alert(somestring); var startPos = html.indexOf('&lt;th class="dataLayer" colspan="5"&gt;'); var length = ('&lt;th class="dataLayer" colspan="5"&gt;').length; var endPos = html.indexOf('&lt;/th&gt;&lt;/tr&gt;&lt;!-- EOF Data Layer --&gt;'); var dataLayer = html.substring(startPos + length, endPos); //Getting the data headers startPos = html.indexOf('&lt;tr class="dataHeaders"&gt;'); length = ('&lt;tr class="dataHeaders"&gt;').length; endPos = html.indexOf('&lt;/tr&gt;&lt;!-- EOF Data Headers --&gt;'); var newString = html.substring(startPos + length, endPos); newString = newString.replace(/&lt;th&gt;/g, ''); newString = newString.substring(0, newString.lastIndexOf('&lt;/th&gt;')); var featureInfoHeaders = new Array(); featureInfoHeaders = newString.split('&lt;/th&gt;'); //Getting the data startPos = html.indexOf('&lt;!-- Data --&gt;'); length = ('&lt;!-- Data --&gt;').length; endPos = html.indexOf('&lt;!-- End Data --&gt;'); newString = html.substring(startPos + length, endPos); newString = newString.substring(0, newString.lastIndexOf('&lt;/tr&gt;&lt;!-- EOF Feature Info Data --&gt;')); var featureInfoData = new Array(); featureInfoData = newString.split('&lt;/tr&gt;&lt;!-- EOF Feature Info Data --&gt;'); for(var s = 0; s &lt; featureInfoData.length; s++){ startPos = featureInfoData[s].indexOf('&lt;!-- Feature Info Data --&gt;'); length = ('&lt;!-- Feature Info Data --&gt;').length; endPos = featureInfoData[s].lastIndexOf('&lt;/td&gt;'); featureInfoData[s] = featureInfoData[s].substring(startPos + length, endPos); featureInfoData[s] = featureInfoData[s].replace(/&lt;td&gt;/g, ''); featureInfoData[s] = featureInfoData[s].split('&lt;/td&gt;'); }//end for alert(featureInfoData); //Put all the feature info in one array var featureInfo = new Array(); var len = featureInfoData.length; for(var j = 0; j &lt; len; j++){ featureInfo[j] = new Object(); featureInfo[j].id = featureInfoData[j][0]; featureInfo[j].latitude = featureInfoData[j][1]; featureInfo[j].longitude = featureInfoData[j][2]; featureInfo[j].placeName = featureInfoData[j][3]; featureInfo[j].translation = featureInfoData[j][4]; }//end for //This can be ignored for now... var string = redesignHTML(featureInfoHeaders, featureInfo); return string; }//end parseHTML </code></pre> <p>So as you can see if the content in that string ever changes, my code will be horribly broken. I want to avoid that as much as possible and try to write better code. I appreciate all the help and advice you can give me. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload