Note that there are some explanatory texts on larger screens.

plurals
  1. POParse broken HTML Code using Nodejs & Cheerio
    primarykey
    data
    text
    <p>I am trying to scrape a pure static html page with tabular data in it using Nodejs &amp; Cheerio. But the problem is that, the page is am trying to scrape doesnt have proper HTML DOM. I mean, there are many opening tags which are not closed. There are other closing tags(<code>&lt;/table&gt;</code>) which has no openings.</p> <p>A sample Code (Alert: The Code is close to real sample &amp; html is broken)</p> <pre><code> &lt;body topmargin="0" leftmargin="0" marginheight="0" marginwidth="0" bgcolor="#FFFFFF" text="#000000" link="#003399" vlink="#003399" alink="#FF8000"&gt; &lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tr&gt;&lt;td bgcolor="#445BC6"&gt;hii&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td align="right" bgcolor="#D9D9E8" width="100%"&gt; &lt;p class="menu"&gt;&lt;b&gt;&lt;font color="#000000"&gt;&lt;a href="details.php?type=contact&amp;npo_id=18430"&gt;Individuals&lt;/a&gt;&lt;/font&gt;&lt;/b&gt;&amp;nbsp;&amp;nbsp; &lt;/td&gt; &lt;/tr&gt; &lt;/table&gt; &lt;P&gt; &lt;TABLE CELLPADDING=8&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt; &lt;TABLE CELLPADDING=8 STYLE="border-collapse: collapse" BORDER=1 WIDTH=80% ALIGN=cemter&gt; &lt;TR&gt;&lt;TD BGCOLOR="D8D8C4" VALIGN=top ALIGN=right&gt;&lt;P&gt;&lt;B&gt;Data 1&lt;/B&gt;&lt;/TD&gt; &lt;TD&gt;&lt;P&gt;&lt;B&gt;Data 2&lt;/B&gt;&lt;/TD&gt; &lt;/TR&gt; &lt;TR&gt;&lt;TD BGCOLOR="D8D8C4" VALIGN=top ALIGN=right&gt;&lt;P&gt;&lt;B&gt;Data 3&lt;/B&gt;&lt;/TD&gt; &lt;TD&gt;&lt;P&gt;Data 4&lt;/TD&gt; &lt;/TR&gt; &lt;/TABLE&gt; &lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt; &lt;tr&gt; &lt;td width="100%" valign="bottom" colspan="2" align="center"&gt; &lt;p&gt; &lt;a href="#top"&gt;another dirty content&lt;/a&gt;&lt;br&gt; &lt;a href="#top"&gt;&lt;font color="#000000"&gt;table is wrong&lt;/font&gt;&lt;/a&gt;&lt;/p&gt; &lt;/td&gt; &lt;/tr&gt;&lt;/table&gt;&lt;/div&gt; </code></pre> <p>&nbsp;</p> <p></p> <p>As one can see there are P tags which are not closing.. At the bottom there is <code>&lt;/table&gt;</code> &amp; <code>&lt;/div&gt;</code> tags which are not opening. So how do i fetch Data1, Data2, Data3, Data4 using cheerio &amp; nodejs ? Any other library will also do which is efficient in parsing such data</p> <p>EDIT(SOLUTION): The problem is solved. All I did was just converted the html tags to lower-case and it worked fine..Am not sure why lower-case is important but it just worked for cheerio</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload