Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to exclude table rows that contain specific strings between start and end tag from matching?
    primarykey
    data
    text
    <p><em>Context</em> The case is screen scraping web content using QuotaXML SDK 1.6 to finally display the data on the dashboard and on the iPhone. This QuotaXML tool offers regex for extracting table data only. QuotaXML does parse html tables using a three step approach. 1. First it identifies the table, for example using "<code>(?si)&lt;table.*?&gt;(.*?)&lt;/table&gt;</code>" 2. Second within this parsed table it identifies rows, like "<code>(?si)&lt;tr.*?&gt;(.*?)&lt;/tr&gt;</code>" 3. Third within this row scope, individual cells are identified like "<code>(?si)&lt;tr.*?&gt;(.*?)&lt;/tr&gt;</code>"</p> <p><em>The problem</em> The source html contains some rows that are not relevant data like lines or images that span full table width using a colspan. Or tables contain data cells which are not relevant to the data lines needed, like call detail records which also contain calls to freephones which are not substracted from the minutes in your plan, in this case 0800 and 00800 numbers. In other words <code>(.*?)</code> may not match ' colspan="' neither '>0800' neither '>00800'. </p> <p>In code:</p> <pre><code>exclude:&lt;tr&gt;&lt;td colspan="2"&gt;&lt;/td&gt;&lt;/tr&gt; include:&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Date&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt; exclude:&lt;tr&gt;&lt;td&gt;05-01-2011&lt;/td&gt;&lt;td&gt;08004913&lt;/td&gt;&lt;/tr&gt; include:&lt;tr&gt;&lt;td&gt;05-01-2011&lt;/td&gt;&lt;td&gt;0123456789&lt;/td&gt;&lt;/tr&gt; </code></pre> <p><em>Homework done</em> Even trying my first (start simple) tries to only exclude colspan are all failing:</p> <ol> <li><code>(?si)&lt;tr.*?&gt;(?!colspan)(.*?)&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr.*?&gt;(.*?)(?!colspan)&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr.*?&gt;.*?[^colspan].*?&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr(\s[^&gt;]*)?&gt;.*?(?!colspan).*?&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr(\s[^&gt;]*)?&gt;.*?(!colspan).*?&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr(\s[^&gt;]*)?&gt;(.*?)(?!colspan)&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr.*?&gt;^(?!.*?colspan=").*?&lt;/tr&gt;</code> <a href="https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex">How to negate specific word in regex?</a> seems related though these suggestions don't result in a match at all.</li> <li><code>(?si)&lt;tr.*?&gt;(.(?&lt;!colspan))*?&lt;/tr&gt;</code></li> <li><code>(?si)&lt;tr.*?&gt;(?!.*colspan).*&lt;/tr&gt;</code> Neither do give do positive and negative lookarounds using <a href="http://www.regular-expressions.info/lookaround.html" rel="nofollow noreferrer">http://www.regular-expressions.info/lookaround.html</a> the clue.</li> </ol> <p>How should I correctly write this regex?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload