Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to extract data from html table in shell script?
    primarykey
    data
    text
    <p>I am trying to create a BASH script what would extract the data from HTML table. Below is the example of table from where I need to extract data:</p> <pre><code>&lt;table border=1&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;Component&lt;/b&gt;&lt;/td&gt; &lt;td&gt;&lt;b&gt;Status&lt;/b&gt;&lt;/td&gt; &lt;td&gt;&lt;b&gt;Time / Error&lt;/b&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;&lt;td&gt;SAVE_DOCUMENT&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.406 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;GET_DOCUMENT&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.332 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;DVK_SEND&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.001 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;DVK_RECEIVE&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.001 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;GET_USER_INFO&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.143 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;NOTIFICATIONS&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.001 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;ERROR_LOG&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.001 s&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;SUMMARY_STATUS&lt;/td&gt;&lt;td&gt;OK&lt;/td&gt;&lt;td&gt;0.888 s&lt;/td&gt;&lt;/tr&gt; &lt;/table&gt; </code></pre> <p>And I want the BASH script to output it like so:</p> <pre><code>SAVE_DOCUMENT OK 0.475 s GET_DOCUMENT OK 0.345 s DVK_SEND OK 0.002 s DVK_RECEIVE OK 0.001 s GET_USER_INFO OK 4.465 s NOTIFICATIONS OK 0.001 s ERROR_LOG OK 0.002 s SUMMARY_STATUS OK 5.294 s </code></pre> <p>How to do it?</p> <p>So far I have tried using the sed, but I don't know how to use it quite well. The header of the table(Component, Status, Time/Error) I excluded with grep using <code>grep "&lt;tr&gt;&lt;td&gt;</code>, so only lines starting with <code>&lt;tr&gt;&lt;td&gt;</code> will be selected for next parsing (sed). This is what I used: <code>sed 's@&lt;\([^&lt;&gt;][^&lt;&gt;]*\)&gt;\([^&lt;&gt;]*\)&lt;/\1&gt;@\2@g'</code> But then <code>&lt;tr&gt;</code> tags still remain and also it wont separate the strings. In other words the result of this script is:</p> <pre><code>&lt;tr&gt;SAVE_DOCUMENTOK0.406 s&lt;/tr&gt; </code></pre> <p>The full command of the script I'm working on is:</p> <pre><code>cat $FILENAME | grep "&lt;tr&gt;&lt;td&gt;" | sed 's@&lt;\([^&lt;&gt;][^&lt;&gt;]*\)&gt;\([^&lt;&gt;]*\)&lt;/\1&gt;@\2@g' </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload