Note that there are some explanatory texts on larger screens.

plurals
  1. POHelp extracting text from html tag with Java and Regex
    primarykey
    data
    text
    <p>I would like to extract some text from an html file using Regex. I am learning regex and I still have trouble understanding it all. I have a code which extracts all the text included betweeen <code>&lt;body&gt;</code> and <code>&lt;/body&gt;</code> here it is:</p> <pre><code>public class Harn2 { public static void main(String[] args) throws IOException{ String toMatch=readFile(); //Pattern pattern=Pattern.compile(".*?&lt;body.*?&gt;(.*?)&lt;/body&gt;.*?"); this one works fine Pattern pattern=Pattern.compile(".*?&lt;table class=\"claroTable\".*?&gt;(.*?)&lt;/table&gt;.*?"); //I want this one to work Matcher matcher=pattern.matcher(toMatch); if(matcher.matches()) { System.out.println(matcher.group(1)); } } private static String readFile() { try{ // Open the file that is the first // command line parameter FileInputStream fstream = new FileInputStream("user.html"); // Get the object of DataInputStream DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine = null; //Read File Line By Line while (br.readLine() != null) { // Print the content on the console //System.out.println (strLine); strLine+=br.readLine(); } //Close the input stream in.close(); return strLine; }catch (Exception e){//Catch exception if any System.err.println("Error: " + e.getMessage()); return ""; } } } </code></pre> <p>Well it works fine like this but now I would like to extract the text between the tag: <code>&lt;table class="claroTable"&gt;</code> and <code>&lt;/table&gt;</code></p> <p>So I replace my regex string by <code>".*?&lt;table class=\"claroTable\".*?&gt;(.*?)&lt;/table&gt;.*?"</code> I have also tried <code>".*?&lt;table class=\"claroTable\"&gt;(.*?)&lt;/table&gt;.*?"</code> but it doesn't work and I don't understand why. There is only one table in the html file but there is an occurence of "table" in a javascript code : "...dataTables.js..." could that be the reason for the mistake?</p> <p>Thank you in advance for helping me,</p> <p>EDIT: the html text to extranct is something like:</p> <pre><code>&lt;body&gt; ..... &lt;table class="claroTable"&gt; &lt;td&gt;&lt;th&gt;some data and manya many tags &lt;/td&gt; ..... &lt;/table&gt; </code></pre> <p>What I would like to extract is anything between <code>&lt;table class="claroTable"&gt;</code> and <code>&lt;/table&gt;</code> </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload