Note that there are some explanatory texts on larger screens.

plurals
  1. POFind text region which include article content in HTML
    text
    copied!<p>Recently I want to get information in HTML source by Java. The base need is to get the main content area of the HTML. For example, the following is HTML source for example:</p> <pre><code>&lt;html&gt; &lt;head&gt; &lt;tilte&gt; chinese charactor --中文 &lt;title&gt; &lt;/head&gt; &lt;body&gt; &lt;div&gt; this is something area including Chinese charactor.,like meun I don't need, &lt;/div&gt; &lt;div&gt; this is something area including Chinese charactor,like ads I don't need, &lt;/div&gt; &lt;div&gt; this is main content, include the content I need. almost every content is filled by many Chinese charactor.Like: 好好学习,天天向上。 我爱stackoverflow.谢谢你的帮助,非常感谢! &lt;/div&gt; &lt;div&gt; this is foot area, also including Chinese charactor ,but I don't need. &lt;/div&gt; &lt;/body&gt; &lt;/html&gt; </code></pre> <p>This HTML source is a simple one; There are many different and complex sources. I want to parse the div or other element area which contain the main content by java. The result I want is:</p> <pre><code>&lt;div&gt; This is main content, include the content I need. almost every content is filled by many Chinese character like: 好好学习,天天向上。 我爱stackoverflow.谢谢你的帮助,非常感谢! &lt;/div&gt; </code></pre> <p>There are tens of thousands of divs which have different content in them, and the div id is unknown or different. The divs have many different conditions, such as p tags. Is there a way to judge the Chinese character's appearance or distribution to parse the content? </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload