Note that there are some explanatory texts on larger screens.

plurals
  1. POC# Web Crawler/Parser/Spider
    primarykey
    data
    text
    <p>I'm new in a C# and WinForms I want to create a web crawler (parser) - which can parse a web pages and showing them hierarchically. + I don't know how to make bot crawling with a specific hyper-link depth.</p> <p>So I think I have 2 questions: </p> <ol> <li>How to make bot crawling with specified link depth?</li> <li>How to show all hyperlinks hierarchically?</li> </ol> <p>P.S. I would be great if it'll be a code samples.</p> <p>P.P.S. have 1 button = button1; and 1 richtextbox = richTextBox1;</p> <p>Here is my code: I know it's very ugly.... (all code in a one button):</p> <pre><code>public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void button1_Click(object sender, EventArgs e) { //Declaration HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url); HttpWebResponse response = (HttpWebResponse) request.GetResponse(); StreamReader sr = new StreamReader(response.GetResponseStream()); Match m; string anotherTest = @"(((ht){1}tp[s]?://)[-a-zA-Z0-9@:%_\+.~#?&amp;\\]+)"; List&lt;string&gt; savedUrls = new List&lt;string&gt;(); List&lt;string&gt; titles = new List&lt;string&gt;(); //Go to this URL: string url = UrlTextBox.Text = "http://www.yahoo.com"; if (!(url.StartsWith("http://") || url.StartsWith("https://"))) url = "http://" + url; //Scrape Whole Html code: string s = sr.ReadToEnd(); try { // Get Urls: m = Regex.Match(s, anotherTest, RegexOptions.IgnoreCase | RegexOptions.Compiled, TimeSpan.FromSeconds(1)); while (m.Success) { savedUrls.Add(m.Groups[1].ToString()); m = m.NextMatch(); } // Get TITLES: Match m2 = Regex.Match(s, @"&lt;title&gt;\s*(.+?)\s*&lt;/title&gt;"); if (m2.Success) { titles.Add(m2.Groups[1].Value); } //Show Title: richTextBox1.Text += titles[0] + "\n"; //Show Urls: TrimUrls(ref savedUrls); } catch (RegexMatchTimeoutException) { Console.WriteLine("The matching operation timed out."); } sr.Close(); } private void TrimUrls(ref List&lt;string&gt; urls) { List&lt;string&gt; d = urls.Distinct().ToList(); foreach (var v in d) { if (v.IndexOf('.') != -1 &amp;&amp; v != "http://www.w3.org") { richTextBox1.Text += v + "\n"; } } } } </code></pre> <p>}</p> <p>And one more question: Is Anybody know how to save it in XML like a tree?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload