Note that there are some explanatory texts on larger screens.

plurals
  1. POScraping data from all asp.net pages with AJAX pagination implemented
    text
    copied!<p>I want to scrap a webpage containing a list of user with addresses, email etc. webpage contain list of user with pagination i.e. page contains 10 users when I click on page 2 link it will load users list form 2nd page via AJAX and update list so on for all pagination links.</p> <p>Website is developed in asp i.e. page with extension .aspx since I don't know anything about asp.net and how asp manages pagination and AJAX </p> <p>I am using simple html dom <a href="http://sourceforge.net/projects/simplehtmldom/">http://sourceforge.net/projects/simplehtmldom/</a> to scrap contain </p> <p>for pages having users <code>&lt;=10</code> I dont have to simulate AJAX request same as when user clicks on pagination link</p> <p>but for page having pagination to get data from other pages I am simulating post AJAX request </p> <pre><code>require 'simple_html_dom.php'; $html = file_get_html('www.example.com/user_list.aspx'); $viewstate = $html-&gt;find("#__VIEWSTATE"); $viewstate = $viewstate[0]-&gt;attr['value']; $eventvalidation = $html-&gt;find("#__EVENTVALIDATION"); $eventvalidation = $eventvalidation[0]-&gt;attr['value']; $number_of_pageinations = 3; $pageNumberCodes = array( 'ctl00$cphMainContent$rdpMembers$ctl01$ctl01', 'ctl00$cphMainContent$rdpMembers$ctl01$ctl02', 'ctl00$cphMainContent$rdpMembers$ctl01$ctl03' ); // this code is added for each page in POST as __EVENTTARGET for ($i = 0; $i &lt; $number_of_pageinations; $i++) { $options = array( CURLOPT_RETURNTRANSFER =&gt; true, // return web page CURLOPT_HEADER =&gt; false, // don't return headers CURLOPT_ENCODING =&gt; "", // handle all encodings CURLOPT_USERAGENT =&gt; "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7'", // who am i CURLOPT_AUTOREFERER =&gt; true, // set referer on redirect CURLOPT_CONNECTTIMEOUT =&gt; 120, // timeout on connect CURLOPT_TIMEOUT =&gt; 1120, // timeout on response CURLOPT_MAXREDIRS =&gt; 10, // stop after 10 redirects CURLOPT_POST =&gt; true, CURLOPT_VERBOSE =&gt; true, CURLOPT_POSTFIELDS =&gt; urlencode('ctl00%24scriptManager=ctl00%24cphMainContent%24ctl00%24cphMainContent%24rdpMembersPanel%7C' . $pageNumberCodes[0] . '&amp;__EVENTTARGET=' . $pageNumberCodes[0] . '&amp;__EVENTARGUMENT=' . '&amp;__VIEWSTATE=' . $viewstate . '&amp;__EVENTVALIDATION=' . $eventvalidation . "&amp;google=" . '&amp;ctl00%24cphMainContent%24txtZip=' . '&amp;ctl00%24cphMainContent%24cboRadius=Exact' . '&amp;ctl00%24cphMainContent%24txtMemberName=' . '&amp;ctl00%24cphMainContent%24txtCity=Honolulu' . '&amp;ctl00%24cphMainContent%24cboState=HI' . '&amp;ctl00%24cphMainContent%24txtAddress=' . '&amp;ctl00_cphMainContent_rdpMembers_ClientState=' . '&amp;ctl00%24cphMainContent%24ddList=-Select%20field%20to%20sort-' . '&amp;ctl00_cphMainContent_ddList_ClientState=' . '&amp;ctl00_cphMainContent_rdlMembers_ClientState=' . '&amp;ctl00_cphMainContent_ddList_ClientState=' . '&amp;ctl00_cphMainContent_rdlMembers_ClientState=' . '&amp;ctl00_cphMainContent_rdpMembers1_ClientState=' . '&amp;__ASYNCPOST=true' . 'RadAJAXControlID=ctl00_cphMainContent_RadAjaxManager1') ); $ch = curl_init($url); curl_setopt_array($ch, $options); $return = curl_exec($ch); curl_close($ch); echo $return; $newHtml = str_get_html($return); $viewstate = $newHtml-&gt;find("#__VIEWSTATE"); $viewstate = $viewstate[0]-&gt;attr['value']; $eventvalidation = $newHtml-&gt;find("#__EVENTVALIDATION"); $eventvalidation = $eventvalidation[0]-&gt;attr['value']; } </code></pre> <p>this should echo data from different pages but It always prints data of first page, can anybody point me where I am worng and what is missing I dont know how asp manages paginations and AJAX request and what is <code>__EVENTARGUMENT</code>, <code>__VIEWSTATE</code> and <code>__EVENTVALIDATION</code></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload