Note that there are some explanatory texts on larger screens.

plurals
  1. POweb crawler in java. downloading web page issue
    text
    copied!<p>I am trying to develop a small web crawler, which downloads the web pages and search for links in a specific section. But when i am running this code, links in "href" tag are getting shortened. like : </p> <p>original link : "/kids-toys-action-figures-accessories/b/ref=toys_hp_catblock_actnfigs?ie=UTF8&amp;node=165993011&amp;pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_s=merchandised-search-4&amp;pf_rd_r=267646F4BB25430BAD0D&amp;pf_rd_t=101&amp;pf_rd_p=1582921042&amp;pf_rd_i=165793011"</p> <p>turned into : "/kids-toys-action-figures-accessories/b?ie=UTF8&amp;node=165993011"</p> <p>can anybody help me please. below is my code : </p> <pre><code>package test; import java.io.*; import java.net.MalformedURLException; import java.util.*; public class myFirstWebCrawler { public static void main(String[] args) { String strTemp = ""; String dir="d:/files/"; String filename="hello.txt"; String fullname=dir+filename; try { URL my_url = new URL("http://www.amazon.com/s/ref=lp_165993011_ex_n_1?rh=n%3A165793011&amp;bbn=165793011&amp;ie=UTF8&amp;qid=1376550433"); BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream(),"utf-8")); createdir(dir); while(null != (strTemp = br.readLine())){ writetofile(fullname,strTemp); System.out.println(strTemp); } System.out.println("index of feature category : " + readfromfile(fullname,"Featured Categories")); } catch (Exception ex) { ex.printStackTrace(); } } public static void createdir(String dirname) { File d= new File(dirname); d.mkdirs(); } public static void writetofile(String path, String bbyte) { try { FileWriter filewriter = new FileWriter(path,true); BufferedWriter bufferedWriter = new BufferedWriter(filewriter); bufferedWriter.write(bbyte); bufferedWriter.newLine(); bufferedWriter.close(); } catch(IOException e) {System.out.println("Error");} } public static int readfromfile(String path, String key) { String dir="d:/files/"; String filename="hello1.txt"; String fullname=dir+filename; linksAndAt[] linksat=new linksAndAt[10]; BufferedReader bf = null; try { bf = new BufferedReader(new FileReader(path)); } catch (FileNotFoundException e1) { e1.printStackTrace(); } String currentLine; int index =-1; try{ Runtime.getRuntime().exec("cls"); while((currentLine = bf.readLine()) != null) { index=currentLine.indexOf(key); if(index&gt;0) { writetofile(fullname,currentLine); int count=0; int lastIndex=0; while(lastIndex != -1) { lastIndex=currentLine.indexOf("href=\"",lastIndex); if(lastIndex != -1) { lastIndex+="href=\"".length(); StringBuilder sb = new StringBuilder(); while(currentLine.charAt(lastIndex) != '\"') { sb.append(Character.toString(currentLine.charAt(lastIndex))); lastIndex++; } count++; System.out.println(sb); } } System.out.println("\n count : " + count); return index; } } } catch(FileNotFoundException f) { f.printStackTrace(); System.out.println("Error"); } catch(IOException e) {try { bf.close(); } catch (IOException e1) { e1.printStackTrace(); }} return index;} } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload