StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POweb crawler in java. downloading web page issue
text
Body
copied!<p>I am trying to develop a small web crawler, which downloads the web pages and search for links in a specific section. But when i am running this code, links in "href" tag are getting shortened. like : </p> <p>original link : "/kids-toys-action-figures-accessories/b/ref=toys_hp_catblock_actnfigs?ie=UTF8&node=165993011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=merchandised-search-4&pf_rd_r=267646F4BB25430BAD0D&pf_rd_t=101&pf_rd_p=1582921042&pf_rd_i=165793011"</p> <p>turned into : "/kids-toys-action-figures-accessories/b?ie=UTF8&node=165993011"</p> <p>can anybody help me please. below is my code : </p> <pre><code>package test; import java.io.*; import java.net.MalformedURLException; import java.util.*; public class myFirstWebCrawler { public static void main(String[] args) { String strTemp = ""; String dir="d:/files/"; String filename="hello.txt"; String fullname=dir+filename; try { URL my_url = new URL("http://www.amazon.com/s/ref=lp_165993011_ex_n_1?rh=n%3A165793011&bbn=165793011&ie=UTF8&qid=1376550433"); BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream(),"utf-8")); createdir(dir); while(null != (strTemp = br.readLine())){ writetofile(fullname,strTemp); System.out.println(strTemp); } System.out.println("index of feature category : " + readfromfile(fullname,"Featured Categories")); } catch (Exception ex) { ex.printStackTrace(); } } public static void createdir(String dirname) { File d= new File(dirname); d.mkdirs(); } public static void writetofile(String path, String bbyte) { try { FileWriter filewriter = new FileWriter(path,true); BufferedWriter bufferedWriter = new BufferedWriter(filewriter); bufferedWriter.write(bbyte); bufferedWriter.newLine(); bufferedWriter.close(); } catch(IOException e) {System.out.println("Error");} } public static int readfromfile(String path, String key) { String dir="d:/files/"; String filename="hello1.txt"; String fullname=dir+filename; linksAndAt[] linksat=new linksAndAt[10]; BufferedReader bf = null; try { bf = new BufferedReader(new FileReader(path)); } catch (FileNotFoundException e1) { e1.printStackTrace(); } String currentLine; int index =-1; try{ Runtime.getRuntime().exec("cls"); while((currentLine = bf.readLine()) != null) { index=currentLine.indexOf(key); if(index>0) { writetofile(fullname,currentLine); int count=0; int lastIndex=0; while(lastIndex != -1) { lastIndex=currentLine.indexOf("href=\"",lastIndex); if(lastIndex != -1) { lastIndex+="href=\"".length(); StringBuilder sb = new StringBuilder(); while(currentLine.charAt(lastIndex) != '\"') { sb.append(Character.toString(currentLine.charAt(lastIndex))); lastIndex++; } count++; System.out.println(sb); } } System.out.println("\n count : " + count); return index; } } } catch(FileNotFoundException f) { f.printStackTrace(); System.out.println("Error"); } catch(IOException e) {try { bf.close(); } catch (IOException e1) { e1.printStackTrace(); }} return index;} } </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload