How to make a web crawler in java?

Here, i m going to share code to make a web crawler in java. For it you need to have jsoup library.

steps:
  1. Create a java project in any editor, i have created in eclipse.
  2. Download jsoup from http://jsoup.org/download.
  3. Add this library to the java project and build path(right click the project --> select "Build Path" --> "Configure Build Path" --> click "Libraries" tab --> click "Add External JARs").
  4. Copy the following code to the .java class.
import java.io.IOException;
import java.util.List; import java.util.ArrayList; 
import org.jsoup.Jsoup;
 import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element;
 import org.jsoup.select.Elements;

public class main {
public static List<String>extractLinks(String url) throws Exception
final ArrayList<String> result = new ArrayList<String>(); 
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]"); 
for (Element link : links)
{ result.add(link.attr("abs:href")); } return result; }

Download the source code:

public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub
String site = "http://www.ebay.com/s/phone/"; 
List<String> links = main.extractLinks(site);
for (String link : links) { System.out.println(link); }

    }

}

Run the code.
Note, you can change the website link accordingly.

Output;
http://www.ebay.com/s/phone#mainContent
http://www.ebay.com/
http://www.ebay.com/s/phone#legalHdr
http://www.ebay.com/s/phone/Apple
http://www.ebay.com/s/phone/Samsung
http://www.ebay.com/s/phone/LG
http://www.ebay.com/s/phone/Motorola
http://www.ebay.com/s/phone/HTC
http://www.ebay.com/s/phone/Nokia
http://www.ebay.com/s/phone/AT-T
http://www.ebay.com/s/phone/T-Mobile
http://www.ebay.com/s/phone/BlackBerry
http://www.ebay.com/s/phone/Huawei



Comments

  1. Hey thanks for this useful info. Is there a way we can modify the above piece of code for sites requiring authentication i.e. username and password?

    ReplyDelete

Post a comment

Popular posts from this blog

Accounting Multiple Choice Questions with answers | Download PDF for MCQs

Difference between Data Mining and Knowledge Discovery (KDD)