Friday, 9 June 2017

How to make a web crawler in java?

Here, i m going to share code to make a web crawler in java. For it you need to have jsoup library.

  1. Create a java project in any editor, i have created in eclipse.
  2. Download jsoup from
  3. Add this library to the java project and build path(right click the project --> select "Build Path" --> "Configure Build Path" --> click "Libraries" tab --> click "Add External JARs").
  4. Copy the following code to the .java class.
import java.util.List; import java.util.ArrayList; 
import org.jsoup.Jsoup;
 import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element;

public class main {
public static List<String>extractLinks(String url) throws Exception
final ArrayList<String> result = new ArrayList<String>(); 
Document doc = Jsoup.connect(url).get();
Elements links ="a[href]"); 
for (Element link : links)
{ result.add(link.attr("abs:href")); } return result; }

Download the source code:

public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub
String site = ""; 
List<String> links = main.extractLinks(site);
for (String link : links) { System.out.println(link); }



Run the code.
Note, you can change the website link accordingly.



  1. Hey thanks for this useful info. Is there a way we can modify the above piece of code for sites requiring authentication i.e. username and password?


All about journals and research paper | What is impact factor? | how the impact factor is calculated? | who calculate the impact factor? Scopous journals

Figure 1: One of my Research Papers When the Scholars are in their Master or PhD or in any research field. They are supposed to writ...