Learn Web Scraping in one Post | 2020

  Web Scraping is the process of retrieving the web information automatically using bots. Here in this tutorial, we used Python.

IDE used: Visual Editor.

Programming Language: Python.

Steps to perform:

Step 1: 

Open the Visual Editor or any editor, and Install the following dependencies:

  • pip install requests
  • pip install bs4
  • pip install html5lib

Step 2:

Get the HTML

Step 3:

Parse the HTML


Step 4:

HTML Tree Traversal


Complete Code

# Step 1
import requests
from bs4 import BeautifulSoup
url="https://phdtalks.org"

# step2: Get the HTML
r=requests.get(url)
htmlcontent=r.content
#print(htmlcontent)

# step3: Parse the HTML
soup=BeautifulSoup(htmlcontent, 'html.parser')
#print(soup) # It will print all the html code of your website.

# step 4: HTML Tree traversal
# commonly used objects here: Tag, NavigableString, BeautifulSoup, Comment
title=soup.title # It will print the title of website.
print(title)

# Get all the paragraphs from the page
paras=soup.find_all('p')
#print(paras)

# Get all the anchors from the page
anchors=soup.find_all('a')
#print(anchors)

print(soup.find('p'))  #it finds the first paragraph from the page.


Comments

Popular posts from this blog

Accounting Multiple Choice Questions with answers | Download PDF for MCQs

Accounting multiple choice questions with answers | Download pdf of MCQs