Skip to content Skip to sidebar Skip to footer

Scraping Multiple Select Options Using Selenium

I am required to scrape PDF's from the website https://secc.gov.in/lgdStateList. There are 3 drop-down menus for a state, a district and a block. There are several states, under ea

Solution 1:

This is where you can find the value of different states. You can find the same from district and block dropdowns.

You should now use those values within payload to get the table you would like to grab data from:

import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}

r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for items in soup.select("table#example tr"):
    data = [' '.join(item.text.split()) for item in items.select("th,td")]
    print(data)

Output the script produces:

['Select State', 'Select District', 'Select Block']['', 'Select District', 'Select Block']['ARARIA BASTI (93638)', 'BANGAMA (93639)', 'BANSBARI (93640)']['BASANTPUR (93641)', 'BATURBARI (93642)', 'BELWA (93643)']['BOCHI (93644)', 'CHANDRADEI (93645)', 'CHATAR (93646)']['CHIKANI (93647)', 'DIYARI (93648)', 'GAINRHA (93649)']['GAIYARI (93650)', 'HARIA (93651)', 'HAYATPUR (93652)']['JAMUA (93653)', 'JHAMTA (93654)', 'KAMALDAHA (93655)']['KISMAT KHAWASPUR (93656)', 'KUSIYAR GAWON (93657)', 'MADANPUR EAST (93658)']['MADANPUR WEST (93659)', 'PAIKTOLA (93660)', 'POKHARIA (93661)']['RAMPUR KODARKATTI (93662)', 'RAMPUR MOHANPUR EAST (93663)', 'RAMPUR MOHANPUR WEST (93664)']['SAHASMAL (93665)', 'SHARANPUR (93666)', 'TARAUNA BHOJPUR (93667)']

You need to scrape the numbers available in brackets adjacent to each results above and then use them in payload and send another post requests to download the pdf files. Make sure to put the script in a folder before execution so that you can get all the files within.

import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"
download_link = "https://secc.gov.in/downloadLgdwisePdfFile"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}
r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select("table#example td > a[onclick^='downloadLgdFile']"):
    gp_code = item.text.strip().split("(")[1].split(")")[0]
    payload['gpCode'] = gp_code
    withopen(f'{gp_code}.pdf','wb') as f:
        f.write(requests.post(download_link,data=payload,verify=False).content)

Post a Comment for "Scraping Multiple Select Options Using Selenium"