Scraping Multiple Select Options Using Selenium

October 22, 2023 Post a Comment

I am required to scrape PDF's from the website https://secc.gov.in/lgdStateList. There are 3 drop-down menus for a state, a district and a block. There are several states, under ea

Solution 1:

This is where you can find the value of different states. You can find the same from district and block dropdowns.

You should now use those values within payload to get the table you would like to grab data from:

import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}

r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for items in soup.select("table#example tr"):
    data = [' '.join(item.text.split()) for item in items.select("th,td")]
    print(data)

Output the script produces:

['Select State', 'Select District', 'Select Block']['', 'Select District', 'Select Block']['ARARIA BASTI (93638)', 'BANGAMA (93639)', 'BANSBARI (93640)']['BASANTPUR (93641)', 'BATURBARI (93642)', 'BELWA (93643)']['BOCHI (93644)', 'CHANDRADEI (93645)', 'CHATAR (93646)']['CHIKANI (93647)', 'DIYARI (93648)', 'GAINRHA (93649)']['GAIYARI (93650)', 'HARIA (93651)', 'HAYATPUR (93652)']['JAMUA (93653)', 'JHAMTA (93654)', 'KAMALDAHA (93655)']['KISMAT KHAWASPUR (93656)', 'KUSIYAR GAWON (93657)', 'MADANPUR EAST (93658)']['MADANPUR WEST (93659)', 'PAIKTOLA (93660)', 'POKHARIA (93661)']['RAMPUR KODARKATTI (93662)', 'RAMPUR MOHANPUR EAST (93663)', 'RAMPUR MOHANPUR WEST (93664)']['SAHASMAL (93665)', 'SHARANPUR (93666)', 'TARAUNA BHOJPUR (93667)']

You need to scrape the numbers available in brackets adjacent to each results above and then use them in payload and send another post requests to download the pdf files. Make sure to put the script in a folder before execution so that you can get all the files within.

import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"
download_link = "https://secc.gov.in/downloadLgdwisePdfFile"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}
r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select("table#example td > a[onclick^='downloadLgdFile']"):
    gp_code = item.text.strip().split("(")[1].split(")")[0]
    payload['gpCode'] = gp_code
    withopen(f'{gp_code}.pdf','wb') as f:
        f.write(requests.post(download_link,data=payload,verify=False).content)

Python Channel

Scraping Multiple Select Options Using Selenium

Solution 1:

Post a Comment for "Scraping Multiple Select Options Using Selenium"