Scraping Multiple Select Options Using Selenium
I am required to scrape PDF's from the website https://secc.gov.in/lgdStateList. There are 3 drop-down menus for a state, a district and a block. There are several states, under ea
Solution 1:
This is where you can find the value of different states. You can find the same from district and block dropdowns.
You should now use those values within payload to get the table you would like to grab data from:
import urllib3
import requests
from bs4 import BeautifulSoup
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
link = "https://secc.gov.in/lgdGpList"
payload = {
'stateCode': '10',
'districtCode': '188',
'blockCode': '1624'
}
r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for items in soup.select("table#example tr"):
data = [' '.join(item.text.split()) for item in items.select("th,td")]
print(data)
Output the script produces:
['Select State', 'Select District', 'Select Block']['', 'Select District', 'Select Block']['ARARIA BASTI (93638)', 'BANGAMA (93639)', 'BANSBARI (93640)']['BASANTPUR (93641)', 'BATURBARI (93642)', 'BELWA (93643)']['BOCHI (93644)', 'CHANDRADEI (93645)', 'CHATAR (93646)']['CHIKANI (93647)', 'DIYARI (93648)', 'GAINRHA (93649)']['GAIYARI (93650)', 'HARIA (93651)', 'HAYATPUR (93652)']['JAMUA (93653)', 'JHAMTA (93654)', 'KAMALDAHA (93655)']['KISMAT KHAWASPUR (93656)', 'KUSIYAR GAWON (93657)', 'MADANPUR EAST (93658)']['MADANPUR WEST (93659)', 'PAIKTOLA (93660)', 'POKHARIA (93661)']['RAMPUR KODARKATTI (93662)', 'RAMPUR MOHANPUR EAST (93663)', 'RAMPUR MOHANPUR WEST (93664)']['SAHASMAL (93665)', 'SHARANPUR (93666)', 'TARAUNA BHOJPUR (93667)']
You need to scrape the numbers available in brackets adjacent to each results above and then use them in payload
and send another post requests to download the pdf files. Make sure to put the script in a folder before execution so that you can get all the files within.
import urllib3
import requests
from bs4 import BeautifulSoup
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
link = "https://secc.gov.in/lgdGpList"
download_link = "https://secc.gov.in/downloadLgdwisePdfFile"
payload = {
'stateCode': '10',
'districtCode': '188',
'blockCode': '1624'
}
r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select("table#example td > a[onclick^='downloadLgdFile']"):
gp_code = item.text.strip().split("(")[1].split(")")[0]
payload['gpCode'] = gp_code
withopen(f'{gp_code}.pdf','wb') as f:
f.write(requests.post(download_link,data=payload,verify=False).content)
Post a Comment for "Scraping Multiple Select Options Using Selenium"