Not Able To Extract Nested Table Body With Pandas From Webpage
Solution 1:
The table is being populated by javascript, so it is not in the HTML that pandas is fetching. You can confirm this by viewing the source of the page in your browser and searching for values that are in the table, such as "PRADESH."
The solution is to use a library such as requests-html
or selenium
to scrape the javascript-rendered page. Then you can parse that HTML with pandas.
from requests_html import HTMLSession
s = HTMLSession()
r = s.get(url)
r.html.render()
table = pd.read_html(r.html)[3]
Solution 2:
So as Eric pointed out the table is being populated by JavaScript.
However, is quite easy to intercept the API call the page is doing internally by using Chrome's developer tools.
Go to network tab and filter by XHR and you will find the endpoint the page is making calls to, which is
http://gsa.nic.in/gsaservice/services/service.svc/gsastatereport?schemecode=PMJDY
Then a simple script like this will get you the data nicely formatted
import json
import pandas as pd
import requests
r = requests.get('http://gsa.nic.in/gsaservice/services/service.svc/gsastatereport?schemecode=PMJDY')
data = json.loads(r.json()['d'])
pd.DataFrame(data[0]['data'])
LGDStateCode StateName totalSaturatedVillage villageSaturatedTillDate TotalBeneficiaries TotalBeneficiariesRegisteredTillDate Saturation
0 28 ANDHRA PRADESH 305 305 27238 27238 100.00
1 12 ARUNACHAL PRADESH 299 283 42331 39999 94.49
2 18 ASSAM 3042 2375 648815 621878 95.85
3 10 BIHAR 635 544 92356 90131 97.5
Post a Comment for "Not Able To Extract Nested Table Body With Pandas From Webpage"