Crawling Dynamic Content With Scrapy
I am trying to get latest review from Google play store. I'm following this question for getting the latest reviews here Method specified in the above link's answer works fine with
Solution 1:
Seems like you haven't changing the id
in the form data.
def parseApp(self, response):
apps = list(set(response.xpath('//a[@class="card-click-target"]/@href').extract()))
url = "https://play.google.com/store/getreviews"
for app in apps:
_id = app.strip('/store/apps/details?id=')
form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
sleep(5)
yield FormRequest(url=url, formdata=form_data, callback=self.parse_data)
def parse_app(self, response):
response_data = re.findall("\[\[.*", response.body)
if response_data:
try:
text = json.loads(response_data[0] + ']')
sell = Selector(text=text[0][2])
except:
pass
# do whatever you want to extract using sell.xapth('YOUR_XPATH_HERE')
A sample review after cleaning the data you will be getting something like this
<div class="single-review">
<a href="/store/people/details?id=106726831005267540508">
<img class="author-image" alt="Lorence Gerona avatar image" src="https://lh3.googleusercontent.com/uFp_tsTJboUY7kue5XAsGA=w48-c-h48">
</a>
<div class="review-header" data-expand-target="" data-reviewid="gp:AOqpTOHnsExa_P6JFRJD6HF5h71fpY91tNaEODjtfiTu-zPFki9ZnYsNp1HEcGFpGEfu9xqwJL_j-03Tx0e9lw">
<div class="review-info">
<span class="author-name">
<a href="/store/people/details?id=106726831005267540508">Lorence Gerona</a>
</span>
<span class="review-date">3 June 2015</span>
<a class="reviews-permalink" href="/store/apps/details?id=com.supercell.boombeach&reviewId=Z3A6QU9xcFRPSG5zRXhhX1A2SkZSSkQ2SEY1aDcxZnBZOTF0TmFFT0RqdGZpVHUtelBGa2k5Wm5Zc05wMUhFY0dGcEdFZnU5eHF3Skxfai0wM1R4MGU5bHc" title="Link to this review"></a> <div class="review-source" style="display:none">
</div>
<div class="review-info-star-rating">
<div class="tiny-star star-rating-non-editable-container" aria-label="Rated 5 stars out of five stars">
<div class="current-rating" style="width: 100%;">
</div>
</div>
</div>
</div>
<div class="rate-review-wrapper">
<div class="play-button icon-button small rate-review" title="Spam" data-rating="SPAM">
<div class="icon spam-flag"></div>
</div>
<div class="play-button icon-button small rate-review" title="Helpful" data-rating="HELPFUL">
<div class="icon thumbs-up"></div>
</div>
<div class="play-button icon-button small rate-review" title="Unhelpful" data-rating="UNHELPFUL"> <div class="icon thumbs-down"></div>
</div>
</div>
</div>
<div class="review-body">
<span class="review-title">Team BOOM BEACH</span>
Amazing game I can defeat hammerman
<div class="review-link" style="display:none">
<a class="id-no-nav play-button tiny" href="#" target="_blank">Full Review</a>
</div>
</div>
</div>
Post a Comment for "Crawling Dynamic Content With Scrapy"