Skip to content Skip to sidebar Skip to footer

Crawling Dynamic Content With Scrapy

I am trying to get latest review from Google play store. I'm following this question for getting the latest reviews here Method specified in the above link's answer works fine with

Solution 1:

Seems like you haven't changing the id in the form data.

def parseApp(self, response):
    apps = list(set(response.xpath('//a[@class="card-click-target"]/@href').extract()))
    url = "https://play.google.com/store/getreviews"
    for app in apps:
        _id = app.strip('/store/apps/details?id=')
        form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
        sleep(5)
        yield FormRequest(url=url, formdata=form_data, callback=self.parse_data)

def parse_app(self, response):
    response_data = re.findall("\[\[.*", response.body)
    if response_data:
        try:
            text = json.loads(response_data[0] + ']')
            sell = Selector(text=text[0][2])
        except:
            pass
        # do whatever you want to extract using sell.xapth('YOUR_XPATH_HERE')

A sample review after cleaning the data you will be getting something like this

<div class="single-review">
    <a href="/store/people/details?id=106726831005267540508">
        <img class="author-image" alt="Lorence Gerona avatar image" src="https://lh3.googleusercontent.com/uFp_tsTJboUY7kue5XAsGA=w48-c-h48">
    </a>
    <div class="review-header" data-expand-target="" data-reviewid="gp:AOqpTOHnsExa_P6JFRJD6HF5h71fpY91tNaEODjtfiTu-zPFki9ZnYsNp1HEcGFpGEfu9xqwJL_j-03Tx0e9lw">
        <div class="review-info">
            <span class="author-name">
                <a href="/store/people/details?id=106726831005267540508">Lorence Gerona</a>
            </span>
            <span class="review-date">3 June 2015</span>
            <a class="reviews-permalink" href="/store/apps/details?id=com.supercell.boombeach&amp;reviewId=Z3A6QU9xcFRPSG5zRXhhX1A2SkZSSkQ2SEY1aDcxZnBZOTF0TmFFT0RqdGZpVHUtelBGa2k5Wm5Zc05wMUhFY0dGcEdFZnU5eHF3Skxfai0wM1R4MGU5bHc" title="Link to this review"></a> <div class="review-source" style="display:none">

        </div>
        <div class="review-info-star-rating">
            <div class="tiny-star star-rating-non-editable-container" aria-label="Rated 5 stars out of five stars">
                <div class="current-rating" style="width: 100%;">

                </div>
            </div>
        </div>
    </div>
    <div class="rate-review-wrapper">
        <div class="play-button icon-button small rate-review" title="Spam" data-rating="SPAM">
            <div class="icon spam-flag"></div>
        </div>
        <div class="play-button icon-button small rate-review" title="Helpful" data-rating="HELPFUL">
            <div class="icon thumbs-up"></div>
        </div>
        <div class="play-button icon-button small rate-review" title="Unhelpful" data-rating="UNHELPFUL"> <div class="icon thumbs-down"></div>
    </div>
</div>
</div>
<div class="review-body">
<span class="review-title">Team BOOM BEACH</span>
Amazing game I can defeat hammerman
<div class="review-link" style="display:none">
    <a class="id-no-nav play-button tiny" href="#" target="_blank">Full Review</a>
</div>
</div>
</div>

Post a Comment for "Crawling Dynamic Content With Scrapy"