Skip to content Skip to sidebar Skip to footer

Is It Ok For Scrapy's Request_fingerprint Method To Return None?

I'd like to override Scrapy's default RFPDupefilter class as follows: from scrapy.dupefilters import RFPDupeFilter class URLDupefilter(RFPDupeFilter): def request_fingerprint

Solution 1:

If you look into request_seen() method of DupeFilter class you can see how scrapy compares fingerprints:

def request_seen(self, request):
    fp = self.request_fingerprint(request)
    if fp inself.fingerprints:
        return True
    self.fingerprints.add(fp)
    ifself.file:
        self.file.write(fp + os.linesep)

fp in self.fingerprints, in your case this would resolve to None in {None}, since your fingerprint is None and self.fingerprints is a set type object. This is valid python and resolves properly. So yes, you can return None.

Edit: However this will let through first xml response, since the fingerprints set will not have None fingerprint in it yet. Ideally you want to fix request_seen method in your dupefilter as well to simply return False if fingerprint is None.

Post a Comment for "Is It Ok For Scrapy's Request_fingerprint Method To Return None?"