Skip to content Skip to sidebar Skip to footer

Running More Than One Spider In A For Loop

I try to instantiate multiple spiders. The first one works fine, but the second one gives me an error: ReactorNotRestartable. feeds = { 'nasa': { 'name': 'nasa',

Solution 1:

looks like you have to instantiate a process per spider, try:

def start_crawler():      

    for feed_name in feeds.keys():
        process = CrawlerProcess({
            'USER_AGENT': CONFIG['USER_AGENT'],
            'DOWNLOAD_HANDLERS': {'s3': None} # boto issues
        })
        MySpider.name = feed_name
        process.crawl(MySpider)
        process.start() 

Solution 2:

Solution is to collect the spiders in the loop and start process just once at the end. My guess, it has something to do with the Reactor allocation / deallocation.

defstart_crawler():

    process = CrawlerProcess({
        'USER_AGENT': CONFIG['USER_AGENT'],
        'DOWNLOAD_HANDLERS': {'s3': None} # disable for issues with boto
    })

    for feed_name in CONFIG['Feeds'].keys():
        MySpider.name = feed_name
        process.crawl(MySpider)

    process.start()

Thanks @eLRuLL for your answer it inspired me to find this solution.

Solution 3:

You can send params in the crawl and use them in parsing process.

classMySpider(XMLFeedSpider):def__init__(self, name, **kwargs):
        super(MySpider, self).__init__(**kwargs)

        self.name = name


defstart_crawler():      
    process = CrawlerProcess({
        'USER_AGENT': CONFIG['USER_AGENT'],
        'DOWNLOAD_HANDLERS': {'s3': None} # boto issues
    })

    for feed_name in feeds.keys():
        process.crawl(MySpider, feed_name)

    process.start()

Post a Comment for "Running More Than One Spider In A For Loop"