Running More Than One Spider In A For Loop
I try to instantiate multiple spiders. The first one works fine, but the second one gives me an error: ReactorNotRestartable. feeds = { 'nasa': { 'name': 'nasa',
Solution 1:
looks like you have to instantiate a process per spider, try:
def start_crawler():
for feed_name in feeds.keys():
process = CrawlerProcess({
'USER_AGENT': CONFIG['USER_AGENT'],
'DOWNLOAD_HANDLERS': {'s3': None} # boto issues
})
MySpider.name = feed_name
process.crawl(MySpider)
process.start()
Solution 2:
Solution is to collect the spiders in the loop and start process just once at the end. My guess, it has something to do with the Reactor allocation / deallocation.
defstart_crawler():
process = CrawlerProcess({
'USER_AGENT': CONFIG['USER_AGENT'],
'DOWNLOAD_HANDLERS': {'s3': None} # disable for issues with boto
})
for feed_name in CONFIG['Feeds'].keys():
MySpider.name = feed_name
process.crawl(MySpider)
process.start()
Thanks @eLRuLL for your answer it inspired me to find this solution.
Solution 3:
You can send params in the crawl and use them in parsing process.
classMySpider(XMLFeedSpider):def__init__(self, name, **kwargs):
super(MySpider, self).__init__(**kwargs)
self.name = name
defstart_crawler():
process = CrawlerProcess({
'USER_AGENT': CONFIG['USER_AGENT'],
'DOWNLOAD_HANDLERS': {'s3': None} # boto issues
})
for feed_name in feeds.keys():
process.crawl(MySpider, feed_name)
process.start()
Post a Comment for "Running More Than One Spider In A For Loop"