I'm running scrapy spiders from a script after every N hours, my spiders folder may get refreshed after every N hours while the CrawlerRunner is still working. My problem is how can I load the new Spiders from the spiders folder inside running CrawlerRunner?
def startProcess():
configure_logging()
runner = CrawlerRunner(get_project_settings())
task = LoopingCall(lambda: def_process(runner))
task.start(60 *60)
reactor.run()
def def_process(runner:CrawlerRunner):
if(new_spider()):
runner.spider_loader.from_settings(get_project_settings()) #not working
process() // in a loop that yield runner.crawl('spider.name', sett=sett)
I tried runner.spider_loader.from_settings(get_project_settings()) but it's not working and I also tried runner.create_crawler(SpiderClass) but how can I add the crawler that returned from this method to the CrawlerRunner so I can execute it as yield runner.crawl("new_spider_name", _config=config)
Now yield runner.crawl("new_spider_name", _config=config) gives Spider not found exception
source https://stackoverflow.com/questions/75748924/scrapy-crawlerrunner-load-new-spiders-in-runtime
Comments
Post a Comment