I'm running scrapy spiders from a script after every N hours, my spiders
folder may get refreshed after every N hours while the CrawlerRunner
is still working. My problem is how can I load the new Spiders from the spiders
folder inside running CrawlerRunner
?
def startProcess():
configure_logging()
runner = CrawlerRunner(get_project_settings())
task = LoopingCall(lambda: def_process(runner))
task.start(60 *60)
reactor.run()
def def_process(runner:CrawlerRunner):
if(new_spider()):
runner.spider_loader.from_settings(get_project_settings()) #not working
process() // in a loop that yield runner.crawl('spider.name', sett=sett)
I tried runner.spider_loader.from_settings(get_project_settings())
but it's not working and I also tried runner.create_crawler(SpiderClass)
but how can I add the crawler
that returned from this method to the CrawlerRunner
so I can execute it as yield runner.crawl("new_spider_name", _config=config)
Now yield runner.crawl("new_spider_name", _config=config)
gives Spider not found exception
source https://stackoverflow.com/questions/75748924/scrapy-crawlerrunner-load-new-spiders-in-runtime
Comments
Post a Comment