-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Because the higher quotas are set via launch events, when restarting, these get lost, so right now the crawler is just dropping lots of URLs -5003
. These would likely have been downloaded eventually otherwise.
This is an example of why crawl config should be handled differently. e.g. the tocrawl
topic should be compacted against a per-seed key, and the whole topic re-read each time, so that the configuration is always up to date.
Metadata
Metadata
Assignees
Labels
No labels