-
-
Notifications
You must be signed in to change notification settings - Fork 110
Open
Description
When using browsertrix-crawler to crawl a specific website, the crawler seems to hang or get stuck after reaching a certain number of pages (e.g., around 7K-8K pages).
Although the program continues to output crawl statistics and the Direct fetch of page URL timed out
message, it appears that no new pages are being crawled, and the "crawled" count in the statistics stops increasing. The entire crawler process seems to be stuck at this point.
This issue seems similar to #780, but I have encountered it in both version 1.5.7 and the latest version 1.6.1.
Docker Compose:
services:
browsertrix-crawler:
environment:
- HTTP_PROXY=http://100.100.2.2:19999
- HTTPS_PROXY=http://100.100.2.2:19999
command:
- crawl
- --seeds=https://scp-wiki-cn.wikidot.com
- --generateWACZ
- --workers=32
- --blockAds
#- --waitUntil=networkidle2
#- --proxyServer=http://100.100.2.2:19999
- --scopeType=prefix
image: webrecorder/browsertrix-crawler:1.6.1
volumes:
- ./crawls:/crawls/
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Triage