Crawler Stuck with "Direct fetch of page URL timed out" Errors

When using browsertrix-crawler to crawl a specific website, the crawler seems to hang or get stuck after reaching a certain number of pages (e.g., around 7K-8K pages).

Although the program continues to output crawl statistics and the `Direct fetch of page URL timed out` message, it appears that no new pages are being crawled, and the "crawled" count in the statistics stops increasing. The entire crawler process seems to be stuck at this point.

This issue seems similar to #780, but I have encountered it in both version 1.5.7 and the latest version 1.6.1.

Docker Compose:
```yaml
services:
  browsertrix-crawler:
    environment:
      - HTTP_PROXY=http://100.100.2.2:19999
      - HTTPS_PROXY=http://100.100.2.2:19999
    command:
      - crawl
      - --seeds=https://scp-wiki-cn.wikidot.com
      - --generateWACZ
      - --workers=32
      - --blockAds
      #- --waitUntil=networkidle2
      #- --proxyServer=http://100.100.2.2:19999
      - --scopeType=prefix
    image: webrecorder/browsertrix-crawler:1.6.1
    volumes:
      - ./crawls:/crawls/

```

[tail.log](https://github.com/user-attachments/files/20119035/tail.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Crawler Stuck with "Direct fetch of page URL timed out" Errors #832

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Crawler Stuck with "Direct fetch of page URL timed out" Errors #832

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions