Skip to content

Timsort comparison error for specific robots.txt URL #86

@anjackson

Description

@anjackson

From DC

Nov 28, 2022 9:48:29 AM org.archive.modules.CrawlURI getPolitenessDelay
WARNING: politessDelay unset, returning default 5000 for https://www.english.op.org/robots.txt (in thread 'ToeThread #47: https://www.english.op.org/robots.txt')
Nov 28, 2022 9:48:35 AM org.archive.crawler.framework.ToeThread recoverableProblem
SEVERE: Problem java.lang.IllegalArgumentException: Comparison method violates its general contract! occurred when trying to process 'https://www.english.op.org/robots.txt' at step ABOUT_TO_BEGIN_PROCESSOR in 
 (in thread 'ToeThread #498: https://www.english.op.org/robots.txt')
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.util.TimSort.mergeHi(TimSort.java:899)
	at java.util.TimSort.mergeAt(TimSort.java:516)
	at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
	at java.util.TimSort.sort(TimSort.java:254)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1464)
	at java.util.Collections.sort(Collections.java:177)
	at org.apache.http.impl.cookie.RFC6265CookieSpec.formatCookies(RFC6265CookieSpec.java:217)
	at org.apache.http.client.protocol.RequestAddCookies.process(RequestAddCookies.java:187)
	at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:133)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at org.archive.modules.fetcher.FetchHTTPRequest.execute(FetchHTTPRequest.java:823)
	at org.archive.modules.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:679)
	at org.archive.modules.Processor.innerProcessResult(Processor.java:175)
	at org.archive.modules.Processor.process(Processor.java:142)
	at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
	at org.archive.crawler.framework.ToeThread.run(ToeThread.java:147)

...the content (as seen in my web browser) appears to be:

# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:

Sitemap: https://www.english.op.org/sitemap_index.xml
# ---------------------------
# END YOAST BLOCK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions