Skip to content

Commit 0af627f

Browse files
committed
docs: additional tweaks to docs for 'list of pages'
- link to 'list-of-pages' anchor to explain the direct entry and seed list upload option - tweak the explaination under list of pages to cover the two options - fix link to docs to include trailing slash before anchor to avoid redirect - follow to #2792
1 parent ebfe36a commit 0af627f

File tree

2 files changed

+17
-7
lines changed

2 files changed

+17
-7
lines changed

frontend/docs/docs/user-guide/workflow-setup.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,20 @@ _Site Crawl_
3838
`Single Page`
3939
: Crawls a single URL and does not include any linked pages.
4040

41-
`List of Pages`
42-
: Crawls only specified URLs and does not include any linked pages (unless [_Include Any Linked Page_](#include-any-linked-page) is enabled). Each URL must be entered on its own line. URLs can be entered directly into the designated text area or uploaded as a text file. These options cannot be combined in a single workflow.
41+
`List of Pages` <a name="list-of-pages"></a>
42+
: Crawls a list of specified URLs.
4343

44-
Up to 100 URLs can be entered into the text area. If you paste a list of over 100 URLs, Browsertrix will automatically convert the list into a text file and attach it to the workflow. Text files can be viewed and deleted from within the workflow, but cannot be edited in place.
44+
Select one of two options to provide a list of URLs:
45+
46+
*Enter URLs* - If the list is small enough, 100 URLs or less, the URLs can be entered directly into the text area. If a large list is pasted into the textbox, it will be converted into an uploaded URL list and attached to the workflow.
47+
48+
*Upload URL List* - A longer list of URLs can be provided as a text file, containing one URL per line. The text file may not exceed 25MB, but there is no limit to the number of URLs in the file. Once a file is added, a link will be provided to view the file (but not edit it). To change the file, a new file can be uploaded in its place.
4549

46-
Ensure each URL is on its own line so the crawler can queue all provided URLs for crawling. It will continue queuing until it reaches either the organization's pages per crawl limit or the crawl workflow's page limit. Once one of these limits is hit, it will stop queuing additional URLs. Duplicate URLs will be queued only once, while invalid URLs will be skipped and not queued at all. The crawl will fail if the list contains no valid URLs or if there is a file formatting error.
50+
For both options, each line should contain a valid URL (starting with https:// or http://). Invalid or duplicate URLs will be skipped. The crawl will fail if the list contains no valid URLs or if the file is not a list of URLs.
51+
52+
While the uploaded text file can contain an unlimited number of URLs, the crawl will still be limited by the [page limit](#max-pages) for the workflow or organization - URLs beyond the limit will not be crawled.
53+
54+
If both a list of entered list and an uploaded file are provided, the currently selected option will be used.
4755

4856
`In-Page Links`
4957
: Crawls only the specified URL and treats linked sections of the page as distinct pages.
@@ -70,6 +78,8 @@ _Site Crawl_
7078

7179
One or more URLs of the page to crawl. URLs must follow [valid URL syntax](https://www.w3.org/Addressing/URL/url-spec.html). For example, if you're crawling a page that can be accessed on the public internet, your URL should start with `http://` or `https://`.
7280

81+
See [List Of Pages](#list-of-pages) for additional info when providing a list of URLs.
82+
7383
??? example "Crawling with HTTP basic auth"
7484

7585
All crawl scopes support [HTTP Basic Auth](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication) which can be provided as part of the URL, for example: `https://username:password@example.com`.

frontend/src/features/crawl-workflows/workflow-editor.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1086,14 +1086,14 @@ export class WorkflowEditor extends BtrixElement {
10861086
this.formState.seedListFormat === SeedListFormat.File
10871087
? html`${fileAdditionalInfo}
10881088
${this.renderUserGuideLink({
1089-
hash: "page-urls",
1089+
hash: "list-of-pages",
10901090
content: msg("Read more about URL list files"),
10911091
})}.`
10921092
: html`${infoTextFor["urlList"]}
10931093
<br />
10941094
${jsonAdditionalInfo},
10951095
${this.renderUserGuideLink({
1096-
hash: "page-urls",
1096+
hash: "list-of-pages",
10971097
content: msg("upload a URL list file"),
10981098
})}.`,
10991099
)}
@@ -2249,7 +2249,7 @@ https://archiveweb.page/images/${"logo.svg"}`}
22492249
hash: string;
22502250
content: string;
22512251
}) {
2252-
const path = `workflow-setup#${hash}`;
2252+
const path = `workflow-setup/#${hash}`;
22532253

22542254
return html`<a
22552255
href="${this.docsUrl}user-guide/${path}"

0 commit comments

Comments
 (0)