diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e8492dd1..28a6f73f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,7 +24,7 @@ Git can be counterintuitive, and [GitHub Desktop](https://desktop.github.com/) o VSCode has [a guide to source control](https://code.visualstudio.com/docs/sourcecontrol/overview), and it has [an extension for working with GitHub](https://marketplace.visualstudio.com/items?itemName=GitHub.vscode-pull-request-github) which you may also find convenient. -In addition to the tools listed in the basic installation instructions in the main [README](./README.md), you can install [`pre-commit`](https://pre-commit.com/) in order to check and verify your work before submitting it. +In addition to the tools listed in the basic [installation instructions](./INSTALLATION.md), you can install [`pre-commit`](https://pre-commit.com/) in order to check and verify your work before submitting it. ## Contributing Code diff --git a/PUBLISHING.md b/PUBLISHING.md index 72bd9a26..af4b8f61 100644 --- a/PUBLISHING.md +++ b/PUBLISHING.md @@ -10,27 +10,23 @@ Instructions on using the scripts `launcher` and `uploader` are in the file [Usa Just use `uploader` (especially if you have multiple wikis): the script takes the filename of a list of wikis as argument and uploads their dumps to archive.org. You only need to: -- Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line. -- [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`. -- Run the script `uploader listfile`. +* Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line. +* [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`. +* Run the script `uploader listfile`. ## Manual publishing -- After running dumpgenerator, in each dump folder, select all files, right-click on the selection, click 7-Zip, click `Add to archive...` and click OK. -- At Archive.org, for each wiki [create a new item](http://archive.org/create/). -- Click `Upload files`. Then either drag and drop the 7-Zip archive onto the box or click `Choose files` and select the 7-Zip archive. -- `Page Title` and `Page URL` will be filled in by the uploader. -- Add a short `Description`, such as a descriptive name fopr the wiki. -- Add `Subject Tags`, separated by commas, these are the keywords that will help the archive to show up in a Internet Archive search, e.g. wikiteam,wiki,subjects of the wiki, and so on. -- `Creator`, can be left blank. -- `Date`, can be left blank. -- `Collection`, select `Community texts`. -- `Language`, select the language of the wiki. -- `License`, click to expand and select Creative Commons, Allow Remixing, Require Share-Alike for a CC-BY-SA licence. -- Click `Upload and Create Your Item`. +* After running dumpgenerator, in each dump folder, select all files, right-click on the selection, click 7-Zip, click `Add to archive...` and click OK. +* At Archive.org, for each wiki [create a new item](http://archive.org/create/). +* Click `Upload files`. Then either drag and drop the 7-Zip archive onto the box or click `Choose files` and select the 7-Zip archive. +* `Page Title` and `Page URL` will be filled in by the uploader. +* Add a short `Description`, such as a descriptive name fopr the wiki. +* Add `Subject Tags`, separated by commas, these are the keywords that will help the archive to show up in a Internet Archive search, e.g. wikiteam,wiki,subjects of the wiki, and so on. +* `Creator`, can be left blank. +* `Date`, can be left blank. +* `Collection`, select `Community texts`. +* `Language`, select the language of the wiki. +* `License`, click to expand and select Creative Commons, Allow Remixing, Require Share-Alike for a CC-BY-SA licence. +* Click `Upload and Create Your Item`. With the subject tag of wikiteam and collection of community texts, your uploads should appear in a search for [subject:"wikiteam" AND collection:opensource](https://archive.org/search?query=subject%3A%22wikiteam%22+AND+collection%3Aopensource). - -## Info for developers - -- [Internet Archive’s S3 like server API](https://archive.org/developers/ias3.html). diff --git a/README.md b/README.md index 9cb6bae7..91d47f80 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,14 @@ For prerequisites and installation see [Installation](./INSTALLATION.md) For usage see [Usage](./USAGE.md) +## Types of dump + +There are two types of backups that can be made XML dumps (current and history) and image dumps. Both can be done in one dump. + +An XML dump contains the meta-data of the edits (author, date, comment) and the wikitext. An XML dump may be "current" or "history". A "history" dump contains the complete history of every page, which is better for CC-BY-SA licencing and is the default. A "current" dump contains only the last edit for every page. + +An image dump contains all the images available in a wiki, plus their descriptions. + ## Publishing the dump Please consider publishing your wiki dump(s). You can do it yourself as explained in [Publishing](./PUBLISHING.md). @@ -31,7 +39,12 @@ Please consider publishing your wiki dump(s). You can do it yourself as explaine ## Contributing -For information on reporting bugs and proposing changes, please see the [Contributing](./Contributing.md) guide. +For information on reporting bugs and proposing changes, please see the [Contributing](./CONTRIBUTING.md) guide. + +### Info for developers + +* [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page) +* [The Internet Archive Python Library](https://archive.org/developers/internetarchive/) ## Code of Conduct diff --git a/USAGE.md b/USAGE.md index 50ab0ff7..80faf150 100644 --- a/USAGE.md +++ b/USAGE.md @@ -105,13 +105,17 @@ Each wiki will be stored into files contiaining a stripped version of the url an By default, a `7z` executable is found on `PATH`. The `--7z-path` argument can be used to use a specific executable instead. The `--generator-arg` or `-g` argument can be used on the command line to pass through arguments to the `generator` instances that are spawned. For example: -- `--generator-arg=--xmlrevisions` to use the modern MediaWiki API for retrieving revisions -- `--generator-arg=--delay=2` to use a delay of 2 seconds between requests -- `-g=--user -g=USER -g=--pass -g=PASSWORD` to dump a wiki that only logged in users can read +* `--generator-arg=--xmlrevisions` to use the modern MediaWiki API for retrieving revisions +* `--generator-arg=--delay=2` to use a delay of 2 seconds between requests +* `-g=--user -g=USER -g=--pass -g=PASSWORD` to dump a wiki that only logged in users can read ## `Uploader` -The script `uploader` is a way to upload a set of already-generated wiki dumps to the Internet Archive with a single invocation. +The script `uploader` is a way to upload a set of already-generated wiki dumps to the Internet Archive with a single invocation. The script takes the filename of a list of wikis as argument and uploads their dumps to archive.org. You only need to: + +* Check the 7z compressed dumps are in the same directory as `listfile`. The file `listfile` contains a list of the api.php URLs of the wikis to upload, one per line. +* [Retrieve your S3 keys](http://www.archive.org/account/s3.php), save them one per line (in the order provided) in a keys.txt file in same directory as `uploader`. +* Run the script `uploader listfile`. Usage: @@ -119,10 +123,6 @@ Usage: uploader [-pd] [-pw] [-a] [-c COLLECTION] [-wd WIKIDUMP_DIR] [-u] [-kf KEYSFILE] [-lf LOGFILE] listfile ``` -For the positional parameter `listfile`, `uploader` expects a path to a file that contains a list of URLs to `api.php`s of wikis, one on each line (exactly the same as `launcher`). - -`uploader` will search a configurable directory for files with the names generated by `launcher` and upload any that it finds to an Internet Archive item. The item will be created if it does not already exist. - Named arguments (short and long versions): * `-pd`, `--prune_directories`: After uploading, remove the raw directory generated by `launcher`