-
Notifications
You must be signed in to change notification settings - Fork 106
Add backup and restore-backup scripts #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Add backup and restore-backup scripts #382
Conversation
What: Adds scripts to backup the data from your Hubs instance to your local hard drive and restore a backup to your instance. Why: This will allow you to keep one or more local copies of your data, restore your data to your instance if needed, and migrate all your data from one instance to another, e.g. when moving from one hosting company to another hosting company. Note: This will be needed to migrate the data from the persistent volumes on your node, to persistent volumes that are completely separate from the node (PR Hubs-Foundation#363).
If no other app is using the load balancer used by Hubs, changing the DNS address is fine (though it doesn't take effect immediately nor can be reversed immediately, due to DNS cacheing). A more focussed approach is to add this annotation to the Ingress: haproxy.org/allow-list: 11.22.33.44 where 11.22.33.44 is the external IP address of your development machine. It takes effect immediately, can be reversed immediately, and allows you access to your Hubs instance while denying all others. Editing the ingress can be done with the command kub edit ingress reticulum -n hcce and similarly for dialog and nearspark. To restore access, it's probably best to re-apply |
True, I didn't think about other apps using the load balancer. So yes, disabling the DNS is not a great solution :(
I tried adding These are the commands I used to edit the ingresses (I'm guessing
Also, reapplying hcce.yaml didn't seem to remove the edits (running those three commands again showed the additions still there). Am I misunderstanding the procedure you're suggesting or is there something more I need to do to apply the edits (I closed the editor and saw this in the terminal: |
You're correct that reapplying the template after manually editing the ingress doesn't reset it. The correct procedure is to add the annotation to the template file and apply it. To revert, one then comments out the annotation and re-applys the template. It's possible that that annotation wasn't supported for the beta version of HAProxy ingress controller that Hubs normally uses. (I'm using the latest). You might try haproxy.org/request-redirect: example.com |
As I'm using an external database, I can't fully test these. I have to comment out the pgsql bits, and I'm hesitant to restore only the reticulum files. That said, the code looks fine and the backup script did create a copy of the reticulum files for me. |
What: Adds a primitive maintenance mode to the restore backup script. This is applied at the beginning and sets haproxy to redirect traffic to a non-existent maintenance mode subdomain, then restarts the instance to totally disconnect any people present on the instance and prevent anyone new from joining. The redirects are removed and the instance is returned to normal at the end after the restore is finished. Why: So that people can't interrupt/corrupt the restore by modifying data on the instance while the restore is happening. Note: At present the maintenance mode isn't a real page, so it's not all that pretty, and you won't be redirected back to your previous page once the restore is finished (even if you reload), but it gets the job done. Ideally, these faults should be addressed at some point in the future.
What: Prints headings for the general steps of the restore script and prints the command output to the terminal. Why: This is a very involved and potentially long running script, and the additional output should help reduce confusion as to whether the script is running normally or has gotten stuck. Note: This implements similar behavior to the apply script, but that, at present, will only work with the main configuration file and not a secondary, temporary one. In the future, the code to apply a Kubernetes configuration and monitor the deployment status should potentially be further abstracted so it can work with any configuration files and only one version of the code is needed.
…g a backup What: Explicitly specifies the Reticulum container in the Reticulum pod as the container to copy the data to instead of relying on it being automatically selected by default. Why: Reduces ambiguity and prevents bugs from cropping up in the future if anything changes and the Reticulum container is no longer the first container.
Thanks. That looks like it'll work. I've updated the PR.
Good point. If you think it would be an easy change to support backing up/restoring an external database as well, then it would be good to add that in. If not, then we should probably wait until we add official support for creating external database setups and add backup/restore support then. |
Let's leave external DB support for another PR - it won't be trivial to add. |
Okay. Sounds good. |
@hobbs-Hobbler tested the restore-backup script on Windows with an old backup from a previous version of the scripts (but using the latest version of the restore-backup script) and everything appeared to work well (it was a non-standard setup, so the results should probably be interpreted with some reservation, but I'm optimistic that at least the restore-backup script should work correctly on Windows). |
The restore-backup script should probably not copy back folders created by the OS, like these. |
Why: An instance using an external database (https://hominidsoftware.com/tech-personal-growth/Hubs-Managed-Databse/Hubs-Managed-Database/) will not have a pgsql pod. Also, a damaged instance might not be running the pgsql pod. There is still value in backing up and/or restoring just the reticulum files. Also handles empty blocks in `hcce.yaml`. Also extracts IP address of all load balancers, as a modern ingress controller might not be in the `hcce` namespace. Open Question: backing and restoring up an external postgresql database might or might not fit in these scripts
I'm not familiar with these folders. Are they a Mac thing? The reason I'm looping over them is because we can't know exactly what folders will be present for a backup/restore, but if those will never be needed and can be reliably detected by the leading period and underscore then I could add a guard to filter them out. |
A key source of |
Why: If there is more than one load balancer in the cluster, the user needs to select the appropriate one.
Backup and restore scripts now continue if pgsql pod is missing.
What: Uses the "junk" package to remove any OS helper files/folders that were created in the Reticulum storage data before restoring the backup. Why: Various user actions can result in the user's OS generating helper files/folders that aren't needed by Hubs, which increases the upload size and clutters up the restored reticulum data back on the Kubernetes storage. Note: This encloses the entire restore-backup script in an async function in order to allow loading the "junk" package, which doesn't support require/CommonJS modules.
What: Passes an environment variable to the kubectl cp command to disable using websockets. Why: Websockets are enabled by default in kubectl 1.30+ and this can cause transfers to fail and not retry. Disabling websockets avoids the issue. Referrences: Link to GitHub issue with the documented workaround: kubernetes/kubernetes#60140 (comment) Link to GitHub PR which introduced websockets as the default and the note that it affects kubectl cp: kubernetes/kubernetes#123281 (comment)
What: Uses the "find" command in the Reticulum pod to remove the contents of the Reticulum storage directory on the Kubernetes cluster before restoring the contents of the local backup. Why: To ensure a full restoration. kubectl cp merges the source directory into the destination directory, so depending on what's in the Reticulum storage on the Kubernetes cluster, there may be stuff left over from before the backup was applied that will remain if the Reticulum storage isn't cleared first, which would cause the final result to be different from the backup.
Updates: I have updated this to integrate the "junk" npm package and remove all the OS files from the backup before uploading. I realized that the OS files could potentially be present in any of the subfolders as well as the main folder, so since I found that disabling the websockets did work reliably for me, so I think this is a much better solution than auto-retrying, and it should hopefully just automatically get phased out when kubectl gets fixed (@DougReeder you were right at the dev meetup last week that there was a better way). I still think we want to keep it on infinite retries for I have als updated the restore script to clear the Reticulum pod storage before uploading the backup in order to ensure a return to the exact state of the backup. I think these updates should make this about ready to merge (assuming no one finds any issues when reviewing/testing and I didn't miss any review comments), but it would be good to try and get as many people to test as we can (I'll see if I can get this tested at the documentation meetup this week). |
Oh, and documentation for the Hubs docs for the backup/restore scripts has been written, but hasn't been put up as a PR yet. |
Also, sorry about the completely mangled diff for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working fine for me
What?
Adds scripts to backup the data from your Hubs instance to your local hard drive and restore a backup to your instance.
Why?
This will allow you to keep one or more local copies of your data, restore your data to your instance if needed, and migrate all your data from one instance to another, e.g. when moving from one hosting company to another hosting company.
Examples
Backup folder structure
Note: additional folders may be present in the reticulum_storage_data_folder and/or some may be omitted. This depends on the individual instance.
How to test
Documentation of functionality
Instructions have been added to the readme. A PR for further documentation is planned for the Hubs docs repository.
Limitations
This requires the pods to be running, so in order to prevent people from using the instance while backing up/restoring you'd need to remove the load balancer IP from your DNS A records (unless there's some other way to bar people from the instance while keeping the pods running that I'm unaware of). UPDATE: This has been addressed for the restore script, by introducing a crude maintenance mode. Thanks to the review comments for pointing me toward this solution.
Alternatives considered
Open questions
What happens if someone is saving something on the instance while a backup is being restored? Will the instance get corrupted?UPDATE: No longer applicable after updates from the review - a crude maintenance mode has been introduced to prevent people being connected to the instance during the restore.Is there a way to bar people from the instance, while keeping the pods running, that doesn't require you to edit your DNS A records?UPDATE: Yes. See the update for the previous question.Additional details or related context
This will be needed to migrate the data from the persistent volumes on your node, to persistent volumes that are completely separate from the node (PR #363).