-
Notifications
You must be signed in to change notification settings - Fork 0
Backup and restore scripts now continue if pgsql pod is missing. #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup and restore scripts now continue if pgsql pod is missing. #17
Conversation
Why: An instance using an external database (https://hominidsoftware.com/tech-personal-growth/Hubs-Managed-Databse/Hubs-Managed-Database/) will not have a pgsql pod. Also, a damaged instance might not be running the pgsql pod. There is still value in backing up and/or restoring just the reticulum files. Also handles empty blocks in `hcce.yaml`. Also extracts IP address of all load balancers, as a modern ingress controller might not be in the `hcce` namespace. Open Question: backing and restoring up an external postgresql database might or might not fit in these scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the fixes we talked about at the dev meeting, this is looking pretty good. Thanks.
There's one thing I left a comment on, and I still need to run some QA tests.
I'll comment again once I've done the QA tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done my QA tests. I found one bug, which is noted in the inline comment, but aside from that everything looks good. Thanks.
Note for anyone looking back at this PR (this is just information and isn't part of the review):
In order to test without the pgsql pod in the restore script, I had to modify the script and add in a couple lines to scale the pgsql pod down again and wait 10 seconds, so the pod has time to finish scaling down, after the maintenance mode is applied (applying the maintenance mode automatically brings the pgsql pod back up, so then it looks for the database dump and fails when it can't find it).
Why: If there is more than one load balancer in the cluster, the user needs to select the appropriate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. Merging.
What?
The backup and restore scripts will, when the pgsql pod does not exist, log a message to stderr and continue backing up or restoring the reticulum files.
Also handles empty resource blocks in
hcce.yaml
.Also extracts IP address of all load balancers
Why?
An instance using an external database (https://hominidsoftware.com/tech-personal-growth/Hubs-Managed-Databse/Hubs-Managed-Database/) will not have a pgsql pod.
Also, a damaged instance might not be running the pgsql pod.
There is still value in backing up and/or restoring just the reticulum files.
A modern ingress controller might not be in the
hcce
namespaceExamples
On hubs.hominidsoftware.com, which has an external database (and so lacks the pgsql pod), uses the current version of HAProxy ingress controller in the namespace
haproxy-controller
(so the LoadBalancer service is also in that namespace), and has a section ofhcce.yaml
with everything between two sets of---
commented out:maintenance-mode-hcce.yaml file generated successfully.
applying maintenance mode
deployment.apps "coturn" deleted
deployment.apps "dialog" deleted
deployment.apps "hubs" deleted
deployment.apps "nearspark" deleted
deployment.apps "photomnemonic" deleted
deployment.apps "reticulum" deleted
deployment.apps "spoke" deleted
pod "coturn-74d6cdb5b4-c9x9z" deleted
pod "dialog-6f56f69c55-ztzv2" deleted
pod "hubs-85487ddc9c-ln8m4" deleted
pod "nearspark-795986bd6b-grc76" deleted
pod "photomnemonic-85ff5bf8d5-js8fv" deleted
pod "reticulum-c7c6f67bd-6jhs8" deleted
pod "spoke-7659644cc5-w97dq" deleted
namespace/hcce unchanged
secret/configs configured
persistentvolumeclaim/ret-pvc unchanged
ingress.networking.k8s.io/ret-modern configured
ingress.networking.k8s.io/dialog-modern configured
ingress.networking.k8s.io/nearspark-modern configured
configmap/ret-config unchanged
deployment.apps/reticulum created
service/ret unchanged
deployment.apps/hubs created
service/hubs unchanged
deployment.apps/spoke created
service/spoke unchanged
deployment.apps/nearspark created
service/nearspark unchanged
deployment.apps/photomnemonic created
service/photomnemonic unchanged
deployment.apps/dialog created
service/dialog unchanged
deployment.apps/coturn created
service/coturn unchanged
waiting on coturn, dialog, hubs, nearspark, photomnemonic, reticulum, spoke
waiting on coturn, dialog, photomnemonic, reticulum
waiting on reticulum
maintenance mode applied
pgsql pod not found
restoring backup
restoring Reticulum '._cached' folder
restoring Reticulum '._expiring' folder
restoring Reticulum '._owned' folder
restoring Reticulum '._storage' folder
restoring Reticulum 'cached' folder
restoring Reticulum 'expiring' folder
restoring Reticulum 'lost+found' folder
restoring Reticulum 'owned' folder
not restoring pgsql
restarting instance
deployment.apps "coturn" deleted
deployment.apps "dialog" deleted
deployment.apps "hubs" deleted
deployment.apps "nearspark" deleted
deployment.apps "photomnemonic" deleted
deployment.apps "reticulum" deleted
deployment.apps "spoke" deleted
pod "coturn-74d6cdb5b4-n4cqx" deleted
pod "dialog-6f56f69c55-xclvq" deleted
pod "hubs-85487ddc9c-6ldlw" deleted
pod "nearspark-795986bd6b-t49q7" deleted
pod "photomnemonic-85ff5bf8d5-56hsq" deleted
pod "reticulum-c7c6f67bd-flbq2" deleted
pod "spoke-7659644cc5-27z9m" deleted
namespace/hcce unchanged
secret/configs configured
persistentvolumeclaim/ret-pvc unchanged
ingress.networking.k8s.io/ret-modern configured
ingress.networking.k8s.io/dialog-modern configured
ingress.networking.k8s.io/nearspark-modern configured
configmap/ret-config unchanged
deployment.apps/reticulum created
service/ret unchanged
deployment.apps/hubs created
service/hubs unchanged
deployment.apps/spoke created
service/spoke unchanged
deployment.apps/nearspark created
service/nearspark unchanged
deployment.apps/photomnemonic created
service/photomnemonic unchanged
deployment.apps/dialog created
service/dialog unchanged
deployment.apps/coturn created
service/coturn unchanged
waiting on coturn, dialog, hubs, nearspark, photomnemonic, reticulum, spoke
waiting on coturn, reticulum
waiting on reticulum
all deployments ready
load balancer external IP address: 146.190.190.57
How to test
kubectl scale deployment pgsql -n hcce --replicas=0
npm run backup
, observe that it creates files in community-edition/data_backups/data_backup_999999/reticulum_storage_data, but not pgsql filesnpm run restore-backup
, observe that it runs to completionkubectl scale deployment pgsql -n hcce --replicas=1
Documentation of functionality
A paragraph has been added to the section of the readme on backup and restore. People running an external database presumably know that, and the script output should be clear.
Limitations
Backing up an external database must be done separately.
Open questions
backing and restoring up an external postgresql database might or might not fit in these scripts