-
Notifications
You must be signed in to change notification settings - Fork 0
Implement RDS snapshot sanitizer #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ebedec4
Implement RDS snapshot sanitizer
FurqanHabibi c695812
Delete readme copy
FurqanHabibi d289af5
Fix readme
FurqanHabibi b57880c
Add config example
FurqanHabibi 026ccd2
Add local run to readme
FurqanHabibi 11d4379
Try publishing
FurqanHabibi f1328a1
Build multi-platform
FurqanHabibi 4e02884
Build multi platform with multi runner
FurqanHabibi daccd3c
Fix GHA
FurqanHabibi 2962ddc
Fix IMAGE_NAME
FurqanHabibi 97ecb38
Add buildx
FurqanHabibi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
name: Create and publish a Docker image | ||
|
||
on: | ||
release: | ||
types: [published] | ||
|
||
env: | ||
REGISTRY: ghcr.io | ||
IMAGE_NAME: ${{ github.repository }} | ||
|
||
jobs: | ||
build-and-push-image: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
packages: write | ||
attestations: write | ||
id-token: write | ||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Log in to the Container registry | ||
uses: docker/login-action@v3 | ||
with: | ||
registry: ${{ env.REGISTRY }} | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Extract metadata (tags, labels) for Docker | ||
id: meta | ||
uses: docker/metadata-action@v5 | ||
with: | ||
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | ||
|
||
- name: Build and push Docker image | ||
id: push | ||
uses: docker/build-push-action@v6 | ||
with: | ||
context: . | ||
push: true | ||
tags: ${{ steps.meta.outputs.tags }} | ||
labels: ${{ steps.meta.outputs.labels }} | ||
|
||
- name: Generate artifact attestation | ||
uses: actions/attest-build-provenance@v2 | ||
with: | ||
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}} | ||
subject-digest: ${{ steps.push.outputs.digest }} | ||
push-to-registry: true | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.venv | ||
.envrc | ||
|
||
__pycache__/ | ||
hardcode.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
repos: | ||
- repo: https://github.com/charliermarsh/ruff-pre-commit | ||
rev: v0.11.0 | ||
hooks: | ||
- id: ruff | ||
args: [--fix] | ||
- id: ruff-format |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FROM python:3.13.2-alpine | ||
|
||
WORKDIR /app | ||
|
||
COPY poetry.lock pyproject.toml /app/ | ||
COPY src/ /app/src | ||
|
||
RUN apk update \ | ||
&& apk upgrade --no-cache \ | ||
&& apk add --no-cache --virtual build-dependencies build-base curl \ | ||
&& pip install --no-cache-dir --upgrade pip setuptools wheel \ | ||
&& curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python - \ | ||
&& ln -s /etc/poetry/bin/poetry /usr/local/bin/poetry \ | ||
&& poetry run pip install --upgrade pip setuptools wheel \ | ||
&& MAKEFLAGS="-j" poetry install \ | ||
&& poetry run python -m compileall -j 0 src \ | ||
&& rm -rf /root/.cache/pip \ | ||
&& rm -rf /root/.cache/pypoetry/artifacts /root/.cache/pypoetry/cache \ | ||
&& rm -rf /etc/poetry/lib/poetry/_vendor/py3.13 \ | ||
&& apk del --no-cache build-dependencies | ||
|
||
ENTRYPOINT ["poetry", "run", "sanitizer"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,33 @@ | ||
# rds-snapshot-sanitizer | ||
|
||
Create sanitized copy of RDS snapshots and share them with selected accounts. | ||
|
||
It works by restoring an unsanitized snapshot to a temporary cluster and executing sanitizing SQL queries against it, after which sanitized snapshot will be created and optionally shared with other accounts. | ||
|
||
# Environment variable | ||
- `SANITIZER_RDS_CLUSTER_ID`: RDS cluster identifier whose snapshots will be sanitized. | ||
- `SANITIZER_CONFIG`: rds-snapshot-sanitizer configuration in JSON. See [Configuration](#configuration). | ||
- `SANTITIZER_RDS_INSTANCE_ACU`: (Optional) ACU to be allocatted for the temporary RDS instance. Defaults to 2 ACU. | ||
- `SANITIZER_SQL_MAX_CONNECTIONS`: (Optional) Number of maximum connections to be created for executing the SQL queries. Defaults to 20. | ||
- `SANITIZER_SHARE_KMS_KEY_ID`: (Optional) KMS key identifier to be used for the sanitized snapshot. | ||
- `SANITIZER_SHARE_ACCOUNT_IDS`: (Optional) List of AWS account ids to share the sanitized snapshot with. | ||
- `SANITIZER_AWS_REGION`: (Optional) AWS region where the RDS cluster is hosted. Defaults to `AWS_REGION` or `AWS_DEFAULT_REGION` environment variable. | ||
- `SANITIZER_DELETE_OLD_SNAPSHOTS`: (Optional) Whether to delete old snapshots. Defaults to False. | ||
- `SANITIZER_OLD_SNAPSHOT_DAYS`: (Optional) Number of days for a snapshot to be considered old. Defaults to 30. | ||
|
||
# Configuration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a sample of the configuration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will add 🙏 |
||
The configuration is a JSON file with the following schema: | ||
- `"tables"`: list of table configuration | ||
- `"name"`: name of the table | ||
- `"columns"`: list of column configuration | ||
- `"name"`: name of the column | ||
- `"sanitizer"`: type of sanitizer to be used. There are two types provided, static and random. | ||
- `"type"`: `"static"` | ||
- `"value"`: a static string value to be used for replacement. | ||
|
||
OR | ||
|
||
- `"type"`: `"random"` | ||
- `"kind"`: `"name"`, `"first_name"`, `"last_name"`, `"user_name"`, `"email"`, `"phone_number"`, etc. See the full list of [randomizer](https://faker.readthedocs.io/en/master/providers.html). | ||
- `"drop_constraints"`: list of table constraints to be dropped | ||
- `"drop_indexes"`: list of index to be dropped |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if you provide guide on how to run this locally (either with Docker or with Poetry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it's useful to describe what IAM permissions are required for this
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm thinking of creating a pod-identity terraform module for it in the style of https://github.com/terraform-aws-modules/terraform-aws-eks-pod-identity
The thing is the tool needs to be connected to the RDS subnet for running the SQL query. I'll probably add a flag to set the postgres host to
localhost
, with the assumption that the user can connect their localhost to the RDS (via bastion port-forwarding for example).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, if it's expected to run within AWS it's fine too, just add a note describing as such.