-
Notifications
You must be signed in to change notification settings - Fork 0
Implement RDS snapshot sanitizer #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
ebedec4
c695812
d289af5
b57880c
026ccd2
11d4379
f1328a1
4e02884
daccd3c
2962ddc
97ecb38
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
name: Create and publish a Docker image | ||
|
||
on: | ||
release: | ||
types: [published] | ||
|
||
env: | ||
REGISTRY: ghcr.io | ||
IMAGE_NAME: ${{ github.repository }} | ||
|
||
jobs: | ||
build-and-push-image: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
packages: write | ||
attestations: write | ||
id-token: write | ||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Log in to the Container registry | ||
uses: docker/login-action@v3 | ||
with: | ||
registry: ${{ env.REGISTRY }} | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Extract metadata (tags, labels) for Docker | ||
id: meta | ||
uses: docker/metadata-action@v5 | ||
with: | ||
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | ||
|
||
- name: Build and push Docker image | ||
id: push | ||
uses: docker/build-push-action@v6 | ||
with: | ||
context: . | ||
push: true | ||
tags: ${{ steps.meta.outputs.tags }} | ||
labels: ${{ steps.meta.outputs.labels }} | ||
|
||
- name: Generate artifact attestation | ||
uses: actions/attest-build-provenance@v2 | ||
with: | ||
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}} | ||
subject-digest: ${{ steps.push.outputs.digest }} | ||
push-to-registry: true | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.venv | ||
.envrc | ||
|
||
__pycache__/ | ||
hardcode.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
repos: | ||
- repo: https://github.com/charliermarsh/ruff-pre-commit | ||
rev: v0.11.0 | ||
hooks: | ||
- id: ruff | ||
args: [--fix] | ||
- id: ruff-format |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FROM python:3.13.2-alpine | ||
|
||
WORKDIR /app | ||
|
||
COPY poetry.lock pyproject.toml /app/ | ||
COPY src/ /app/src | ||
|
||
RUN apk update \ | ||
&& apk upgrade --no-cache \ | ||
&& apk add --no-cache --virtual build-dependencies build-base curl \ | ||
&& pip install --no-cache-dir --upgrade pip setuptools wheel \ | ||
&& curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python - \ | ||
&& ln -s /etc/poetry/bin/poetry /usr/local/bin/poetry \ | ||
&& poetry run pip install --upgrade pip setuptools wheel \ | ||
&& MAKEFLAGS="-j" poetry install \ | ||
&& poetry run python -m compileall -j 0 src \ | ||
&& rm -rf /root/.cache/pip \ | ||
&& rm -rf /root/.cache/pypoetry/artifacts /root/.cache/pypoetry/cache \ | ||
&& rm -rf /etc/poetry/lib/poetry/_vendor/py3.13 \ | ||
&& apk del --no-cache build-dependencies | ||
|
||
ENTRYPOINT ["poetry", "run", "sanitizer"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,34 @@ | ||
# rds-snapshot-sanitizer | ||
# rds-snapshot-sanitizer | ||
|
||
Create sanitized copy of RDS snapshots and share them with selected accounts. | ||
|
||
It works by restoring an unsanitized snapshot to a temporary cluster and executing sanitizing SQL queries against it, after which sanitized snapshot will be created and optionally shared with other accounts. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better if you provide guide on how to run this locally (either with Docker or with Poetry) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think it's useful to describe what IAM permissions are required for this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I'm thinking of creating a pod-identity terraform module for it in the style of https://github.com/terraform-aws-modules/terraform-aws-eks-pod-identity
The thing is the tool needs to be connected to the RDS subnet for running the SQL query. I'll probably add a flag to set the postgres host to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, if it's expected to run within AWS it's fine too, just add a note describing as such. |
||
|
||
# Environment variable | ||
- `SANITIZER_RDS_CLUSTER_ID`: RDS cluster identifier whose snapshots will be sanitized. | ||
- `SANITIZER_CONFIG`: rds-snapshot-sanitizer configuration in JSON. See [Configuration](#configuration). | ||
- `SANTITIZER_RDS_INSTANCE_ACU`: (Optional) ACU to be allocatted for the temporary RDS instance. Defaults to 2 ACU. | ||
- `SANITIZER_SQL_MAX_CONNECTIONS`: (Optional) Number of maximum connections to be created for executing the SQL queries. Defaults to 20. | ||
- `SANITIZER_SHARE_KMS_KEY_ID`: (Optional) KMS key identifier to be used for the sanitized snapshot. | ||
- `SANITIZER_SHARE_ACCOUNT_IDS`: (Optional) List of AWS account ids to share the sanitized snapshot with. | ||
- `SANITIZER_AWS_REGION`: (Optional) AWS region where the RDS cluster is hosted. Defaults to `AWS_REGION` or `AWS_DEFAULT_REGION` environment variable. | ||
- `SANITIZER_DELETE_OLD_SNAPSHOTS`: (Optional) Whether to delete old snapshots. Defaults to False. | ||
- `SANITIZER_OLD_SNAPSHOT_DAYS`: (Optional) Number of days for a snapshot to be considered old. Defaults to 30. | ||
|
||
# Configuration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a sample of the configuration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will add 🙏 |
||
The configuration is a JSON file with the following schema: | ||
- `"tables"`: list of table configuration | ||
- `"name"`: name of the table | ||
- `"columns"`: list of column configuration | ||
- `"name"`: name of the column | ||
- `"sanitizer"`: type of sanitizer to be used. There are two types provided, static and random. | ||
- `"type"`: `"static"` | ||
- `"value"`: a static string value to be used for replacement. | ||
|
||
OR | ||
|
||
- `"type"`: `"random"` | ||
- `"kind"`: `"name"`, `"first_name"`, `"last_name"`, `"user_name"`, `"email"`, `"phone_number"`, etc. See the full list of [randomizer](https://faker.readthedocs.io/en/master/providers.html). | ||
- `"drop_constraints"`: list of table constraints to be dropped | ||
- `"drop_indexes"`: list of index to be dropped |
Uh oh!
There was an error while loading. Please reload this page.