Skip to content

Add lgalloc support #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 90 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,93 @@ The module has been tested with:
- PostgreSQL 15
- terraform-helm-materialize v0.1.12 (Materialize Operator v25.1.7)

## Disk Support for Materialize on GCP

This module supports configuring disk support for Materialize using local SSDs in GCP with OpenEBS and lgalloc.

### Machine Types with Local SSDs in GCP

When using disk support for Materialize on GCP, you need to use machine types that support local SSD attachment. Here are some recommended machine types:

* [N2 series](https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machine_types) with local NVMe SSDs:
* For memory-optimized workloads, consider `n2-highmem-16` or `n2-highmem-32` with local NVMe SSDs
* Example: `n2-highmem-32` with 2 or more local SSDs

* [N2D series](https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machine_types) with local NVMe SSDs:
* For memory-optimized workloads, consider `n2d-highmem-16` or `n2d-highmem-32` with local NVMe SSDs
* Example: `n2d-highmem-32` with 2 or more local SSDs

### Enabling Disk Support

To enable disk support with default settings in your Terraform configuration:

```hcl
enable_disk_support = true

gke_config = {
node_count = 3

# This machine has 256GB RAM
machine_type = "n2-highmem-32"

# This is for the OS disk, not for Materialize data
disk_size_gb = 100
min_nodes = 3
max_nodes = 5

# This provides 2 x 375GB = 750GB of local SSD storage
# Exceeding the 2:1 disk-to-RAM ratio (256GB RAM : 750GB disk)
local_ssd_count = 2
}
```

This configuration:
1. Attaches two local SSDs to each node, providing 750GB of storage per node
2. Ensures the disk-to-RAM ratio is greater than 2:1 for the n2-highmem-32 instance (which has 256GB RAM)
3. Installs OpenEBS via Helm to manage these local SSDs
4. Configures local NVMe SSD devices using the [bootstrap](./modules/gke/bootstrap.sh) script
5. Creates appropriate storage classes for Materialize

### Advanced Configuration Example

For a different machine type with appropriate disk sizing:

```hcl
enable_disk_support = true

gke_config = {
node_count = 3
# This machine has 128GB RAM
machine_type = "n2-highmem-16"
disk_size_gb = 100
min_nodes = 3
max_nodes = 5
# This provides 1 x 375GB = 375GB of local NVMe SSD storage
# Exceeding the 2:1 disk-to-RAM ratio (128GB RAM : 375GB disk)
local_ssd_count = 1
}

disk_support_config = {
openebs_version = "4.2.0"
storage_class_name = "custom-storage-class"
}
```

### Calculating the Right Number of Local SSDs

The following table helps you determine the appropriate number of local SSDs based on your chosen machine type to maintain the recommended 2:1 disk-to-RAM ratio:

| Machine Type | RAM | Required Disk | Recommended Local SSD Count | Total SSD Storage |
|-----------------|---------|---------------|-----------------------------|-------------------|
| `n2-highmem-8` | `64GB` | `128GB` | 1 | `375GB` |
| `n2-highmem-16` | `128GB` | `256GB` | 1 | `375GB` |
| `n2-highmem-32` | `256GB` | `512GB` | 2 | `750GB` |
| `n2-highmem-64` | `512GB` | `1024GB` | 3 | `1125GB` |
| `n2-highmem-80` | `640GB` | `1280GB` | 4 | `1500GB` |

Remember that each local NVMe SSD in GCP provides 375GB of storage.
Choose the appropriate `local_ssd_count` to make sure your total disk space is at least twice the amount of RAM in your machine type for optimal Materialize performance.

## Requirements

| Name | Version |
Expand Down Expand Up @@ -59,7 +146,9 @@ No resources.
| <a name="input_cert_manager_install_timeout"></a> [cert\_manager\_install\_timeout](#input\_cert\_manager\_install\_timeout) | Timeout for installing the cert-manager helm chart, in seconds. | `number` | `300` | no |
| <a name="input_cert_manager_namespace"></a> [cert\_manager\_namespace](#input\_cert\_manager\_namespace) | The name of the namespace in which cert-manager is or will be installed. | `string` | `"cert-manager"` | no |
| <a name="input_database_config"></a> [database\_config](#input\_database\_config) | Cloud SQL configuration | <pre>object({<br/> tier = optional(string, "db-custom-2-4096")<br/> version = optional(string, "POSTGRES_15")<br/> password = string<br/> username = optional(string, "materialize")<br/> db_name = optional(string, "materialize")<br/> })</pre> | n/a | yes |
| <a name="input_gke_config"></a> [gke\_config](#input\_gke\_config) | GKE cluster configuration. Make sure to use large enough machine types for your Materialize instances. | <pre>object({<br/> node_count = number<br/> machine_type = string<br/> disk_size_gb = number<br/> min_nodes = number<br/> max_nodes = number<br/> })</pre> | <pre>{<br/> "disk_size_gb": 50,<br/> "machine_type": "e2-standard-4",<br/> "max_nodes": 2,<br/> "min_nodes": 1,<br/> "node_count": 1<br/>}</pre> | no |
| <a name="input_disk_support_config"></a> [disk\_support\_config](#input\_disk\_support\_config) | Advanced configuration for disk support (only used when enable\_disk\_support = true) | <pre>object({<br/> install_openebs = optional(bool, true)<br/> run_disk_setup_script = optional(bool, true)<br/> local_ssd_count = optional(number, 1)<br/> create_storage_class = optional(bool, true)<br/> openebs_version = optional(string, "4.2.0")<br/> openebs_namespace = optional(string, "openebs")<br/> storage_class_name = optional(string, "openebs-lvm-instance-store-ext4")<br/> })</pre> | `{}` | no |
| <a name="input_enable_disk_support"></a> [enable\_disk\_support](#input\_enable\_disk\_support) | Enable disk support for Materialize using OpenEBS and local SSDs. When enabled, this configures OpenEBS, runs the disk setup script, and creates appropriate storage classes. | `bool` | `true` | no |
| <a name="input_gke_config"></a> [gke\_config](#input\_gke\_config) | GKE cluster configuration. Make sure to use large enough machine types for your Materialize instances. | <pre>object({<br/> node_count = number<br/> machine_type = string<br/> disk_size_gb = number<br/> min_nodes = number<br/> max_nodes = number<br/> })</pre> | <pre>{<br/> "disk_size_gb": 100,<br/> "machine_type": "n2-highmem-8",<br/> "max_nodes": 2,<br/> "min_nodes": 1,<br/> "node_count": 1<br/>}</pre> | no |
| <a name="input_helm_chart"></a> [helm\_chart](#input\_helm\_chart) | Chart name from repository or local path to chart. For local charts, set the path to the chart directory. | `string` | `"materialize-operator"` | no |
| <a name="input_helm_values"></a> [helm\_values](#input\_helm\_values) | Values to pass to the Helm chart | `any` | `{}` | no |
| <a name="input_install_cert_manager"></a> [install\_cert\_manager](#input\_install\_cert\_manager) | Whether to install cert-manager. | `bool` | `true` | no |
Expand Down
88 changes: 88 additions & 0 deletions docs/header.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,91 @@ The module has been tested with:
- GKE version 1.28
- PostgreSQL 15
- terraform-helm-materialize v0.1.12 (Materialize Operator v25.1.7)


## Disk Support for Materialize on GCP

This module supports configuring disk support for Materialize using local SSDs in GCP with OpenEBS and lgalloc.

### Machine Types with Local SSDs in GCP

When using disk support for Materialize on GCP, you need to use machine types that support local SSD attachment. Here are some recommended machine types:

* [N2 series](https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machine_types) with local NVMe SSDs:
* For memory-optimized workloads, consider `n2-highmem-16` or `n2-highmem-32` with local NVMe SSDs
* Example: `n2-highmem-32` with 2 or more local SSDs

* [N2D series](https://cloud.google.com/compute/docs/general-purpose-machines#n2d_machine_types) with local NVMe SSDs:
* For memory-optimized workloads, consider `n2d-highmem-16` or `n2d-highmem-32` with local NVMe SSDs
* Example: `n2d-highmem-32` with 2 or more local SSDs

### Enabling Disk Support

To enable disk support with default settings in your Terraform configuration:

```hcl
enable_disk_support = true

gke_config = {
node_count = 3

# This machine has 256GB RAM
machine_type = "n2-highmem-32"

# This is for the OS disk, not for Materialize data
disk_size_gb = 100
min_nodes = 3
max_nodes = 5

# This provides 2 x 375GB = 750GB of local SSD storage
# Exceeding the 2:1 disk-to-RAM ratio (256GB RAM : 750GB disk)
local_ssd_count = 2
}
```

This configuration:
1. Attaches two local SSDs to each node, providing 750GB of storage per node
2. Ensures the disk-to-RAM ratio is greater than 2:1 for the n2-highmem-32 instance (which has 256GB RAM)
3. Installs OpenEBS via Helm to manage these local SSDs
4. Configures local NVMe SSD devices using the [bootstrap](./modules/gke/bootstrap.sh) script
5. Creates appropriate storage classes for Materialize

### Advanced Configuration Example

For a different machine type with appropriate disk sizing:

```hcl
enable_disk_support = true

gke_config = {
node_count = 3
# This machine has 128GB RAM
machine_type = "n2-highmem-16"
disk_size_gb = 100
min_nodes = 3
max_nodes = 5
# This provides 1 x 375GB = 375GB of local NVMe SSD storage
# Exceeding the 2:1 disk-to-RAM ratio (128GB RAM : 375GB disk)
local_ssd_count = 1
}

disk_support_config = {
openebs_version = "4.2.0"
storage_class_name = "custom-storage-class"
}
```

### Calculating the Right Number of Local SSDs

The following table helps you determine the appropriate number of local SSDs based on your chosen machine type to maintain the recommended 2:1 disk-to-RAM ratio:

| Machine Type | RAM | Required Disk | Recommended Local SSD Count | Total SSD Storage |
|-----------------|---------|---------------|-----------------------------|-------------------|
| `n2-highmem-8` | `64GB` | `128GB` | 1 | `375GB` |
| `n2-highmem-16` | `128GB` | `256GB` | 1 | `375GB` |
| `n2-highmem-32` | `256GB` | `512GB` | 2 | `750GB` |
| `n2-highmem-64` | `512GB` | `1024GB` | 3 | `1125GB` |
| `n2-highmem-80` | `640GB` | `1280GB` | 4 | `1500GB` |

Remember that each local NVMe SSD in GCP provides 375GB of storage.
Choose the appropriate `local_ssd_count` to make sure your total disk space is at least twice the amount of RAM in your machine type for optimal Materialize performance.
32 changes: 32 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,23 @@ locals {
managed_by = "terraform"
module = "materialize"
})

# Disk support configuration
disk_config = {
install_openebs = var.enable_disk_support ? lookup(var.disk_support_config, "install_openebs", true) : false
run_disk_setup_script = var.enable_disk_support ? lookup(var.disk_support_config, "run_disk_setup_script", true) : false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just base this one entirely on var.enable_disk_support and not have this extra var?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was that users only set enable_disk_support, and the rest use defaults. I added disk_support_config just for extra flexibility if we ever need to override things. Happy to simplify if we want to keep it more opinionated though, up to you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the PR is more or less a copy and paste from the AWS implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.... let's just keep this

local_ssd_count = lookup(var.disk_support_config, "local_ssd_count", 1)
create_storage_class = var.enable_disk_support ? lookup(var.disk_support_config, "create_storage_class", true) : false
openebs_version = lookup(var.disk_support_config, "openebs_version", "4.2.0")
openebs_namespace = lookup(var.disk_support_config, "openebs_namespace", "openebs")
storage_class_name = lookup(var.disk_support_config, "storage_class_name", "openebs-lvm-instance-store-ext4")
storage_class_provisioner = "local.csi.openebs.io"
storage_class_parameters = {
storage = "lvm"
fsType = "ext4"
volgroup = "instance-store-vg"
}
}
}

module "networking" {
Expand Down Expand Up @@ -33,6 +50,13 @@ module "gke" {
min_nodes = var.gke_config.min_nodes
max_nodes = var.gke_config.max_nodes

# Disk support configuration
enable_disk_setup = local.disk_config.run_disk_setup_script
local_ssd_count = local.disk_config.local_ssd_count
install_openebs = local.disk_config.install_openebs
openebs_namespace = local.disk_config.openebs_namespace
openebs_version = local.disk_config.openebs_version

namespace = var.namespace
labels = local.common_labels
}
Expand Down Expand Up @@ -158,6 +182,14 @@ locals {
}
}
}
storage = var.enable_disk_support ? {
storageClass = {
create = local.disk_config.create_storage_class
name = local.disk_config.storage_class_name
provisioner = local.disk_config.storage_class_provisioner
parameters = local.disk_config.storage_class_parameters
}
} : {}
tls = (var.use_self_signed_cluster_issuer && length(var.materialize_instances) > 0) ? {
defaultCertificateSpecs = {
balancerdExternal = {
Expand Down
54 changes: 54 additions & 0 deletions modules/gke/bootstrap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash
set -xeuo pipefail

echo "Starting GCP NVMe SSD setup"

# Install required tools
if command -v apt-get >/dev/null 2>&1; then
apt-get update
apt-get install -y lvm2
elif command -v yum >/dev/null 2>&1; then
yum install -y lvm2
else
echo "No package manager found. Please install required tools manually."
exit 1
fi
Comment on lines +6 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely version lock this, or if there aren't dependencies, install from a binary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pinned this to a specific version.

Regarding the dependencies:

Reading package lists...
+ apt-get install -y lvm2=2.03.11-2.1
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  dmeventd dmsetup libaio1 libbsd0 libdevmapper-event1.02.1 libdevmapper1.02.1
  libedit2 libexpat1 liblvm2cmd2.03 libmd0 thin-provisioning-tools

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely unlikely to work long-term. If you want to pin it, you need to bake it into an image. The upstream repos will likely not contain that specific version for long.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I thought that we did not want to maintain our own image because of security concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can't pin the version this way, and the security concerns don't go away just because you're using someone else's image.

We can either:

  1. Unpin both the lvm2 package version and use a moving target debian image tag.
  2. Maintain our own image and use dependabot (or similar) to keep the image up to date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed the version pin.

When I was originally working on that custom container, I got pushback because of concerns that a lot of vulnerability scanner noise could come from bootstrap containers.

But if we are all fine with this, I am happy to work on that bootstrap Docker image.


# Find NVMe devices
SSD_DEVICE_LIST=()

devices=$(find /dev/disk/by-id/ -name "google-local-ssd-*" 2>/dev/null || true)
if [ -n "$devices" ]; then
while read -r device; do
SSD_DEVICE_LIST+=("$device")
done <<<"$devices"
else
echo "ERROR: No Local SSD devices found at standard path /dev/disk/by-id/google-local-ssd-*"
echo "Please verify that local SSDs were properly attached to this instance"
echo "See: https://cloud.google.com/compute/docs/disks/local-ssd"
exit 1
fi

# Check if any of the devices are already in use by LVM
for device in "${SSD_DEVICE_LIST[@]}"; do
if pvdisplay "$device" &>/dev/null; then
echo "$device is already part of LVM, skipping setup"
exit 0
fi
done

# Create physical volumes
for device in "${SSD_DEVICE_LIST[@]}"; do
echo "Creating physical volume on $device"
pvcreate -f "$device"
done

# Create volume group
echo "Creating volume group instance-store-vg"
vgcreate instance-store-vg "${SSD_DEVICE_LIST[@]}"

# Display results
pvs
vgs

echo "Disk setup completed"
Loading