From 8cf5b28084186c5ae9ed33966ef923b1cbeef71f Mon Sep 17 00:00:00 2001 From: Aswin A Date: Tue, 25 Mar 2025 14:20:48 +0530 Subject: [PATCH 1/6] feat: Added details of EntityOperator in Stretch clusters Added details of EntityOperator in Stretch clusters Signed-off-by: Aswin A --- docs/.pages | 3 +- docs/entityoperator.md | 234 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 236 insertions(+), 1 deletion(-) create mode 100644 docs/entityoperator.md diff --git a/docs/.pages b/docs/.pages index 8fa72ac..4e17be9 100644 --- a/docs/.pages +++ b/docs/.pages @@ -13,4 +13,5 @@ nav: - Testing-cluster-failover.md - Testing-failover-and-resiliency.md - Testing-performance.md - - Setting-up-Rack-Awareness-In-Stretch-Cluster.md \ No newline at end of file + - Setting-up-Rack-Awareness-In-Stretch-Cluster.md + - EntityOperator.md \ No newline at end of file diff --git a/docs/entityoperator.md b/docs/entityoperator.md new file mode 100644 index 0000000..bc6f0af --- /dev/null +++ b/docs/entityoperator.md @@ -0,0 +1,234 @@ +# Entity Operator + +The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. + +## Key Components of Entity Operator + +The Entity Operator consists of two main sub-components: + +### Topic Operator + +- Watches for KafkaTopic CRs in Kubernetes. +- Automatically creates, updates, and deletes topics in Kafka based on KafkaTopic CR definitions. +- Keeps Kubernetes and Kafka topic configurations in sync. +- Ensures desired state consistency between Kubernetes and Kafka. + +### User Operator + +- Watches for KafkaUser CRs in Kubernetes. +- Manages security credentials (TLS certificates, SASL credentials). +- Ensures user permissions and authentication are correctly configured. + +## Why is the Entity Operator Useful? + +- Eliminates the need for manual topic and user management. +- Ensures Kafka users have appropriate authentication and authorization settings. +- Enables declarative management using Kubernetes CRs. +- Keeps configurations between Kubernetes and Kafka in sync. + +## How Client Applications Use KafkaTopic and KafkaUser CRs in Strimzi + +The client applications interact with Kafka topics and users in Strimzi using Kubernetes native resources + +- KafkaTopic CRs define and manage Kafka topics. +- KafkaUser CRs define users and security credentials for authentication & authorization. + +## How Applications Use KafkaTopic CRs + +### Creating a Topic + +Developers define a topic declaratively using a KafkaTopic CR. The Topic Operator ensures this topic is created in Kafka. + +**Example KafkaTopic CR** + +```yaml +apiVersion: kafka.strimzi.io/v1beta2 +kind: KafkaTopic +metadata: + name: my-topic + labels: + strimzi.io/cluster: my-cluster # Must match the Kafka cluster name +spec: + partitions: 3 + replicas: 2 + config: + retention.ms: 86400000 # Data retention for 1 day + segment.bytes: 1073741824 # 1GB segment size +``` + +**How clients use it** + +Once the topic is created, client applications (producers & consumers) can publish and read messages from `my-topic` like any regular Kafka topic. + +## How Applications Use KafkaUser CRs + +### Creating a User for Authentication & Authorization + +Client applications need a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, authentication method (TLS/SCRAM-SHA), and permissions. + +```yaml +apiVersion: kafka.strimzi.io/v1beta2 +kind: KafkaUser +metadata: + name: my-app-user + labels: + strimzi.io/cluster: my-cluster # Must match the Kafka cluster name +spec: + authentication: + type: tls # TLS-based auth + authorization: + type: simple + acls: + - resource: + type: topic + name: my-topic + patternType: literal + operations: + - Read + - Write +``` + +**How clients use it** + +### Authentication + +- If `TLS` authentication is enabled, Strimzi will generate a secret containing the user's TLS certificates. +- If `SCRAM-SHA` authentication is enabled, Strimzi will generate a username and password in a Kubernetes secret. + +### Authorization (ACLs) + +- In the above example, the user my-app-user has Read & Write access to my-topic. + +Clients will only be able to perform allowed operations. + +## How Clients Retrieve and Use Credentials + +After creating a KafkaUser, Strimzi automatically generates a Kubernetes Secret with the credentials. + +**Example** + +```bash +kubectl get secret my-app-user -o yaml +``` + +It will contain + +#### For TLS authentication + +- ca.crt (CA certificate) +- user.crt (Client certificate) +- user.key (Client private key) + +#### For SCRAM-SHA authentication + +- password (Base64-encoded password) + +### Using These Credentials in a Kafka Client + +**Example** + +##### Java Producer Example (TLS Authentication) + + +```java +Properties props = new Properties(); +props.put("bootstrap.servers", "my-cluster-kafka-bootstrap:9093"); +props.put("security.protocol", "SSL"); +props.put("ssl.truststore.location", "/etc/secrets/ca.p12"); +props.put("ssl.truststore.password", "password"); +props.put("ssl.keystore.location", "/etc/secrets/user.p12"); +props.put("ssl.keystore.password", "password"); + +KafkaProducer producer = new KafkaProducer<>(props); +``` + +#### Java Consumer Example (SCRAM-SHA Authentication) + +```java +Properties props = new Properties(); +props.put("bootstrap.servers", "my-cluster-kafka-bootstrap:9093"); +props.put("security.protocol", "SASL_SSL"); +props.put("sasl.mechanism", "SCRAM-SHA-512"); +props.put("sasl.jaas.config", "org.apache.kafka.common.security.scram.ScramLoginModule required username='my-app-user' password='my-secret-password';"); + +KafkaConsumer consumer = new KafkaConsumer<>(props); + +``` + +### Summary of How Applications Use KafkaTopic & KafkaUser CRs + +| Action | Operator Responsible | +| -------- | ------- | +| Developer creates a `KafkaTopic` CR | Topic Operator creates & syncs the topic in Kafka | +| Developer creates a KafkaUser CR | User Operator creates the user & credentials | +| Application retrieves credentials from Kubernetes Secrets | Application mounts the secrets for authentication | +| Application connects to Kafka using these credentials | Producer/Consumer communicates with Kafka | + + +## Impact of Central Cluster Failure on Kafka Clients in a Stretch Cluster + +In stretch Kafka deployment, where + +- ✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters. +- ✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic. +- ✅ The Entity Operator (managing users & topics) runs in the central cluster. + +## Does the Kafka Cluster Still Function? + +Yes, but with conditions. + +- Brokers in other Kubernetes clusters are still running. +- Kafka controllers may still be running (if a quorum is maintained). + +Replication between brokers (spread across clusters) continues as long as there’s a majority quorum of controllers. + +- ✅ If a majority of controllers are still available in the surviving clusters, Kafka continues running. +- 🚨 If the majority of controllers were in the central cluster and lost, Kafka will experience a complete outage. + +💡 Mitigation -> user need to ensure the `controller.quorum.voters` are spread across clusters to avoid losing the majority. + +## Can Kafka Clients Still Communicate with Brokers? + +Yes, if at least one broker remains reachable. + +Clients connect to brokers, not the Entity Operator or the central cluster. + +- If clients' bootstrap servers list includes brokers running in the surviving clusters, they can still connect. +- If a leader partition for a topic is hosted on a broker in a surviving cluster, the topic remains available. +- If all leader partitions for a topic were on brokers in the central cluster, that topic becomes unavailable. + +💡 Mitigation -> Use Kafka’s rack awareness (`broker.rack`) and ISR (In-Sync Replica) balancing to distribute leader partitions across clusters. + +## Can Clients Still Produce & Consume Messages? + +Yes, if leader partitions are available. +No, if all leader partitions were in the lost central cluster. + +**Producing Messages** + +- A producer can send messages only if the leader partition for a topic is still available in a surviving cluster. +- If the leader was in the lost central cluster, Kafka automatically elects a new leader (if ISR exists). +- If no ISR exists, production fails. + +**Consuming Messages** + +- Consumers can still fetch messages from available partitions. +- If consumer group metadata was stored in a lost broker, the group might experience issues. + +💡 Mitigation -> Enable `min.insync.replicas` and leader election across clusters to ensure partition availability. + + +## What About Entity Operator Functions? +The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because + +- Kafka clients do not interact with the Entity Operator at runtime. +- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters. +- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers. + +Kafka clients (producers and consumers) can still authenticate and connect to Kafka brokers in the surviving clusters as long as the necessary authentication credentials (secrets) are available in those clusters. + +If the KafkaUser secrets only exist in the central cluster, then when it goes down, clients in other clusters cannot authenticate to Kafka brokers. However, if these secrets were already copied to all clusters where brokers are running, authentication will still work even if the central cluster is down. + +To avoid authentication failures when the central cluster goes down, you must: + +- Replicate KafkaUser secrets across all clusters where Kafka brokers exist. From e5cf3d0b5980817686017e05a5fcd9507deebf14 Mon Sep 17 00:00:00 2001 From: Aswin A Date: Tue, 25 Mar 2025 16:17:19 +0530 Subject: [PATCH 2/6] fix: addressed review comments Addressed review comments Signed-off-by: Aswin A --- docs/entityoperator.md | 78 ++++++++++++++++-------------------------- 1 file changed, 29 insertions(+), 49 deletions(-) diff --git a/docs/entityoperator.md b/docs/entityoperator.md index bc6f0af..7eefed3 100644 --- a/docs/entityoperator.md +++ b/docs/entityoperator.md @@ -1,6 +1,6 @@ -# Entity Operator +# Impact of Entity Operator Availability in a Stretch Kafka Cluster -The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. +The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. This document explains how its availability affects topic and user management when deployed in a multi-cluster Kafka setup. ## Key Components of Entity Operator @@ -169,66 +169,46 @@ KafkaConsumer consumer = new KafkaConsumer<>(props); In stretch Kafka deployment, where -- ✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters. -- ✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic. -- ✅ The Entity Operator (managing users & topics) runs in the central cluster. +✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters.
+✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic.
+✅ The Entity Operator (managing users & topics) runs in the central cluster. -## Does the Kafka Cluster Still Function? -Yes, but with conditions. - -- Brokers in other Kubernetes clusters are still running. -- Kafka controllers may still be running (if a quorum is maintained). - -Replication between brokers (spread across clusters) continues as long as there’s a majority quorum of controllers. - -- ✅ If a majority of controllers are still available in the surviving clusters, Kafka continues running. -- 🚨 If the majority of controllers were in the central cluster and lost, Kafka will experience a complete outage. - -💡 Mitigation -> user need to ensure the `controller.quorum.voters` are spread across clusters to avoid losing the majority. - -## Can Kafka Clients Still Communicate with Brokers? - -Yes, if at least one broker remains reachable. - -Clients connect to brokers, not the Entity Operator or the central cluster. - -- If clients' bootstrap servers list includes brokers running in the surviving clusters, they can still connect. -- If a leader partition for a topic is hosted on a broker in a surviving cluster, the topic remains available. -- If all leader partitions for a topic were on brokers in the central cluster, that topic becomes unavailable. - -💡 Mitigation -> Use Kafka’s rack awareness (`broker.rack`) and ISR (In-Sync Replica) balancing to distribute leader partitions across clusters. +## What About Entity Operator Functions? +The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because -## Can Clients Still Produce & Consume Messages? +- Kafka clients do not interact with the Entity Operator at runtime. +- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters. +- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers. -Yes, if leader partitions are available. -No, if all leader partitions were in the lost central cluster. +## What Happens If No Cluster Has KafkaUser and KafkaTopic CRs? -**Producing Messages** +If the central cluster is the only one hosting KafkaUser and KafkaTopic CRs, then when it goes down: -- A producer can send messages only if the leader partition for a topic is still available in a surviving cluster. -- If the leader was in the lost central cluster, Kafka automatically elects a new leader (if ISR exists). -- If no ISR exists, production fails. +1. User Authentication Risks -**Consuming Messages** + - Kafka brokers in surviving clusters rely on existing secrets for authentication. + - If KafkaUser secrets were only stored in the central cluster and not replicated, brokers in other clusters will be unable to authenticate client requests. + - New client connections will fail since brokers cannot verify credentials. + - Existing client connections may remain active if they were authenticated before the central cluster failure, but they will eventually be disconnected when session timeouts occur. -- Consumers can still fetch messages from available partitions. -- If consumer group metadata was stored in a lost broker, the group might experience issues. +2. Topic Management Limitations -💡 Mitigation -> Enable `min.insync.replicas` and leader election across clusters to ensure partition availability. + - Topics that were already created will continue to exist and function normally. + - Clients can still produce and consume messages only if they are already authenticated before the central cluster failure. + - No new topics can be created or updated since the KafkaTopic CRs and Entity Operator are unavailable. +### Mitigation Strategies -## What About Entity Operator Functions? -The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because +To ensure Kafka clients remain functional even when the central cluster goes down, we should implement the following best practices -- Kafka clients do not interact with the Entity Operator at runtime. -- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters. -- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers. +✅ Replicate KafkaUser secrets across all clusters where Kafka brokers exist. -Kafka clients (producers and consumers) can still authenticate and connect to Kafka brokers in the surviving clusters as long as the necessary authentication credentials (secrets) are available in those clusters. +- This ensures authentication remains functional even if the central cluster is unavailable. -If the KafkaUser secrets only exist in the central cluster, then when it goes down, clients in other clusters cannot authenticate to Kafka brokers. However, if these secrets were already copied to all clusters where brokers are running, authentication will still work even if the central cluster is down. +✅ Ensure Kafka brokers cache authentication data where possible(This needs verification). -To avoid authentication failures when the central cluster goes down, you must: +- Some authentication mechanisms (like SCRAM) allow brokers to cache credentials temporarily. +- This can help avoid immediate authentication failures if the central cluster is temporarily down. -- Replicate KafkaUser secrets across all clusters where Kafka brokers exist. +✅ Alternatively we can Explore options like KafkaAccess Operator. This reduces dependency on a single cluster for authentication. From 2bf7791d8b7500c6e6e17cba48c3fc4a71af0142 Mon Sep 17 00:00:00 2001 From: Aswin A Date: Wed, 26 Mar 2025 13:00:23 +0530 Subject: [PATCH 3/6] feat: Addressed Review comments Addressed Review comments Signed-off-by: Aswin A --- docs/entityoperator.md | 118 ++++++++++++++++++++++------------------- 1 file changed, 63 insertions(+), 55 deletions(-) diff --git a/docs/entityoperator.md b/docs/entityoperator.md index 7eefed3..996313c 100644 --- a/docs/entityoperator.md +++ b/docs/entityoperator.md @@ -1,45 +1,46 @@ -# Impact of Entity Operator Availability in a Stretch Kafka Cluster +# Impact of Entity Operator Availability in a Stretched Kafka Cluster -The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. This document explains how its availability affects topic and user management when deployed in a multi-cluster Kafka setup. +This document outlines the role of the Strimzi Entity Operator in managing Kafka users and topics and explains how its availability affects operations within a multi-cluster Kafka deployment where the Entity Operator resides in a central Kubernetes cluster. -## Key Components of Entity Operator +## Key Components of the Entity Operator -The Entity Operator consists of two main sub-components: +The Entity Operator in Strimzi comprises two primary sub-components: ### Topic Operator -- Watches for KafkaTopic CRs in Kubernetes. -- Automatically creates, updates, and deletes topics in Kafka based on KafkaTopic CR definitions. -- Keeps Kubernetes and Kafka topic configurations in sync. -- Ensures desired state consistency between Kubernetes and Kafka. +- Monitors Kubernetes for `KafkaTopic` CRs. +- Automatically creates, updates, and deletes Kafka topics within the Kafka cluster based on the definitions in the `KafkaTopic` CRs. +- Ensures synchronization between the desired topic configurations in Kubernetes and the actual topic configurations in Kafka. ### User Operator -- Watches for KafkaUser CRs in Kubernetes. -- Manages security credentials (TLS certificates, SASL credentials). -- Ensures user permissions and authentication are correctly configured. +- Monitors Kubernetes for `KafkaUser` CRs. +- Manages security credentials (e.g., TLS certificates, SASL credentials) and configures user permissions and authentication within the Kafka cluster. +- Automates the provisioning and synchronization of Kafka user authentication and authorization settings. -## Why is the Entity Operator Useful? +## Why is the Entity Operator Essential? -- Eliminates the need for manual topic and user management. -- Ensures Kafka users have appropriate authentication and authorization settings. -- Enables declarative management using Kubernetes CRs. -- Keeps configurations between Kubernetes and Kafka in sync. +The Entity Operator provides several key benefits: -## How Client Applications Use KafkaTopic and KafkaUser CRs in Strimzi +- Eliminates the need for manual topic and user management through Kafka's administrative tools. +- Ensures Kafka users are configured with appropriate authentication and authorization settings as defined in Kubernetes. +- Enables the management of Kafka resources using Kubernetes-native Custom Resources, promoting a declarative approach. +- Maintains consistency between the desired state in Kubernetes and the actual state of topics and users in Kafka. -The client applications interact with Kafka topics and users in Strimzi using Kubernetes native resources +## How Client Applications Utilize KafkaTopic and KafkaUser CRs in Strimzi -- KafkaTopic CRs define and manage Kafka topics. -- KafkaUser CRs define users and security credentials for authentication & authorization. +Client applications interact with Kafka topics and users in Strimzi using Kubernetes Custom Resources: -## How Applications Use KafkaTopic CRs +- **`KafkaTopic` CRs:** Define and manage Kafka topics, specifying parameters like partitions, replication factor, and configuration. +- **`KafkaUser` CRs:** Define users and their security configurations for authentication and authorization. + +## How Applications Utilize `KafkaTopic` CRs ### Creating a Topic -Developers define a topic declaratively using a KafkaTopic CR. The Topic Operator ensures this topic is created in Kafka. +Developers define Kafka topics declaratively using `KafkaTopic` CRs. The Topic Operator ensures the creation of these topics within the Kafka cluster. -**Example KafkaTopic CR** +**Example `KafkaTopic` CR:** ```yaml apiVersion: kafka.strimzi.io/v1beta2 @@ -52,19 +53,19 @@ spec: partitions: 3 replicas: 2 config: - retention.ms: 86400000 # Data retention for 1 day - segment.bytes: 1073741824 # 1GB segment size + retention.ms: 86400000 # Data retention for 1 day + segment.bytes: 1073741824 # 1GB segment size ``` -**How clients use it** +**How Clients Use It** -Once the topic is created, client applications (producers & consumers) can publish and read messages from `my-topic` like any regular Kafka topic. +Once the `my-topic` is created by the Topic Operator, client applications (producers and consumers) can publish and read messages from it as they would with any regular Kafka topic, provided they have the necessary permissions. -## How Applications Use KafkaUser CRs +## How Applications Utilize `KafkaUser` CRs ### Creating a User for Authentication & Authorization -Client applications need a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, authentication method (TLS/SCRAM-SHA), and permissions. +Client applications require a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, the authentication method (e.g., TLS or SCRAM-SHA), and the permissions they should have. ```yaml apiVersion: kafka.strimzi.io/v1beta2 @@ -155,6 +156,18 @@ KafkaConsumer consumer = new KafkaConsumer<>(props); ``` + +**Impact on Clients:** + +The fact that the User Operator manages credentials and ACLs through Kafka's standard mechanisms means that the availability of the User Operator is crucial for: + +- Creating and managing new user identities within Kafka. +- Ensuring that the correct authentication credentials are in place and accessible. +- Defining and enforcing authorization rules for user access to topics. + +When the Central cluster (and thus the User Operator) is unavailable, the ability to perform these management tasks is lost, directly impacting the ability of clients to authenticate and operate with the expected level of access. While the underlying Kafka authentication and authorization capabilities exist within the brokers, the management and provisioning through the Kubernetes control plane are disrupted. This means that administrators will not be able to create, update, or delete Kafka users and topics, including performing credential rotations and ACL updates. + + ### Summary of How Applications Use KafkaTopic & KafkaUser CRs | Action | Operator Responsible | @@ -174,41 +187,36 @@ In stretch Kafka deployment, where ✅ The Entity Operator (managing users & topics) runs in the central cluster. -## What About Entity Operator Functions? -The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because - -- Kafka clients do not interact with the Entity Operator at runtime. -- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters. -- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers. - -## What Happens If No Cluster Has KafkaUser and KafkaTopic CRs? +The failure of the central Kubernetes cluster will render the Entity Operator unavailable. This has the following implications for Kafka clients -If the central cluster is the only one hosting KafkaUser and KafkaTopic CRs, then when it goes down: +#### Authentication -1. User Authentication Risks +- Kafka brokers in the surviving member clusters rely on the configured authentication mechanisms and the presence of valid credentials for client authentication. +- If secrets containing authentication credentials (TLS certificates or SCRAM passwords) are not replicated across all clusters, new client deployments and credential updates will fail. However, existing clients with valid credentials will continue functioning until their credentials expire or require rotation. +- Existing client connections that were authenticated before the central cluster failure might remain active for a period, but they will eventually be disconnected due to session timeouts or other factors, and they will fail to re-establish connections without valid authentication. +- Crucially, the management of credentials (e.g., rotation) through the User Operator will be unavailable. - - Kafka brokers in surviving clusters rely on existing secrets for authentication. - - If KafkaUser secrets were only stored in the central cluster and not replicated, brokers in other clusters will be unable to authenticate client requests. - - New client connections will fail since brokers cannot verify credentials. - - Existing client connections may remain active if they were authenticated before the central cluster failure, but they will eventually be disconnected when session timeouts occur. +#### Authorization -2. Topic Management Limitations +- The ACLs defined in KafkaUser CRs are configured on the Kafka brokers. These ACLs will generally remain in place. +- However, any new authorization rules or modifications to existing ones defined in KafkaUser CRs cannot be applied because the User Operator is down. +- TLS certificates used for authentication expire and rotate periodically. Without the User Operator, expired certificates cannot be renewed, leading to eventual authentication failures. - - Topics that were already created will continue to exist and function normally. - - Clients can still produce and consume messages only if they are already authenticated before the central cluster failure. - - No new topics can be created or updated since the KafkaTopic CRs and Entity Operator are unavailable. +#### Topic Management -### Mitigation Strategies +- Topics that were already created will continue to exist and function normally. +- Clients can continue to produce and consume messages on existing topics if they remain authenticated and authorized. +- Existing topics will continue to function, but administrators cannot modify topic configurations or delete topics through Kubernetes. +- No new topics can be created or updated through the Kubernetes-managed KafkaTopic CRs since the Topic Operator is unavailable. -To ensure Kafka clients remain functional even when the central cluster goes down, we should implement the following best practices +### Why the Entity Operator's Absence Impacts Clients -✅ Replicate KafkaUser secrets across all clusters where Kafka brokers exist. +As outlined in the 'Impact of Central Cluster Failure' section, the unavailability of the Entity Operator disrupts the declarative management of critical aspects like user authentication and topic lifecycle within your Kubernetes environment. This loss of control directly affects the ability of clients to authenticate, access new resources, and manage their connections effectively. -- This ensures authentication remains functional even if the central cluster is unavailable. +### Mitigation Strategies to Enhance Client Functionality During Central Cluster Failure: -✅ Ensure Kafka brokers cache authentication data where possible(This needs verification). +To enhance the resilience of Kafka clients in the event of a central cluster failure, the following best practices are recommended: -- Some authentication mechanisms (like SCRAM) allow brokers to cache credentials temporarily. -- This can help avoid immediate authentication failures if the central cluster is temporarily down. +✅ Replicate KafkaUser Secrets: Ensure that the Kubernetes Secrets containing authentication credentials (TLS certificates or SCRAM passwords) are replicated across all Kubernetes clusters where Kafka brokers are running. This allows brokers in surviving clusters to authenticate clients using the known credentials. -✅ Alternatively we can Explore options like KafkaAccess Operator. This reduces dependency on a single cluster for authentication. +✅ Explore Alternative Authentication and Authorization Solutions: Consider solutions like the Kafka Access Operator, which might offer more distributed control over authentication and authorization, reducing the dependency on a single central cluster for these critical functions. \ No newline at end of file From 7467b051037d5591de7fdab7b3507c9759d82d1b Mon Sep 17 00:00:00 2001 From: Aswin A Date: Fri, 28 Mar 2025 10:19:41 +0530 Subject: [PATCH 4/6] fix: Added more clarity Added more clarity Signed-off-by: Aswin A --- docs/entityoperator.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/entityoperator.md b/docs/entityoperator.md index 996313c..291b558 100644 --- a/docs/entityoperator.md +++ b/docs/entityoperator.md @@ -124,6 +124,8 @@ It will contain - password (Base64-encoded password) +*Note:* The Kubernetes Secret is primarily used to distribute the initial SCRAM username and password to clients. However, Kafka brokers internally store hashed verifiers of these credentials. When clients authenticate, they send their credentials to Kafka, which verifies them against the stored hashed verifiers rather than using the plaintext password from the Secret. + ### Using These Credentials in a Kafka Client **Example** @@ -192,8 +194,9 @@ The failure of the central Kubernetes cluster will render the Entity Operator un #### Authentication - Kafka brokers in the surviving member clusters rely on the configured authentication mechanisms and the presence of valid credentials for client authentication. -- If secrets containing authentication credentials (TLS certificates or SCRAM passwords) are not replicated across all clusters, new client deployments and credential updates will fail. However, existing clients with valid credentials will continue functioning until their credentials expire or require rotation. -- Existing client connections that were authenticated before the central cluster failure might remain active for a period, but they will eventually be disconnected due to session timeouts or other factors, and they will fail to re-establish connections without valid authentication. +- If TLS certificates secrets are not replicated across all clusters, new client deployments and credential updates will fail. However, existing clients with valid TLS certificates will continue functioning until their certificates expire or require rotation. The duration for which they can operate depends entirely on the expiration date set when the certificates were issued. +- If SCRAM credentials have been successfully replicated across the Kafka brokers, existing clients should be able to continue authenticating, even if the Central cluster is down. The issue is with new client deployments or credential updates. +- Existing client connections that were authenticated before the central cluster failure might remain active for a period, but their continued operation depends on multiple factors. If a Kafka broker restarts, clients may need to re-authenticate, which could fail if they rely on new credentials from an unavailable Entity Operator. Additionally, the configured `session.timeout.ms` and Kafka’s reauthentication behavior may determine how long clients remain connected before being disconnected. - Crucially, the management of credentials (e.g., rotation) through the User Operator will be unavailable. #### Authorization @@ -211,12 +214,12 @@ The failure of the central Kubernetes cluster will render the Entity Operator un ### Why the Entity Operator's Absence Impacts Clients -As outlined in the 'Impact of Central Cluster Failure' section, the unavailability of the Entity Operator disrupts the declarative management of critical aspects like user authentication and topic lifecycle within your Kubernetes environment. This loss of control directly affects the ability of clients to authenticate, access new resources, and manage their connections effectively. +As outlined in the 'Impact of Central Cluster Failure' section, the unavailability of the Entity Operator disrupts the declarative management of critical administrative functions like user authentication and topic lifecycle within your Kubernetes environment. This loss of control directly impacts the ability of administrators to create, update, or delete users and topics, and to manage their credentials and access. -### Mitigation Strategies to Enhance Client Functionality During Central Cluster Failure: +### Best Practices for Stretched Kafka Deployments -To enhance the resilience of Kafka clients in the event of a central cluster failure, the following best practices are recommended: +To enhance the resilience of Kafka clients in a stretched deployment, especially in the event of a central cluster failure, the following best practices are recommended: -✅ Replicate KafkaUser Secrets: Ensure that the Kubernetes Secrets containing authentication credentials (TLS certificates or SCRAM passwords) are replicated across all Kubernetes clusters where Kafka brokers are running. This allows brokers in surviving clusters to authenticate clients using the known credentials. +✅ Ensure that authentication credentials (TLS certificates and SCRAM hashed verifiers) are replicated across all Kafka brokers in the stretched cluster. This ensures that clients can continue to authenticate even if the central cluster is unavailable. -✅ Explore Alternative Authentication and Authorization Solutions: Consider solutions like the Kafka Access Operator, which might offer more distributed control over authentication and authorization, reducing the dependency on a single central cluster for these critical functions. \ No newline at end of file +✅ Consider exploring alternative authentication and authorization solutions, such as the Kafka Access Operator, which might offer more distributed control and reduce the dependency on a single central cluster for these critical functions. This can improve the overall resilience of the deployment. \ No newline at end of file From 3159cfc5be594fc508b98655fb79e3bb8dcae623 Mon Sep 17 00:00:00 2001 From: Aswin A Date: Fri, 28 Mar 2025 12:51:12 +0530 Subject: [PATCH 5/6] feat: Added details of internal, external access Added details of internal, external access Signed-off-by: Aswin A --- docs/entityoperator.md | 90 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/docs/entityoperator.md b/docs/entityoperator.md index 291b558..c62bde3 100644 --- a/docs/entityoperator.md +++ b/docs/entityoperator.md @@ -212,6 +212,96 @@ The failure of the central Kubernetes cluster will render the Entity Operator un - Existing topics will continue to function, but administrators cannot modify topic configurations or delete topics through Kubernetes. - No new topics can be created or updated through the Kubernetes-managed KafkaTopic CRs since the Topic Operator is unavailable. + +#### Client Authentication & Secret Management in a Stretched Kafka Deployment + +In a multi-cluster Kafka deployment, clients (producers/consumers) may run in different environments, each with its own authentication challenges. Below, we define what it means for clients running in different locations and how they manage Kafka authentication. + +##### Clients Running in the Central Kubernetes Cluster + +These are Kafka clients (applications) deployed in the same cluster where the Entity Operator runs. + +**How They Authenticate & Connect** + +- Credentials (e.g., username/password for SCRAM, TLS certificates) are automatically generated by the Entity Operator. +- Credentials are stored in Kubernetes Secrets within the same namespace. +- Kafka clients retrieve authentication details from these Kubernetes Secrets. +- They connect via internal Kubernetes service names (e.g., my-cluster-kafka-bootstrap:9092). + +**Example Configuration (SCRAM Authentication)*** + +```yaml +env: + - name: KAFKA_BOOTSTRAP_SERVERS + value: "my-cluster-kafka-bootstrap:9092" + - name: KAFKA_SASL_USERNAME + valueFrom: + secretKeyRef: + name: my-user + key: sasl.jaas.config + - name: KAFKA_SSL_TRUSTSTORE_PASSWORD + valueFrom: + secretKeyRef: + name: my-user + key: truststore-password +``` +**Note**: The values are not hardcoded, but are being pulled from the secret named 'my-user'. + +✅ Key Takeaways + +✔ Credentials are managed via Kubernetes Secrets, requiring no manual configuration.
+✔ Clients access Kafka via internal networking, making communication seamless.
+✔ Works out of the box without extra setup.
+ +##### Clients Running in a Remote Kubernetes Cluster + +These are Kafka clients running in Kubernetes clusters that are **Not** the central cluster (where the Entity Operator runs). + +**Challenges They Face** + +🚨 The KafkaUser Secret does not exist in the remote cluster because the Entity Operator only runs in the central cluster.
+🚨 Kubernetes-based applications expect credentials to be stored as Secrets within their own cluster.
+ +**Potential Solutions** + +✅ Manually Sync Secrets → Copy KafkaUser secrets from the central cluster to the remote cluster.
+✅ Use Kafka Access Operator (Future Feature) → Strimzi maintainers are working on cross-cluster user management.
+✅ Treat Clients as External Clients → Fetch credentials manually and connect via external listeners. + + +##### Clients Running Outside Kubernetes (External Clients) + +These are Kafka clients running outside of any Kubernetes environment (e.g., VMs, bare-metal, cloud services). + +**How They Authenticate & Connect** + +- Credentials are manually retrieved from Kubernetes Secrets and stored externally—no ongoing dependency on Kubernetes Secrets (e.g., CI/CD pipelines, or environment variables, or manually using `kubectl get secret -o yaml`). +- Clients use external listeners (e.g., LoadBalancer, NodePort, Ingress, public DNS). +- Secrets are required for initial credential retrieval, but clients do not rely on them during runtime + +**Example Configuration (SCRAM Authentication)** + +```yaml +bootstrap.servers=my-kafka-bootstrap.central-cluster.com:9094 +security.protocol=SASL_SSL +sasl.mechanism=SCRAM-SHA-512 +sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ + username="my-user" \ + password="my-password"; +ssl.truststore.location=/path/to/truststore.jks +ssl.truststore.password=mypassword +``` + +##### Comparison Table: How Clients Handle Authentication + +| Feature | External Clients | Internal Clients (Central) | Internal Clients (Remote) | +|--------------------------|----------------------------------------------|--------------------------------------------|-------------------------------------------------| +| **How They Get Credentials** | Manually fetched (Vault, CI/CD) | Kubernetes Secret (Entity Operator) | ❌ No automatic Secret (must be copied) | +| **How They Connect to Kafka** | External listener (LoadBalancer, DNS) | Internal service (`bootstrap:9092`) | ✅ Works if Secret is manually synced | +| **Requires Secret Sync?** | ❌ No | ❌ No | ✅ Yes (or use external access) | +| **Common Failure Points** | Misconfigured authentication, TLS issues | Works by default | ❌ Missing Secret → Clients fail to connect | + + ### Why the Entity Operator's Absence Impacts Clients As outlined in the 'Impact of Central Cluster Failure' section, the unavailability of the Entity Operator disrupts the declarative management of critical administrative functions like user authentication and topic lifecycle within your Kubernetes environment. This loss of control directly impacts the ability of administrators to create, update, or delete users and topics, and to manage their credentials and access. From 10bcf3952189d1bdbf71d262e1e12ef12dc507f4 Mon Sep 17 00:00:00 2001 From: Aswin A <55191821+aswinayyolath@users.noreply.github.com> Date: Fri, 28 Mar 2025 16:50:00 +0530 Subject: [PATCH 6/6] Update docs/entityoperator.md Co-authored-by: Neeraj Laad --- docs/entityoperator.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/entityoperator.md b/docs/entityoperator.md index c62bde3..a123e6b 100644 --- a/docs/entityoperator.md +++ b/docs/entityoperator.md @@ -163,11 +163,12 @@ KafkaConsumer consumer = new KafkaConsumer<>(props); The fact that the User Operator manages credentials and ACLs through Kafka's standard mechanisms means that the availability of the User Operator is crucial for: -- Creating and managing new user identities within Kafka. -- Ensuring that the correct authentication credentials are in place and accessible. -- Defining and enforcing authorization rules for user access to topics. +- Creating, viewing, updating and deleting user identities and their permissions (ACL) within Kafka. +- Creating, viewing, updating and deleting topics and their configuration within Kafka. -When the Central cluster (and thus the User Operator) is unavailable, the ability to perform these management tasks is lost, directly impacting the ability of clients to authenticate and operate with the expected level of access. While the underlying Kafka authentication and authorization capabilities exist within the brokers, the management and provisioning through the Kubernetes control plane are disrupted. This means that administrators will not be able to create, update, or delete Kafka users and topics, including performing credential rotations and ACL updates. +When the Central cluster (and thus the Entity Operator) is unavailable: +- administrators will not be able to view create, update, or delete Kafka users and topics via Kubernetes custom resources. +- existing users, topics and client applications using those credentials will continue to work as usual with no disruption. ### Summary of How Applications Use KafkaTopic & KafkaUser CRs