-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Added details of EntityOperator in Stretch clusters #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
8cf5b28
e5cf3d0
2bf7791
7467b05
3159cfc
10bcf39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
# Impact of Entity Operator Availability in a Stretch Kafka Cluster | ||
|
||
The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. This document explains how its availability affects topic and user management when deployed in a multi-cluster Kafka setup. | ||
|
||
## Key Components of Entity Operator | ||
|
||
The Entity Operator consists of two main sub-components: | ||
|
||
### Topic Operator | ||
|
||
- Watches for KafkaTopic CRs in Kubernetes. | ||
- Automatically creates, updates, and deletes topics in Kafka based on KafkaTopic CR definitions. | ||
- Keeps Kubernetes and Kafka topic configurations in sync. | ||
- Ensures desired state consistency between Kubernetes and Kafka. | ||
|
||
### User Operator | ||
|
||
- Watches for KafkaUser CRs in Kubernetes. | ||
- Manages security credentials (TLS certificates, SASL credentials). | ||
- Ensures user permissions and authentication are correctly configured. | ||
|
||
## Why is the Entity Operator Useful? | ||
|
||
- Eliminates the need for manual topic and user management. | ||
- Ensures Kafka users have appropriate authentication and authorization settings. | ||
- Enables declarative management using Kubernetes CRs. | ||
- Keeps configurations between Kubernetes and Kafka in sync. | ||
|
||
## How Client Applications Use KafkaTopic and KafkaUser CRs in Strimzi | ||
|
||
The client applications interact with Kafka topics and users in Strimzi using Kubernetes native resources | ||
|
||
- KafkaTopic CRs define and manage Kafka topics. | ||
- KafkaUser CRs define users and security credentials for authentication & authorization. | ||
|
||
## How Applications Use KafkaTopic CRs | ||
|
||
### Creating a Topic | ||
|
||
Developers define a topic declaratively using a KafkaTopic CR. The Topic Operator ensures this topic is created in Kafka. | ||
|
||
**Example KafkaTopic CR** | ||
|
||
```yaml | ||
apiVersion: kafka.strimzi.io/v1beta2 | ||
kind: KafkaTopic | ||
metadata: | ||
name: my-topic | ||
labels: | ||
strimzi.io/cluster: my-cluster # Must match the Kafka cluster name | ||
spec: | ||
partitions: 3 | ||
replicas: 2 | ||
config: | ||
retention.ms: 86400000 # Data retention for 1 day | ||
segment.bytes: 1073741824 # 1GB segment size | ||
``` | ||
|
||
**How clients use it** | ||
|
||
Once the topic is created, client applications (producers & consumers) can publish and read messages from `my-topic` like any regular Kafka topic. | ||
|
||
## How Applications Use KafkaUser CRs | ||
|
||
### Creating a User for Authentication & Authorization | ||
|
||
Client applications need a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, authentication method (TLS/SCRAM-SHA), and permissions. | ||
|
||
```yaml | ||
apiVersion: kafka.strimzi.io/v1beta2 | ||
kind: KafkaUser | ||
metadata: | ||
name: my-app-user | ||
labels: | ||
strimzi.io/cluster: my-cluster # Must match the Kafka cluster name | ||
spec: | ||
authentication: | ||
type: tls # TLS-based auth | ||
authorization: | ||
type: simple | ||
acls: | ||
- resource: | ||
type: topic | ||
name: my-topic | ||
patternType: literal | ||
operations: | ||
- Read | ||
- Write | ||
``` | ||
|
||
**How clients use it** | ||
|
||
### Authentication | ||
|
||
- If `TLS` authentication is enabled, Strimzi will generate a secret containing the user's TLS certificates. | ||
- If `SCRAM-SHA` authentication is enabled, Strimzi will generate a username and password in a Kubernetes secret. | ||
|
||
### Authorization (ACLs) | ||
|
||
- In the above example, the user my-app-user has Read & Write access to my-topic. | ||
|
||
Clients will only be able to perform allowed operations. | ||
|
||
## How Clients Retrieve and Use Credentials | ||
|
||
After creating a KafkaUser, Strimzi automatically generates a Kubernetes Secret with the credentials. | ||
|
||
**Example** | ||
|
||
```bash | ||
kubectl get secret my-app-user -o yaml | ||
``` | ||
|
||
It will contain | ||
|
||
#### For TLS authentication | ||
|
||
- ca.crt (CA certificate) | ||
- user.crt (Client certificate) | ||
- user.key (Client private key) | ||
|
||
#### For SCRAM-SHA authentication | ||
|
||
- password (Base64-encoded password) | ||
|
||
### Using These Credentials in a Kafka Client | ||
|
||
**Example** | ||
|
||
##### Java Producer Example (TLS Authentication) | ||
|
||
|
||
```java | ||
Properties props = new Properties(); | ||
props.put("bootstrap.servers", "my-cluster-kafka-bootstrap:9093"); | ||
props.put("security.protocol", "SSL"); | ||
props.put("ssl.truststore.location", "/etc/secrets/ca.p12"); | ||
props.put("ssl.truststore.password", "password"); | ||
props.put("ssl.keystore.location", "/etc/secrets/user.p12"); | ||
props.put("ssl.keystore.password", "password"); | ||
|
||
KafkaProducer<String, String> producer = new KafkaProducer<>(props); | ||
``` | ||
|
||
#### Java Consumer Example (SCRAM-SHA Authentication) | ||
|
||
```java | ||
Properties props = new Properties(); | ||
props.put("bootstrap.servers", "my-cluster-kafka-bootstrap:9093"); | ||
props.put("security.protocol", "SASL_SSL"); | ||
props.put("sasl.mechanism", "SCRAM-SHA-512"); | ||
props.put("sasl.jaas.config", "org.apache.kafka.common.security.scram.ScramLoginModule required username='my-app-user' password='my-secret-password';"); | ||
|
||
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); | ||
|
||
``` | ||
|
||
### Summary of How Applications Use KafkaTopic & KafkaUser CRs | ||
|
||
| Action | Operator Responsible | | ||
| -------- | ------- | | ||
| Developer creates a `KafkaTopic` CR | Topic Operator creates & syncs the topic in Kafka | | ||
| Developer creates a KafkaUser CR | User Operator creates the user & credentials | | ||
| Application retrieves credentials from Kubernetes Secrets | Application mounts the secrets for authentication | | ||
| Application connects to Kafka using these credentials | Producer/Consumer communicates with Kafka | | ||
|
||
|
||
## Impact of Central Cluster Failure on Kafka Clients in a Stretch Cluster | ||
|
||
In stretch Kafka deployment, where | ||
|
||
✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters.<br> | ||
✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic.<br> | ||
✅ The Entity Operator (managing users & topics) runs in the central cluster. | ||
|
||
|
||
## What About Entity Operator Functions? | ||
The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because | ||
|
||
- Kafka clients do not interact with the Entity Operator at runtime. | ||
- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters. | ||
- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again topic and user management using custom resources is blocked, but users can create topics/users in Kafka as normal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what the confusion here is but I will try to share what my understanding is and I'm not claiming that I'll be 100% accurate here... and happy to get corrected, But we need to have a handle on this before we revise or add anything in the proposal related to EO. Kafka clients don't directly interact with the Entity Operator's components (Topic and User Operators) during their normal message operations. Even EO is not mandatory as far as Strimzi is concerned, but is highly recommended. Existing topics and ACLs will remain operational on the Kafka brokers in the member clusters. However, it's crucial to understand the limitations and the broader impact of the Central cluster failure in this scenario, especially when considering the use of KafkaUser and KafkaTopic custom resources for management. While clients might continue to send and receive messages for a period, the absence of the Entity Operator in the Central cluster severely restricts the managed control plane for your Kafka environment. This leads to the following critical implications
Therefore, while the immediate impact might not be a complete outage for all connected clients, the failure of the Central cluster, and consequently the Entity Operator, creates a significant degradation in the manageability, security, and adaptability of teh stretched Kafka cluster. |
||
|
||
## What Happens If No Cluster Has KafkaUser and KafkaTopic CRs? | ||
|
||
If the central cluster is the only one hosting KafkaUser and KafkaTopic CRs, then when it goes down: | ||
|
||
1. User Authentication Risks | ||
|
||
- Kafka brokers in surviving clusters rely on existing secrets for authentication. | ||
- If KafkaUser secrets were only stored in the central cluster and not replicated, brokers in other clusters will be unable to authenticate client requests. | ||
- New client connections will fail since brokers cannot verify credentials. | ||
- Existing client connections may remain active if they were authenticated before the central cluster failure, but they will eventually be disconnected when session timeouts occur. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure of this behaviour? Managing user credentials and ACLS is a Kafka capability and should not be specific to use of entity operator. So I was not expecting these restrictions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kafka itself has its own authentication and authorization capabilities. My intention in this section was to highlight the impact on clients within the context of a Strimzi deployment that utilizes the Entity Operator for managing users and their credentials through Kubernetes Secrets. The restriction isn't on Kafka's core functionality, but on the availability of the managed credentials (e.g., TLS certificates stored in Kubernetes Secrets) that the User Operator in the Central cluster typically manages and might not be automatically replicated across all clusters in a stretched setup. Ensuring proper secret replication across clusters is crucial for maintaining authentication in such scenarios. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kafka itself enforces authentication if credentials are stored inside Kafka (like SCRAM-SHA passwords). But if authentication relies on Kubernetes Secrets (like TLS certs), and those secrets were not replicated, authentication will fail AFAIK There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You’re correct that Kafka itself is responsible for enforcing authentication and ACLs, regardless of hte Entity Operator’s availability. However, whether authentication continues to work after a central cluster failure depends on how credentials were managed and distributed SCRAM credentials are stored within Kafka’s metadata log (KRaft). Since these credentials are part of Kafka’s internal state, authentication will continue to work as long as the metadata remains accessible in the surviving clusters. TLS certificates are typically managed via Kubernetes Secrets when using the KafkaUser CR. If these Secrets were only stored in the central cluster and not replicated to the member clusters, brokers in surviving clusters will be unable to verify new client connections. In this case, authentication failures will occur for new client connections, even though existing sessions may persist until session timeouts. Kafka itself enforces ACLs stored in its metadata. Existing ACLs will still be applied, but new ACLs cannot be created or updated until the central cluster and the Entity Operator are restored. |
||
|
||
2. Topic Management Limitations | ||
|
||
- Topics that were already created will continue to exist and function normally. | ||
- Clients can still produce and consume messages only if they are already authenticated before the central cluster failure. | ||
- No new topics can be created or updated since the KafkaTopic CRs and Entity Operator are unavailable. | ||
|
||
### Mitigation Strategies | ||
|
||
To ensure Kafka clients remain functional even when the central cluster goes down, we should implement the following best practices | ||
|
||
✅ Replicate KafkaUser secrets across all clusters where Kafka brokers exist. | ||
|
||
- This ensures authentication remains functional even if the central cluster is unavailable. | ||
|
||
✅ Ensure Kafka brokers cache authentication data where possible(This needs verification). | ||
|
||
- Some authentication mechanisms (like SCRAM) allow brokers to cache credentials temporarily. | ||
- This can help avoid immediate authentication failures if the central cluster is temporarily down. | ||
|
||
✅ Alternatively we can Explore options like KafkaAccess Operator. This reduces dependency on a single cluster for authentication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we nee to add this
as long as secrets (TLS/SCRAM) were distributed to all clusters.
I would expect Kafka to store user credentials in ZK / controllers and not rely on topic operator implementation.