You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/entityoperator.md
+62-54Lines changed: 62 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,45 +1,46 @@
1
-
# Impact of Entity Operator Availability in a Stretch Kafka Cluster
1
+
# Impact of Entity Operator Availability in a Stretched Kafka Cluster
2
2
3
-
The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. This document explains how its availability affects topic and user management when deployed in a multi-cluster Kafka setup.
3
+
This document outlines the role of the Strimzi Entity Operator in managing Kafka users and topicsand explains how its availability affects operations within a multi-cluster Kafka deployment where the Entity Operator resides in a central Kubernetes cluster.
4
4
5
-
## Key Components of Entity Operator
5
+
## Key Components of the Entity Operator
6
6
7
-
The Entity Operator consists of two main sub-components:
7
+
The Entity Operator in Strimzi comprises two primary sub-components:
8
8
9
9
### Topic Operator
10
10
11
-
- Watches for KafkaTopic CRs in Kubernetes.
12
-
- Automatically creates, updates, and deletes topics in Kafka based on KafkaTopic CR definitions.
13
-
- Keeps Kubernetes and Kafka topic configurations in sync.
14
-
- Ensures desired state consistency between Kubernetes and Kafka.
11
+
- Monitors Kubernetes for `KafkaTopic` Custom Resources (CRs).
12
+
- Automatically creates, updates, and deletes Kafka topics within the Kafka cluster based on the definitions in the `KafkaTopic` CRs.
13
+
- Ensures synchronization between the desired topic configurations in Kubernetes and the actual topic configurations in Kafka.
-Ensures user permissions and authentication are correctly configured.
17
+
-Monitors Kubernetes for `KafkaUser` Custom Resources (CRs).
18
+
- Manages security credentials (e.g., TLS certificates, SASL credentials) and configures user permissions and authentication within the Kafka cluster.
19
+
-Simplifies and automates the management of user access and security settings.
21
20
22
-
## Why is the Entity Operator Useful?
21
+
## Why is the Entity Operator Essential?
23
22
24
-
- Eliminates the need for manual topic and user management.
25
-
- Ensures Kafka users have appropriate authentication and authorization settings.
26
-
- Enables declarative management using Kubernetes CRs.
27
-
- Keeps configurations between Kubernetes and Kafka in sync.
23
+
The Entity Operator provides several key benefits:
28
24
29
-
## How Client Applications Use KafkaTopic and KafkaUser CRs in Strimzi
25
+
- Eliminates the need for manual topic and user management through Kafka's administrative tools.
26
+
- Ensures Kafka users are configured with appropriate authentication and authorization settings as defined in Kubernetes.
27
+
- Enables the management of Kafka resources using Kubernetes-native Custom Resources, promoting a declarative approach.
28
+
- Maintains consistency between the desired state in Kubernetes and the actual state of topics and users in Kafka.
30
29
31
-
The client applications interact with Kafka topics and users in Strimzi using Kubernetes native resources
30
+
## How Client Applications Utilize KafkaTopic and KafkaUser CRs in Strimzi
32
31
33
-
- KafkaTopic CRs define and manage Kafka topics.
34
-
- KafkaUser CRs define users and security credentials for authentication & authorization.
32
+
Client applications interact with Kafka topics and users in Strimzi using Kubernetes Custom Resources:
35
33
36
-
## How Applications Use KafkaTopic CRs
34
+
-**`KafkaTopic` CRs:** Define and manage Kafka topics, specifying parameters like partitions, replication factor, and configuration.
35
+
-**`KafkaUser` CRs:** Define users and their security configurations for authentication and authorization.
36
+
37
+
## How Applications Utilize `KafkaTopic` CRs
37
38
38
39
### Creating a Topic
39
40
40
-
Developers define a topic declaratively using a KafkaTopic CR. The Topic Operator ensures this topic is created in Kafka.
41
+
Developers define Kafka topics declaratively using `KafkaTopic` CRs. The Topic Operator ensures the creation of these topics within the Kafka cluster.
41
42
42
-
**Example KafkaTopic CR**
43
+
**Example `KafkaTopic` CR:**
43
44
44
45
```yaml
45
46
apiVersion: kafka.strimzi.io/v1beta2
@@ -52,19 +53,19 @@ spec:
52
53
partitions: 3
53
54
replicas: 2
54
55
config:
55
-
retention.ms: 86400000# Data retention for 1 day
56
-
segment.bytes: 1073741824# 1GB segment size
56
+
retention.ms: 86400000# Data retention for 1 day
57
+
segment.bytes: 1073741824# 1GB segment size
57
58
```
58
59
59
-
**How clients use it**
60
+
**How Clients Use It**
60
61
61
-
Once the topic is created, client applications (producers & consumers) can publish and read messages from `my-topic` like any regular Kafka topic.
62
+
Once the `my-topic` is created by the Topic Operator, client applications (producers and consumers) can publish and read messages from it as they would with any regular Kafka topic, provided they have the necessary permissions.
62
63
63
-
## How Applications Use KafkaUser CRs
64
+
## How Applications Utilize `KafkaUser` CRs
64
65
65
66
### Creating a User for Authentication & Authorization
66
67
67
-
Client applications need a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, authentication method (TLS/SCRAM-SHA), and permissions.
68
+
Client applications require a Kafka user to authenticate and communicate securely. A KafkaUser CR defines the user, the authentication method (e.g., TLS or SCRAM-SHA), and the permissions they should have.
68
69
69
70
```yaml
70
71
apiVersion: kafka.strimzi.io/v1beta2
@@ -155,6 +156,18 @@ KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
155
156
156
157
```
157
158
159
+
160
+
**Impact on Clients:**
161
+
162
+
The fact that the User Operator manages credentials and ACLs through Kafka's standard mechanisms means that the availability of the User Operator is crucial for:
163
+
164
+
- **Creating and managing new user identities within Kafka.**
165
+
- **Ensuring that the correct authentication credentials are in place and accessible.**
166
+
- **Defining and enforcing authorization rules for user access to topics.**
167
+
168
+
When the Central cluster (and thus the User Operator) is unavailable, the ability to perform these management tasks is lost, directly impacting the ability of clients to authenticate and operate with the expected level of access. While the underlying Kafka authentication and authorization capabilities exist within the brokers, the management and provisioning through the Kubernetes control plane are disrupted.
169
+
170
+
158
171
### Summary of How Applications Use KafkaTopic & KafkaUser CRs
159
172
160
173
| Action | Operator Responsible |
@@ -174,41 +187,36 @@ In stretch Kafka deployment, where
174
187
✅ The Entity Operator (managing users & topics) runs in the central cluster.
175
188
176
189
177
-
## What About Entity Operator Functions?
178
-
The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because
179
-
180
-
- Kafka clients do not interact with the Entity Operator at runtime.
181
-
- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters.
182
-
- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers.
190
+
The failure of the central Kubernetes cluster will render the Entity Operator unavailable. This has the following implications for Kafka clients
183
191
184
-
## What Happens If No Cluster Has KafkaUser and KafkaTopic CRs?
192
+
#### Authentication
185
193
186
-
If the central cluster is the only one hosting KafkaUser and KafkaTopic CRs, then when it goes down:
194
+
- Kafka brokers in the surviving member clusters rely on the configured authentication mechanisms and the presence of valid credentials for client authentication.
195
+
- If the secrets containing authentication credentials (TLS certificates or SCRAM passwords) are not replicated to the member clusters, brokers in those clusters will be unable to authenticate new client connections.
196
+
- Existing client connections that were authenticated before the central cluster failure might remain active for a period, but they will eventually be disconnected due to session timeouts or other factors, and they will fail to re-establish connections without valid authentication.
197
+
- Crucially, the management of credentials (e.g., rotation) through the User Operator will be unavailable.
187
198
188
-
1. User Authentication Risks
199
+
#### Authorization
189
200
190
-
- Kafka brokers in surviving clusters rely on existing secrets for authentication.
191
-
- If KafkaUser secrets were only stored in the central cluster and not replicated, brokers in other clusters will be unable to authenticate client requests.
192
-
- New client connections will fail since brokers cannot verify credentials.
193
-
- Existing client connections may remain active if they were authenticated before the central cluster failure, but they will eventually be disconnected when session timeouts occur.
201
+
- The ACLs defined in KafkaUser CRs are configured on the Kafka brokers. These ACLs will generally remain in place.
202
+
- However, any new authorization rules or modifications to existing ones defined in KafkaUser CRs cannot be applied because the User Operator is down.
194
203
195
-
2. Topic Management Limitations
204
+
#### Topic Management
196
205
197
-
- Topics that were already created will continue to exist and function normally.
198
-
- Clients can still produce and consume messages only if they are already authenticated before the central cluster failure.
199
-
- No new topics can be created or updated since the KafkaTopic CRs and Entity Operator are unavailable.
206
+
- Topics that were already created will continue to exist and function normally.
207
+
- Clients can continue to produce and consume messages on existing topics if they remain authenticated and authorized.
208
+
- No new topics can be created or updated through the Kubernetes-managed KafkaTopic CRs since the Topic Operator is unavailable.
200
209
201
-
### Mitigation Strategies
210
+
### Why the Entity Operator's Absence Impacts Clients
202
211
203
-
To ensure Kafka clients remain functional even when the central cluster goes down, we should implement the following best practices
212
+
While Kafka brokers have their own internal mechanisms for authentication and authorization, the Entity Operator is the component that automates the configuration and management of these features within a Kubernetes environment using KafkaUser and KafkaTopic CRs. When the Entity Operator is unavailable, the declarative management of these critical aspects is lost.
204
213
205
-
✅ Replicate KafkaUser secrets across all clusters where Kafka brokers exist.
214
+
Mitigation Strategies to Ensure Client Functionality During Central Cluster Failure:
206
215
207
-
- This ensures authentication remains functional even if the central cluster is unavailable.
216
+
To enhance the resilience of Kafka clients in the event of a central cluster failure, the following best practices are recommended:
208
217
209
-
✅ Ensure Kafka brokers cache authentication data where possible(This needs verification).
218
+
✅ Replicate KafkaUser Secrets: Ensure that the Kubernetes Secrets containing authentication credentials (TLS certificates or SCRAM passwords) are replicated across all Kubernetes clusters where Kafka brokers are running. This allows brokers in surviving clusters to authenticate clients using the known credentials.
210
219
211
-
- Some authentication mechanisms (like SCRAM) allow brokers to cache credentials temporarily.
212
-
- This can help avoid immediate authentication failures if the central cluster is temporarily down.
220
+
✅ Consider Caching Mechanisms (Verification Needed): While not a direct Strimzi feature, exploring if and how Kafka brokers might cache authentication data (e.g., for SCRAM) could potentially mitigate immediate authentication failures during a short central cluster outage. This requires further investigation and verification of Kafka's internal behavior.
213
221
214
-
✅ Alternatively we can Explore options like KafkaAccess Operator. This reduces dependency on a single cluster for authentication.
222
+
✅ Explore Alternative Authentication and Authorization Solutions: Consider solutions like the Kafka Access Operator, which might offer more distributed control over authentication and authorization, reducing the dependency on a single central cluster for these critical functions.
0 commit comments