Skip to content

Commit e5cf3d0

Browse files
committed
fix: addressed review comments
Addressed review comments Signed-off-by: Aswin A <aswin6303@gmail.com>
1 parent 8cf5b28 commit e5cf3d0

File tree

1 file changed

+29
-49
lines changed

1 file changed

+29
-49
lines changed

docs/entityoperator.md

Lines changed: 29 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Entity Operator
1+
# Impact of Entity Operator Availability in a Stretch Kafka Cluster
22

3-
The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi.
3+
The Entity Operator in Strimzi is responsible for managing Kafka users and topics. It automates the creation, configuration, and security settings of these entities, ensuring smooth integration with Kafka clusters deployed via Strimzi. This document explains how its availability affects topic and user management when deployed in a multi-cluster Kafka setup.
44

55
## Key Components of Entity Operator
66

@@ -169,66 +169,46 @@ KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
169169

170170
In stretch Kafka deployment, where
171171

172-
- ✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters.
173-
- ✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic.
174-
- ✅ The Entity Operator (managing users & topics) runs in the central cluster.
172+
✅ Kafka brokers and controllers are spread across multiple Kubernetes clusters.<br>
173+
✅ The central cluster hosts all Kafka CRs, including Kafka, KafkaNodePool, KafkaUser, and KafkaTopic.<br>
174+
✅ The Entity Operator (managing users & topics) runs in the central cluster.
175175

176-
## Does the Kafka Cluster Still Function?
177176

178-
Yes, but with conditions.
179-
180-
- Brokers in other Kubernetes clusters are still running.
181-
- Kafka controllers may still be running (if a quorum is maintained).
182-
183-
Replication between brokers (spread across clusters) continues as long as there’s a majority quorum of controllers.
184-
185-
- ✅ If a majority of controllers are still available in the surviving clusters, Kafka continues running.
186-
- 🚨 If the majority of controllers were in the central cluster and lost, Kafka will experience a complete outage.
187-
188-
💡 Mitigation -> user need to ensure the `controller.quorum.voters` are spread across clusters to avoid losing the majority.
189-
190-
## Can Kafka Clients Still Communicate with Brokers?
191-
192-
Yes, if at least one broker remains reachable.
193-
194-
Clients connect to brokers, not the Entity Operator or the central cluster.
195-
196-
- If clients' bootstrap servers list includes brokers running in the surviving clusters, they can still connect.
197-
- If a leader partition for a topic is hosted on a broker in a surviving cluster, the topic remains available.
198-
- If all leader partitions for a topic were on brokers in the central cluster, that topic becomes unavailable.
199-
200-
💡 Mitigation -> Use Kafka’s rack awareness (`broker.rack`) and ISR (In-Sync Replica) balancing to distribute leader partitions across clusters.
177+
## What About Entity Operator Functions?
178+
The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because
201179

202-
## Can Clients Still Produce & Consume Messages?
180+
- Kafka clients do not interact with the Entity Operator at runtime.
181+
- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters.
182+
- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers.
203183

204-
Yes, if leader partitions are available.
205-
No, if all leader partitions were in the lost central cluster.
184+
## What Happens If No Cluster Has KafkaUser and KafkaTopic CRs?
206185

207-
**Producing Messages**
186+
If the central cluster is the only one hosting KafkaUser and KafkaTopic CRs, then when it goes down:
208187

209-
- A producer can send messages only if the leader partition for a topic is still available in a surviving cluster.
210-
- If the leader was in the lost central cluster, Kafka automatically elects a new leader (if ISR exists).
211-
- If no ISR exists, production fails.
188+
1. User Authentication Risks
212189

213-
**Consuming Messages**
190+
- Kafka brokers in surviving clusters rely on existing secrets for authentication.
191+
- If KafkaUser secrets were only stored in the central cluster and not replicated, brokers in other clusters will be unable to authenticate client requests.
192+
- New client connections will fail since brokers cannot verify credentials.
193+
- Existing client connections may remain active if they were authenticated before the central cluster failure, but they will eventually be disconnected when session timeouts occur.
214194

215-
- Consumers can still fetch messages from available partitions.
216-
- If consumer group metadata was stored in a lost broker, the group might experience issues.
195+
2. Topic Management Limitations
217196

218-
💡 Mitigation -> Enable `min.insync.replicas` and leader election across clusters to ensure partition availability.
197+
- Topics that were already created will continue to exist and function normally.
198+
- Clients can still produce and consume messages only if they are already authenticated before the central cluster failure.
199+
- No new topics can be created or updated since the KafkaTopic CRs and Entity Operator are unavailable.
219200

201+
### Mitigation Strategies
220202

221-
## What About Entity Operator Functions?
222-
The Entity Operator becomes unavailable when the central cluster goes down. However, this does not impact existing Kafka clients directly because
203+
To ensure Kafka clients remain functional even when the central cluster goes down, we should implement the following best practices
223204

224-
- Kafka clients do not interact with the Entity Operator at runtime.
225-
- User authentication still works as long as secrets (TLS/SCRAM) were distributed to all clusters.
226-
- Topics and ACLs remain intact but cannot be updated or created until the central cluster recovers.
205+
✅ Replicate KafkaUser secrets across all clusters where Kafka brokers exist.
227206

228-
Kafka clients (producers and consumers) can still authenticate and connect to Kafka brokers in the surviving clusters as long as the necessary authentication credentials (secrets) are available in those clusters.
207+
- This ensures authentication remains functional even if the central cluster is unavailable.
229208

230-
If the KafkaUser secrets only exist in the central cluster, then when it goes down, clients in other clusters cannot authenticate to Kafka brokers. However, if these secrets were already copied to all clusters where brokers are running, authentication will still work even if the central cluster is down.
209+
✅ Ensure Kafka brokers cache authentication data where possible(This needs verification).
231210

232-
To avoid authentication failures when the central cluster goes down, you must:
211+
- Some authentication mechanisms (like SCRAM) allow brokers to cache credentials temporarily.
212+
- This can help avoid immediate authentication failures if the central cluster is temporarily down.
233213

234-
- Replicate KafkaUser secrets across all clusters where Kafka brokers exist.
214+
✅ Alternatively we can Explore options like KafkaAccess Operator. This reduces dependency on a single cluster for authentication.

0 commit comments

Comments
 (0)