-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: Add pod affinity/anti-affinity metrics for deployments #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add pod affinity/anti-affinity metrics for deployments #2733
Conversation
- Add kube_deployment_spec_pod_affinity_required_rules metric - Add kube_deployment_spec_pod_affinity_preferred_rules metric - Add kube_deployment_spec_pod_anti_affinity_required_rules metric - Add kube_deployment_spec_pod_anti_affinity_preferred_rules metric - Update deployment metrics documentation - Add comprehensive test coverage for all scenarios
How would you use this metric for alerting and/or showing information about the deployment? |
These metrics enable critical alerting on scheduling constraint violations. For example: (kube_deployment_spec_pod_anti_affinity_preferred_rules > 0) and (kube_deployment_spec_pod_anti_affinity_required_rules == 0) alerts when deployments rely only on soft anti-affinity rules that can be ignored during node pressure, creating single points of failure. They also help monitor missing protection: (kube_deployment_spec_pod_anti_affinity_required_rules == 0) and (kube_deployment_spec_pod_anti_affinity_preferred_rules == 0) identifies deployments without any anti-affinity rules. For dashboards, you can visualize cluster-wide scheduling health with count(kube_deployment_spec_pod_anti_affinity_required_rules > 0) to show how many deployments have proper distribution protection. During incidents, these metrics help correlate why workloads ended up co-located or why pods failed to schedule due to overly complex constraints. This addresses #2701's core need: visibility into "preferred vs required" scheduling logic to maintain reliable workload distribution during cluster events. Thanks @mrueg for the question - these use cases demonstrate the operational value of these scheduling constraint metrics! |
/triage accepted |
i think the metric should be explicit, something like:
then you can count over these and get the desired result, as well as gather exactly that information about the specific affinity setting. I'm not sure about the labelSelector at this point, if this should be split into subtypes as well or just calling https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/helpers.go#L171 is enough. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SoumyaRaikwar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @mrueg for the feedback! I understand you're looking for more explicit metrics that expose individual affinity rule details rather than just counts. You're absolutely right that explicit metrics would provide much more granular visibility. Instead of simple count metrics |
…etrics - Replace 4 count-based metrics with single kube_deployment_spec_affinity metric - Add granular labels: affinity, type, topology_key, label_selector - Enable individual rule visibility and flexible querying - Update tests and documentation for new metric structure
b99f07e
to
d1612ee
Compare
Hi @mrueg, I've successfully refactored the implementation to use explicit rule-based metrics as you requested. Key Changes:
The new approach provides exponentially more operational value while maintaining low cardinality and following the individual object-level data principle from the best practices document. |
What this PR does / why we need it
Adds explicit rule-based pod affinity and anti-affinity metrics for deployments to provide granular visibility into Kubernetes scheduling constraints, addressing issue #2701.
Refactored from count-based to explicit rule-based approach following maintainer feedback for enhanced operational value.
Which issue(s) this PR fixes
Fixes #2701
Metrics Added
kube_deployment_spec_affinity
- Pod affinity and anti-affinity rules with granular labelsLabels provided:
affinity
-podaffinity
|podantiaffinity
type
-requiredDuringSchedulingIgnoredDuringExecution
|preferredDuringSchedulingIgnoredDuringExecution
topology_key
- The topology key for the rulelabel_selector
- The formatted label selector string