-
Notifications
You must be signed in to change notification settings - Fork 220
feat(helm): add support for vpa #4210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: Did you see an actual need for VPA-ing the ASO pod? Do you just VPA everything in your clusters?
I'm most interested in if this is useful to you because you want VPA to scale the pod down (default requests are too high) or you want VPA to scale the pod up (default requests are too low, or limits too low).
...ice-operator/templates/autoscaling.k8s.io_v1_verticalpodautoscaler_azureserviceoperator.yaml
Outdated
Show resolved
Hide resolved
controlledResources: | ||
{{- toYaml . | nindent 8 }} | ||
{{- end }} | ||
{{- if .Values.verticalPodAutoscaler.controlledValues }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Can .Values.verticalPodAutoscaler.controlledValues
ever be unspecified in the values.yaml? It seems like probably not? Is the if
-guard here really needed then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they could. Default would be RequestsAndLimits
value.
Yes as we have multiple clusters with different operator load: some teams are heavily using ASO, some - just few resources to try and in the end we need to have a cost effective solution for "middleware" without a toil for picking correct resources values per cluster/team.
We're trying to use it for most of the components if it supported in the charts.
Depends on the application - for ASO it would be beneficial to be used in both directions as it's not that critical and can tolerate a restart. |
Sorry this has been lingering - will review it shortly! |
Synced with @theunrepentantgeek and @super-harsh on this topic and we have concerns that running VPA on ASO as it stands today may cause more harm than good. The reasoning is:
With the above said, our current stance on this PR is to hold off on merging it at least for now. We'll re-evaluate in a few weeks when I get #4215 done and possibly have more time to play around with VPA configurations. In the meantime, if you want to run VPA with ASO, do so with caution: but you should be able to do so even without Helm chart integration because all you need to do is create the |
Signed-off-by: t3mi <t3mi@users.noreply.github.com>
@t3mi, have you done any benchmark that would help modeling CPU/memory consumption for a given amount of CRs controlled by ASO? |
@antonioquintavalle1A sorry missed notification. We haven't done any benchmarking. We faced etcd availability issues but that was before introduction of selective crdPattern feature. There were no issues for us with ASO after we limited crds to a subset of the approved and used resources, except some edge cases not related to performance. |
Hello, we are using ASO with VPA and we encountered problems with the GOMEMLIMIT https://github.com/Azure/azure-service-operator/blob/main/v2/charts/azure-service-operator/templates/apps_v1_deployment_azureserviceoperator-controller-manager.yaml#L73 Symptoms we got on our side when upgrading to an ASO version with this settings were higher memory (GOMEMLIMIT was set to 400MiB but POD was consuming 1,2 GiB) and huge increase in CPU (X600) as trying to garbage collect all the time. But we do have optimisation problems with ASO. We get huge memory consumption ~2GB that are not due to the ASO resources themselves (around 10 resources), but more around the watches done by the operator. I suspect the Secret caching is generating this crazy amount of memory and should be optimised/reviewed |
Do you have a lot of secrets in the cluster? Can you file an issue for this? (or maybe 2 issues, one for high memory usage possibly due to secrets/configmap caching, other for allowing GOMEMLIMIT to work with VPA?) |
What this PR does / why we need it:
This PR adds support the the vertical pod autoscaler so users do not have to care about the resource usage of the controller and let the vertical pod autoscaler to do its job.
How does this PR make you feel:

If applicable: