-
Notifications
You must be signed in to change notification settings - Fork 858
Changed the sidecar requests rate limiter from exponential to a constant one #4186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NIce one.
Any reason not to use this for the sidecar as well - keep everything consistent? /gcbrun |
Build Failed 😭 Build Id: e6f929b1-b836-4e2f-b797-6410cfd494c8 Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Aah, need to regenerate the index.yaml install script |
I think it is more important to be consistent with the allocator. My view of the problem is that the allocator updates the crd with new allocations, the gs detects those allocations (eventually changes lists and counters as acknowledgement) and updates again the CRD when sessions are finished. If the sidecar starts retries fast, it looks like it has priority thus generating update failures on the allocator side. In one server one session case there should not be problems as there should not be reasons for retries. If you want, I can expose the fast/slow rate limiter settings but it seems overkill to have 3 settings exposed :P. |
That makes sense. SGTM. |
/gcbrun |
Build Failed 😭 Build Id: adb6713e-afb0-4d80-9288-233c448d2fff Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
/gcbrun |
8bcac9b
to
87f4878
Compare
I rebased onto latest, do I need to do something else to get the change merged? |
/gcbrun |
Assuming all passes (and someone should step through the code, I haven't done that yet, I usually wait for CI to pass), we'll also need an update to the Helm docs please. https://agones.dev/site/docs/installation/install-agones/helm/#configuration As a note - the new feature will need to be wrapped in a |
Build Failed 😭 Build Id: f2d581e5-14e5-45ff-9c66-9f93df959c8d Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Flake: Step 28: submit-e2e-test-cloud-build |
/gcbrun |
Build Failed 😭 Build Id: c9d178cb-6fa8-4c75-b1db-65fe47c0de91 Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Fun. |
flaaaakinesss.... |
There we go - CI passed 👍🏻 time for a code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM! Just need those docs updates, and this should be good to merge!
/gcbrun |
Build Failed 😭 Build Id: 9e78dba1-5f6a-4494-8883-a66afe02e38b Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😭 Build Id: 018e2fd7-d946-489e-aa7e-6cb5a5303bd5 Status: FAILURE To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
/gcbrun |
Build Succeeded 🥳 Build Id: d24c5eb8-cb62-4e0b-854c-06801751d8bf The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Sorry for the late reply, I was in vacation... I did the change in the helm.md |
Looks like 1.50 is underway, so will come back around once the release is done 👍🏻 |
/gcbrun |
Build Succeeded 🥳 Build Id: e856e041-81ce-41b4-81d0-0903417f202f The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
If you fix the merge conflicts please, and we are good to merge!
# Conflicts: # pkg/gameservers/controller.go
Kind reminder to merge this until it gets conflicted again :P |
/gcbrun |
Build Succeeded 🥳 Build Id: 9436ba9d-cc17-411d-8433-076bc039e6f6 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
What type of PR is this?
/kind bugfix
What this PR does / Why we need it:
During high allocation rate tests, I noticed that the sessions list is updated very very slow (at 15 minutes interval sometimes). The lists were updated by the game server logic following the start or termination of allocated sessions. A high failure rate of PATCH requests made the retry interval to hit the 1000 seconds limit which is very high (there is a nice article here regarding how rate limiters work: https://danielmangum.com/posts/controller-runtime-client-go-rate-limiting/#the-default-controller-rate-limiter). The Agones controllers use a fast/slow rate limiter but for some reason that I am missing the sidecar is using the default one.
Using a constant rate limiter gave better results: the sessions are updated and there is no storm of PATCH failures. 500ms are chosen to match the allocator batch wait time, the logic being that the allocator pushes allocations, the game server acknowledges somehow. This value can be configured from the controller helm values as I think it is related to the system performance. For example, my tests were better with 1 second.
Note: the one per 500 ms request is actually for each type of update that the sidecar implements: state, labels, annotations, player capacity, connected players, counter and lists. So in theory there could be up to 7 requests per 500 ms coming from the sidecar.
Which issue(s) this PR fixes:
The issue is not open yet, it is described above.
Special notes for your reviewer: