Skip to content

Conversation

miai10
Copy link
Contributor

@miai10 miai10 commented May 23, 2025

What type of PR is this?
/kind bugfix

What this PR does / Why we need it:

During high allocation rate tests, I noticed that the sessions list is updated very very slow (at 15 minutes interval sometimes). The lists were updated by the game server logic following the start or termination of allocated sessions. A high failure rate of PATCH requests made the retry interval to hit the 1000 seconds limit which is very high (there is a nice article here regarding how rate limiters work: https://danielmangum.com/posts/controller-runtime-client-go-rate-limiting/#the-default-controller-rate-limiter). The Agones controllers use a fast/slow rate limiter but for some reason that I am missing the sidecar is using the default one.

Using a constant rate limiter gave better results: the sessions are updated and there is no storm of PATCH failures. 500ms are chosen to match the allocator batch wait time, the logic being that the allocator pushes allocations, the game server acknowledges somehow. This value can be configured from the controller helm values as I think it is related to the system performance. For example, my tests were better with 1 second.

Note: the one per 500 ms request is actually for each type of update that the sidecar implements: state, labels, annotations, player capacity, connected players, counter and lists. So in theory there could be up to 7 requests per 500 ms coming from the sidecar.

Which issue(s) this PR fixes:

The issue is not open yet, it is described above.

Special notes for your reviewer:

@github-actions github-actions bot added the kind/bug These are bugs. label May 23, 2025
@markmandel
Copy link
Collaborator

NIce one.

The Agones controllers use a fast/slow rate limiter but for some reason that I am missing the sidecar is using the default one.

Any reason not to use this for the sidecar as well - keep everything consistent?

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: e6f929b1-b836-4e2f-b797-6410cfd494c8

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator

make[1]: Leaving directory '/workspace/build'
sort /workspace/install/yaml/install.yaml > /tmp/agones-install/install.current.yaml.sorted
diff /tmp/agones-install/install.yaml.sorted /tmp/agones-install/install.current.yaml.sorted
17367a17368
>           value: "1s"
17520a17522
>         - name: SIDECAR_REQUESTS_RATE_LIMIT
make: *** [Makefile:420: test-install-yaml] Error 1

Aah, need to regenerate the index.yaml install script make gen-install from memory.

@miai10
Copy link
Contributor Author

miai10 commented May 26, 2025

NIce one.

The Agones controllers use a fast/slow rate limiter but for some reason that I am missing the sidecar is using the default one.

Any reason not to use this for the sidecar as well - keep everything consistent?

/gcbrun

I think it is more important to be consistent with the allocator. My view of the problem is that the allocator updates the crd with new allocations, the gs detects those allocations (eventually changes lists and counters as acknowledgement) and updates again the CRD when sessions are finished. If the sidecar starts retries fast, it looks like it has priority thus generating update failures on the allocator side.

In one server one session case there should not be problems as there should not be reasons for retries.

If you want, I can expose the fast/slow rate limiter settings but it seems overkill to have 3 settings exposed :P.

@markmandel
Copy link
Collaborator

That makes sense. SGTM.

@markmandel
Copy link
Collaborator

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: adb6713e-afb0-4d80-9288-233c448d2fff

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@0xaravindh
Copy link
Member

/gcbrun

@miai10 miai10 force-pushed the perf/sidecar-patches branch from 8bcac9b to 87f4878 Compare May 30, 2025 09:54
@miai10
Copy link
Contributor Author

miai10 commented May 30, 2025

I rebased onto latest, do I need to do something else to get the change merged?

@markmandel
Copy link
Collaborator

/gcbrun

@markmandel
Copy link
Collaborator

Assuming all passes (and someone should step through the code, I haven't done that yet, I usually wait for CI to pass), we'll also need an update to the Helm docs please.

https://agones.dev/site/docs/installation/install-agones/helm/#configuration

As a note - the new feature will need to be wrapped in a feature shortcode so it only publishes with the new version. You can find details here: https://agones.dev/site/docs/contribute/documentation-editing-contribution/#at-the-page-level

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: f2d581e5-14e5-45ff-9c66-9f93df959c8d

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator

Flake: Step 28: submit-e2e-test-cloud-build

@markmandel
Copy link
Collaborator

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: c9d178cb-6fa8-4c75-b1db-65fe47c0de91

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator

ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: unexpected status code https://mcr.microsoft.com/v2/windows/servercore/blobs/sha256:cb524f6f22159378ea820d234d80ca09b79c2f0cc91315eeef11904e3ff36a21: 503 Service Unavailable
make: *** [Makefile:625: build-agones-sdk-image-windows-ltsc2019] Error 1
make: *** Waiting for unfinished jobs....
#5 extracting sha256:8edf7e005a035a07159bee8c941c629808c8de251a7a9e5e424a2648cf918299 56.4s done

Fun.

@markmandel
Copy link
Collaborator

flaaaakinesss....

@markmandel
Copy link
Collaborator

There we go - CI passed 👍🏻 time for a code review.

Copy link
Collaborator

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM! Just need those docs updates, and this should be good to merge!

@0xaravindh
Copy link
Member

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 9e78dba1-5f6a-4494-8883-a66afe02e38b

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 018e2fd7-d946-489e-aa7e-6cb5a5303bd5

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@0xaravindh
Copy link
Member

/gcbrun

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: d24c5eb8-cb62-4e0b-854c-06801751d8bf

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4186/head:pr_4186 && git checkout pr_4186
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.50.0-dev-990f698

@miai10
Copy link
Contributor Author

miai10 commented Jun 17, 2025

Assuming all passes (and someone should step through the code, I haven't done that yet, I usually wait for CI to pass), we'll also need an update to the Helm docs please.

https://agones.dev/site/docs/installation/install-agones/helm/#configuration

As a note - the new feature will need to be wrapped in a feature shortcode so it only publishes with the new version. You can find details here: https://agones.dev/site/docs/contribute/documentation-editing-contribution/#at-the-page-level

Sorry for the late reply, I was in vacation...

I did the change in the helm.md

@markmandel
Copy link
Collaborator

Looks like 1.50 is underway, so will come back around once the release is done 👍🏻

@igooch
Copy link
Collaborator

igooch commented Jun 27, 2025

/gcbrun

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: e856e041-81ce-41b4-81d0-0903417f202f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4186/head:pr_4186 && git checkout pr_4186
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.51.0-dev-bef214c

Copy link
Collaborator

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

If you fix the merge conflicts please, and we are good to merge!

@miai10
Copy link
Contributor Author

miai10 commented Jul 2, 2025

Kind reminder to merge this until it gets conflicted again :P

@0xaravindh
Copy link
Member

/gcbrun

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 9436ba9d-cc17-411d-8433-076bc039e6f6

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4186/head:pr_4186 && git checkout pr_4186
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.51.0-dev-c747b0d

@markmandel markmandel merged commit 1865ad2 into googleforgames:main Jul 2, 2025
4 checks passed
@miai10 miai10 deleted the perf/sidecar-patches branch July 4, 2025 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants