Only check latencies once every 10 seconds with `routeByLatency` #2795

justinmir · 2023-11-10T18:54:05Z

routeByLatency currently checks latencies any time a server returns a MOVED or READONLY reply. When a shard is down, the ClusterClient chooses to issue the request to a random server, which returns a MOVED reply. This causes a state refresh and a latency update on all servers. This can lead to significant ping load to clusters with a large number of clients.

This introduces logic to ping only once every 10 seconds, only performing a latency update on a node during the GC function if the latency was set later than 10 seconds ago.

Fixes #2782

Figure: Ping behavior of the client running 21bd40a and a client running this PR. When shards are failed the current cluster client will spam pings while the fixed cluster client will only ping each server once every 10 seconds.

This shows the impact in a running large production cluster. The cluster is handling ~4M pings per second due to this behavior.

`routeByLatency` currently checks latencies any time a server returns a MOVED or READONLY reply. When a shard is down, the ClusterClient chooses to issue the request to a random server, which returns a MOVED reply. This causes a state refresh and a latency update on all servers. This can lead to significant ping load to clusters with a large number of clients. This introduces logic to ping only once every 10 seconds, only performing a latency update on a node during the `GC` function if the latency was set later than 10 seconds ago. Fixes redis#2782

ofekshenawa · 2024-02-18T12:56:18Z

LGTM!
WDYT about changing Unix() to NanoUnix? To be more precise and to avoid unnecessary loops

justinmir · 2024-02-29T17:58:44Z

Sure I'll push that change shortly.

justinmir · 2024-04-11T22:18:22Z

@ofekshenawa PTAL when you get a chance!

justinmir · 2024-10-18T17:42:46Z

@vladvildanov hoping to get some eyes here, this will help us no longer have to maintain our own fork

ofekshenawa · 2024-11-20T12:36:44Z

Hey @justinmir, sorry for the delay!
Approved and merged!

* Only check latencies once every 10 seconds with `routeByLatency` `routeByLatency` currently checks latencies any time a server returns a MOVED or READONLY reply. When a shard is down, the ClusterClient chooses to issue the request to a random server, which returns a MOVED reply. This causes a state refresh and a latency update on all servers. This can lead to significant ping load to clusters with a large number of clients. This introduces logic to ping only once every 10 seconds, only performing a latency update on a node during the `GC` function if the latency was set later than 10 seconds ago. Fixes #2782 * use UnixNano instead of Unix for better precision --------- Co-authored-by: ofekshenawa <104765379+ofekshenawa@users.noreply.github.com>

justinmir marked this pull request as ready for review November 10, 2023 19:47

chayim requested a review from ofekshenawa February 18, 2024 07:06

justinmir and others added 2 commits February 29, 2024 12:26

use UnixNano instead of Unix for better precision

97cee82

Merge branch 'master' into only-update-latency-in-gc-if-stale

5dcad41

Merge branch 'master' into only-update-latency-in-gc-if-stale

2b4bdfc

Merge branch 'master' into only-update-latency-in-gc-if-stale

0cb00f7

ofekshenawa approved these changes Nov 20, 2024

View reviewed changes

ofekshenawa merged commit f1ffb55 into redis:master Nov 20, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only check latencies once every 10 seconds with `routeByLatency` #2795

Only check latencies once every 10 seconds with `routeByLatency` #2795

Uh oh!

justinmir commented Nov 10, 2023 •

edited

Loading

Uh oh!

ofekshenawa commented Feb 18, 2024

Uh oh!

justinmir commented Feb 29, 2024

Uh oh!

justinmir commented Apr 11, 2024

Uh oh!

justinmir commented Oct 18, 2024

Uh oh!

Uh oh!

ofekshenawa commented Nov 20, 2024

Uh oh!

Uh oh!

Only check latencies once every 10 seconds with routeByLatency #2795

Only check latencies once every 10 seconds with routeByLatency #2795

Uh oh!

Conversation

justinmir commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofekshenawa commented Feb 18, 2024

Uh oh!

justinmir commented Feb 29, 2024

Uh oh!

justinmir commented Apr 11, 2024

Uh oh!

justinmir commented Oct 18, 2024

Uh oh!

Uh oh!

ofekshenawa commented Nov 20, 2024

Uh oh!

Uh oh!

Only check latencies once every 10 seconds with `routeByLatency` #2795

Only check latencies once every 10 seconds with `routeByLatency` #2795

justinmir commented Nov 10, 2023 •

edited

Loading