misc: Thread-safe cache rewrite #2683

enwask · 2025-07-24T12:08:07Z

Rewrites memoized_meth and memoized_generator for concurrency, as well as dropping the legacy memoized_func in favor of functools.cache. Also adds a global lock for the symbol cache allowing thread-safe Symbol construction and cache manipulation.

The memoized_meth decorator now stores one cache per thread, whereas memoized_generator stores a single cache for a given method (though there may still be misses if the cache is initialized concurrently). This means neither method cache has a call-once guarantee.

Memoized generators block for the initial call of the generator function and construct a thread-safe version of itertools.tee, which allows for concurrent iteration (but blocks when iterating elements that are not yet in the buffer). After the source generator is consumed, there is no blocking and subsequently threads can iterate the buffer in parallel (see SafeTee in tools.memoization).

Copilot

Pull Request Overview

This PR rewrites the memoization system to be thread-safe, replacing the legacy memoized_func with Python's built-in functools.cache and redesigning memoized_meth and memoized_generator for concurrent access. Additionally, it introduces a global lock for the symbol cache to enable thread-safe symbol construction.

Replaces memoized_func with functools.cache throughout the codebase
Implements thread-local caching for memoized_meth with one cache per thread
Creates a thread-safe SafeTee implementation for memoized_generator that allows concurrent iteration
Adds global locking mechanism for symbol cache operations

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_tools.py	Adds comprehensive tests for concurrent behavior of memoized methods and generators
devito/types/caching.py	Introduces global lock for thread-safe symbol cache operations
devito/types/basic.py	Updates symbol construction to use the global cache lock
devito/tools/memoization.py	Complete rewrite of memoization decorators with thread-safety and SafeTee implementation
devito/arch/compiler.py	Replaces memoized_func with functools.cache
devito/arch/archinfo.py	Replaces memoized_func with functools.cache

Comments suppressed due to low confidence (3)

tests/test_tools.py:273

The test uses time.sleep() but doesn't import the time module. This will cause a NameError when the test runs.

                time.sleep(0.2)

tests/test_tools.py:330

The test uses time.sleep() but doesn't import the time module. This will cause a NameError when the test runs.

                time.sleep(0.25)

tests/test_tools.py:333

The test uses time.sleep() but doesn't import the time module. This will cause a NameError when the test runs.

                time.sleep(0.25)

Copilot · 2025-07-25T11:31:20Z

devito/tools/memoization.py

        try:
+            # Try to retrieve the cached value
            res = cache[key]
        except KeyError:


[nitpick] Using try-except for cache lookup may be slower than if key in cache for the common case where keys are present. Consider using cache.get(key) with a sentinel value to avoid the exception overhead.

Suggested change

try:

# Try to retrieve the cached value

res = cache[key]

except KeyError:

# Use a sentinel to avoid exception overhead

sentinel = object()

res = cache.get(key, sentinel)

if res is sentinel:

After re-reading the AI, I might agree. Backtraces that encounter caches which use exception handling for flow control are painful to read

Copilot · 2025-07-25T11:31:21Z

devito/tools/memoization.py

+        try:
+            # Try to retrieve the cached value
+            res = cache[key]
+        except KeyError:


[nitpick] Using try-except for cache lookup may be slower than if key in cache for the common case where keys are present. Consider using cache.get(key) with a sentinel value to avoid the exception overhead.

Suggested change

try:

# Try to retrieve the cached value

res = cache[key]

except KeyError:

# Use a sentinel to check for cache misses

_MISSING = object()

res = cache.get(key, _MISSING)

if res is _MISSING:

FabioLuporini

at a bare minimum we need a new TestMultithreading batch in test_caching

devito/arch/archinfo.py

FabioLuporini · 2025-08-05T07:29:36Z

devito/tools/memoization.py

+        it if necessary.
+        """
+        # Try-catch is theoretically faster on the happy path
+        _local: local


this is improvable... :)

FabioLuporini · 2025-08-05T07:30:08Z

devito/tools/memoization.py

-        self.func = func
+        # If the cache doesn't exist, initialize it
+        except AttributeError:
+            with self._lock:


if it's thread-local why would you need a lock?

The attribute itself contains thread-local data but it still needs to exist on the instance—I think it still needs to be initialized safely or one thread might erase existing thread local storage for another thread.

Although on second thought, maybe the cost of that occasional clash is worth getting rid of the lock

I see now.

Potentially easy workaround (brutal pseudocode just to give u the gist):

def ThreadSafeCache(dict): # {thread_id -> {key -> value}} def __init__(.......): <this has to lead to a dict of dict somehow... perhaps subclass default dict... not sure> def __getitem__(self, k, v=None): return self[threadid][k] ...

This way you need no lock

I had to do something similar in the dark age here https://github.com/devitocodes/devito/blob/main/devito/tools/timing.py#L17 for slightly different reasons but still multi-threading related

Well then you need a lock for all accesses no? Or at least to place dictionaries in the top-level dict, since afaik that is not safe for concurrent modification

FabioLuporini · 2025-08-05T07:33:42Z

devito/tools/memoization.py


+
+class SafeTee(Iterator[YieldType]):


we could call this just tee and return python's tee (overriding __new__) if mono thread

Feels a little gross to be honest. What's the purpose of doing that if SafeTee isn't needed outside of memoized generators?

I mean, ultimately all I want is that single-threaded programs do not suffer from the inherent overheads of thread-safe data structures.

So, re-reading my comment, I'm guilty of mixing up syntactic sugar and suggestions; the suggestion would be falling back to plain tee if mono thread (does it matter? again, the mono thread program overhead when switching to thread-safe data structure is still unclear to mean -- and hopefully none!)

Yeah, I mean with falling back—that's something I was going to go through and update for all of this code, to the extent that it's possible: avoid any locking overhead if the GIL is enabled (since presumably we won't be trying to multithread anything with the GIL as everything is basically interpreter bound)

The overhead is arguably negligible but definitely nonzero. I can try to do some profiling to see if it's worth eliminating when the GIL is enabled

FabioLuporini · 2025-08-05T07:37:07Z

devito/types/basic.py

-        name = kwargs.pop('name', None) or args.pop(0)
-        newobj = cls.__xnew__(cls, name, **assumptions)
+        # Lock against the symbol cache and double-check the cache
+        with CacheManager.lock():


I'm not comfortable with introducing thread-safety logic into the types hierarchy, because __new__ gets overridden often, so this might create unexpected issues and subsequently burden on the developer. Why don't we move all this complexity into types/caching?

Secondly, if mono thread, it should somehow fallback to the previous "unlocked cache" data structure, unless you tell me the overhead of acquiring and releasing a lock with mono thread is practically zero

Seconded on both of these points. Wrt the first, it is also worth considering that the integration of cache locks into the __new__ effectively acts as a barrier to modifying and working on subclasses of Symbol (possibly preventing contributions), increases the chance for bugs to creep in though mistakes, and will generally uglify and complicate the codebase since this change will need propagating down into subclasses

FabioLuporini · 2025-08-05T07:39:07Z

devito/types/caching.py

@@ -76,8 +78,10 @@ def _cache_get(cls, key):
            obj = obj_cached()
            if obj is None:
                # Cleanup _SymbolCache (though practically unnecessary)
-                # does not fail if it's already gone
-                _SymbolCache.pop(key, None)
+                with _cache_lock:


this code you're changing is monumentally critical, and at this stage I cannot claim these changes are safe yet

Yeah, was going to try to figure out some reasonable tests for this

FabioLuporini · 2025-08-05T07:39:17Z

devito/types/caching.py

-            if obj() is None:
-                # (key could be removed in another thread since get() above)
-                _SymbolCache.pop(key, None)
+        for key, obj_cached in cache_copied.items():


this code you're changing is monumentally critical, and at this stage I cannot claim these changes are safe yet

EdCaunt · 2025-08-05T08:56:20Z

devito/tools/memoization.py

-                return self + arg
-        Obj.add_to(1) # not enough arguments
-        Obj.add_to(1, 2) # returns 3, result is not cached
+    def __init__(self, meth: Callable[Concatenate[InstanceType, ParamsType],


Tidy the typing up by putting Callable[Concatenate[InstanceType, ParamsType], ReturnType] on its own line as methodtype or similar

EdCaunt · 2025-08-05T09:03:12Z

devito/tools/memoization.py

+        _local: local
+        try:
+            # Attempt to access the thread-local data
+            _local = obj._memoized_meth__local


Maybe call it on_local? The double underscore in obj._memoized_meth__local is upsetting

The double underscore is emulating name mangling (like, when a class Something declares an attribute __starting_with_double_underscores it's mangled to _Something__starting_with_double_underscores to avoid clashing). Can rename it, but I did want it to be clear that this attribute is assigned by memoized_meth if someone happens to inspect a live object later and wonder what the heck _on_local comes from

EdCaunt · 2025-08-05T09:37:20Z

devito/tools/memoization.py

-        return result
+            with self._lock:
+                # Check again in case another thread initialized outside the lock
+                if not hasattr(obj, '_memoized_generator__cache'):


Nitpick: I think the double underscore can be reduced to a single one in this method name

See reasoning above; this emulates name mangling but I guess just removing one of the underscores is not unreasonable

EdCaunt · 2025-08-05T09:47:23Z

devito/tools/memoization.py

+            source_iter = self._meth(obj, *args, **kwargs)
+            res = cache[key] = SafeTee(source_iter)
+
+        return res.tee()


Isn't this essentially copying the SafeTee you have just created? Why do you need to do that?

Fair point, yeah we can just return the root SafeTee if we created it here

EdCaunt · 2025-08-05T09:51:44Z

devito/types/basic.py

-        name = kwargs.pop('name', None) or args.pop(0)
-        newobj = cls.__xnew__(cls, name, **assumptions)
+        # Lock against the symbol cache and double-check the cache
+        with CacheManager.lock():


Seconded on both of these points. Wrt the first, it is also worth considering that the integration of cache locks into the __new__ effectively acts as a barrier to modifying and working on subclasses of Symbol (possibly preventing contributions), increases the chance for bugs to creep in though mistakes, and will generally uglify and complicate the codebase since this change will need propagating down into subclasses

EdCaunt · 2025-08-05T09:55:17Z

devito/types/basic.py


-        return newobj
+            # Store new instance in symbol cache
+            Cached.__init__(newobj, key)


This is a fairly fundamental class in Devito and will need substantial testing to be sure that these changes are not breaking any edge cases

EdCaunt · 2025-08-05T10:03:03Z

tests/test_tools.py

@@ -209,3 +211,156 @@ def __init__(self, value: int):
        # Cache should be cleared after Operator construction
        cache_size = Object._instance_cache.cache_info()[-1]
        assert cache_size == 0
+
+
+class TestMemoizedMethods:


Do these tests get specifically run in the GIL-less CI?

You should also probably have some tests to make sure this doesn't do anything weird when using _rebuild

codecov · 2025-08-05T10:23:52Z

Codecov Report

❌ Patch coverage is 89.13043% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.52%. Comparing base (cd9058b) to head (8bb95f2).
⚠️ Report is 101 commits behind head on main.

Files with missing lines	Patch %	Lines
tests/test_tools.py	83.33%	17 Missing ⚠️
devito/tools/memoization.py	95.18%	1 Missing and 3 partials ⚠️
devito/types/basic.py	84.61%	1 Missing and 1 partial ⚠️
devito/types/caching.py	84.61%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2683      +/-   ##
==========================================
+ Coverage   78.59%   87.52%   +8.92%     
==========================================
  Files         245      245              
  Lines       49089    49230     +141     
  Branches     4322     4322              
==========================================
+ Hits        38582    43087    +4505     
+ Misses       9714     5408    -4306     
+ Partials      793      735      -58

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JDBetteridge

I would like to see some more tests and maybe a bit more exposition. Maybe not a notebook, but a wiki page. Some reference for how all this is designed and supposed to work.

When I had to previously fix major bugs in a caching framework, most of what I was doing was fixing how people were incorrectly using the existing cache. There needs to be a readily available reference with examples, pitfalls and anti-patterns.

Most contributors (I expect) will have never had to deal with concurrent code execution or the principles that the whole paradigm is based on.

JDBetteridge · 2025-08-05T17:03:09Z

devito/tools/memoization.py

+    def __init__(self, meth: Callable[Concatenate[InstanceType, ParamsType],
+                                      ReturnType]) -> None:


Suggested change

def __init__(self, meth: Callable[Concatenate[InstanceType, ParamsType],

ReturnType]) -> None:

def __init__(

self,

meth: Callable[Concatenate[InstanceType, ParamsType], ReturnType]

) -> None:

This would be my preferred style for all function signatures that are too long due to type hinting

Black style! Yeah I have been sorta torn between these two styles but if there's a preference I'll go with this

I would personally just stick Callable[Concatenate[InstanceType, ParamsType], ReturnType] on its own line, like MethodType = Callable[Concatenate[InstanceType, ParamsType], ReturnType]

JDBetteridge · 2025-08-05T17:05:05Z

devito/tools/memoization.py

+        Retrieves the thread-local cache for the given object instance, initializing
+        it if necessary.
+        """
+        # Try-catch is theoretically faster on the happy path


Faster than what?

Faster than two dictionary lookups

add it maybe

JDBetteridge · 2025-08-05T17:10:44Z

devito/tools/memoization.py

        try:
+            # Try to retrieve the cached value
            res = cache[key]
        except KeyError:


After re-reading the AI, I might agree. Backtraces that encounter caches which use exception handling for flow control are painful to read

JDBetteridge · 2025-08-05T17:18:39Z

tests/test_tools.py

+            @memoized_meth
+            def compute(self, x):
+                self.misses += 1
+                return x * 2


Can these deeply nested objects be fixtures?

Yeah I was thinking of factoring them out to fixtures, will do that when I revisit more cache testing

FabioLuporini · 2025-08-06T08:16:29Z

devito/tools/memoization.py

+    A thread-safe version of `itertools.tee` that allows multiple iterators to safely
+    share the same buffer.
+
+    In theory, this comes at a cost to performance of iterating elements that haven't


I started typing a long reply to this, but then I deleted it... I'll raise a question instead:

does this make data dependence analysis (Scope) run faster when executing with multiple threads, and what's the cutoff point ? I imagine mono thread will suffer from overhead

depending on the answer to this question, my next question would be: is this really the way we want to exploit multi-threading to parallelise data dependence analysis?

Generators are inherently sequential, and it feels like here we're forcing innatural behavior... won't all threads clash by attempting to yield contiguous elements back to back? do u see what I mean here?

I mean, part of the issue here is that we have to (want to) maintain the structure of the existing implementation based on generators -- an implementation that over the years has been extremely optimized for single threaded applications. If multi-threading is all we had to worry about, we might as we well forget about generators and use the thread pool to collectively construct the entire set of dependencies once and for all...

another thought spurred by ☝️ :

we could have the thread pool construct a subset of Dependencies (e.g. each thread constructs K Dependencies at a time, possibly K=1) and then yield them one at a time

another orthogonal idea: we use N-1 threads to eagerly populate the Scope at instantiation time, so that when at the caller site we access any of the Scope attributes we find Dependencies already pre-built (rather than constructing them on the fly).

in this latter idea, we would be creating a producer-consumer pattern . The guy who instantiate the Scope is the consumer, and instantiation fires up the producer(s, ?) threads that are now in charge of populating the Scope, while the consumer runs its logic one element at a time

Yeah, this is something I was thinking about a lot when I was rewriting the generator cache. I wanted to see if eagerly evaluating them once might make more sense for multi-threaded use.

My concern was that lots of the higher-level opportunities for parallelism are (or contain) methods that heavily utilize Scopes, so there might be potential for a deadlock if a bunch of threads are waiting for a Scope to be populated but there aren't any workers available to act as producers. So this might need some more complex machinery like a work queue I think? And I know how you feel about complexity

enwask force-pushed the safe-caches branch from 1b57830 to 23c8c2d Compare July 24, 2025 14:10

enwask added 6 commits July 24, 2025 16:05

misc: Remove memoized_func -> functools.cache

12f8638

misc: Memoized methods rewrite

d544a71

misc: Fix local cache initialization

709e28d

tests: Add memoized method tests

05e6518

misc: Don't try to catch unhashable args

80434cd

misc: Thread-safe symbol cache

8bb95f2

enwask force-pushed the safe-caches branch from 45550f5 to 8bb95f2 Compare July 24, 2025 15:05

enwask marked this pull request as ready for review July 24, 2025 15:05

ggorman requested a review from Copilot July 25, 2025 11:30

Copilot AI reviewed Jul 25, 2025

View reviewed changes

FabioLuporini reviewed Aug 5, 2025

View reviewed changes

EdCaunt reviewed Aug 5, 2025

View reviewed changes

JDBetteridge requested changes Aug 5, 2025

View reviewed changes

FabioLuporini reviewed Aug 6, 2025

View reviewed changes

-        try:
-            # Try to retrieve the cached value
-            res = cache[key]
-        except KeyError:
+        # Use a sentinel to avoid exception overhead
+        sentinel = object()
+        res = cache.get(key, sentinel)
+        if res is sentinel:

-        try:
-            # Try to retrieve the cached value
-            res = cache[key]
-        except KeyError:
+        # Use a sentinel to check for cache misses
+        _MISSING = object()
+        res = cache.get(key, _MISSING)
+        if res is _MISSING:

		def __init__(self, meth: Callable[Concatenate[InstanceType, ParamsType],
		ReturnType]) -> None:

misc: Thread-safe cache rewrite #2683

Are you sure you want to change the base?

misc: Thread-safe cache rewrite #2683

Uh oh!

Conversation

enwask commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

FabioLuporini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

enwask commented Jul 24, 2025 •

edited

Loading

codecov bot commented Aug 5, 2025 •

edited

Loading

EdCaunt Aug 7, 2025 •

edited

Loading