-
-
Notifications
You must be signed in to change notification settings - Fork 54
Add prefer_async param to TAPService constructor to control what mode uses #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #686 +/- ##
=======================================
Coverage 84.00% 84.01%
=======================================
Files 80 80
Lines 8522 8526 +4
=======================================
+ Hits 7159 7163 +4
Misses 1363 1363 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider the functionality reasonable and can't find flaw with the execution. I particularly like the explanation of sync vs. async in the docs. So... I'm all for merging, but since it's touching the API and people might want to point out that we should drop the force_sync argument to query and just tell people to use run_sync, I'd like to leave the actual merge to someone else.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me, and the docs is excellent, but I feel this is the type of thing that may lead to overcomplicated API.
I mean encouraging the explicit use of run_sync
or run_async
could be enough?
That being said, this is not a strong preference and is not a blocker for merging, but I pass on the ball to one more maintainer to chime in.
On Mon, Jun 30, 2025 at 03:47:04PM -0700, Brigitta Sipőcz wrote:
That being said, this is not a strong preference and is not a
blocker for merging, but I pass on the ball to one more maintainer
to chime in.
I remain mildly in favour, so... @andamian or @tomdonaldson, if
you're mildly in favour, too, would you merge?
|
I've followed the interesting discussion in the original issue and I, myself, had to grapple with the sync vs async issue in the past so I appreciate any effort that is being made to make it clearer to the users. However, I do think that this solution is not heading in the right direction for the following reasons:
|
Thanks for the feedback @andamian! I do see the point that many users of "search" shouldn't need to think about sync/async implementation details and generally agree that query characteristics matter for sync/async choice: However I'd argue that sync/async is not just a matter of the query. In our current implementation we hide async execution behind the sync endpoint.
As a results we decided to implement a low sync timeout (1 minute) with much longer async limits despite wanting to hide this complexity from users. In terms of the proposed How does a user know what values to set? (for a query they haven't run before):
Users probably will not know how long their query will actually take on this service, what the service's sync timeout limits are and whether their query will hit large tables or use slow indexes. If the service has different async capabilities, how do these attributes determine execution method? Does long_query=True automatically mean async? Also different services may have very different characteristics. While our service has 1-minute sync limit, (very-long) async, other services: may have 10-second sync limits, or 5-minute limits and others may have limited temp storage. So I don't see how a fixed timeout threshold or long_query attribute would help here, unless I've misunderstood the proposal. I generally lean towards an optional parameter that they don't have to set (explicitly choosing async or sync) which gives explicit control but doesn't confuse users who don't care about the sync/async implementations, but maybe I'm missing the bigger picture of how your proposal would work in practice. |
I agree that the I incline to think that Another alternative is to hide all these under the hood without any changes to the API. Just assume that the user doesn't know (or doesn't want to know) the implementation details. Try the |
The way I'm looking at this I see the explicit async/sync choice better because it directly maps to reality and is more honest to what each option will do. When a user explicitly sets the run method, i.e: If a user has a query that would normally timeout for a given sync but would run within the limits of the async endpoint, I don't know how they would choose the right values for long_query/timeout without trial & error. Then for users who don't care or want to know about TAP internals, they can just run search without the optional parameter. I also feel that since pyvo already exposes an API for running sync & async ( I'm not against the idea of using sync and falling back to async, and had contemplating proposing this approach as well.
|
If we're going this route, then I would advocate for adding a warning after the timeout, something along the line to "rerunning query with async, you may consider changing your code to use |
On Tue, Jul 08, 2025 at 02:48:29PM -0700, Brigitta Sipőcz wrote:
If we're going this route, then I would advocate for adding a
warning after the timeout, something along the line to "rerunning
query with async, you may consider changing your code to use
`run_async` instead".
The trouble with this is that a TAP client cannot yet reliably tell
that a query timed out (rather than fail in some different way).
You point makes me think we should change this. As it happens, DALI
is right before RFC now, and I think we should advocate some language
on making timeouts machine-readable in there.
I've considered using QUERY_STATUS for that, in analogy to OVERFLOW.
But extending the QUERY_STATUS "vocabulary" seems tricky, and as it
happens, HTTP itself has 408 Request Timeout. I believe just
amending DALI's 5.2 with something saying "services are urged to use
a 408 HTTP status code if a request took more time to process than
the server was willing to allot it; clients could use this
information to recommend authentication, switching to async mode, or
whatever" would prepare us for what Brigitta is proposing here (which
I think does make sense).
Are there any volunteers for writing a bug against DALI in this
matter (or perhaps even a PR)?
|
I've started with an issue here and happy to work on the text if there is consensus: |
Shall I close this PR in favour of addressing this at the level of better timeout messages once the plumbing is in place in DALI, or do is this still an approach we are considering? |
On Tue, Jul 15, 2025 at 03:44:27PM -0700, stvoutsin wrote:
stvoutsin left a comment (astropy/pyvo#686)
Shall I close this PR in favour of addressing this at the level of
better timeout messages once the plumbing is in place in DALI, or
do is this still an approach we are considering?
I'm happy to have it closed given that there was some amount of
pushback and there may really be not much advantage in telling people
to run search(,prefer_async) against pointing them to run_async() in
the first place.
|
Description
Adds an optional
prefer_async
parameter toTAPService
that allows users to specify that they want search to default to async execution.This change should be backwards-compatbile.
Problem: Currently,
TAPService.search()
always uses sync regardless. Some TAP service providers prefer asynchronous execution for better stability and resource management.Solution: Implements approach 2 from issue #685 - an opt-in
prefer_async
parameter that:False
(no behavior change for existing code)True
, makessearch()
userun_async()
automaticallyforce_sync=True
parameter to override async preference when neededUsage:
Docs
Slightly modified/extended the dal/index.rst documentation with additional content describing the new parameter and how it can be used.
I've also included a small change/fix to UWS to make the pattern check more generic for the job list output.
Tests
Added a few unit tests to validate behaviour
Closes
#685
Is this a reasonable enhancement? Any thing I should change in the implementation or documentation?