Skip to content

Disable allow_stream_result to force a materialized DuckDB execution results #877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

YuweiXiao
Copy link
Contributor

Discussion: #866

// This is required for cases like CTAS from a Postgres table, where allowing streaming results
// could lead to race conditions on Postgres resources.
// Checkout discussion: https://github.com/duckdb/pg_duckdb/discussions/866
auto pending = prepared.PendingQuery(named_values, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to only do this if postgres tables are actually involved in the query that we send to duckdb. Otherwise dataloading steps (e.g. loading a few GB parquet file) will use much more memory for no real reason, because now the result cannot be streamed anymore. The only way to really know whether a postgres query is involved in the query (given that duckdb.query exists) is to do this detection while preparing the statement, i.e. if it the plan involves a postgres scan, then we should not allow streaming results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check on the rtables of the Query instance. And as long as a Postgres table is referenced, the stream is disabled.

Comment on lines 425 to 426
PostgresScopedStackReset scoped_stack_reset;
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes suddenly needed? (as well as the one below in PostgresScanFunction) It seems like these additions could very well be the cause for the test failures in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code's been removed. I hit a stack limit error when I was testing, but now cannot reproduce it again

@@ -149,6 +149,20 @@ namespace pgduckdb {

int64_t executor_nest_level = 0;

bool
ContainsPostgresTable(const Query *query) {
List *rtable = query->rtable;
Copy link
Collaborator

@JelteF JelteF Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do a full traversal of the tree to look for more queries, not just look in the outermost query. Similarly to ContainsDuckdbItems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants