-
Notifications
You must be signed in to change notification settings - Fork 125
Disable allow_stream_result
to force a materialized DuckDB execution results
#877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/pgduckdb_node.cpp
Outdated
// This is required for cases like CTAS from a Postgres table, where allowing streaming results | ||
// could lead to race conditions on Postgres resources. | ||
// Checkout discussion: https://github.com/duckdb/pg_duckdb/discussions/866 | ||
auto pending = prepared.PendingQuery(named_values, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to only do this if postgres tables are actually involved in the query that we send to duckdb. Otherwise dataloading steps (e.g. loading a few GB parquet file) will use much more memory for no real reason, because now the result cannot be streamed anymore. The only way to really know whether a postgres query is involved in the query (given that duckdb.query
exists) is to do this detection while preparing the statement, i.e. if it the plan involves a postgres scan, then we should not allow streaming results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check on the rtables of the Query instance. And as long as a Postgres table is referenced, the stream is disabled.
src/scan/postgres_scan.cpp
Outdated
PostgresScopedStackReset scoped_stack_reset; | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these changes suddenly needed? (as well as the one below in PostgresScanFunction
) It seems like these additions could very well be the cause for the test failures in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code's been removed. I hit a stack limit error when I was testing, but now cannot reproduce it again
@@ -149,6 +149,20 @@ namespace pgduckdb { | |||
|
|||
int64_t executor_nest_level = 0; | |||
|
|||
bool | |||
ContainsPostgresTable(const Query *query) { | |||
List *rtable = query->rtable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do a full traversal of the tree to look for more queries, not just look in the outermost query. Similarly to ContainsDuckdbItems
.
Discussion: #866