Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.26.0
Added
path_filter
parameter inpw.io.s3.read
andpw.io.minio.read
functions. It enables post-filtering of object paths using a wildcard pattern (*
,?
), allowing exclusion of paths that pass the mainpath
filter but do not matchpath_filter
.- Input connectors now support backpressure control via
max_backlog_size
, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates. pw.reducers.count_distinct
andpw.reducers.count_distinct_approximate
to count the number of distinct elements in a table. Thepw.reducers.count_distinct_approximate
allows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using theprecision
parameter.pw.Table.join
(and its variants) now has two additional parameters -left_exactly_once
andright_exactly_once
. If the elements from a side of a join should be joined exactly once,*_exactly_once
parameter of the side can be set toTrue
. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.
Changed
- Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
- Improved initialization speed of
pw.io.s3.read
andpw.io.minio.read
. pw.io.s3.read
andpw.io.minio.read
now limit the number and the total size of objects to be predownloaded.- BREAKING optimized the implementation of
pw.reducers.min
,pw.reducers.max
,pw.reducers.argmin
,pw.reducers.argmax
,pw.reducers.any
reducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING optimized the implementation of
pw.reducers.sum
reducer onfloat
andnp.ndarray
columns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- Improved precision of
pw.reducers.sum
onfloat
columns by introducing Neumeier summation.
v0.25.1
Added
pw.xpacks.llm.mcp_server.PathwayMcp
that allows servingpw.xpacks.llm.document_store.DocumentStore
andpw.xpacks.llm.question_answering
endpoints as MCP (Model Context Protocol) tools.pw.io.dynamodb.write
method for writing to Dynamo DB.
v0.25.0
Added
pw.io.questdb.write
method for writing to Quest DB.pw.io.fs.read
now supports the"only_metadata"
format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.
Changed
- BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
- BREAKING
pw.io.fs.read
no longer acceptsformat="raw"
. Useformat="binary"
to read binary objects,format="plaintext_by_file"
to read plaintext objects per file, orformat="plaintext"
to read plaintext objects split into lines. - BREAKING The
pw.io.s3_csv.read
connector has been removed. Please usepw.io.s3.read
withformat="csv"
instead.
Fixed
pw.io.s3.read
andpw.io.s3.write
now also check theAWS_PROFILE
environment variable for AWS credentials if none are explicitly provided.
v0.24.1
Added
- Confluent Schema Registry support in Kafka and Redpanda input and output connectors.
Changed
pw.io.airbyte.read
will now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.
v0.24.0
Added
pw.io.mqtt.read
andpw.io.mqtt.write
methods for reading from and writing to MQTT.
Changed
pw.xpacks.llm.embedders.SentenceTransformerEmbedder
andpw.xpacks.llm.llms.HFPipelineChat
are now computed in batches. The maximum size of a single batch can be set in the constructor with the argumentmax_batch_size
.- BREAKING Arguments
api_key
andbase_url
forpw.xpacks.llm.llms.OpenAIChat
can no longer be set in the__call__
method, and instead, if needed, should be set in the constructor. - BREAKING Argument
api_key
forpw.xpacks.llm.llms.OpenAIEmbedder
can no longer be set in the__call__
method, and instead, if needed, should be set in the constructor. pw.io.postgres.write
now accepts arbitrary types for the values of thepostgres_settings
dict. If a value is not a string, Python'sstr()
method will be used.
Removed
pw.io.kafka.read_from_upstash
has been removed, as the managed Kafka service in Upstash has been deprecated.
v0.23.0
Changed
- BREAKING: To use
pw.sql
you now have to installpathway[sql]
.
Fixed
pw.io.deltalake.read
now correctly reads data from partitioned tables in all cases.- Added retries for all cloud-based persistence backend operations to improve reliability.
v0.22.0
Added
- Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using
pw.persistence.Backend.azure
and included in the persistence config. - Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so
max_batch_size
argument has to be set.
Changed
- BREAKING: when creating
pw.DateTimeUtc
it is now obligatory to pass the time zone information. - BREAKING: when creating
pw.DateTimeNaive
passing time zone information is not allowed. - BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.
Fixed
- Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.
v0.21.6
Added
sort_by
method topw.BaseCustomAccumulator
that allows to sort rows within a single batch. Whensort_by
is defined the rows are reduced in the order specified by thesort_by
method. It can for example be used to process entries in the order of event time.
Changed
pw.Table.debug
now prints a whole row in a single line instead of printing each cell separately.- Calling functions without arguments in YAML configurations files is now deprecated in
pw.load_yaml
. To call the function a mapping should be passed, e.g. empty mapping as{}
. In the future!
syntax without any mapping will be used to pass function objects without calling them. - The license check error message now provides a more detailed explanation of the failure.
- When code is run using
pathway spawn
with multiple processes, if one process terminates with an error, all other processes will also be terminated. pw.xpacks.llm.vector_store.VectorStoreServer
is being deprecated, and it is now subclass ofpw.xpacks.llm.document_store.DocumentStore
. Public API is being kept the same, however users are encouraged to switch to usingDocumentStore
from now on.pw.xpacks.llm.vector_store.VectorStoreClient
is being deprecated in favor ofpw.xpacks.llm.document_store.DocumentStoreClient
.pw.io.deltalake.write
can now maintain the target table's snapshot on the output.
v0.21.5
Changed
pw.io.deltalake.read
now processes Delta table version updates atomically, applying all changes together in a single minibatch.- The panel widget for table visualization now has a horizontal scroll bar for large tables.
- Added the possibility to return value from any column from
pw.reducers.argmax
andpw.reducers.argmin
, not onlyid
.
Fixed
pw.reducers.argmax
andpw.reducers.argmin
work correctly with the result ofpw.Table.windowby
.
v0.21.4
Added
pw.io.kafka.read
andpw.io.redpanda.read
now support static mode.
Changed
- The
inactivity_detection
function is now a method for append only tables. It no longer relies on an event timestamp column but now uses table processing times to detect inactivity periods.