Skip to content

Releases: pathwaycom/pathway

v0.26.0

14 Aug 08:20
Compare
Choose a tag to compare

Added

  • path_filter parameter in pw.io.s3.read and pw.io.minio.read functions. It enables post-filtering of object paths using a wildcard pattern (*, ?), allowing exclusion of paths that pass the main path filter but do not match path_filter.
  • Input connectors now support backpressure control via max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates.
  • pw.reducers.count_distinct and pw.reducers.count_distinct_approximate to count the number of distinct elements in a table. The pw.reducers.count_distinct_approximate allows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using the precision parameter.
  • pw.Table.join (and its variants) now has two additional parameters - left_exactly_once and right_exactly_once. If the elements from a side of a join should be joined exactly once, *_exactly_once parameter of the side can be set to True. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.

Changed

  • Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
  • Improved initialization speed of pw.io.s3.read and pw.io.minio.read.
  • pw.io.s3.read and pw.io.minio.read now limit the number and the total size of objects to be predownloaded.
  • BREAKING optimized the implementation of pw.reducers.min, pw.reducers.max, pw.reducers.argmin, pw.reducers.argmax, pw.reducers.any reducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING optimized the implementation of pw.reducers.sum reducer on float and np.ndarray columns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • Improved precision of pw.reducers.sum on float columns by introducing Neumeier summation.

v0.25.1

24 Jul 12:09
Compare
Choose a tag to compare

Added

  • pw.xpacks.llm.mcp_server.PathwayMcp that allows serving pw.xpacks.llm.document_store.DocumentStore and pw.xpacks.llm.question_answering endpoints as MCP (Model Context Protocol) tools.
  • pw.io.dynamodb.write method for writing to Dynamo DB.

v0.25.0

17 Jul 17:44
Compare
Choose a tag to compare

Added

  • pw.io.questdb.write method for writing to Quest DB.
  • pw.io.fs.read now supports the "only_metadata" format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.

Changed

  • BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
  • BREAKING pw.io.fs.read no longer accepts format="raw". Use format="binary" to read binary objects, format="plaintext_by_file" to read plaintext objects per file, or format="plaintext" to read plaintext objects split into lines.
  • BREAKING The pw.io.s3_csv.read connector has been removed. Please use pw.io.s3.read with format="csv" instead.

Fixed

  • pw.io.s3.read and pw.io.s3.write now also check the AWS_PROFILE environment variable for AWS credentials if none are explicitly provided.

v0.24.1

17 Jul 17:44
Compare
Choose a tag to compare

Added

  • Confluent Schema Registry support in Kafka and Redpanda input and output connectors.

Changed

  • pw.io.airbyte.read will now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.

v0.24.0

17 Jul 17:44
Compare
Choose a tag to compare

Added

  • pw.io.mqtt.read and pw.io.mqtt.write methods for reading from and writing to MQTT.

Changed

  • pw.xpacks.llm.embedders.SentenceTransformerEmbedder and pw.xpacks.llm.llms.HFPipelineChat are now computed in batches. The maximum size of a single batch can be set in the constructor with the argument max_batch_size.
  • BREAKING Arguments api_key and base_url for pw.xpacks.llm.llms.OpenAIChat can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • BREAKING Argument api_key for pw.xpacks.llm.llms.OpenAIEmbedder can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • pw.io.postgres.write now accepts arbitrary types for the values of the postgres_settings dict. If a value is not a string, Python's str() method will be used.

Removed

  • pw.io.kafka.read_from_upstash has been removed, as the managed Kafka service in Upstash has been deprecated.

v0.23.0

12 Jun 08:22
Compare
Choose a tag to compare

Changed

  • BREAKING: To use pw.sql you now have to install pathway[sql].

Fixed

  • pw.io.deltalake.read now correctly reads data from partitioned tables in all cases.
  • Added retries for all cloud-based persistence backend operations to improve reliability.

v0.22.0

05 Jun 10:48
Compare
Choose a tag to compare

Added

  • Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using pw.persistence.Backend.azure and included in the persistence config.
  • Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so max_batch_size argument has to be set.

Changed

  • BREAKING: when creating pw.DateTimeUtc it is now obligatory to pass the time zone information.
  • BREAKING: when creating pw.DateTimeNaive passing time zone information is not allowed.
  • BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.

Fixed

  • Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.

v0.21.6

29 May 07:49
Compare
Choose a tag to compare

Added

  • sort_by method to pw.BaseCustomAccumulator that allows to sort rows within a single batch. When sort_by is defined the rows are reduced in the order specified by the sort_by method. It can for example be used to process entries in the order of event time.

Changed

  • pw.Table.debug now prints a whole row in a single line instead of printing each cell separately.
  • Calling functions without arguments in YAML configurations files is now deprecated in pw.load_yaml. To call the function a mapping should be passed, e.g. empty mapping as {}. In the future ! syntax without any mapping will be used to pass function objects without calling them.
  • The license check error message now provides a more detailed explanation of the failure.
  • When code is run using pathway spawn with multiple processes, if one process terminates with an error, all other processes will also be terminated.
  • pw.xpacks.llm.vector_store.VectorStoreServer is being deprecated, and it is now subclass of pw.xpacks.llm.document_store.DocumentStore. Public API is being kept the same, however users are encouraged to switch to using DocumentStore from now on.
  • pw.xpacks.llm.vector_store.VectorStoreClient is being deprecated in favor of pw.xpacks.llm.document_store.DocumentStoreClient.
  • pw.io.deltalake.write can now maintain the target table's snapshot on the output.

v0.21.5

09 May 07:50
Compare
Choose a tag to compare

Changed

  • pw.io.deltalake.read now processes Delta table version updates atomically, applying all changes together in a single minibatch.
  • The panel widget for table visualization now has a horizontal scroll bar for large tables.
  • Added the possibility to return value from any column from pw.reducers.argmax and pw.reducers.argmin, not only id.

Fixed

  • pw.reducers.argmax and pw.reducers.argmin work correctly with the result of pw.Table.windowby.

v0.21.4

24 Apr 14:37
Compare
Choose a tag to compare

Added

  • pw.io.kafka.read and pw.io.redpanda.read now support static mode.

Changed

  • The inactivity_detection function is now a method for append only tables. It no longer relies on an event timestamp column but now uses table processing times to detect inactivity periods.