Skip to content

Releases: pathwaycom/pathway

v0.21.3

24 Apr 14:37
Compare
Choose a tag to compare

Fixed

  • The performance of input connectors is optimized in certain cases.
  • The panel widget for table visualization does now a better formatting for timestamps and missing values. The pagination was also updated to better fit the widget and the default sorters in snapshot mode have been fixed.

v0.21.2

10 Apr 07:28
Compare
Choose a tag to compare

Added

  • Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with pw.io.register_input_synchronization_group.
  • pw.io.register_input_synchronization_group now supports the following types of columns: pw.DateTimeUtc, pw.DateTimeNaive, pw.DateTimeDuration, and int.

Changed

  • Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.

Fixed

  • Bugfix for problem with list_documents() when no documents present in store.
  • The append-only property of tables created by pw.io.kafka.read is now set correctly.

v0.21.1

28 Mar 11:39
Compare
Choose a tag to compare

Changed

  • Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
  • New flag return_status for inputs_query method in pw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.

v0.21.0

19 Mar 13:46
Compare
Choose a tag to compare

Added

  • All Pathway types can now be serialized to CSV using pw.io.csv.write and deserialized back using pw.io.csv.read.
  • pw.io.csv.read now parses null-values in data when it can be done unambiguously.

Changed

  • BREAKING: Updated endpoints in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:
    • Deprecated: /v1/pw_list_documents, /v1/pw_ai_answer
    • New: /v2/list_documents, /v2/answer
  • RAG methods under the pw.xpacks.llm.question_answering.RAGClient are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.
    • pw_ai_summary -> summarize
    • pw_ai_answer -> answer
    • pw_list_documents -> list_documents
  • When pw.io.deltalake.write creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table with pw.io.deltalake.read if no schema is specified.
  • The schema parameter is now optional for pw.io.deltalake.read. If the table was created by Pathway and the schema was not specified by user, it is read from the table metadata.
  • pw.io.deltalake.write now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.
  • BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used.
  • BREAKING: The Duration type is now serialized and deserialized as a number of nanoseconds when the CSV format is used.
  • BREAKING: The tuple and np.ndarray types are now serialized and deserialized as their JSON representations when the CSV format is used.

Fixed

  • pw.io.csv.write now correctly escapes quote characters.

v0.20.1

07 Mar 08:18
Compare
Choose a tag to compare

Added

  • Added RecursiveSplitter
  • pw.io.deltalake.write now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.

v0.20.0

25 Feb 08:10
Compare
Choose a tag to compare

[0.20.0] - 2025-02-25

Added

  • Added structure-aware chunking for DoclingParser.
  • Added table_parsing_strategy for DoclingParser.
  • Column expressions as_int(), as_float(), as_str(), and as_bool() now accept additional arguments, unwrap and default, to simplify null handling.
  • Support for python tuples in expressions.

Changed

  • BREAKING: Changed the argument in DoclingParser from parse_images (bool) into image_parsing_strategy (Literal["llm"] | None).
  • BREAKING: doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore now longer accepts pw.UDF.
  • Better error messages when using pathway spawn with multiple workers. Now error messages are printed only from the worker experiencing the error directly.

Fixed

  • doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore had no effect. This is now fixed.

v0.19.0

20 Feb 13:12
Compare
Choose a tag to compare

Added

  • LLMReranker now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.
  • pw.io.kafka.write and pw.io.nats.write now support ColumnReference as a topic name. When a ColumnReference is provided, each message's topic is determined by the corresponding column value.
  • pw.io.python.write accepting ConnectorObserver as an alternative to pw.io.subscribe.
  • pw.io.iceberg.read and pw.io.iceberg.write now support S3 as data backend and AWS Glue catalog implementations.
  • All output connectors now support the sort_by field for ordering output within a single minibatch.
  • A new UDF executor pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time.
  • A Future data type to represent results of fully asynchronous UDFs.
  • pw.Table.await_futures method to wait for results of fully asynchronous UDFs.
  • pw.io.deltalake.write now supports partition columns specification.

Changed

  • BREAKING: Changed the interface of LLMReranker, the use_logit_bias, cache_strategy, retry_strategy and kwargs arguments are no longer supported.
  • BREAKING: LLMReranker no longer inherits from pw.UDF
  • BREAKING: pw.stdlib.utils.AsyncTransformer.output_table now returns a table with columns with Future data type.
  • pw.io.deltalake.read can now read append-only tables without requiring explicit specification of primary key fields.

v0.18.0

07 Feb 16:10
Compare
Choose a tag to compare

Added

  • pw.io.postgres.write and pw.io.postgres.write_snapshot now handle serialization of PyObjectWrapper and Timedelta properly.
  • New chunking options in pathway.xpacks.llm.parsers.UnstructuredParser
  • Now all Pathway types can be serialized into JSON and consistently deserialized back.
  • table.col.dt.to_duration converting an integer into a pw.Duration.
  • pw.Json now supports storing datetime and duration type values in ISO format.

Changed

  • BREAKING: Changed the interface of UnstructuredParser
  • BREAKING: The Pointer type is now serialized and deserialized as a string field in Iceberg and Delta Lake.
  • BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents.
  • BREAKING: The Array type is now serialized and deserialized as an object with two fields: shape denoting the shape of the stored multi-dimensional array and elements denoting the elements of the flattened array.
  • BREAKING: Marked package as py.typed to indicate support for type hints.

Removed

  • BREAKING: Removed undocumented license_key argument from pw.run and pw.run_all methods. Instead, pw.set_license_key should be used.

v0.17.0

31 Jan 12:07
Compare
Choose a tag to compare

Added

  • pw.io.iceberg.read method for reading Apache Iceberg tables into Pathway.
  • methods pw.io.postgres.write and pw.io.postgres.write_snapshot now accept an additional argument init_mode, which allows initializing the table before writing.
  • pw.io.deltalake.read now supports serialization and deserialization for all Pathway data types.
  • New parser pathway.xpacks.llm.parsers.DoclingParser supporting parsing of pdfs with tables and images.
  • Output connectors now include an optional name parameter. If provided, this name will appear in logs and monitoring dashboards.
  • Automatic naming for input and output connectors has been enhanced.

Changed

  • BREAKING: pw.io.deltalake.read now requires explicit specification of primary key fields.
  • BREAKING: pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now returns a dictionary from pw_ai_answer endpoint.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows optionally returning context documents from pw_ai_answer endpoint.
  • BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
  • BREAKING: The Pointer type is now serialized to Delta Tables as raw bytes.
  • pw.io.kafka.write now allows to specify key and headers for JSON and CSV data formats.
  • persistent_id parameter in connectors has been renamed to name. This new name parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
  • Changed names of parsers to be more consistent: ParseUnstrutured -> UnstructuredParser, ParseUtf8 -> Utf8Parser. ParseUnstrutured and ParseUtf8 are now deprecated.

Fixed

  • generate_class method in Schema now correctly renders columns of UnionType and None types.
  • a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
  • pw.io.postgres.write_snapshot now correctly handles tables that only have primary key columns.

Removed

  • BREAKING: pw.indexing.build_sorted_index, pw.indexing.retrieve_prev_next_values, pw.indexing.sort_from_index and pw.indexing.SortedIndex are removed. Sorting is now done with pw.Table.sort.
  • BREAKING: Removed deprecated methods pw.Table.unsafe_promise_same_universe_as, pw.Table.unsafe_promise_universes_are_pairwise_disjoint, pw.Table.unsafe_promise_universe_is_subset_of, pw.Table.left_join, pw.Table.right_join, pw.Table.outer_join, pw.stdlib.utils.AsyncTransformer.result.
  • BREAKING: Removed deprecated column _pw_shard in the result of windowby.
  • BREAKING: Removed deprecated functions pw.debug.parse_to_table, pw.udf_async, pw.reducers.npsum, pw.reducers.int_sum, pw.stdlib.utils.col.flatten_column.
  • BREAKING: Removed deprecated module pw.asynchronous.
  • BREAKING: Removed deprecated access to functions from pw.io in pw.
  • BREAKING: Removed deprecated classes pw.UDFSync, pw.UDFAsync.
  • BREAKING: Removed class pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced with pw.xpack.llm.parsers.DoclingParser.
  • BREAKING: Removed deprecated arguments from input connectors: value_columns, primary_key, types, default_values. Schema should be used instead.

v0.16.4

09 Jan 15:14
Compare
Choose a tag to compare

Fixed

  • Google Drive connector in static mode now correctly displays in jupyter visualizations.