Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.21.3
Fixed
- The performance of input connectors is optimized in certain cases.
- The panel widget for table visualization does now a better formatting for timestamps and missing values. The pagination was also updated to better fit the widget and the default sorters in snapshot mode have been fixed.
v0.21.2
Added
- Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with
pw.io.register_input_synchronization_group
. pw.io.register_input_synchronization_group
now supports the following types of columns:pw.DateTimeUtc
,pw.DateTimeNaive
,pw.DateTimeDuration
, andint
.
Changed
- Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.
Fixed
- Bugfix for problem with list_documents() when no documents present in store.
- The append-only property of tables created by
pw.io.kafka.read
is now set correctly.
v0.21.1
Changed
- Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
- New flag
return_status
forinputs_query
method inpw.xpacks.llm.DocumentStore
. If set to True, DocumentStore returns the status of indexing for each file.
v0.21.0
Added
- All Pathway types can now be serialized to CSV using
pw.io.csv.write
and deserialized back usingpw.io.csv.read
. pw.io.csv.read
now parses null-values in data when it can be done unambiguously.
Changed
- BREAKING: Updated endpoints in
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
:- Deprecated:
/v1/pw_list_documents
,/v1/pw_ai_answer
- New:
/v2/list_documents
,/v2/answer
- Deprecated:
- RAG methods under the
pw.xpacks.llm.question_answering.RAGClient
are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.pw_ai_summary
->summarize
pw_ai_answer
->answer
pw_list_documents
->list_documents
- When
pw.io.deltalake.write
creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table withpw.io.deltalake.read
if noschema
is specified. - The
schema
parameter is now optional forpw.io.deltalake.read
. If the table was created by Pathway and theschema
was not specified by user, it is read from the table metadata. pw.io.deltalake.write
now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.- BREAKING: The
Bytes
type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used. - BREAKING: The
Duration
type is now serialized and deserialized as a number of nanoseconds when the CSV format is used. - BREAKING: The
tuple
andnp.ndarray
types are now serialized and deserialized as their JSON representations when the CSV format is used.
Fixed
pw.io.csv.write
now correctly escapes quote characters.
v0.20.1
Added
- Added
RecursiveSplitter
pw.io.deltalake.write
now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.
v0.20.0
[0.20.0] - 2025-02-25
Added
- Added structure-aware chunking for
DoclingParser
. - Added
table_parsing_strategy
forDoclingParser
. - Column expressions
as_int()
,as_float()
,as_str()
, andas_bool()
now accept additional arguments,unwrap
anddefault
, to simplify null handling. - Support for python tuples in expressions.
Changed
- BREAKING: Changed the argument in
DoclingParser
fromparse_images
(bool) intoimage_parsing_strategy
(Literal["llm"] | None). - BREAKING:
doc_post_processors
argument in thepw.xpacks.llm.document_store.DocumentStore
now longer acceptspw.UDF
. - Better error messages when using
pathway spawn
with multiple workers. Now error messages are printed only from the worker experiencing the error directly.
Fixed
doc_post_processors
argument in thepw.xpacks.llm.document_store.DocumentStore
had no effect. This is now fixed.
v0.19.0
Added
LLMReranker
now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.pw.io.kafka.write
andpw.io.nats.write
now supportColumnReference
as a topic name. When aColumnReference
is provided, each message's topic is determined by the corresponding column value.pw.io.python.write
acceptingConnectorObserver
as an alternative topw.io.subscribe
.pw.io.iceberg.read
andpw.io.iceberg.write
now support S3 as data backend and AWS Glue catalog implementations.- All output connectors now support the
sort_by
field for ordering output within a single minibatch. - A new UDF executor
pw.udfs.fully_async_executor
. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time. - A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futures
method to wait for results of fully asynchronous UDFs.pw.io.deltalake.write
now supports partition columns specification.
Changed
- BREAKING: Changed the interface of
LLMReranker
, theuse_logit_bias
,cache_strategy
,retry_strategy
andkwargs
arguments are no longer supported. - BREAKING: LLMReranker no longer inherits from pw.UDF
- BREAKING:
pw.stdlib.utils.AsyncTransformer.output_table
now returns a table with columns with Future data type. pw.io.deltalake.read
can now read append-only tables without requiring explicit specification of primary key fields.
v0.18.0
Added
pw.io.postgres.write
andpw.io.postgres.write_snapshot
now handle serialization ofPyObjectWrapper
andTimedelta
properly.- New chunking options in
pathway.xpacks.llm.parsers.UnstructuredParser
- Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_duration
converting an integer into apw.Duration
.pw.Json
now supports storing datetime and duration type values in ISO format.
Changed
- BREAKING: Changed the interface of
UnstructuredParser
- BREAKING: The
Pointer
type is now serialized and deserialized as a string field in Iceberg and Delta Lake. - BREAKING: The
Bytes
type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents. - BREAKING: The
Array
type is now serialized and deserialized as an object with two fields:shape
denoting the shape of the stored multi-dimensional array andelements
denoting the elements of the flattened array. - BREAKING: Marked package as py.typed to indicate support for type hints.
Removed
- BREAKING: Removed undocumented
license_key
argument frompw.run
andpw.run_all
methods. Instead,pw.set_license_key
should be used.
v0.17.0
Added
pw.io.iceberg.read
method for reading Apache Iceberg tables into Pathway.- methods
pw.io.postgres.write
andpw.io.postgres.write_snapshot
now accept an additional argumentinit_mode
, which allows initializing the table before writing. pw.io.deltalake.read
now supports serialization and deserialization for all Pathway data types.- New parser
pathway.xpacks.llm.parsers.DoclingParser
supporting parsing of pdfs with tables and images. - Output connectors now include an optional
name
parameter. If provided, this name will appear in logs and monitoring dashboards. - Automatic naming for input and output connectors has been enhanced.
Changed
- BREAKING:
pw.io.deltalake.read
now requires explicit specification of primary key fields. - BREAKING:
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
now returns a dictionary frompw_ai_answer
endpoint. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
allows optionally returning context documents frompw_ai_answer
endpoint.- BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
- BREAKING: The
Pointer
type is now serialized to Delta Tables as raw bytes. pw.io.kafka.write
now allows to specifykey
andheaders
for JSON and CSV data formats.persistent_id
parameter in connectors has been renamed toname
. This newname
parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.- Changed names of parsers to be more consistent:
ParseUnstrutured
->UnstructuredParser
,ParseUtf8
->Utf8Parser
.ParseUnstrutured
andParseUtf8
are now deprecated.
Fixed
generate_class
method inSchema
now correctly renders columns ofUnionType
andNone
types.- a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshot
now correctly handles tables that only have primary key columns.
Removed
- BREAKING:
pw.indexing.build_sorted_index
,pw.indexing.retrieve_prev_next_values
,pw.indexing.sort_from_index
andpw.indexing.SortedIndex
are removed. Sorting is now done withpw.Table.sort
. - BREAKING: Removed deprecated methods
pw.Table.unsafe_promise_same_universe_as
,pw.Table.unsafe_promise_universes_are_pairwise_disjoint
,pw.Table.unsafe_promise_universe_is_subset_of
,pw.Table.left_join
,pw.Table.right_join
,pw.Table.outer_join
,pw.stdlib.utils.AsyncTransformer.result
. - BREAKING: Removed deprecated column
_pw_shard
in the result ofwindowby
. - BREAKING: Removed deprecated functions
pw.debug.parse_to_table
,pw.udf_async
,pw.reducers.npsum
,pw.reducers.int_sum
,pw.stdlib.utils.col.flatten_column
. - BREAKING: Removed deprecated module
pw.asynchronous
. - BREAKING: Removed deprecated access to functions from
pw.io
inpw
. - BREAKING: Removed deprecated classes
pw.UDFSync
,pw.UDFAsync
. - BREAKING: Removed class
pw.xpack.llm.parsers.OpenParse
. It's functionality has been replaced withpw.xpack.llm.parsers.DoclingParser
. - BREAKING: Removed deprecated arguments from input connectors:
value_columns
,primary_key
,types
,default_values
. Schema should be used instead.
v0.16.4
Fixed
- Google Drive connector in static mode now correctly displays in jupyter visualizations.