Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628

xwwwx · 2025-04-22T05:23:05Z

🔧 Summary
This PR refactors the ClickHouse bulk insert mechanism to leverage Memory<object> and IMemoryOwner<Memory<object>>, replacing the previous object[][]-based implementation.

🧾 Key Changes

Batch.Rows
- Changed from object[][] to IMemoryOwner<Memory<object>>
- Updated Dispose() to properly release owned memory
WriteToServerAsync
- Added support for IEnumerable<Memory<object>>
- Existing overloads (object[], DataTable, etc.) now internally convert to Memory<object>
IntoBatches
- Refactored to use MemoryPool<T> instead of ArrayPool<T>
- Returns IMemoryOwner<T> for improved memory safety
Row Serializers (IRowSerializer, RowBinarySerializer, RowBinaryWithDefaultsSerializer)
- Refactored to use Span<object> instead of object[]
- Enables zero-allocation serialization where possible

✅ Benefits

Improved Flexibility
Allows upstream systems to pass reusable or pooled Memory<object> buffers directly.
Better Memory Management
Controlled lifetime via IMemoryOwner<T>, reduced GC pressure, and support for memory pooling.
Zero Allocation Serialization
Using Span<object> avoids intermediate heap allocations when writing rows.
Backward Compatibility Maintained
Existing interfaces remain usable; new overloads provide better performance paths.

📌 Motivation
This refactor lays the foundation for more scalable, high-performance bulk insert scenarios. It will enable cleaner integration with upstream systems that use buffer pooling or stream processing architectures.

…nd Efficiency

codecov · 2025-04-22T05:25:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.56%. Comparing base (0d4b060) to head (722904a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #628      +/-   ##
==========================================
+ Coverage   82.18%   82.56%   +0.38%     
==========================================
  Files         103      103              
  Lines        2245     2243       -2     
  Branches      340      339       -1     
==========================================
+ Hits         1845     1852       +7     
+ Misses        275      264      -11     
- Partials      125      127       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DarkWanderer · 2025-04-22T18:08:46Z

Hi,

Thank you for your PR, I appreciate the intent to help with optimization. However, according to benchmark it doesn't quite improve performance and in fact makes it slightly worse:

With ArrayPool:

Method	Count	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
BulkInsertInt32	100000	32.02 ms	0.503 ms	0.618 ms	812.5000	812.5000	812.5000	8.14 MB

With MemoryPool:

Method	Count	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
BulkInsertInt32	100000	33.02 ms	0.636 ms	1.285 ms	906.2500	906.2500	906.2500	8.27 MB

The primary cause for that seems to be that MemoryPool is actually backed by ArrayPool internally: https://stackoverflow.com/a/61859719/1732138

But a good attempt nevertheless

xwwwx · 2025-04-23T03:13:49Z

Thank you for the feedback!

The main motivation behind this change is not to improve performance per se, but to improve flexibility for upstream callers.

In our use case, we use a pooled object[] array and slice it into variable-sized segments for bulk insert. Since the segment size isn't fixed, using Memory allows us to safely represent those slices without allocating new arrays. This helps us reuse memory buffers efficiently across multiple insert operations.

I agree that Memory may introduce some overhead compared to object[][], and it's definitely not zero-allocation. However, for us the tradeoff in allocation vs. reusability and composability is worthwhile.

That said, I’m open to suggestions on how we can better support this kind of usage while minimizing the impact on performance and the library’s core design.

xwwwx · 2025-04-23T03:29:28Z

I've reverted back to ArrayPool as suggested.
Please help check if this improves the performance. 🙇‍♂️

Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility a…

ac6b1c7

…nd Efficiency

xwwwx requested a review from DarkWanderer as a code owner April 22, 2025 05:23

replace MemoryPool by ArrayPool

8d5e37f

xwwwx force-pushed the main branch from 94efdbd to 8d5e37f Compare April 23, 2025 03:31

Revert testUtilites

f67dfc2

xwwwx force-pushed the main branch from ad1c9c2 to f67dfc2 Compare April 23, 2025 03:59

xwwwx added 3 commits April 23, 2025 13:14

ShouldInsertDataReader/ShouldInsertDataTable

983e0a3

replace await using by using

554b1fc

Merge branch 'DarkWanderer:main' into main

722904a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628

Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628

Uh oh!

xwwwx commented Apr 22, 2025

Uh oh!

codecov bot commented Apr 22, 2025 •

edited

Loading

Uh oh!

DarkWanderer commented Apr 22, 2025

Uh oh!

xwwwx commented Apr 23, 2025

Uh oh!

xwwwx commented Apr 23, 2025

Uh oh!

Uh oh!

Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628

Are you sure you want to change the base?

Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628

Uh oh!

Conversation

xwwwx commented Apr 22, 2025

Uh oh!

codecov bot commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DarkWanderer commented Apr 22, 2025

Uh oh!

xwwwx commented Apr 23, 2025

Uh oh!

xwwwx commented Apr 23, 2025

Uh oh!

Uh oh!

codecov bot commented Apr 22, 2025 •

edited

Loading