-
Notifications
You must be signed in to change notification settings - Fork 91
Refactor Bulk Insert to Use Memory<object> for Enhanced Flexibility and Efficiency #628
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #628 +/- ##
==========================================
+ Coverage 82.18% 82.56% +0.38%
==========================================
Files 103 103
Lines 2245 2243 -2
Branches 340 339 -1
==========================================
+ Hits 1845 1852 +7
+ Misses 275 264 -11
- Partials 125 127 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Hi, Thank you for your PR, I appreciate the intent to help with optimization. However, according to benchmark it doesn't quite improve performance and in fact makes it slightly worse: With
With
The primary cause for that seems to be that But a good attempt nevertheless |
Thank you for the feedback! The main motivation behind this change is not to improve performance per se, but to improve flexibility for upstream callers. In our use case, we use a pooled object[] array and slice it into variable-sized segments for bulk insert. Since the segment size isn't fixed, using Memory allows us to safely represent those slices without allocating new arrays. This helps us reuse memory buffers efficiently across multiple insert operations. I agree that Memory may introduce some overhead compared to object[][], and it's definitely not zero-allocation. However, for us the tradeoff in allocation vs. reusability and composability is worthwhile. That said, I’m open to suggestions on how we can better support this kind of usage while minimizing the impact on performance and the library’s core design. |
I've reverted back to ArrayPool as suggested. |
🔧 Summary
This PR refactors the ClickHouse bulk insert mechanism to leverage
Memory<object>
andIMemoryOwner<Memory<object>>
, replacing the previousobject[][]
-based implementation.🧾 Key Changes
Batch.Rows
Changed from
object[][]
toIMemoryOwner<Memory<object>>
Updated Dispose() to properly release owned memory
WriteToServerAsync
Added support for
IEnumerable<Memory<object>>
Existing overloads (
object[]
,DataTable
, etc.) now internally convert toMemory<object>
IntoBatches
Refactored to use
MemoryPool<T>
instead ofArrayPool<T>
Returns
IMemoryOwner<T>
for improved memory safetyRow Serializers (
IRowSerializer
,RowBinarySerializer
,RowBinaryWithDefaultsSerializer
)Refactored to use
Span<object>
instead ofobject[]
Enables zero-allocation serialization where possible
✅ Benefits
Improved Flexibility
Allows upstream systems to pass reusable or pooled
Memory<object>
buffers directly.Better Memory Management
Controlled lifetime via
IMemoryOwner<T>
, reduced GC pressure, and support for memory pooling.Zero Allocation Serialization
Using
Span<object>
avoids intermediate heap allocations when writing rows.Backward Compatibility Maintained
Existing interfaces remain usable; new overloads provide better performance paths.
📌 Motivation
This refactor lays the foundation for more scalable, high-performance bulk insert scenarios. It will enable cleaner integration with upstream systems that use buffer pooling or stream processing architectures.