-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Reduce wasted memory in ChunkedSliceOutput #23707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce wasted memory in ChunkedSliceOutput #23707
Conversation
530f9ee
to
2c1426b
Compare
Please update the PR Release Notes. |
2c1426b
to
b4985dc
Compare
Thanks @sdruzkin So for those cases, since the typical bytes per output stream is small (but not zero), the writer performance impact is minimal but the memory saving is big and set the initial size to 0 won't reduce memory. I have a few other changes for setting the config in Spark and propagate it to presto-orc, not sure if I should share it here. I plan to do more Spark shadow testing to measure the performance impact with different chunk size. |
buffer = new byte[currentSize]; | ||
currentSize = min(multiplyExact(currentSize, 2), maxChunkSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous behaviour is double the currentSize before allocation which mean even through the initial size was configured at 4kb, the actual initial allocation is at 8kb.
I think this is a bug, and this PR double the currentSize after allocation so the initial allocation is based on minOutputBufferChunkSize
@@ -53,10 +53,9 @@ public class OrcOutputBuffer | |||
private static final int INSTANCE_SIZE = ClassLayout.parseClass(OrcOutputBuffer.class).instanceSize(); | |||
private static final int PAGE_HEADER_SIZE = 3; // ORC spec 3 byte header | |||
private static final int INITIAL_BUFFER_SIZE = 256; | |||
private static final int MINIMUM_OUTPUT_BUFFER_CHUNK_SIZE = 4 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. You are claiming that the old min chunk size was 8kb, but here in the Orc compression output slice it's 4kb. Can you please clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more context in Summary and the comment on ChunkedSliceOutput (line 374).
The short answer is previously the actual initial allocation (8kB) is 2x of the initial size(4kB) due to a bug.
LGTM. |
@chenyangfb @sdruzkin Can you please rewrite this release note so that the users can know how toggle this? Also can you please add which operator can befit from this?
|
Hi @jaystarshot, |
You can just rewrite and comment here itself and I will pick that for release notes. |
Description
Currently the minimal buffer size of ChunkedSliceOutput is 8kb, this leads to lots of wasted memory with large number of output buffer, each with a few bytes. For example, with 10k output stream with 10 bytes, the total size of ChunkedSliceOutput is 80MB while the logical size is 100kb.
This PR reduce wasted memory in ChunkedSliceOutput by allowing smaller buffer size in ChunkedSliceOutput. The default value is still 8kb.
This PR also fixed a bug in ChunkedSliceOutput (line 374) which leads to actual initial allocation (8kB) been 2x of the initial size(4kB).
Impact
Improve memory usage and reduce wasted memory
Test Plan
Tested with Spark workload writing flat map column.
Control use 8kB chunk buffer size while test use 1kB chunk buffer.
Total memory usage from ChunkedSliceOutput reduced from 2GB to 700MB and memory usage per object reduced from ~8kB to ~2kB.
Before


After
Contributor checklist
Release Notes
General change
Allow setting chunk size in ChunkedSliceOutput, it can reduce memory in the cases of large number of small output buffer