Skip to content

Conversation

chenyangfb
Copy link
Contributor

@chenyangfb chenyangfb commented Sep 24, 2024

Description

Currently the minimal buffer size of ChunkedSliceOutput is 8kb, this leads to lots of wasted memory with large number of output buffer, each with a few bytes. For example, with 10k output stream with 10 bytes, the total size of ChunkedSliceOutput is 80MB while the logical size is 100kb.

This PR reduce wasted memory in ChunkedSliceOutput by allowing smaller buffer size in ChunkedSliceOutput. The default value is still 8kb.
This PR also fixed a bug in ChunkedSliceOutput (line 374) which leads to actual initial allocation (8kB) been 2x of the initial size(4kB).

Impact

Improve memory usage and reduce wasted memory

Test Plan

Tested with Spark workload writing flat map column.

Control use 8kB chunk buffer size while test use 1kB chunk buffer.
Total memory usage from ChunkedSliceOutput reduced from 2GB to 700MB and memory usage per object reduced from ~8kB to ~2kB.

Before
Screenshot 2024-09-24 at 2 40 09 PM
After
Screenshot 2024-09-24 at 2 40 24 PM

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

General change
Allow setting chunk size in ChunkedSliceOutput, it can reduce memory in the cases of large number of small output buffer

@chenyangfb chenyangfb force-pushed the orc_output_buffer_chunk_size branch 2 times, most recently from 530f9ee to 2c1426b Compare September 24, 2024 00:54
@chenyangfb chenyangfb marked this pull request as ready for review September 24, 2024 21:47
@chenyangfb chenyangfb requested review from sdruzkin and a team as code owners September 24, 2024 21:47
@sdruzkin
Copy link
Collaborator

sdruzkin commented Sep 25, 2024

  1. How does it impact writer performance? Smaller initial chunks will result in additional 2-3 mem copy operations.
  2. The change you made will result in smaller initial size, but what would happen after a few batches are fed into the writer? In my understanding there will be no impact on reducing the memory consumption, except the perf regression because of additional mem copy.
  3. Won't it make more sense to set the initial size to 0, but then grow to 4k at the first write?

Please update the PR Release Notes.

@chenyangfb chenyangfb force-pushed the orc_output_buffer_chunk_size branch from 2c1426b to b4985dc Compare September 25, 2024 06:11
@chenyangfb
Copy link
Contributor Author

chenyangfb commented Sep 25, 2024

Thanks @sdruzkin
This diff added support for setting chunk size, but the default size is the same as before (8kB).
Smaller initial chunks such as 1kB is aimed for cases with large number of output stream such as flat map with 10k output stream and typical bytes per output stream range from 10 bytes to a few kB.

So for those cases, since the typical bytes per output stream is small (but not zero), the writer performance impact is minimal but the memory saving is big and set the initial size to 0 won't reduce memory.

I have a few other changes for setting the config in Spark and propagate it to presto-orc, not sure if I should share it here.

I plan to do more Spark shadow testing to measure the performance impact with different chunk size.

buffer = new byte[currentSize];
currentSize = min(multiplyExact(currentSize, 2), maxChunkSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this unnecessary change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous behaviour is double the currentSize before allocation which mean even through the initial size was configured at 4kb, the actual initial allocation is at 8kb.
I think this is a bug, and this PR double the currentSize after allocation so the initial allocation is based on minOutputBufferChunkSize

@@ -53,10 +53,9 @@ public class OrcOutputBuffer
private static final int INSTANCE_SIZE = ClassLayout.parseClass(OrcOutputBuffer.class).instanceSize();
private static final int PAGE_HEADER_SIZE = 3; // ORC spec 3 byte header
private static final int INITIAL_BUFFER_SIZE = 256;
private static final int MINIMUM_OUTPUT_BUFFER_CHUNK_SIZE = 4 * 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. You are claiming that the old min chunk size was 8kb, but here in the Orc compression output slice it's 4kb. Can you please clarify this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more context in Summary and the comment on ChunkedSliceOutput (line 374).
The short answer is previously the actual initial allocation (8kB) is 2x of the initial size(4kB) due to a bug.

@sdruzkin
Copy link
Collaborator

LGTM.
Can you do a perf testing on a 200-400mb file with flat maps before and after?
You can get some real data like this P1614619553

@sdruzkin sdruzkin merged commit ffb7643 into prestodb:master Sep 27, 2024
56 checks passed
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@jaystarshot
Copy link
Member

jaystarshot commented Nov 8, 2024

@chenyangfb @sdruzkin Can you please rewrite this release note so that the users can know how toggle this? Also can you please add which operator can befit from this?

* Add configurable chunk size in ChunkedSliceOutput, reducing total memory usage by nearly 3x and per-object memory by 4x with a 1kB chunk size (default 8kB).

@chenyangfb
Copy link
Contributor Author

Hi @jaystarshot,
sure. I can rewrite it. How do I do that? Any instructions I can take look?

@jaystarshot
Copy link
Member

You can just rewrite and comment here itself and I will pick that for release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants