Skip to content

Commit a869f6d

Browse files
committed
Merge branch 'develop' into 2652-gpu-data-compaction
2 parents 7677b34 + fde0811 commit a869f6d

File tree

336 files changed

+5403
-2770
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

336 files changed

+5403
-2770
lines changed

.github/workflows/chunk.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ jobs:
2424
with:
2525
java-version: '17'
2626
distribution: 'corretto'
27-
- uses: dtolnay/rust-toolchain@1.79.0
27+
- uses: dtolnay/rust-toolchain@stable
2828
if: ${{ ! inputs.skipRust }}
29-
- uses: mozilla-actions/sccache-action@v0.0.5
29+
- uses: mozilla-actions/sccache-action@v0.0.7
3030
if: ${{ ! inputs.skipRust }}
3131
- name: Install cargo cross
3232
run: cargo install cross

.github/workflows/java-status.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,14 @@ jobs:
3737
mvn compile exec:java -q -e -Dexec.mainClass=sleeper.build.chunks.ValidateProjectChunks \
3838
-Dmaven.repo.local=${{ runner.temp }}/.m2/repository \
3939
-Dexec.args="$CHUNKS_YAML $MAVEN_PROJECT"
40+
- name: Check notices file
41+
working-directory: ./java/build
42+
run: |
43+
NOTICES_FILE=${{ github.workspace }}/NOTICES
44+
MAVEN_PROJECT=${{ github.workspace }}/java
45+
mvn compile exec:java -q -e -Dexec.mainClass=sleeper.build.notices.CheckNotices \
46+
-Dmaven.repo.local=${{ runner.temp }}/.m2/repository \
47+
-Dexec.args="$NOTICES_FILE $MAVEN_PROJECT"
4048
- name: Validate properties templates are up to date
4149
working-directory: ./java
4250
run: |

.github/workflows/rust-audit.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ jobs:
1919

2020
steps:
2121
- uses: actions/checkout@v3
22-
- uses: dtolnay/rust-toolchain@1.79.0
23-
- uses: mozilla-actions/sccache-action@v0.0.5
22+
- uses: dtolnay/rust-toolchain@stable
23+
- uses: mozilla-actions/sccache-action@v0.0.7
2424
- name: Install cargo audit
2525
run: cargo install cargo-audit
2626
- name: Audit

.github/workflows/rust-cache.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ jobs:
1818

1919
steps:
2020
- uses: actions/checkout@v3
21-
- uses: dtolnay/rust-toolchain@1.79.0
22-
- uses: mozilla-actions/sccache-action@v0.0.5
21+
- uses: dtolnay/rust-toolchain@stable
22+
- uses: mozilla-actions/sccache-action@v0.0.7
2323
- name: Install cargo cross
2424
run: cargo install cross
2525
- name: Build x86_64

.github/workflows/rust-lint.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ jobs:
1515

1616
steps:
1717
- uses: actions/checkout@v3
18-
- uses: dtolnay/rust-toolchain@1.79.0
18+
- uses: dtolnay/rust-toolchain@stable
1919
with:
2020
components: clippy,rustfmt
21-
- uses: mozilla-actions/sccache-action@v0.0.5
21+
- uses: mozilla-actions/sccache-action@v0.0.7
2222
- name: Check formatting
2323
working-directory: ./rust
2424
run: cargo fmt --all -- --check

.github/workflows/rust-tests.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ jobs:
1515

1616
steps:
1717
- uses: actions/checkout@v3
18-
- uses: dtolnay/rust-toolchain@1.79.0
19-
- uses: mozilla-actions/sccache-action@v0.0.5
18+
- uses: dtolnay/rust-toolchain@stable
19+
- uses: mozilla-actions/sccache-action@v0.0.7
2020
- name: Rust tests
2121
run: cargo test
2222
working-directory: ./rust

CHANGELOG.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,54 @@ Releases
44
This page documents the releases of Sleeper. Performance figures for each release
55
are available [here](docs/13-system-tests.md#performance-benchmarks)
66

7+
## Version 0.27.0
8+
9+
*Note: this release contains breaking changes. It is not possible to upgrade from a previous version of Sleeper
10+
to version 0.27.0*
11+
12+
This includes batching to allow for much larger numbers of compaction jobs.
13+
14+
Upgrades:
15+
- Upgraded Apache DataFusion to 43.0.0
16+
17+
Compaction:
18+
- Parallelised sending compaction jobs in batches across instances of a new lambda
19+
- Creation of batches is now separate from creation of individual jobs to be run on tasks
20+
- Can now create over 100,000 compaction jobs per scheduled invocation
21+
- Reduced duplication of unnecessary compaction runs by pre-validating jobs before sending
22+
- Added configuration of a limit per Sleeper table for the number of compaction jobs created per scheduled invocation
23+
- Improved estimates when scaling EC2 instances to run compaction tasks
24+
- Determines available CPU and memory based on EC2 instance type
25+
- Added a configurable overhead to avoid slowdown due to overprovisioning
26+
27+
Deployment:
28+
- Improved handling of cases where AWS account concurrency limit may be approached
29+
- Defaulted most lambdas to a maximum concurrency of 10 instances
30+
- Defaulted reserved concurrency for state store committer lambda to 10 instances
31+
- Increased default memory requirement for lambdas that work with Sleeper table state
32+
33+
Reporting:
34+
- Compaction jobs only show as created once they've been sent from a batch
35+
36+
Documentation:
37+
- Reorganised documentation of scripts & clients
38+
- Documented issue with compaction on LocalStack
39+
40+
Build:
41+
- Automated checking Maven dependencies are included in the NOTICES file
42+
43+
System tests:
44+
- Removed DynamoDB state store implementation from nightly system test suite
45+
- Some improvements to test isolation and preparation for concurrent execution
46+
- Enabled some scheduled rules in system tests that are normally enabled in a real system
47+
48+
Bugfixes:
49+
- Fixed connecting to an existing deployment with `sleeper environment add`, removed use of old CDK output format
50+
- Fixed deployment with reporting status stores disabled
51+
- Fixed compaction with DataFusion generating invalid sketches for integer and long type row keys
52+
- Fixed bulk import on EKS incorrectly using Netty Arrow implementation
53+
54+
755
## Version 0.26.0
856

957
This includes upgrades to Java, EMR, Spark and the AWS SDK, and improvements to the deployment process.

NOTICES

Lines changed: 45 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,6 @@
11
Sleeper is mainly written in Java and this is built using Maven. This process will automatically
22
pull in dependencies. We list Sleeper's dependencies, and their licenses below.
33

4-
AWS Athena SDK (com.amazonaws:aws-athena-federation-sdk:*)
5-
6-
- Apache License, Version 2.0
7-
84
AWS Java SDK (com.amazonaws:aws-java-sdk-*:1.*)
95

106
- Apache License, Version 2.0
@@ -25,6 +21,14 @@ AWS Lambda Java Events SDK (com.amazonaws:aws-lambda-java-events:3.*)
2521

2622
- Apache License, Version 2.0
2723

24+
AWS Athena SDK (com.amazonaws:aws-athena-federation-sdk:2023.*)
25+
26+
- Apache License, Version 2.0
27+
28+
Trino (io.trino:trino-*:390)
29+
30+
- Apache License, Version 2.0
31+
2832
WildFly OpenSSL (org.wildfly.openssl:wildfly-openssl:2.*)
2933

3034
- Apache License, Version 2.0
@@ -41,7 +45,7 @@ Snappy Java (org.xerial.snappy:snappy-java:1.*)
4145

4246
- Apache License, Version 2.0
4347

44-
Aircompressor (io.airlift:aircompressor:0.*)
48+
Aircompressor (io.airlift:aircompressor:2.*)
4549

4650
- Apache License, Version 2.0
4751

@@ -73,7 +77,7 @@ XZ (org.tukaani:xz:1.*)
7377

7478
- BSD Zero Clause License
7579

76-
OkHttp (com.squareup.okhttp3:okhttp:4.*)
80+
OkHttp (com.squareup.okhttp3:*:4.*)
7781

7882
- Apache License, Version 2.0
7983

@@ -85,21 +89,21 @@ Kotlin Stdlib (org.jetbrains.kotlin:*:2.*)
8589

8690
- Apache License, Version 2.0
8791

88-
Logback (ch.qos.logback:logback-core:1.*)
92+
Logback (ch.qos.logback:logback-*:1.*)
8993

9094
- Dual license (<https://logback.qos.ch/license.html>)
9195
- Eclipse Public License 1.0
9296
- GNU Lesser General Public License, version 2.1
9397

94-
Facebook Jcommon collections (com.facebook.jcommon:collections::0.*)
98+
Facebook Jcommon collections (com.facebook.jcommon:collections:0.*)
9599

96100
- Apache License, Version 2.0
97101

98102
Guava: Google Core Libraries For Java (com.google.guava:guava:33.*)
99103

100104
- Apache License, Version 2.0
101105

102-
Fasterxml Jackson (com.fasterxml.jackson.core:*:2.*, com.fasterxml.jackson.module:*:2.*)
106+
Fasterxml Jackson (com.fasterxml.jackson.*:jackson-*:2.*)
103107

104108
- Apache License, Version 2.0
105109

@@ -135,7 +139,11 @@ Apache Commons Net (commons-net:commons-net:3.*)
135139

136140
- Apache License, Version 2.0
137141

138-
Apache Commons Lang (org.apache.commons:commons-lang3:3.*)
142+
Apache Commons Lang (org.apache.commons:commons-lang3:3.*, commons-lang:commons-lang:2.*)
143+
144+
- Apache License, Version 2.0
145+
146+
Apache Commons FileUpload (commons-fileupload:commons-fileupload:1.*)
139147

140148
- Apache License, Version 2.0
141149

@@ -147,15 +155,15 @@ Apache Datasketches (org.apache.datasketches:datasketches-java:3.*)
147155

148156
- Apache License, Version 2.0
149157

150-
Apache Hadoop (org.apache.hadoop:hadoop-aws:3.*, org.apache.hadoop:hadoop-client:3.*)
158+
Apache Hadoop (org.apache.hadoop:hadoop-*:3.*)
151159

152160
- Apache License, Version 2.0
153161

154-
Apache Parquet (org.apache.parquet:parquet-hadoop:1.*)
162+
Apache Parquet (org.apache.parquet:parquet-*:1.*)
155163

156164
- Apache License, Version 2.0
157165

158-
Apache Spark (org.apache.spark:spark-sql_2.12:3.*)
166+
Apache Spark (org.apache.spark:spark-*_2.*:3.*)
159167

160168
- Apache License, Version 2.0
161169

@@ -175,7 +183,7 @@ Java Websocket (org.java-websocket:Java-WebSocket:1.*)
175183

176184
- MIT License
177185

178-
JUnit (org.junit.jupiter:junit-jupiter-*:5.*, org.junit.platform:junit-platform-suite:1.*)
186+
JUnit (org.junit.jupiter:junit-jupiter-*:5.*, org.junit.platform:junit-platform-suite:1.*, org.junit.vintage:junit-vintage-engine:5.* - Trino needs the vintage engine)
179187

180188
- Eclipse Public License 1.0
181189

@@ -191,7 +199,7 @@ AssertJ (org.assertj:assertj-core:3.*)
191199

192200
- Apache License, Version 2.0
193201

194-
JsonUnit (net.javacrumbs.json-unit:json-unit-assertj:3.*)
202+
JsonUnit (net.javacrumbs.json-unit:json-unit-assertj:4.*)
195203

196204
- Apache License, Version 2.0
197205

@@ -227,7 +235,7 @@ AWS Lambda Powertools (software.amazon.lambda:powertools-metrics:1.*)
227235

228236
- MIT License (<https://github.com/awslabs/aws-lambda-powertools-java/blob/master/LICENSE>)
229237

230-
Jersey Client (org.glassfish.jersey.core:jersey-client:2.*)
238+
Jersey Client (org.glassfish.jersey.core:jersey-client:2.*, org.glassfish.jersey.inject:jersey-hk2:2.*, org.glassfish.jersey.media:jersey-media-json-jackson:2.*)
231239

232240
- Eclipse Public License 2.0 (<https://projects.eclipse.org/projects/ee4j.jersey>)
233241

@@ -240,16 +248,12 @@ Netty (io.netty:*:4.*)
240248

241249
- Apache License, Version 2.0
242250

243-
Eclipse Jetty (org.eclipse.jetty:*:9.*)
251+
Eclipse Jetty (org.eclipse.jetty:*:9.*, org.eclipse.jetty.http2:*:9.*, org.eclipse.jetty.websocket:*:9.*)
244252

245253
- Dual license (<https://eclipse.dev/jetty/licenses.php>)
246254
- Apache License, Version 2.0
247255
- Eclipse Public License 1.0
248256

249-
Jettison (org.codehaus.jettison:jettison:1.*)
250-
251-
- Apache License, Version 2.0
252-
253257
JSON Path (com.jayway.jsonpath:json-path:2.*)
254258

255259
- Apache License, Version 2.0
@@ -270,11 +274,11 @@ JJWT (io.jsonwebtoken:*:0.*)
270274

271275
- Apache License, Version 2.0
272276

273-
Jungrapht Visualization (jungrapht) (com.github.tomnelson:jungrapht-visualization:1.*)
277+
Jungrapht Visualization (jungrapht) (com.github.tomnelson:jungrapht-*:1.*)
274278

275279
- The 3-Clause BSD License
276280

277-
JGraphT (org.jgrapht:*:1.*)
281+
JGraphT (org.jgrapht:*:1.*, org.jgrapht:jgrapht-core:0.* - Trino uses an older version)
278282

279283
- May be used under either of two licenses (<https://github.com/jgrapht/jgrapht>)
280284
- GNU Lesser General Public License (LGPL) 2.1
@@ -304,6 +308,18 @@ Dependency Check Maven Plugin (org.owasp:dependency-check-maven:11.*)
304308

305309
- Apache License, Version 2.0
306310

311+
SpotBugs Maven Plugin (com.github.spotbugs:spotbugs-*:4.*)
312+
313+
- GNU Lesser General Public License, version 3
314+
315+
Checkstyle (com.puppycrawl.tools:checkstyle:10.*)
316+
317+
- GNU Lesser General Public License, version 2.1
318+
319+
Maven Plugins (org.apache.maven.plugins:maven-*-plugin:3.*)
320+
321+
- Apache License, Version 2.0
322+
307323
The file java/sketches/src/main/java/sleeper/sketches/SketchSerialiser.java contains code that is heavily based
308324
on ArrayOfStringsSerDe from the Apache DataSketches library, licensed under the Apache License, Version 2.0.
309325

@@ -321,8 +337,6 @@ s3fs:
321337

322338
- The 3-Clause BSD License
323339

324-
325-
326340
Sleeper contains Rust code. This has the following dependencies.
327341

328342
Rust Object Store (<https://github.com/apache/arrow-rs/tree/master/object_store>)
@@ -457,26 +471,26 @@ jwgmeligmeyling/spotbugs-github-action (<https://github.com/marketplace/actions/
457471

458472
- MIT License
459473

460-
dtolnay/rust-toolchain (https://github.com/marketplace/actions/rustup-toolchain-install)
474+
dtolnay/rust-toolchain (<https://github.com/marketplace/actions/rustup-toolchain-install>)
461475

462476
- MIT License
463477

464-
mozilla-actions/sccache-action (https://github.com/marketplace/actions/sccache-action)
478+
mozilla-actions/sccache-action (<https://github.com/marketplace/actions/sccache-action>)
465479

466480
- Apache License, Version 2.0
467481

468-
docker/setup-qemu-action (https://github.com/marketplace/actions/docker-setup-qemu)
482+
docker/setup-qemu-action (<https://github.com/marketplace/actions/docker-setup-qemu>)
469483

470484
- Apache License, Version 2.0
471485

472-
docker/setup-buildx-action (https://github.com/marketplace/actions/docker-setup-buildx)
486+
docker/setup-buildx-action (<https://github.com/marketplace/actions/docker-setup-buildx>)
473487

474488
- Apache License, Version 2.0
475489

476-
docker/login-action (https://github.com/marketplace/actions/docker-login)
490+
docker/login-action (<https://github.com/marketplace/actions/docker-login>)
477491

478492
- Apache License, Version 2.0
479493

480-
docker/build-push-action (https://github.com/marketplace/actions/build-and-push-docker-images)
494+
docker/build-push-action (<https://github.com/marketplace/actions/build-and-push-docker-images>)
481495

482496
- Apache License, Version 2.0

docs/15-system-tests.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -203,21 +203,22 @@ means that it is hard to produce reproducible figures. In future work we hope to
203203
more accurate results. Nevertheless, these tests have caught several significant performance regressions that would
204204
otherwise not have been noticed.
205205

206-
| Version number | Test date | Compaction rate (records/s) | Ingest S3 write rate (records/s) |
207-
|----------------|------------|-----------------------------|----------------------------------|
208-
| 0.11.0 | 13/06/2022 | 366000 | 160000 |
209-
| 0.12.0 | 18/10/2022 | 378000 | 146600 |
210-
| 0.13.0 | 06/01/2023 | 326000 | 144000 |
211-
| 0.14.0 | 20/01/2023 | 349000 | 153000 |
212-
| 0.15.0 | 30/03/2023 | 336000 | 136000 |
213-
| 0.16.0 | 28/04/2023 | 325000 | 137000 |
214-
| 0.17.0 | 09/06/2023 | 308000 | 163000 |
215-
| 0.18.0 | 09/08/2023 | 326000 | 147000 |
216-
| 0.19.0 | 19/09/2023 | 326700 | 143500 |
217-
| 0.20.0 | 20/11/2023 | 318902 | 137402 |
218-
| 0.21.0 | 08/02/2024 | 330460 | 145683 |
219-
| 0.22.0 | 22/03/2024 | 350177 | 155302 |
220-
| 0.23.0 | 22/05/2024 | 273585 | 154574 |
221-
| 0.24.0 | 17/07/2024 | 242175 | 151578 |
222-
| 0.25.0 | 18/09/2024 | 257317 | 144229 |
223-
| 0.26.0 | 11/11/2024 | 311350 | 160701 |
206+
| Version number | Test date | Java compaction rate (records/s/) | DataFusion compaction rate (records/s) | Ingest S3 write rate (records/s) |
207+
|----------------|------------|-----------------------------------|----------------------------------------|----------------------------------|
208+
| 0.11.0 | 13/06/2022 | 366,000 | | 160,000 |
209+
| 0.12.0 | 18/10/2022 | 378,000 | | 146,600 |
210+
| 0.13.0 | 06/01/2023 | 326,000 | | 144,000 |
211+
| 0.14.0 | 20/01/2023 | 349,000 | | 153,000 |
212+
| 0.15.0 | 30/03/2023 | 336,000 | | 136,000 |
213+
| 0.16.0 | 28/04/2023 | 325,000 | | 137,000 |
214+
| 0.17.0 | 09/06/2023 | 308,000 | | 163,000 |
215+
| 0.18.0 | 09/08/2023 | 326,000 | | 147,000 |
216+
| 0.19.0 | 19/09/2023 | 326,700 | | 143,500 |
217+
| 0.20.0 | 20/11/2023 | 318,902 | | 137,402 |
218+
| 0.21.0 | 08/02/2024 | 330,460 | | 145,683 |
219+
| 0.22.0 | 22/03/2024 | 350,177 | | 155,302 |
220+
| 0.23.0 | 22/05/2024 | 273,585 | | 154,574 |
221+
| 0.24.0 | 17/07/2024 | 242,175 | | 151,578 |
222+
| 0.25.0 | 18/09/2024 | 257,317 | | 144,229 |
223+
| 0.26.0 | 11/11/2024 | 311,350 | | 160,701 |
224+
| 0.27.0 | 12/12/2024 | 248,531 | 1,304,938 | 163,032 |

0 commit comments

Comments
 (0)