Skip to content

[FLINK-37155] [Runtime/Coordination] Implementing FLIP-505 for Flink History Server scalability improvements to decouple local and remote storage #26878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

achang52
Copy link

@achang52 achang52 commented Aug 6, 2025

What is the purpose of the change

Implementing FLIP-505 for Flink History Server scalability improvements by decoupling local job archive caching with a remote store.

Brief change log

  • Adding new configurations for the Flink History Server historyserver.archive.cached-retained-jobs and historyserver.archive.num-cached-most-recently-viewed-jobs
  • Enabling decoupling the number of job archives stored from the local cache by enabling remote storage
  • Enabling fetching a job archive by jobID

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change added tests and can be verified as follows:

  • Added new tests for HistoryServerArchiveFetcherTest.java for ensuring the validation of how cached jobs are evicted and how the local and remote caches interact
  • Added additional test in the HistoryServerTest.java and WebFrontendBootstrapTest to cover local and remote caching behavior
  • Manually verified by deploying the Flink History Server locally with test job archives.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 6, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@achang52 achang52 marked this pull request as ready for review August 12, 2025 22:36
@achang52 achang52 changed the title [FLINK-37155] [Runtime/Coordination] Implementing FLIP-505 for Flink History Server scalability improvements [FLINK-37155] [Runtime/Coordination] Implementing FLIP-505 for Flink History Server scalability improvements to decouple local and remote storage Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants