KAFKA-19893: Reduce tiered storage redundancy with delayed upload (KIP-1241) #20913
+90
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



JIRA:19893
KIP:1241
Currently, Kafka uploads all non-active local log segments to remote
storage even when they are still within the local retention period,
resulting in redundant storage of the same data in both tiers. This
wastes storage capacity (cost) without providing immediate
benefits,since reads during the retention window prioritize local data.
However, some users/topics rely on remote storage for real-time
analytics and need the latest data to be available as soon as possible
(In fact, it only tries to stay as up-to-date as possible, but it still
can’t include the latest data because the active segment hasn’t been
uploaded yet.). Therefore, this optimization is offered as a topic's
optional configuration rather than the default behavior.
Here are some additional thoughts/considerations.
remote storage, so this change is very safe—you don’t need to worry
about files being cleaned up before they be upload to the remote.
won’t be set too short. For example, in our production environment, we
keep 1 day of local data alongside 3-7 days in remote storage, so
there’s still 1 day of redundancy.
Example for the goal: