Skip to content

Conversation

@RasmusRendal
Copy link

In the current state, a partition consists of partitions, which is quite confusing. This PR changes that, such that a partition consists of buckets.

…uckets

In the current state, a partition consists of partitions, which is quite confusing. This PR changes that, such that a partition consists of buckets.
Deterministic token partitioning allows you to use subject-based addressing to deterministically divide (partition) a flow of messages where one or more of the subject tokens is mapped into a partition key. Deterministically means, the same tokens are always mapped into the same key. The mapping will appear random and may not be `fair` for a small number of subjects.
For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of partitions, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`.
For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of buckets, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RasmusRendal, thanks for the contribution. Now that you pointed this discrepancy out, I think removing the term bucket generally would make more sense and stick with partition since it is redundant.

For example:

Suggested change
For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of buckets, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`.
For example: new customer orders are published on `neworders.<customer id>`, you can spread those messages over 3 partition, using the `partition(number of partitions, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`.

Does this make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling the things the partition consists of something different than "partition" is still important, if not for making the documentation understandable, to make it easier to talk about NATS. I want to be able to tell a colleague that "We create a partition of our subject, and each bucket/part is handled by a separate consumer".

This is also how people talk about partitions in other contexts: https://en.wikipedia.org/wiki/Partition_of_a_set#Definition_and_notation

The sets in $P$ are called the blocks, parts, or cells, of the partition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants