This has a number of nice properties:
1. You don’t need to store keys in any special way. Just make them a unique column of your db and the db will detect duplicates for you (and you can provide logic to handle as required, eg ignoring if other input fields are the same, raising an error if a message has the same idempotent key but different fields).
2. You can reliably generate new downstream keys from an incoming key without the need for coordination between consumers, getting an identical output key for a given input key regardless of consumer.
3. In the event of a replayed message it’s fine to republish downstream events because the system is now deterministic for a given input, so you’ll get identical output (including generated messages) for identical input, and generating duplicate outputs is not an issue because this will be detected and ignored by downstream consumers.
4. This parallelises well because consumers are deterministic and don’t require any coordination except by db transaction.
Edit: Just looked it up... looks like this is basically what a uuid5 is, just a hash(salt+string)
If you try to disambiguate those messages using, say, a timestamp or a unique transaction ID, you're back where you started: how do you avoid collisions of those fields? Better if you used a random UUIDv4 in the first place.
Customer A can buy N units of product X as many times as they want.
Each unique purchase you process will have its own globally unique id.
Each duplicated source event you process (due to “at least once” guarantees) will generate the same unique id across the other duplicates - without needing to coordinate between consumers.
> Critically, these two things must happen atomically, typically by wrapping them in a database transaction. Either the message gets processed and its idempotency key gets persisted. Or, the transaction gets rolled back and no changes are applied at all.
How do you do that when the processing isn’t persisted to the same database? IE. what if the side effect is outside the transaction?
You can’t atomically rollback the transaction and external side effects.
If you could use a distributed database transaction already, then you don’t need idempotent keys at all. The transaction itself is the guarantee
There is also 2 phase commit, which is not without downsides either.
All in all, I think the author made a wrong point that exact-once-processing is somehow easier to solve than exact-once-delivery, while in fact it’s exactly same problem just shaped differently. IDs here are secondary.
for example, you can conceive of a software vendor that does the end-to-end of a real estate transaction: escrow, banking, signature, etc. The IT required to support the model of such a thing would be staggering. Does it make sense to do that kind of product development? That is inventing all of SAP, on top of solving your actual problem. Or making the mistake of adopting temporal, trigger, etc., who think they have a smaller problem than making all of SAP and spend considerable resources convincing you that they do.
The status quo is that everyone focuses on their little part to do it as quickly as possible. The need for durable workflows is BAD. You should look at that problem as, make buying and selling homes much faster and simpler, or even change the order of things so that less durability is required; not re-enact the status quo as an IT driven workflow.
Why are real-estate transactions complex and full of paperwork? Because there are history books filled with fraud. There are other types of large transactions that also involve a lot of paperwork too, for the same reason.
Why does a company have extensive internal tracing of the progress of their business processes, and those of their customers? Same reason, usually. People want accountability and they want to discourage embezzlement and such things.
"How we've been doing things is wrong and I am going to redesign it in a way that no one else knows about so I don't have to implement the thing that's asked of me"
Businesses that require enterprise sales are probably the worst performing category of seed investing. They encompass all of Ed tech and health tech, which are the two worst industry verticals for VC; and Y Combinator has to focus on an index of B2B services for other programmers because without that constraint, nearly every “do what you are asked for” would fail. Most of the IT projects business do internally fail!
In fact I think the idea you are selling is even harder, it is much harder to do B2B enterprise sales than knowing if the thing you are making makes sense and is good.
Which obviously has it's own set of tradeoffs.
This illustrates that the webdevs who write articles on "distributed system" don't really understand what is already out there. These are all solved problems.
It should be the opposite: with more messages you want to scale with independent consumers, and a monotonic counter is a disaster for that.
You also don’t need to worry about dropping old messages if you implement your processing to respect the commutative property.
Is there any method for uniqueness testing that works after fan-out?
> You also don’t need to worry about dropping old messages if you implement your processing to respect the commutative property.
Commutative property protects if messages are received out of order. Duplicates require idempotency.
idempotency means something else to me.