5t1x@ |
The hunt for the cluster-killer Erlang bug | by Dániel Szoboszlay | Klarna Engineering
https://engineering.klarna.com/the-hunt-for-the-cluster-killer-erlang-bug-81dd0640aa81?gi=e3144c1fcf1b
Saved on 2022-06-15 [19158 edays] via engineering.klarna.com
Modified 2023-08-09 [19578 edays]
erlang kafka postmortem programming
https://engineering.klarna.com/the-hunt-for-the-cluster-killer-erlang-bug-81dd0640aa81?gi=e3144c1fcf1b
Saved on 2022-06-15 [19158 edays] via engineering.klarna.com
Modified 2023-08-09 [19578 edays]
erlang kafka postmortem programming
The chain of sad events started with a routine maintenance on the Kafka cluster going South. One of the Kafka nodes was terminated kill -9 style, and Kafka failed to quickly elect a new leader for the partitions it handled. This affected <5% of the partitions, but caused smaller or bigger issues in almost every service at Klarna.