TL;DR: We migrated 10+ microservices from direct HTTP calls to Kafka event-driven communication. Reliability improved massively but the migration was harder than expected. Here are the real lessons including the mistakes. Our system started as a monolith. Then we split it into microservices. The services talked to each other using direct HTTP calls. Service A would POST to Service B which would PO