Switch to:
LinkedIn and Apache Kafka: 7 trillion messages a day

LinkedIn and Apache Kafka: 7 trillion messages a day

Apache Kafka, an open-source platform for real-time data stream processing, has become an essential tool for companies operating on a large scale.

LinkedIn, which originally developed Kafka, is today one of its main users, handling over 7 trillion messages a day.

To support this volume of data, LinkedIn has created a tailor-made Kafka ecosystem, facing scalability challenges and actively contributing to the open-source community.

Kafka was born in 2010 within LinkedIn to manage real-time data flows generated by users. The system had such success that it was open-sourced in 2011 and is today an industry standard.

Despite its widespread adoption, LinkedIn continues to be one of the most advanced users, with an infrastructure that includes dozens of Kafka clusters, more than 4,000 brokers and 7 million partitions.

This infrastructure must guarantee reliability and high performance.

To do this, LinkedIn has developed a customized version of Kafka, including patches and specific improvements for its own needs. However, many of these innovations are also shared with the open-source community, contributing to the development of the Apache Kafka project.

LinkedIn’s Kafka ecosystem

LinkedIn’s Kafka clusters are designed to handle billions of events a day, ensuring that every piece of information is processed in real time.

Internal applications rely on Kafka to record user activities, transmit messages between different services and collect system metrics, allowing the company to continuously analyze and optimize its operations.

To facilitate communication between the various components of the system, LinkedIn uses tools that improve data integration and management.

For example, the REST Proxy allows even non-Java-based services to interact with Kafka, while a schema management system ensures that sent and received data are always consistent.

To optimize the data flow between the company’s various operational centers, LinkedIn uses a replication technology, which ensures that information is always available and updated wherever needed.

Furthermore, to keep the system balanced and prevent overloads, an internal software monitors resource usage and automatically redistributes the load when necessary.

Challenges and innovations for scalability

Managing Kafka on this scale presents numerous challenges.

LinkedIn must face scalability, operability and maintenance issues, for which it has created customized Kafka releases with patches specific to its own needs.

This approach not only optimizes performance, but also enriches the entire open-source ecosystem.

In addition to internal management, LinkedIn is an active contributor to the Apache Kafka project.

Internally developed features, such as improvements to partition management and automation tools, are shared with the community, allowing other companies to benefit from the innovations introduced.

The use of Apache Kafka by LinkedIn represents an example of how an open-source system can be adapted and scaled to support global operations.

Thanks to a customized ecosystem and careful management of operational challenges, LinkedIn is able to process enormous amounts of data every day, while guaranteeing high performance and contributing to the growth of the Kafka platform for the entire technology community.