December 09, 2013, 12:42 PM — Internet services and other organizations hoping to plant a newsfeed on their Web sites, or aggregate their log files, could get a hand from a new pub/sub (publish-subscribe) messaging application called Kafka, first developed by professional social networking service LinkedIn. Kafka version 0.8 is the first major release of the middleware since it became an Apache Software Foundation Top Level Project earlier this year, after LinkedIn open-sourced the code. The new release is the first version to be able to work with multiple data directories. It can replicate data within the same cluster, and includes new internal metrics.
While RabbitMQ and other messaging platforms built on AMQP (Advanced Message Queuing Protocol) have been widely used for years, LinkedIn developed Kafka in-house in order to route a larger number of messages than those applications typically handle.
Like other pub/sub mechanisms, Kafka can collect messages from multiple contributors and distribute those messages to subscribers of the message feed. Kafka has a distributed architecture, meaning that multiple brokers can be installed across servers to increase the throughput between publishers and subscribers. Observers have noted that Kafka does not offer as many routing capabilities as RabbitMQ, though it offers superior throughput.
LinkedIn engineers had found that Kafka deployment can publish more than 400,000 messages a second, which is two orders of magnitude greater than RabbitMQ's performance, and one order of magnitude greater than that from ActiveMQ, another AMQP-based package also maintained by Apache.
LinkedIn itself uses the software to deliver user newsfeeds, as well as for other duties such as tracking metrics. Internet service Shopify also uses Kafka, and has even released a client to intercept Kafka messages.