Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. It is, in its essence, a “massively scalable pub/sub message queue architected as a distributed transaction log”,[2] making it highly valuable for enterprise infrastructures.
The design is heavily influenced by transaction logs.[3]
History
Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In November 2014, several engineers who built Kafka at LinkedIn created a new company named Confluent[4] with a focus on Kafka.
Enterprises that use Kafka
The following is a list of notable enterprises that have used or are using Kafka:
- Cisco Systems[5]
- Daumkakao[6]
- Netflix[7]
- PayPal[8]
- Spotify[9]
- Uber[10]
- Shopify[11]
- Betfair[12]
- Sift Science[13]
- HubSpot[14]
Kafka performance
Due to its ability to scale massively and it being largely used by enterprise-level infrastructures, tracking Kafka performance has become an increasingly important issue. There are currently several monitoring platforms to track Kafka performance, both open-source, like Linkedin’s Burrow, as well as paid, like Datadog.
The key metrics[15] these platforms track consist of:
- Kafka server (broker) metrics
- Producer metrics
- Consumer metrics for both Version 0.8.2.2 and 0.9.0.0+
Kafka is also often used in conjunction with ZooKeeper for deployment management, which necessitates monitoring its metrics alongside Kafka clusters.[16]
See also
- Apache ActiveMQ
- Apache Samza
- StormMQ
- Apache Qpid
- Message-oriented middleware
- Enterprise messaging system
- Enterprise Integration Patterns
- Service-oriented architecture
- Event-driven SOA
References
- Repository Mirror at GitHub
- Monitoring Kafka performance metrics, Datadog Engineering Blog, accessed 23 May 2016/
- The Log: What every software engineer should know about real-time data’s unifying abstraction, LinkedIn Engineering Blog, accessed 5 May 2014
- Primack, Dan. “LinkedIn engineers spin out to launch ‘Kafka’ startup Confluent”. fortune.com. Retrieved 10 February 2015.
- “OpenSOC: An Open Commitment to Security”. Cisco blog. Retrieved 2016-02-03.
- Doyung Yoon. “S2Graph : A Large-Scale Graph Database with HBase”.
- Cheolsoo Park and Ashwin Shankar. “Netflix: Integrating Spark at Petabyte Scale”.
- Shibi Sudhakaran of PayPal. “PayPal: Creating a Central Data Backbone: Couchbase Server to Kafka to Hadoop and Back (talk at Couchbase Connect 2015)”. Couchbase. Retrieved 2016-02-03.
- Josh Baer. “How Apache Drives Spotify’s Music Recommendations”.
- “Stream Processing in Uber”. InfoQ. Retrieved 2015-12-06.
- “Shopify – Sarama is a Go library for Apache Kafka”.
- “Exchange Market Data Streaming with Kafka”.
- “Concurrency and At Least Once Semantics with the New Kafka Consumer”.
- “Kafka at HubSpot: Critical Consumer Metrics”.
- “Monitoring Kafka performance metrics”. Datadog. 2016-04-06. Retrieved 2016-06-01.
“Monitor Kafka with Datadog”. Datadog. 2016-04-06. Retrieved 2016-06-01.
External links
- Apache Kafka website
- Discussion of project’s design
- Github mirror
- Apache Kafka presentation by Morten Kjetland
- Comparison with RabbitMQ on Quora
- Comparison with RabbitMQ on the Kafka developer mailing list
- Comparison with RabbitMQ and ZeroMQ on Stackoverflow
- Intra-cluster Replication in Apache Kafka
- Kafka Users Mailing List Discussions
- LinkedIn open sourcing announcement