-
サマリー
あらすじ・解説
We finally start talking about Apache Kafka! Also, Allen is getting acquainted with Aesop, Outlaw is killing clusters, and Joe is paying attention in drama class. The full show notes are available on the website at https://www.codingblocks.net/episode235 News Atlanta Dev Con is coming up, on September 7th, 2024 (www.atldevcon.com) Intro to Apache Kafka What is it? Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Core capabilities High throughput - Deliver messages at network-limited throughput using a cluster of machines with latencies as low as 2ms.Scalable - Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, and hundreds of thousands of partitions. Elastically expand and contract storage and processingPermanent storage - Store streams of data safely in a distributed, durable, fault-tolerant cluster.High availability - Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions. Ecosystem Built-in stream processing - Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing.Connect to almost anything - Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more.Client libraries - Read, write, and process streams of events in a vast array of programming languagesLarge ecosystem of open source tools - Large ecosystem of open source tools: Leverage a vast array of community-driven tooling. Trust and Ease of Use Mission critical - Support mission-critical use cases with guaranteed ordering, zero message loss, and efficient exactly-once processing.Trusted by thousands of organizations - Thousands of organizations use Kafka, from internet giants to car manufacturers to stock exchanges. More than 5 million unique lifetime downloads.Vast user community - Kafka is one of the five most active projects of the Apache Software Foundation, with hundreds of meetups around the world. What is it? Getting data in real-time from event sources like databases, sensors, mobile devices, cloud services, applications, etc. in the form of streams of events. Those events are stored "durably" (in Kafka) for processing, either in real-time or retrospectively, and then routed to various destinations depending on your needs. It's this continuous flow and processing of data that is known as "streaming data" How can it be used? (some examples)Processing payments and financial transactions in real-timeTracking automobiles and shipments in real time for logistical purposesCapture and analyze sensor data from IoT devices or other equipmentTo connect and share data from different divisions in a company Apache Kafka as an event streaming platform? It contains three key capabilities that make it a complete streaming platform Can publish and subscribe to streams of eventsCan store streams of events durably and reliably for as long as necessary (infinitely if you have the storage)To process streams of events in real-time or retrospectively Can be deployed to bare metal, virtual machines or to containers on-prem or in the cloudCan be run self-managed or via various cloud providers as a managed service How does Kafka work? A distributed system that's composed of servers and clients that communicate using a highly performant TCP protocol Servers Kafka runs as a cluster of one or more servers that can span multiple data centers or cloud regionsBrokers - these are a portion of the servers that are the storage layerKafka Connect - these are servers that constantly import and export data from existing systems in your infrastructure such as relational databasesKafka clusters are highly scalable and fault-tolerant Clients Allows you to write distributed applications that allow to read, write and process streams of events in parallel that are fault-tolerant and scale These clients are available in many programming languages - both the ones provided by the core platform as well as 3rd party clients Concepts Events It's a record of something that happened - also called a "record" in the documentation Has a keyHas a valueHas an event timestampCan have additional metadata Producers and Consumers Producers - these are the client applications that publish/write events to KafkaConsumers - these are the client applications that read/subscribe to events from KafkaProducers and consumers are completely decoupled from each other Topics Events are stored in topicsTopics are like folders on a file system - events would be the equivalent of files within that folderTopics are mutli-producer and multi-subscriber There can be zero, one or many producers or subscribers to a topic that ...