An Introduction to Apache Pulsar

Apache Pulsar is a distributed messaging system and a streaming platform originally developed by Yahoo, in 2012; the main goal was to develop a scalable and reliable messaging platform to address the gaps which were not fulfilled with the existing Enterprise Messaging Systems (EMS). It was Open-Sourced in the year 2016; contributed to the Apache Software Foundation (ASF).

Apache Pulsar is an evolutionary step forward in the Enterprise Messaging Systems (EMS) lineage. It provides real-time messaging, real-time compute & scalable storage capabilities for processing large data sets; with that, it can be used to build real-time data pipelines and stream processing applications.

Messages are the basic concepts of Pulsar

Like any other Enterprise Messaging System, messages are the basic concepts of Pulsar. Apache Pulsar is built on the pub-sub pattern (or publish-subscribe pattern). In the Pub-Sub pattern, producers publish the messages to topics, and consumers subscribing to the topics will consume the messages. Note that, topics are named channels used to pass the messages. The messages will be retained until consumers acknowledge whether the messages are processed successfully. One of the goals of the Enterprise Messaging System is guaranteed message delivery. Pulsar uses, Apache BookKeeper log storage system to store the messages.

Pulsar’s multi-layered architecture contains the conceptual layers; the service layer and the storage layer. Data serving is handled by the service layer through stateless broker nodes, and data storage is handled by the storage layer through its’ bookie nodes (an individual BookKeeper server). These conceptual layers can scale independently.

Producers connect to the message brokers to publish the messages and consumers connect to the message brokers to consume the messages. Depending on the configuration rules, messages will be delivered to the consumers.

At the highest level, a Pulsar instance consists of one or more Pulsar clusters. A Pulsar cluster is a set of message brokers and bookies. Pulsar clusters can replicate messages from one another using geo-replication; these can reside in different data centers or geographical regions; with this, data can replicate in multiple locations.

Pulsar uses the ZooKeeper service for cluster-specific configuration and co-ordination related tasks.


An Introduction to Apache Pulsar

Leave a Reply

Your email address will not be published.

Scroll to top