Exploring an Apache Kafka to Pub/Sub migration: Major considerations

The fastest way to migrate a business application into Google Cloud is to use the lift and shift strategy—and part of that transition is migrating any OSS, or third-party services that the application uses. But sometimes it can be more efficient and beneficial to leverage Google Cloud services instead. One of the services that customers often think about migrating is Apache Kafka, a popular message distribution solution that performs asynchronous message exchange between different components of an application. While following the lift and shift strategy, the native solution is to migrate to a proprietary managed Kafka cluster or to leverage a managed partner service of Confluent Cloud. But in many cases, our Pub/Sub messaging and event distribution service can successfully replace Apache Kafka, with lower maintenance and operational costs, and better integration with other Google Cloud services.Kafka is designed to be a distributed commit log. In other words, it includes the functionality of both a message system and storage system, providing features beyond that of a simple message broker. These features include log compaction, partitioned ordering, exactly-once delivery, the ability to browse committed messages, long message retention times and others often complicate the migration decision. The migration task is easier when Kafka is simply used as a message broker or event distribution system. But it is also possible to migrate from Kafka to Pub/Sub when the former is used for data streaming. In this post, we compare some key differences between Kafka and Pub/Sub to help you evaluate the effort of the migration. Then, in an upcoming post, we’ll show you how to implement some Kafka functionality with the Pub/Sub service as well as to accomplish the migration itself.Pub/Sub Key AdvantagesDespite the fact that Apache Kafka offers more features, many applications that run in Google Cloud can benefit from using Pub/Sub as their messaging service. Some of Pub/Sub’s benefits include:Zero maintenance costs – Apache Kafka is highly customizable and flexible, but that can translate to expensive, often manual maintenance. In contrast, running Pub/Sub does not require any manpower. Lower operational costs – Running Kafka OSS in Google Cloud incurs operational costs, since you have to provision and maintain the Kafka clusters.  In addition, infrastructure costs might be higher in some circumstances since they are based on allocated resources rather than used resources. In contrast, Pub/Sub pricing is based on pay-per-use and the service requires almost no administrationNative integration with other Google Cloud services, e.g. Cloud Functions, Storage or Stackdriver – To use Kafka with these services, you need to install and configure additional software (connectors) for each integration. A push mechanism – In addition to the conventional message pulling mechanism, Pub/Sub retrieves messages posted to a topic via push delivery.Implicit scaling – Pub/Sub automatically scales in response to a change in load. In contrast, Kafka’s topic partitioning requires additional management, including making decisions about resource consumption vs. performance.Integrated logging and monitoring – Pub/Sub is natively integrated with Stackdriver, with no external configurations or tooling required. Kafka provides monitoring using the JMX plugin. When you deploy Kafka on Google Cloud, you’ll need to do additional development to integrate Kafka logs into Stackdriver logging and monitoring, maintain multiple sources of logs and alerts.Key differences affecting migration decisions It’s not easy to know upfront how complex it will be to migrate from Kafka to Pub/Sub. Here’s a decision tree that suggests solutions to potential migration blockers.If you use exactly-once message delivery Kafka’s exactly-once message delivery guarantee comes with a price: a degradation in performance. You can use it in production environments if you’re not expecting high message throughput and you don’t need to scale under load. A more effective way to achieve exactly once processing at high scale might be to make your message processing idempotent or use Dataflow to deduplicate the messages. These approaches can be used with Kafka too. If you consume messages that were published longer than seven days agoThere are few business reasons to postpone message processing. One of the most common is processing of messages that for some reason were not processed at a time they were posted by a publisher, for example, due to commit failure.  One of the use cases is the dead letter queue pattern where messages that cannot be processed by current applications are stored until it is modified to accommodate them. In Kafka you implement a dead letter queue using Kafka Connect or Kafka Streams. Pub/Sub now has a native dead letter queue too. This functionality is in alpha. Follow the Pub/Sub release notes to see when it will be generally available. Alternatively, you can implement dead letter queue logic using a combination of Google Cloud services. This post shows you how, using Dataflow and a Google Cloud database.When you use Kafka to store messages over long time periods, the migration guidelines are to store the posted messages in a database such as Cloud Bigtable or the BigQuery data warehouse.If you use log compaction,random message access or message deletionBeing able to overwrite or delete messages is functionality that you usually find in a storage service rather than in a message distribution service. Kafka’s log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition. There is no equivalent feature in Pub/Sub and compaction requires explicit reprocessing of messages or incremental aggregation of state. Pub/Sub does provide the ability to discard messages automatically after as little 10 minutes. You can consider using seek functionality to random message access. Also the seek to a timestamp allows to discard the acknowledged messages manually after a retention period between 10 minutes and 7 days.If you use keyed message orderingOne of Kafka’s flagship features is its partition message ordering, sometimes referred to as keyed message ordering. Compared to Kafka, Pub/Sub offers only best-effort ordered message delivery. The feature is often cited as a functional blocker for migrating to another message distribution solution. However, the problem is not that clear-cut. Let’s briefly review message ordering in Kafka. Kafka promises to order messages within a single partition of a topic. This means that when a producer sends messages to a topic in some order, the broker writes the messages to the topic’s partition in that order, and all consumers read them in that order too.A broker distributes messages among partitions randomly. Because topics usually have many partitions, it is hard to maintain the ordering of the messages. To solve that problem, Kafka offers keyed messages—a mechanism that allows a single producer to assign unique keys to published messages. All messages that come with a specific key go to the same partition. A consumer can process the messages with the same key chronologically by reading them from that partition.Kafka’s ordering provides partial message ordering within a topic. Total topic ordering can be achieved with Kafka by configuring only one partition in the topic. However, this configuration takes out parallelism and usually is not used in production. Pub/Sub documentation reviews different use cases for message ordering and proposes solutions using additional Cloud services. You can also use third-party solutions if you don’t want to use these Google Cloud services. In addition, Pub/Sub has an “ordering key” feature (in limited alpha) that guarantees that messages successfully published by a single publisher for the same ordering key will be sent to subscribers in that order. Follow the Pub/Sub release notes to see when it will be generally available.What’s next?If you are considering a migration from Apache Kafka to Pub/Sub, we hope that this post helps to evaluate the change and offers comparison of unique features of both tools. In our next post, we’ll review implementation complexity of the migration and how to resolve it using the mentioned unique Pub/Sub features.ResourcesKafka reference documentationSpotify on replacing Kafka with Pub/SubImplement exactly-once delivery using Google Cloud DataflowError handling strategy using Cloud Pub/Sub and Dead Letter queuePub/Sub product pageMessage ordering documentationPub/Sub FAQ
Quelle: Google Cloud Platform

Published by