In our quest to implement a robust and efficient Mirror Maker service, we find ourselves at a crossroads where architecture, tools, and components converge. The Mirror Maker, a vital element in the world of data replication, is an intricate system that demands a careful examination of its inner workings. Join us on this journey as we cut up the components of our implementation and delve into the nitty-gritty of how it all comes together. From Kafka Connect to Mirror Maker and everything in between, we’re about to uncover the intricate pieces of the puzzle that make our data replication solution a reality.
In contrast to our previous articles, we will adopt a different approach in this piece. We will provide a detailed insight into our implementation and then proceed to uncover the Mirror Maker repository we’ve created, encompassing the entire setup.
What we will cover:
- Architectural Insights
- The Image of Mirror Maker
- Customizing Topic Names
- Setting Up Configuration Properties for Kafka Connect
- Building the Mirror Maker
- Deploying the Docker Image
- Deploying Connectors
- Connectors Structure
Architectural Insights
Our approach includes the Kafka Connect tool and the Mirror Maker that is built on top of the Kafka Connect.
The Kafka Connect is a tool that serves us as a scalable and dependable data streaming tool. It allows us to establish connections between Apache Kafka and a variety of other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka.
Next, we have the Mirror Maker tool which is designed with the unique purpose of replicating data, topics, topic configurations, consumer groups and their offsets, as well as ACLs from one or more source Kafka clusters to one or more target Kafka clusters.
In a nutshell, MirrorMaker uses Connectors to consume from source clusters and produce to target clusters.
The Image of Mirror Maker
We’ve developed our approach by creating a Docker image that includes both the Kafka Connect and Mirror Maker tools. As mentioned earlier, Mirror Maker is built on top of the Kafka Connect framework.
You can find the Dockerfile here.
Our Dockerfile was based on the sample repository provided by AWS, specifically kafka-connect-mm2.
The Dockerfile is structured with the following layers:
- Base image of Ubuntu
- Kafka Connect version
- AWS MSK IAM Auth
- JMX metrics
- Maven tool
- Mirror Maker tool
After downloading Kafka Connect in the Dockerfile, the subsequent step involves downloading AWS MSK IAM Auth. This library enables the use of AWS Identity and Access Management (IAM) for connecting to Amazon MSK clusters. Our aim is to ensure that our Mirror Maker tool can work with AWS, not limited to Confluent or any other third-party solutions. Thus, we’ve included the necessary AWS library for this purpose.
Furthermore, we’ve added the JMX (Prometheus exporter) to enable the scraping of local JVM metrics.
We’ve also included Maven in the installation process because we build the Mirror Maker Java project later on.
Customizing Topic Names
The Mirror Maker creates mirrored topics with a prefix, typically a combination of the source cluster alias. However, in our setup, we give users with the flexibility to name mirrored topics according to their preferences. For more details, you can refer to changes that we applied here, which is based on the Mirror Maker repository.
The mirrored topic will be named based on the source.cluster.alias parameter.
Take a closer look at the Dockerfile section where we clone the mirrormaker2-msk-migration and adding our changes before building the project to obtain its JAR file.
RUN git clone https://github.com/aws-samples/mirrormaker2-msk-migration.git
WORKDIR mirrormaker2-msk-migration/CustomMM2ReplicationPolicy
COPY CustomMM2ReplicationPolicy.java ./CustomMM2ReplicationPolicy/src/main/java/com/amazonaws/kafka/samples/
COPY TestCustomMM2ReplicationPolicy.java ./CustomMM2ReplicationPolicy/src/test/java/com/amazonaws/kafka/samples/
RUN mvn clean install
RUN mv target/CustomMM2ReplicationPolicy-1.0-SNAPSHOT.jar /opt/kafka/libs/
Setting Up Configuration Properties for Kafka Connect
The Dockerfile ends with the execution of the following command: /opt/kafka/bin/connect-distributed.sh
.
This command is responsible for running the Kafka Connector service. In addition, we include configuration properties for its operation in /etc/connect-mirror-maker/connect-mirror-maker.properties
.
We generate the properties configuration during the service deployment as a ConfigMap.
As you can see here:
# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=
The bootstrap servers represent the target cluster where we intend to enable our Kafka Connect functionality.
Building the Mirror Maker
Building Mirror Maker can be achieved either through the command-line interface (CLI), or you have the flexibility to create an automated process using tools like ArgoCD or Jenkins, among others.
# Clone the Mirror Maker repository
git clone https://github.com/naturalett/mirror-maker-confluent-to-msk.git
cd mirror-maker-confluent-to-msk/building-mirror
docker build -t naturalett/mirror-maker:v1 .
Deploying the Docker Image
We will follow a similar approach as in the building step for deploying the Mirror Maker Helm Chart. This can be done through the CLI, or you have the option to establish your automated methods.
# Clone the Mirror Maker repository
git clone https://github.com/naturalett/mirror-maker-confluent-to-msk.git
cd mirror-maker-confluent-to-msk/building-mirror
helm upgrade -i mm2-cfnstg-to-stg-ue2 --cleanup-on-fail --set 'image.tag=v1' --namespace staging --set 'env=staging' --set 'destination_cluster=stg-ue2' ./helm
During the deployment of Mirror Maker, it will automatically generate the configmap. This configmap is responsible for creating the essential properties file, which includes the critical bootstrap.servers configuration required for the initial setup of our Mirror Maker.
Deploying Connectors
We’ve created a Python script that takes input of the destination and source clusters for connector creation. Subsequently, this script applies the selected connectors from our repository.
# Clone the Mirror Maker repository
git clone https://github.com/naturalett/mirror-maker-confluent-to-msk.git
cd mirror-maker-confluent-to-msk/building-mirror/connectors
python push_configs.py -dc stg-ue2 -sc cfnstg -env staging
Connectors Structure
Let’s visualize how the configurations for our topics in the repository appear:
connectors
production
...
...
staging → Our environment
stg-ue2 → Cluster destination (Our initial bootstrap servers)
cfnstg → Cluster source (From where to replicate the Kafka Topic)
events-staging.json → The topic configuration
We can generate numerous connectors to suit our needs. These connectors can source data from any origin, which could be any MSK cluster. However, the destination will always be the specific cluster where we have set up the Kafka Connect Service, as defined by its bootstrap servers configuration.
Wrap It Up…
We’ve designed a flexible setup that allows you to choose mirrored topic names as per your preference. Our process involved a Docker image and a user-friendly Python script for the creation of the connectors and we explored the structure of our repository, revealing how the various components come together to form our data replication solution.