Skip to main content
Skip to main content
Edit this page

Ingesting data from MySQL to ClickHouse (using CDC)

Beta feature. Learn more.
References

Ingesting data from MySQL to ClickHouse Cloud via ClickPipes is in public beta.

The MySQL ClickPipe provides a fully-managed and resilient way to ingest data from MySQL and MariaDB databases into ClickHouse Cloud. It supports both bulk loads for one-time ingestion and Change Data Capture (CDC) for continuous ingestion.

MySQL ClickPipes can be deployed and managed manually using the ClickPipes UI. In the future, it'll be possible to deploy and manage MySQL ClickPipes programatically using OpenAPI and Terraform.

Prerequisites

To get started, you first need to ensure that your MySQL database is correctly configured for binlog replication. The configuration steps depend on how you're deploying MySQL, so please follow the relevant guide below:

Supported data sources

NameLogoDetails
Amazon RDS MySQL
One-time load, CDC
Icon-Architecture/32/Arch_Amazon-RDS_32Follow the Amazon RDS MySQL configuration guide.
Amazon Aurora MySQL
One-time load, CDC
Icon-Architecture/32/Arch_Amazon-Aurora_32Follow the Amazon Aurora MySQL configuration guide.
Cloud SQL for MySQL
One-time load, CDC
Follow the Cloud SQL for MySQL configuration guide.
Azure Flexible Server for MySQL
One-time load
Icon-databases-122Follow the Azure Flexible Server for MySQL configuration guide.
Self-hosted MySQL
One-time load, CDC
Follow the Generic MySQL configuration guide.
Amazon RDS MariaDB
One-time load, CDC
Icon-Architecture/32/Arch_Amazon-RDS_32Follow the Amazon RDS MariaDB configuration guide.
Self-hosted MariaDB
One-time load, CDC
MDB-VLogo_RGBFollow the Generic MariaDB configuration guide.

Once your source MySQL database is set up, you can continue creating your ClickPipe.

Create your ClickPipe

Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up here.

  1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
  1. Select the Data Sources button on the left-side menu and click on "Set up a ClickPipe"
  1. Select the MySQL CDC tile

Add your source MySQL database connection

  1. Fill in the connection details for your source MySQL database which you configured in the prerequisites step.

    References

    Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a list of ClickPipes IP addresses. For more information refer to the source MySQL setup guides linked at the top of this page.

(Optional) Set up SSH Tunneling

You can specify SSH tunneling details if your source MySQL database is not publicly accessible.

  1. Enable the "Use SSH Tunnelling" toggle.

  2. Fill in the SSH connection details.

  3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under ~/.ssh/authorized_keys.

  4. Click on "Verify Connection" to verify the connection.

Note

Make sure to whitelist ClickPipes IP addresses in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.

Once the connection details are filled in, click Next.

Configure advanced settings

You can configure the advanced settings if needed. A brief description of each setting is provided below:

  • Sync interval: This is the interval at which ClickPipes will poll the source database for changes. This has an implication on the destination ClickHouse service, for cost-sensitive users we recommend to keep this at a higher value (over 3600).
  • Parallel threads for initial load: This is the number of parallel workers that will be used to fetch the initial snapshot. This is useful when you have a large number of tables and you want to control the number of parallel workers used to fetch the initial snapshot. This setting is per-table.
  • Pull batch size: The number of rows to fetch in a single batch. This is a best effort setting and may not be respected in all cases.
  • Snapshot number of rows per partition: This is the number of rows that will be fetched in each partition during the initial snapshot. This is useful when you have a large number of rows in your tables and you want to control the number of rows fetched in each partition.
  • Snapshot number of tables in parallel: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.

Configure the tables

  1. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.

  2. You can select the tables you want to replicate from the source MySQL database. While selecting the tables, you can also choose to rename the tables in the destination ClickHouse database as well as exclude specific columns.

Review permissions and start the ClickPipe

  1. Select the "Full access" role from the permissions dropdown and click "Complete Setup".

Finally, please refer to the "ClickPipes for MySQL FAQ" page for more information about common issues and how to resolve them.

What's next?

Once you've set up your ClickPipe to replicate data from MySQL to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance. For common questions around MySQL CDC and troubleshooting, see the MySQL FAQs page.