Skip to content

Refine and synchronize en and zh Clustering docs #3104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: release-5.8
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions en_US/deploy/cluster/create-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -516,3 +516,30 @@ To use TCP IPv4 and TCP IPv6, you can set with the `cluster.proto_dist` in `emqx
To enable SSL, you first need to set the `cluster.proto_dist` to `inet_tls`, then configure the `ssl_dist.conf` file in the `etc` folder and specify the TLS certificate. For details, see [Using TLS for Erlang Distribution](https://www.erlang.org/doc/apps/ssl/ssl_distribution.html).

<!--need an example code here-->

## Pseudo-Distributed Cluster

EMQX also provides a pseudo-distributed cluster feature for testing and development purposes. It refers to a cluster setup where multiple instances of EMQX are running on a single machine, with each instance configured as a node in the cluster.

After starting the first node, use the following command to start the second node and join the cluster manually. To avoid port conflicts, we need to adjust some listening ports:

```bash
EMQX_NODE__NAME='emqx2@127.0.0.1' \
EMQX_LOG__FILE_HANDLERS__DEFAULT__FILE='log2/emqx.log' \
EMQX_STATSD__SERVER='127.0.0.1:8124' \
EMQX_LISTENERS__TCP__DEFAULT__BIND='0.0.0.0:1882' \
EMQX_LISTENERS__SSL__DEFAULT__BIND='0.0.0.0:8882' \
EMQX_LISTENERS__WS__DEFAULT__BIND='0.0.0.0:8082' \
EMQX_LISTENERS__WSS__DEFAULT__BIND='0.0.0.0:8085' \
EMQX_DASHBOARD__LISTENERS__HTTP__BIND='0.0.0.0:18082' \
EMQX_NODE__DATA_DIR="./data2" \
./bin/emqx start

./bin/emqx ctl cluster join emqx1@127.0.0.1
```

The above code example is to create a cluster manually, you can also refer to the [auto clustering](./create-cluster.md#auto-clustering) section on how to create a cluster automatically.

The dashboard is designed under the assumption that all cluster nodes use the same port number. Using distinct ports on a single computer may cause Dashboard UI issues, therefore, it is not recommended in production.

<!--to add a quickstart with the pseudo-distributed cluster @WIVWIV -->
189 changes: 160 additions & 29 deletions en_US/deploy/cluster/introduction.md

Large diffs are not rendered by default.

114 changes: 74 additions & 40 deletions en_US/deploy/cluster/mria-introduction.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,76 @@
# Architecture
# Cluster Architecture

<!--need to add a section about how users can work with a cluster with all nodes as core nodes-->

EMQX 5.0 redesigns the cluster architecture with [Mria](https://github.com/emqx/mria), which significantly improves EMQX's horizontal scalability. The new design supports 100,000,000 MQTT connections with a single cluster.
Starting from EMQX 5.0, a new [Mria](https://github.com/emqx/mria) cluster architecture was introduced, along with a redesigned data replication mechanism. This significantly enhanced EMQX's horizontal scalability and is one of the key factors enabling a single EMQX 5.0 cluster to support up to 100 million MQTT connections.

<img src="./assets/EMQX_Mria_architecture.png" alt="EMQX Mria" style="zoom: 40%;" />
This page introduces the EMQX cluster deployment model under the new architecture, as well as key considerations during deployment. For automated cluster deployment, refer to the [EMQX Kubernetes Operator](https://www.emqx.com/zh/emqx-kubernetes-operator) and the guide on [Configuring EMQX Core and Replicant Nodes](https://docs.emqx.com/en/emqx-operator/latest/tasks/configure-emqx-core-replicant.html).

In this [Mria](https://github.com/emqx/mria), each node assumes one of two roles: Core node or Replicant.
Core nodes serve as a data layer for the database.
Replicant nodes connect to Core nodes and passively replicate data updates from Core nodes. On how core and replicant node works, you can continue to read the [EMQX clustering](../../design/clustering.md).
::: tip Prerequisite Knowledge

By default, all nodes assume the Core node role, so the cluster behaves like that in [EMQX 4.x](https://docs.emqx.com/en/enterprise/v4.4/getting-started/cluster.html#node-discovery-and-autocluster), which is recommended for a small cluster with 3 nodes or fewer. The Core + Replicant mode is only recommended if there are more than 3 nodes in the cluster.
It is recommended to first read [EMQX Clustering](./introduction.md).

## Enable Core + Replicant Mode
:::

To enable the Core + Replicant mode, it is necessary to designate certain nodes as replicant nodes. This is achieved by setting `node.role` parameter to `replicant`. Additionally, you need to enable an automatic cluster discovery strategy (`cluster.discovery_strategy`).
## Mria Architecture Overview

::: tip
Mria is an open-source extension of Erlang’s native database, Mnesia, that enables eventual consistency in data replication. With asynchronous transaction log replication enabled, the node connection topology shifts from Mnesia’s **fully meshed** model to Mria’s **mesh + star** hybrid topology.

<img src="./assets/EMQX_Mria_architecture.png" alt="EMQX Mria" style="zoom: 30%;" />

### Node Role Description

Nodes in the cluster are categorized into two roles: Core nodes and Replicant nodes.

#### Core Nodes

Core nodes form the fully meshed data layer of the cluster. Each core node holds a complete and up-to-date replica of the data, ensuring fault tolerance: as long as one core node remains available, data is not lost. Core nodes are generally static and persistent, and are not recommended to be auto-scaling (i.e., frequently added, removed, or replaced).

#### Replicant Nodes

Replicant nodes connect to core nodes and passively replicate data updates from them. They are not allowed to perform write operations; instead, any writes are forwarded to the core nodes for processing. With a full local copy of data, replicants offer fast read access and lower routing latency.

### Advantages of the Mria Architecture

The Mria architecture combines the strengths of leaderless replication and master-slave replication, offering several key benefits:

- **Improved horizontal scalability**: EMQX 5.0 supports large-scale clusters with up to 23 nodes.
- **Simplified cluster auto-scaling**: Replicant nodes can be added or removed dynamically to support automated scaling.

In contrast to EMQX 4.x, where all nodes used a fully connected topology (increasing sync overhead as node count grew), EMQX 5.0 avoids this issue by keeping replicant nodes read-only. As more replicants join the cluster, write efficiency is not affected, enabling the formation of much larger clusters.

Moreover, replicant nodes are designed to be disposable and easily scaled in or out without affecting data redundancy. This makes them ideal for auto-scaling groups and improves DevOps practices.

> **Note**: As the dataset grows, the initial data sync from core nodes to a new replicant can become resource-intensive. Avoid overly aggressive auto-scaling policies for replicant nodes to prevent performance issues.

## Deployment Architecture

By default, all nodes assume the Core node role, so the cluster behaves like that in [EMQX 4.x](https://docs.emqx.com/en/enterprise/v4.4/getting-started/cluster.html#node-discovery-and-autocluster), which is recommended for a small cluster with 7 nodes or fewer. The Core + Replicant mode is only recommended if there are more than 7 nodes in the cluster.

::: tip Note

The Core + Replicant cluster architecture is available only in EMQX Enterprise. The open-source edition supports Core-only clusters.

Only the EMQX Enterprise supports the Core + Replicant mode. The EMQX Open Source only supports the Core node clustering.
:::

::: tip Recommendation

A cluster must include at least one Core node. As a best practice, we recommend starting with 3 Core nodes + N Replicant nodes.

:::

Node role assignment should be based on actual business requirements and the expected cluster size:

| Scenario | Recommended Deployment |
| -------------------------- | ------------------------------------------------------------ |
| Small cluster (≤ 7 nodes) | Core-only mode is sufficient; all nodes handle MQTT traffic. |
| Medium-sized cluster | Whether Core nodes handle MQTT traffic depends on workload; test for best results. |
| Large cluster (≥ 10 nodes) | Core nodes act only as the database layer. Replicant nodes handle all MQTT traffic to maximize stability and scalability. |

## Enable Core + Replicant Mode

To enable the Core + Replicant mode, it is necessary to designate certain nodes as replicant nodes. This is achieved by setting `node.role` parameter to `replicant`. Additionally, you need to enable an automatic cluster [discovery strategy](./create-cluster.md#node-discovery) (`cluster.discovery_strategy`).

::: tip

Replicant nodes cannot use `manual` discovery strategy to discover core nodes.
Expand All @@ -42,6 +91,18 @@ cluster {
}
```

## Network and Hardware Requirements

### Network

- Network latency between Core nodes should be less than 10 ms. Latency exceeding 100 ms may cause cluster failures.
- It is strongly recommended to deploy Core nodes within the same private network.
- Replicant nodes should also be deployed in the same private network as Core nodes, although the network quality requirements are slightly more relaxed.

### CPU and Memory

Core nodes require more memory, but consume relatively low CPU when not handling client connections. Replicant nodes follow the same hardware sizing as in EMQX 4.x, and their memory requirements should be estimated based on the expected number of connections and message throughput.

## Monitor and Debug

<!-- TODO 后续补充数值类型 Gauge or Counter -->
Expand Down Expand Up @@ -73,35 +134,8 @@ You can integrate with Prometheus to monitor the cluster operations. On how to i

### Console Commands

You can also monitor the operating status of the cluster with command `emqx eval 'mria_rlog:status().'` on the Erlang console.
You can also monitor the operating status of the cluster with the command `emqx eval 'mria_rlog:status().'` on the Erlang console.

If EMQX cluster is operating normally, you can get a list of status information, for example, the current log level, the number of messages processed, and the number of messages dropped.
If the EMQX cluster is operating normally, you can get a list of status information, for example, the current log level, the number of messages processed, and the number of messages dropped.

<!--Here we need a query statement and the returned message, and can we link this Erlang console to https://www.erlang.org/doc/man/shell.html -->

## Pseudo-Distributed Cluster

EMQX also provides a pseudo-distributed cluster feature for testing and development purposes. It refers to a cluster setup where multiple instances of EMQX are running on a single machine, with each instance configured as a node in the cluster.

After starting the first node, use the following command to start the second node and join the cluster manually. To avoid port conflicts, we need to adjust some listening ports:

```bash
EMQX_NODE__NAME='emqx2@127.0.0.1' \
EMQX_LOG__FILE_HANDLERS__DEFAULT__FILE='log2/emqx.log' \
EMQX_STATSD__SERVER='127.0.0.1:8124' \
EMQX_LISTENERS__TCP__DEFAULT__BIND='0.0.0.0:1882' \
EMQX_LISTENERS__SSL__DEFAULT__BIND='0.0.0.0:8882' \
EMQX_LISTENERS__WS__DEFAULT__BIND='0.0.0.0:8082' \
EMQX_LISTENERS__WSS__DEFAULT__BIND='0.0.0.0:8085' \
EMQX_DASHBOARD__LISTENERS__HTTP__BIND='0.0.0.0:18082' \
EMQX_NODE__DATA_DIR="./data2" \
./bin/emqx start

./bin/emqx ctl cluster join emqx1@127.0.0.1
```

The above code example is to create a cluster manually, you can also refer to the [auto clustering](./create-cluster.md#auto-clustering) section on how to create a cluster automatically.

The dashboard is designed under the assumption that all cluster nodes use the same port number. Using distinct ports on a single computer may cause Dashboard UI issues, therefore, it is not recommended in production.

<!--to add a quickstart with the pseudo-distributed cluster @WIVWIV -->
4 changes: 2 additions & 2 deletions en_US/design/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@ The online configuration management feature allows you to make configuration cha

The most important distributed data structure in an MQTT broker cluster is the routing table, which is used to store the routing information of all topics. The routing table is used to determine which nodes should receive a message published to a particular topic. In this section, we will discuss how EMQX ensures that the routing table is consistent across all nodes in the cluster.

EQMX cluster makes use of full ACID (Atomicity, Consistency, Isolation, Durability) transactions to ensure that the routing table is consistent across all the `core` nodes in the cluster. and employs asynchronous replication from the `core` nodes to the `replica` nodes to ensure that the routing table is eventually consistent across all nodes in the cluster.
The EQMX cluster makes use of full ACID (Atomicity, Consistency, Isolation, Durability) transactions to ensure that the routing table is consistent across all the `core` nodes in the cluster and employs asynchronous replication from the `core` nodes to the `replica` nodes to ensure that the routing table is eventually consistent across all nodes in the cluster.

Let's dive into the details of how EMQX data consistency is achieved.

### Data Replication Channels

In an EMQX cluster, there are two data replication channels.

- Metadata replication, such as routing information on which (wildcard) topics are being subscribed by which nodes.
- Metadata replication, such as routing information on which (wildcard) topics are being subscribed to by which nodes.

* Message delivery, such as when forwarding messages from one node to another.

Expand Down
2 changes: 1 addition & 1 deletion en_US/gateway/ocpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ As the concept of username and password is already defined in the connection mes

OCPP gateway uses the information in the Basic Authentication of the Websocket handshake message to generate the authentication fields for the client:

- Client ID: Valu of the part of the connection address after the fixed path prefix.
- Client ID: Value of the part of the connection address after the fixed path prefix.
- Username: Value of the Username in the Basic Authentication.
- Password: Value of the Password in the Basic Authentication.

Expand Down
Binary file added zh_CN/deploy/cluster/assets/EMQX_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added zh_CN/deploy/cluster/assets/mnesia-cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions zh_CN/deploy/cluster/create-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -507,3 +507,28 @@ $ ./bin/emqx ctl cluster status
- TCP IPv6:`inet6_tcp`

要启用 SSL,您首先需要将 `cluster.proto_dist` 设置为 `inet_tls`,然后在 `etc` 文件夹中配置 `ssl_dist.conf` 文件并指定 TLS 证书。详情参见 [使用 TLS 进行 Erlang 分布式处理](https://www.erlang.org/doc/apps/ssl/ssl_distribution.html)。

## 本地测试:伪分布式集群

EMQX 支持伪分布式集群功能,适用于测试和开发场景。所谓“伪分布式集群”,是指在同一台机器上运行多个 EMQX 实例,并将每个实例配置为集群中的一个节点。

启动第一个节点后,可使用以下命令启动第二个节点,并手动将其加入集群。为避免端口冲突,需要调整各类监听端口:

```bash
EMQX_NODE__NAME='emqx2@127.0.0.1' \
EMQX_LOG__FILE_HANDLERS__DEFAULT__FILE='log2/emqx.log' \
EMQX_STATSD__SERVER='127.0.0.1:8124' \
EMQX_LISTENERS__TCP__DEFAULT__BIND='0.0.0.0:1882' \
EMQX_LISTENERS__SSL__DEFAULT__BIND='0.0.0.0:8882' \
EMQX_LISTENERS__WS__DEFAULT__BIND='0.0.0.0:8082' \
EMQX_LISTENERS__WSS__DEFAULT__BIND='0.0.0.0:8085' \
EMQX_DASHBOARD__LISTENERS__HTTP__BIND='0.0.0.0:18082' \
EMQX_NODE__DATA_DIR="./data2" \
./bin/emqx start

./bin/emqx ctl cluster join emqx1@127.0.0.1
```

上述示例展示的是如何手动创建集群。如需了解如何自动创建集群,可参阅[自动集群](#自动集群)一节。

需要注意的是,Dashboard 的设计默认集群中所有节点使用相同的端口号。若在同一台计算机上使用不同端口,可能会导致 Dashboard 页面显示异常,因此不建议在生产环境中使用此方式。
Loading
Loading