← Series hub ← PrevNext →

This page maps the ideas from the Double 11 architecture story to modern cloud-native equivalents. The goal is not “copy Alipay tooling,” but “copy the patterns.”

1) LDC / Unitization vs Kubernetes Multi-Cluster (Cells)

LDC / unitization is conceptually similar to cell-based architecture:

  • Each cell/unit is self-contained for a slice of users/traffic.
  • You scale by adding cells, not by stretching a shared core.
  • Failures are contained where possible.

Modern equivalents:

  • Kubernetes multi-cluster (or multi-namespace) with strict tenancy boundaries.
  • Sharding strategy (by user-id, region, merchant, etc.).
  • Traffic routing and locality policies (Gateway/API Gateway + service routing).

When to use which:

  • If you are building a new system: cell-based architecture on Kubernetes is a practical default.
  • If you already have a large monolith: cell boundaries can be introduced gradually via routing + data partitioning.

2) OceanBase vs Modern Distributed Databases

OceanBase represents a class of systems: distributed SQL with strong correctness guarantees at scale.

Modern options (depending on constraints):

  • TiDB: practical distributed SQL, often attractive for MySQL ecosystem teams.
  • CockroachDB: strongly consistent, geo-distributed SQL design.
  • Vitess: sharding for MySQL while keeping MySQL as the storage engine (different trade-offs).

Selection heuristics:

  • Need cross-region consistency and clear correctness -> distributed SQL (CockroachDB/TiDB-style).
  • Need to keep MySQL operational model -> Vitess-style sharding (accept application-level complexity).
  • Need HTAP / mixed workloads -> evaluate TiDB / hybrid architectures.

3) RocketMQ vs Kafka vs Pulsar

At Double 11 scale, MQ is a reliability and control plane. Modern choices usually come down to:

  • Kafka: broad ecosystem, strong default choice for logs/streams; operational maturity matters.
  • Pulsar: separation of storage/compute and multi-tenancy features; different operational model.
  • RocketMQ: strong history in certain ecosystems; comparable concepts exist across MQs.

What matters more than the brand:

  • Partitioning strategy (avoid hot partitions).
  • Consumer lag and recovery playbooks.
  • Idempotency, DLQs, retry policies.
  • Observability (lag, throughput, failure rates).

4) SOFA RPC vs gRPC (and modern RPC choices)

Modern default: gRPC (or HTTP/2-based RPC) with:

  • Strict contracts (protobuf/IDL).
  • Built-in deadlines/timeouts.
  • Interceptors for tracing/metrics/logging.

Key design guidance:

  • Make failure semantics explicit (timeouts, retries, circuit breaking).
  • Prevent retry storms.
  • Standardize observability and rollout practices at the framework layer.

5) SOFAMesh vs Istio/Linkerd (Service Mesh)

Service mesh is a method of enforcing policy and observability uniformly:

  • mTLS, identity, and authorization.
  • Traffic policies (timeouts, retries, outlier detection).
  • Consistent telemetry.

Modern choices:

  • Istio: feature-rich, complexity trade-off.
  • Linkerd: simpler operational profile, strong for many teams.

Guidance:

  • Adopt mesh to standardize security and traffic policy; avoid adopting it “just because.”
  • Ensure teams have the operational discipline to run it (SLOs, upgrades, observability).

6) Risk Control Systems vs Modern ML Platforms

“Risk control” is not a single model. It is a system:

  • Streaming feature pipelines
  • Real-time inference
  • Rules + ML layering
  • Auditability and rapid policy updates

Modern building blocks:

  • Stream processing (Kafka Streams/Flink equivalents)
  • Feature stores / online feature serving
  • Real-time model serving (strict latency budgets)

7) Decision Framework: Build vs Adopt

If you are not operating at Alipay scale, you should prefer:

  • managed services,
  • proven OSS components,
  • and a strong operational model (testing, observability, playbooks), over building custom infrastructure.

What you should copy from Alipay regardless of scale:

  • cell/unit thinking
  • deterministic readiness via full-link testing
  • explicit degrade strategies
  • automation as a product