This page maps the ideas from the Double 11 architecture story to modern cloud-native equivalents. The goal is not “copy Alipay tooling,” but “copy the patterns.”
1) LDC / Unitization vs Kubernetes Multi-Cluster (Cells)
LDC / unitization is conceptually similar to cell-based architecture:
- Each cell/unit is self-contained for a slice of users/traffic.
- You scale by adding cells, not by stretching a shared core.
- Failures are contained where possible.
Modern equivalents:
- Kubernetes multi-cluster (or multi-namespace) with strict tenancy boundaries.
- Sharding strategy (by user-id, region, merchant, etc.).
- Traffic routing and locality policies (Gateway/API Gateway + service routing).
When to use which:
- If you are building a new system: cell-based architecture on Kubernetes is a practical default.
- If you already have a large monolith: cell boundaries can be introduced gradually via routing + data partitioning.
2) OceanBase vs Modern Distributed Databases
OceanBase represents a class of systems: distributed SQL with strong correctness guarantees at scale.
Modern options (depending on constraints):
- TiDB: practical distributed SQL, often attractive for MySQL ecosystem teams.
- CockroachDB: strongly consistent, geo-distributed SQL design.
- Vitess: sharding for MySQL while keeping MySQL as the storage engine (different trade-offs).
Selection heuristics:
- Need cross-region consistency and clear correctness -> distributed SQL (CockroachDB/TiDB-style).
- Need to keep MySQL operational model -> Vitess-style sharding (accept application-level complexity).
- Need HTAP / mixed workloads -> evaluate TiDB / hybrid architectures.
3) RocketMQ vs Kafka vs Pulsar
At Double 11 scale, MQ is a reliability and control plane. Modern choices usually come down to:
- Kafka: broad ecosystem, strong default choice for logs/streams; operational maturity matters.
- Pulsar: separation of storage/compute and multi-tenancy features; different operational model.
- RocketMQ: strong history in certain ecosystems; comparable concepts exist across MQs.
What matters more than the brand:
- Partitioning strategy (avoid hot partitions).
- Consumer lag and recovery playbooks.
- Idempotency, DLQs, retry policies.
- Observability (lag, throughput, failure rates).
4) SOFA RPC vs gRPC (and modern RPC choices)
Modern default: gRPC (or HTTP/2-based RPC) with:
- Strict contracts (protobuf/IDL).
- Built-in deadlines/timeouts.
- Interceptors for tracing/metrics/logging.
Key design guidance:
- Make failure semantics explicit (timeouts, retries, circuit breaking).
- Prevent retry storms.
- Standardize observability and rollout practices at the framework layer.
5) SOFAMesh vs Istio/Linkerd (Service Mesh)
Service mesh is a method of enforcing policy and observability uniformly:
- mTLS, identity, and authorization.
- Traffic policies (timeouts, retries, outlier detection).
- Consistent telemetry.
Modern choices:
- Istio: feature-rich, complexity trade-off.
- Linkerd: simpler operational profile, strong for many teams.
Guidance:
- Adopt mesh to standardize security and traffic policy; avoid adopting it “just because.”
- Ensure teams have the operational discipline to run it (SLOs, upgrades, observability).
6) Risk Control Systems vs Modern ML Platforms
“Risk control” is not a single model. It is a system:
- Streaming feature pipelines
- Real-time inference
- Rules + ML layering
- Auditability and rapid policy updates
Modern building blocks:
- Stream processing (Kafka Streams/Flink equivalents)
- Feature stores / online feature serving
- Real-time model serving (strict latency budgets)
7) Decision Framework: Build vs Adopt
If you are not operating at Alipay scale, you should prefer:
- managed services,
- proven OSS components,
- and a strong operational model (testing, observability, playbooks), over building custom infrastructure.
What you should copy from Alipay regardless of scale:
- cell/unit thinking
- deterministic readiness via full-link testing
- explicit degrade strategies
- automation as a product