Phase 2: Core Architecture (LDC, Unitization, Multi-Active)

This phase focuses on the architecture that enables peak scale while preserving correctness and operational control.

2.1 LDC and Unitization (Cell Architecture)

The core idea: a “unit”

A unit is a self-contained slice of the system that can handle end-to-end business flows for a subset of users/traffic.

Complete in services: the unit has the full set of required services.
Partial in data: data is sharded so each unit owns a subset (e.g., by user-id range).

The key goal is horizontal scaling with isolation: add units to add capacity, and contain failures within a unit when possible.

LDC zones (conceptual model)

Many descriptions of LDC are easiest to understand as three “zones”:

RZone (Regional / Unit Zone): the main workhorse units (multiple active).
GZone (Global): truly global, low-frequency shared data/control (minimize scope).
CZone (City / Common hot data): shared hot data needed frequently across units (optimize latency).

Conceptual sketch:

┌─────────────────────────────────────────────────────────────┐
│                    LDC Architecture                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐       │
│   │   RZone 1   │   │   RZone 2   │   │   RZone N   │       │
│   │  (Region)   │   │  (Region)   │   │  (Region)   │       │
│   ├─────────────┤   ├─────────────┤   ├─────────────┤       │
│   │ • App Layer │   │ • App Layer │   │ • App Layer │       │
│   │ • Data Shard│   │ • Data Shard│   │ • Data Shard│       │
│   │ • Full Svcs │   │ • Full Svcs │   │ • Full Svcs │       │
│   └──────┬──────┘   └──────┬──────┘   └──────┬──────┘       │
│          │                 │                 │              │
│          └─────────────────┼─────────────────┘              │
│                            │                                │
│                   ┌──────────┴──────────┐                     │
│                   │                     │                     │
│            ┌──────┴──────┐      ┌──────┴──────┐              │
│            │   GZone     │      │   CZone     │              │
│            │  (Global)   │      │  (City)     │              │
│            ├─────────────┤      ├─────────────┤              │
│            │ • Config    │      │ • User Info │              │
│            │ • CIF       │      │ • Login     │              │
│            │ • Shared    │      │ • Frequent  │              │
│            └─────────────┘      └─────────────┘              │
└─────────────────────────────────────────────────────────────┘

Why this matters

Problem	Unitization/LDC response
One shared core becomes a bottleneck	Split state and traffic into multiple units
One failure can cascade globally	Localize blast radius to a unit when possible
Scaling requires risky “bigger DB” moves	Scale by adding units + partitions
Cross-region latency and DR are hard	Multi-active by design, not an afterthought

2.2 Database Layer: OceanBase (Why the DB became a pillar)

At peak scale, the database is not a component; it is a platform. The architectural thesis is:

If the database cannot scale horizontally with strong correctness, the rest of the architecture becomes fragile.
Financial systems require durable correctness under concurrency, not just throughput benchmarks.

OceanBase is often described as a distributed SQL system that:

Partitions data into table partitions / shards.
Replicates partitions across zones for high availability.
Uses consensus to maintain correctness under failures.

The architectural implication: “scale out” becomes normal for the database, not a once-a-year emergency.

2.3 Messaging and Asynchronous Boundaries

Large peak systems rely on message-driven decoupling:

Soften coupling between services during spikes.
Buffer bursty workloads.
Enable degrade strategies: keep the core payment path clean while deferring non-critical work.

At this scale, messaging is typically treated as a reliability layer:

Clear topic/event naming and contracts.
Back-pressure and throttling strategies.
Retries, DLQs, and idempotency by design.

2.4 Reliability Patterns (What keeps peaks boring)

Regardless of the exact implementation, the patterns are recognizable in most peak systems:

Multi-active: multiple active regions/cells; avoid single-region dependency.
Traffic routing: route requests to the correct unit; enforce locality.
Circuit breakers / throttling / degradation: preserve the payment core by shedding optional load.
Isolation by design: separate critical paths from non-critical paths.
Operational determinism: everything above is validated via full-link testing (Phase 3).

2.5 Cloud-native evolution (architecture → efficiency)

Once the architecture supports horizontal scaling and isolation, cloud-native adoption can improve:

Elastic capacity (shorter provisioning windows).
Operational automation (repeatable environments and deploys).
Cost per transaction (efficiency becomes measurable and improvable).

Key Takeaways

Unitization is the scaling unlock: it turns vertical ceilings into horizontal growth.
The database must be designed for peak correctness: correctness and durability are part of the product.
Messaging is a reliability primitive: it’s not only “async,” it’s peak control.
Architecture only works when operations are deterministic: that’s Phase 3.

2.1 LDC and Unitization (Cell Architecture)#

The core idea: a “unit”#

LDC zones (conceptual model)#

Why this matters#

2.2 Database Layer: OceanBase (Why the DB became a pillar)#

2.3 Messaging and Asynchronous Boundaries#

2.4 Reliability Patterns (What keeps peaks boring)#

2.5 Cloud-native evolution (architecture → efficiency)#

Key Takeaways#

References & Further Reading#