Generative UI architecture brings a new horizon for user experience, but it is the worst nightmare for QA and DevOps teams.
How do you write an automated test script (E2E Test) for an interface when you don’t know what content the AI will generate beforehand? And how do you ensure the system doesn’t burn through API budgets when thousands of users ask the exact same question?
6.1. The Non-deterministic Hurdle in E2E Testing
In traditional (Deterministic) applications, a Cypress or Playwright test script usually looks like this:
- Type “Hanoi” into the Search box.
- Click “Search”.
expect(page.locator('.weather-title')).toHaveText('Hanoi Weather').
However, LLMs are non-deterministic. For the exact same command, today it might return {"title": "Hanoi Weather"}, and tomorrow it might return {"title": "Weather Report for the Capital Hanoi"}. Your static tests will fail continuously (Flaky tests).
Solution 1: Completely Isolate AI and UI During Testing
Golden rule: Never call a real LLM API in your UI E2E Tests. You must Mock the WebSocket Server to return a hardcoded JSON string (e.g., calling the Component Registry directly with a fake Payload). This proves that: “As long as the AI returns a valid A2UI JSON, my Frontend is guaranteed to render it correctly.” Testing the intelligence of the AI must be pushed to a different layer (LLM Evaluation), separate from UI Testing.
Note (For Full Integration Tests): If QA strictly requires testing the entire flow going through the Backend Agent, use the VCR / Cassette Recording technique (like the pollyjs library). The first test run will call the real LLM and “record” the JSON response. Subsequent CI/CD test runs will automatically “replay” that JSON cassette to maintain Determinism.
Solution 2: Property-Based Testing
Instead of checking for a specific text string (Exact Match), check the “Properties” of the Component.
- Wrong:
expect(page).toHaveText("Transfer $500 to user B") - Right:
expect(page.locator('form[data-testid="transfer-form"]')).toBeVisible()expect(page.locator('input[name="amount"]').inputValue()).toBeGreaterThan(0)expect(page.locator('button[type="submit"]')).toBeEnabled()
By testing the presence of structure rather than specific text, your tests will survive any phrasing changes made by the AI.
6.2. Semantic Caching at the Edge
Another major issue is cost and latency. If 1,000 users type “How to change password”, calling the OpenAI API 1,000 times is a horrific waste of both money and waiting time (latency).
What is Semantic Caching?
Traditional caches rely on Exact Matches. If user A types "Change password", and user B types "How do I change my password", a traditional cache will miss.
Semantic Caching solves this using Vector Databases:
- User inputs a question.
- Embed the question into a Vector.
- Compare the Vector distance against questions in the Cache.
"Change password"and"How do I change my password"have very high geometric similarity (Similarity > 0.95).- Cache HIT!
Edge Caching Architecture
To optimize latency down to milliseconds, the Semantic Cache should not be placed on the Backend Server, but at the Edge — for example, using Cloudflare Workers combined with Vectorize (Cloudflare’s Vector Database).
- The user’s request hits the Cloudflare Worker at the nearest Point of Presence (e.g., POP Singapore).
- The Worker embeds the prompt into a vector and queries the Cache.
- If HIT, the Worker immediately returns the A2UI JSON structure to the Client. Latency < 50ms. Zero cost for calling OpenAI.
- Only upon a MISS is the request routed back to the Backend Kubernetes cluster for the AI Agent to process.
🔗 Next Step: You have grasped all the architectural theories from UI, State, Security, to Caching. It’s time to start coding. In the final part of this Series, we will look at the directory structure of a Boilerplate Repo and the strategy for migrating it into a legacy project: Part 7 — Reference Repository & Migration Strategy.