In one sentence
The master-data API is expensive: every call fetches ~1,000 records through the Payload ORM, which consumes a lot of CPU/memory and locks the system. While a master-data pull is running, other API requests get starved and start timing out — that is why opening the landing page while a master-data request is in flight produces random 500 errors on unrelated pages.
What actually happens
Today — the broken path
flowchart TD
C[Client requests master data] --> ORM[Payload ORM fetches ~1,000 records]
ORM --> SPIKE[Heavy CPU + memory spike
relations, validation, mapping]
SPIKE --> LOCK[Shared resources exhausted
DB connections / event loop / memory]
LOCK --> STARVE[Other in-flight requests starved]
STARVE --> ERR[Landing page & unrelated pages
time out → random 500 errors]
classDef bad fill:#fee2e2,stroke:#ef4444,color:#7f1d1d;
classDef neutral fill:#f1f5f9,stroke:#94a3b8,color:#334155;
class C,ORM neutral;
class SPIKE,LOCK,STARVE,ERR bad;
A master-data pull comes in
The client (WPA / landing) requests master data. Each request loads ~1,000 records through the Payload ORM.
The ORM layer makes it heavy
Going through the ORM means hydrating relations, validation, and object mapping for every one of those 1,000 rows — a large CPU and memory spike per request.
The system gets locked / starved
That spike consumes shared resources (DB connections, event loop, memory), so other in-flight requests have nothing left to run on.
Other pages time out → random 500s
Unrelated requests (e.g. opening the landing page) exceed their timeout and fail with a 500 — even though nothing is wrong with those pages. The failures look random because they only happen while a master-data pull is running.
The fix
Two changes
Skip the ORM, and stop recomputing the same data
Query data directly with Drizzle
Instead of going through the Payload ORM layer, query the database directly with Drizzle. This removes the per-row hydration/validation overhead, so a master-data fetch becomes a cheap, fast SQL read instead of a heavy ORM operation.
Cache responses as files in Azure Blob Storage
Build the master-data response once and store it as a file in Azure Blob Storage. On the next request, the API just returns a pre-signed Azure Blob URL — the client downloads straight from Blob storage and the API does no heavy processing at all. The cache is invalidated/rebuilt only when the underlying master data actually changes.
After the fix — the fast path
flowchart TD
C[Client requests master data] --> API{Cache fresh
in Azure Blob?}
API -->|Yes — almost always| URL[API returns pre-signed Blob URL
no ORM, no heavy work]
URL --> DL[Client downloads file
directly from Azure Blob Storage]
DL --> FREE[App server stays free
→ no 500s on other pages]
API -->|No — only when data changed| BUILD[Drizzle direct query builds file once]
BUILD --> STORE[Store file in Azure Blob + bump version]
STORE --> URL
W[Master data is edited] -.invalidate.-> API
classDef good fill:#dcfce7,stroke:#22c55e,color:#14532d;
classDef neutral fill:#f1f5f9,stroke:#94a3b8,color:#334155;
classDef build fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a;
class C,API neutral;
class URL,DL,FREE good;
class BUILD,STORE,W build;
The expensive build runs once per data change (the dashed path), not once per request. Almost every request takes the green path and never touches the ORM.
Request flow after the fix
- ✓Client asks for master data → API returns a pre-signed Azure Blob URL (no 1,000-row fetch, no ORM work).
- ✓Client downloads the cached file directly from Azure Blob Storage, not from our app server.
- ✓The app server stays free for everything else → no more resource starvation, no more random 500s on the landing page.
- ✓Cache rebuilds only when master data changes — heavy work happens once per change, not once per request.
Effort Estimate
Landing-page side only
| Task | Hours |
|---|---|
| Rewrite master-data logic to query directly with Drizzle (drop the ORM layer) | 2h |
| Set up Azure Storage for staging | 2h |
| Implement cache logic + cache-invalidation logic | 8h |
| Set up Azure Storage for production environment | 1h |
| Total | ~13h |
This estimate covers the landing-page / API side only. It does not include the effort needed on the WPA side to consume the new pre-signed-URL response — that will be estimated and owned separately.