Streaming UI with React Suspense: The Right Mental Model
Suspense lets you stream server-rendered HTML progressively — no client-side loading states needed. Here’s how it works and when to reach for it in AI-native apps.
May 11, 2026 · 5 min read
Most loading UIs follow the same pattern: fetch starts, spinner appears, data arrives, spinner disappears. It works, but it has a hidden cost — the browser receives a complete HTML shell with no useful content, then waits for JavaScript to hydrate and fetch before showing anything real. Suspense with server streaming flips that model entirely.
What streaming actually means
When Next.js renders a route, it can send the HTML response in chunks rather than waiting for all async work to complete. A <Suspense> boundary marks which parts of the tree can arrive late. The server sends the shell immediately — header, layout, static content — then flushes each Suspense boundary as its data resolves. The browser paints something useful in milliseconds, and the rest fills in without a full-page reload.
This is not JavaScript-driven lazy loading. The chunks arrive as raw HTML over the same HTTP response connection. React uses a small inline script to swap in the real content when each chunk lands — the browser has already rendered and painted the shell before any hydration runs.
The code is simpler than you expect
Wrap any slow Server Component in <Suspense> with a fallback. React handles the rest:
import { Suspense } from "react";
import { RecentActivity } from "./RecentActivity"; // async Server Component
export default function DashboardPage() {
return (
<main>
<h1>Dashboard</h1>
{/* Renders immediately — no data dependency */}
<QuickStats />
{/* Streams in when the DB query in RecentActivity resolves */}
<Suspense fallback={<ActivitySkeleton />}>
<RecentActivity />
</Suspense>
</main>
);
}RecentActivity is a plain async Server Component that awaits a database query. No useEffect, no client state, no manual fetch. The component suspends while awaiting, and the fallback skeleton holds its place in the HTML until the real content is ready.
Why this matters for AI products
LLM inference is slow by nature — a typical generation can take 2–10 seconds. Without streaming, users stare at a blank panel or a spinner for that entire window. With Suspense and the Vercel AI SDK's streamText, you can start flushing tokens to the UI as soon as the first one arrives:
// app/api/generate/route.ts
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
export async function POST(req: Request) {
const { prompt } = await req.json();
const result = streamText({
model: openai("gpt-4o-mini"),
prompt,
});
return result.toDataStreamResponse();
}
// components/GenerateButton.tsx — "use client"
"use client";
import { useCompletion } from "@ai-sdk/react";
export function GenerateButton({ prompt }: { prompt: string }) {
const { completion, complete, isLoading } = useCompletion({ api: "/api/generate" });
return (
<div>
<button onClick={() => complete(prompt)} disabled={isLoading}>
{isLoading ? "Generating…" : "Generate"}
</button>
<p>{completion}</p>
</div>
);
}The server streams tokens; useCompletionappends them as they arrive. From the user's perspective, output starts appearing immediately rather than all at once after a long wait. Perceived latency drops even though total generation time is identical.
Nesting boundaries for granular control
You can nest <Suspense> boundaries to reveal content in stages. A page might stream in the hero section, then a data table, then related recommendations — each on its own timeline, each with its own skeleton. The user always has something to look at.
One practical rule: put a boundary around each independently-fetching subtree, not around the entire page. A single top-level boundary means the whole page waits for the slowest query. Granular boundaries let fast data show up fast.
When not to use Suspense
Suspense boundaries add a small coordination overhead. For data that resolves in under ~100 ms, a skeleton flash can feel worse than a brief wait. Measure first. For navigation-critical data — the content a user clicked to see — consider fetching it before the route renders using parallel route loading rather than streaming it in after. Streaming is a tool for slow, independent data; don't reach for it by default on every component.
The key shift
The mental model change is worth internalizing: the HTTP response is no longer atomic. It starts the moment the first byte is ready and continues until the last Suspense boundary resolves. Your UI architecture should reflect that — design pages as a composition of fast shells and slower fills, and let the browser paint progressively rather than all at once.