AI
Streaming chat and AI building blocks.
Last updated on
10 min readSyntaxKit ships a complete org-scoped streaming chat at /dashboard/ai-chat on top of the Vercel AI SDK and the Vercel AI Gateway. Everything routes through one adapter, gateway(modelId), so swapping providers is a string change rather than a refactor. The @syntaxkit/ui/components/ai-elements/* package gives you the same composer, transcript, attachments, reasoning, sources, model picker, and mic button to drop into any new AI feature.
We pick the AI Gateway over per-provider SDKs because it removes the "N keys for N providers" problem: one Vercel key, dozens of model ids, one billing surface, one place to switch when a new model lands.
How a chat turn flows
useChat, the oRPC streaming transport, and where persistence happens.
The AI SDK and Gateway
One adapter, one key, model-id strings, and the default model.
AI Elements: building blocks
The reusable transcript, composer, and streaming primitives.
Add a new AI feature
Reuse the handler shape for any streaming or one-shot feature.
How A Chat Turn Flows
The unusual bit is the transport. The kit pairs @ai-sdk/react's useChat with a custom transport.sendMessages that calls the oRPC streaming procedure client.chat.send and pipes the result through eventIteratorToUnproxiedDataStream from @orpc/client. The server returns streamToEventIterator(result.toUIMessageStream({ sendReasoning: true, sendSources: true })) so reasoning and sources arrive as first-class UI message parts. Persistence happens in onFinish so the user message is saved before the stream and the assistant message + an AiUsageEvent row land once tokens stop flowing.
The diagram source lives at apps/docs/diagrams/ai-chat-flow.mmd. Rerun pnpm --filter @syntaxkit/docs diagrams:build after editing it to refresh both SVG variants.
Package Layout
What's Wired In
| Capability | How it's enabled |
|---|---|
| Streaming chat with reasoning + sources | result.toUIMessageStream({ sendReasoning: true, sendSources: true }) in chat.send |
| Image attachments | Browser presigns via storage.presign, PUTs the bytes, calls storage.finalize; the server enforces ownership via assertOwnedAttachmentParts (key prefix images/<userId>/) |
| Web search | Pro-only; flips the model to perplexity/sonar and disables image attachments for that turn |
| Multi-model selection | Pro-only; free plan is locked to CHAT_DEFAULT_MODEL_ID |
| Voice transcription | <SpeechInput> uses the browser's Web Speech API, or hands a recorded Blob to your onAudioRecorded callback |
| Regenerate | chat.regenerate deletes the trailing assistant turn(s) and replaces with a fresh stream |
| Auto-titled chats | First-turn onFinish calls generateText({ model: gateway("openai/gpt-4o-mini") }) and writes Chat.title once |
| Cursor-paginated chat list + search | chat.list (cursor) + chat.search (case-insensitive title and content) |
| Cursor-paginated message history | chat.get returns the latest CHAT_DEFAULT_MESSAGES_PAGE_SIZE messages plus a nextCursor; chat.listMessages pages older messages from that cursor so a long chat can never produce an unbounded payload |
| Per-plan monthly response cap | reserveAiUsageEvent atomically counts and inserts an AiUsageEvent row under a per-organization Postgres advisory lock, so concurrent requests can't overshoot the cap |
| Abuse protection | Sliding-window Upstash limits keyed by userId + organizationId for the chat.send and chat.regenerate surfaces |
The AI SDK And Gateway
Two design choices to call out before wiring:
| Choice | Why |
|---|---|
| Vercel AI SDK over per-provider SDKs | One streaming primitive (streamText), one prompt format (ModelMessage), one tool-calling protocol. Nothing in the kit's chat handler is OpenAI-specific. |
| AI Gateway over direct provider keys | One key (AI_GATEWAY_API_KEY), one billing surface, model-id strings like openai/gpt-5.2, anthropic/claude-haiku-4.5, google/gemini-3-flash, xai/grok-4.1-fast-non-reasoning, perplexity/sonar. Switch providers by editing a string. |
The default model is CHAT_DEFAULT_MODEL_ID in packages/shared/src/schemas/chat.ts. The selectable model list lives in apps/web/components/dashboard/ai-chat/chat-view/models.ts (the models array) so each entry can carry display metadata (chef, provider slug for the logo, label). Adding a model is a two-line edit when you also want it in the picker; it's a zero-line edit when you just want the server to accept it (any string is forwarded to gateway()).
Env vars. The chat path needs exactly one AI-specific env: AI_GATEWAY_API_KEY (recognized by the AI SDK's gateway() adapter). The kit declares it in turbo.json for cache invalidation, but does not yet ship it in apps/web/.env.example.
Add AI_GATEWAY_API_KEY to your environment when you set up the kit. Without it, every AI request fails at the SDK boundary with an authentication error. Get a key from vercel.com/dashboard under "AI Gateway".
The full list of supported providers and current model ids lives at models.dev; whatever string the gateway accepts there will work in gateway(modelId) here.
The Streaming Procedure
chat.send is a single-purpose oRPC procedure (POST /rpc/chat/send) that flows through five gates before it ever opens a model connection. Listed in order:
Validate attachment ownership
assertOwnedAttachmentParts walks every file part on the incoming message and checks each url against NEXT_PUBLIC_S3_PUBLIC_URL. The key must start with images/<userId>/, where <userId> is the session user. This forces every chat attachment to have already gone through the kit's presign + finalize pipeline before it's allowed near the model.
Resolve the model and check billing features
resolveChatModel returns perplexity/sonar when webSearch is on (and rejects an explicit non-default model in that case). Otherwise resolveRequestedModel calls assertBillingFeature(billing, "multiModelAccess", ...) if the requested model is not the default. webSearch itself is gated by assertBillingFeature(billing, "webSearch", ...).
Apply the abuse policy
enforceChatAbusePolicy calls enforceAbusePolicy for the chat.send surface with userId and organizationId characteristics. Sliding-window limits live in packages/shared/src/abuse.ts. Going over returns TOO_MANY_REQUESTS with a retryAfter payload.
Build the model context
buildModelContextMessages walks history newest-first, capped at CHAT_MAX_CONTEXT_MESSAGES (40) and CHAT_MAX_CONTEXT_CHARACTERS (20,000). User parts that include images become structured { type: "image", image: URL, mediaType } content; everything else becomes plain { role, content }. The system prompt is set to "You are a helpful assistant." and lives inline in chat.ts, so changing it is a one-liner.
Reserve the monthly response slot
reserveAiUsageEvent atomically counts the active billing window (the active subscription period, or the current calendar month for the free plan) and inserts a new AiUsageEvent row inside a single prisma.$transaction. The transaction begins with pg_advisory_xact_lock(hashtextextended('ai-usage:<orgId>', 0)), which serialises concurrent reservations for the same organization so the count + insert pair is race-free. Free plans are capped at monthlyAiResponses: 100; Pro is null (unlimited, which skips the lock and just inserts). Going over throws CONFLICT. assertWithinAiResponseLimit is still exported as a non-mutating predicate for surfaces that need a soft check (e.g. dashboards), but the chat router never relies on it for enforcement.
If the model call later errors, streamText.onError (and a synchronous try/catch around streamText) issues a best-effort prisma.aiUsageEvent.delete to refund the reservation, so a failed turn doesn't consume the user's quota.
After the gates pass, the user Message is persisted, then:
const result = streamText({
model: gateway(modelId),
system: "You are a helpful assistant.",
messages: modelMessages,
onError: async () => {
// Refund the reservation row so a failed turn does not consume the cap.
},
onFinish: async ({ text }) => {
// Persist the assistant Message and (on the first turn)
// generateText({ model: gateway("openai/gpt-4o-mini") }) to set
// Chat.title and flip Chat.titleGenerated. The AiUsageEvent row has
// already been written by reserveAiUsageEvent before streaming began.
},
});
return streamToEventIterator(
result.toUIMessageStream({
sendReasoning: true,
sendSources: true,
})
);chat.regenerate mirrors this: same gates, same streamText shape, but onFinish deletes the trailing assistant turn(s) before inserting the replacement and writes an AiUsageEvent with kind: "chat_regenerate" instead of "chat_send".
The Client: useChat With An oRPC Transport
The kit uses @ai-sdk/react's useChat over an oRPC streaming procedure rather than a plain fetch endpoint. The bridge is one helper:
import { useChat } from "@ai-sdk/react";
import { eventIteratorToUnproxiedDataStream } from "@orpc/client";
import { client } from "@/lib/orpc";
const { messages, sendMessage, status, stop } = useChat({
id: chatId,
messages: seedMessages,
transport: {
async sendMessages(options) {
const latestMessage = options.messages[options.messages.length - 1];
return eventIteratorToUnproxiedDataStream(
await client.chat.send(
{
chatId: options.chatId,
messages: buildSendPayloadMessages(latestMessage),
model: webSearchRef.current ? undefined : modelRef.current,
webSearch: webSearchRef.current,
},
{ signal: options.abortSignal }
)
);
},
reconnectToStream() {
throw new Error("Unsupported");
},
},
onFinish: () => {
// Invalidate the sidebar list and dashboard stats.
},
});A few patterns worth knowing:
| Pattern | Where |
|---|---|
seedMessages | [chatId]/page.tsx server-prefetches chat.get (latest CHAT_DEFAULT_MESSAGES_PAGE_SIZE messages plus a nextCursor); ChatView reads it with useSuspenseQuery and maps DB rows to UIMessages once via dbMessagesToUIMessages keyed to chatId. Older history is fetched on demand via chat.listMessages and prepended through useChat's setMessages. |
status / stop | Pass status to <PromptInputSubmit> so the button toggles between submit, stop, and pending |
| Regenerate | Drains client.chat.regenerate outside useChat (a plain for await) and ends with router.refresh() so RSC data updates |
initialPrompt | The ?prompt= URL param from the new-chat hub auto-sends once when status === "ready", then router.replace strips the query |
| Refs over closures | modelRef and webSearchRef ensure the transport reads the latest user choice, since useChat captures the transport once per id |
AI Elements: Building Blocks
Reusable presentation primitives. Group them by what they do, not what file they live in. All export from @syntaxkit/ui/components/ai-elements/<file>.
Transcript Primitives
Conversation
Scrollable shell with stick-to-bottom behavior, empty state, jump-to-bottom button, and Markdown download. Pure UI; takes a simple { role, content }[] for download.
Message
Row layout for one chat turn (user vs assistant styling). Required prop: from: UIMessage['role']. Sub-components: MessageContent, MessageActions, MessageBranch*.
MessageResponse
Memoized assistant body that renders Markdown via Streamdown (CJK, code highlighting, math, mermaid). Use it as the children of MessageContent for assistant turns.
MessageBranch
Optional 'multiple drafts' switcher for regenerated turns: MessageBranchSelector, MessageBranchPrevious, MessageBranchNext, MessageBranchPage.
Composer Primitives
PromptInput
Form shell with hidden file input, drag-and-drop, paste support, and an attachments context. Required prop: onSubmit(message: PromptInputMessage, event).
PromptInputTextarea / Submit / Tools
Layout slots that compose inside PromptInput: textarea with sensible defaults, submit/stop button driven by ChatStatus, tool-row container for buttons.
Attachments
Grid, inline, or list layouts for FileUIPart and SourceDocumentUIPart. Pair with usePromptInputAttachments() inside PromptInput to render staged uploads.
ModelSelector
Command-palette dialog model picker. ModelSelectorLogo pulls provider art from models.dev. Pure UI; the kit hides it for free plans.
SpeechInput
Mic button. Uses the browser Web Speech API when available and supported; otherwise records audio and defers transcription to your onAudioRecorded callback.
Suggestions
Horizontally scrollable row of pill buttons for starter prompts. Used by the new-chat hub and as an empty state inside chats.
Streaming-Aware Primitives
Reasoning
Collapsible thinking block. Tracks streaming duration, animates the trigger label with Shimmer, and renders the body via Streamdown. Wire isStreaming to your useChat status.
Sources
Collapsible citations list. Trigger shows 'Used N sources'; Source is a styled external-link row. Render this when a UIMessage has source-url parts.
Message, PromptInput, Attachments, and PromptInputSubmit lean on ai-package types (UIMessage, ChatStatus, FileUIPart, SourceDocumentUIPart). The rest are pure presentation, so you can use them outside useChat (for one-shot generations or non-chat AI features).
Billing And Limits
Two limit surfaces gate AI usage. Per-plan features control what a user is allowed to do; per-request limits control what fits in a single payload.
Per-plan features
| Feature | Free | Pro |
|---|---|---|
monthlyAiResponses | 100 / month | Unlimited (null) |
multiModelAccess (model picker beyond default) | No | Yes |
webSearch (Perplexity Sonar) | No | Yes |
The cap is enforced by counting AiUsageEvent rows (kind: "chat_send" or "chat_regenerate") in the active billing window. See Billing for how plans and entitlements are configured, and Storage: How An Upload Flows for the attachment pipeline that feeds chat images.
Per-request limits
All defined in packages/shared/src/schemas/chat.ts and enforced by Zod on the server.
| Constant | Value | What it bounds |
|---|---|---|
CHAT_MAX_MESSAGES_PER_REQUEST | 1 | Only the latest user turn is sent on each call |
CHAT_MAX_PARTS_PER_MESSAGE | 4 | Text + file parts per message |
CHAT_MAX_TEXT_LENGTH_PER_PART | 4,000 | Characters in any single text part |
CHAT_MAX_USER_MESSAGE_TEXT_LENGTH | 4,000 | Total characters across all text parts in one message |
CHAT_MAX_CONTEXT_MESSAGES | 40 | History the server includes when calling the model |
CHAT_MAX_CONTEXT_CHARACTERS | 20,000 | Total history characters across included messages |
CHAT_DEFAULT_MESSAGES_PAGE_SIZE | 50 | Messages returned per page from chat.get and chat.listMessages |
CHAT_MAX_MESSAGES_PAGE_SIZE | 100 | Hard cap for the per-page message limit |
History is trimmed newest-first inside buildModelContextMessages, so an active conversation always keeps its tail and the trim shows up only on long threads.
Abuse Protection
chat.send and chat.regenerate are surfaces in the shared abuse policy. Each request is keyed by userId and organizationId; both characteristics must be present, otherwise the surface fails closed.
When Upstash Redis is not configured, the chat handler logs a single warning per process and continues without rate limits. This is intentional for local development. Configure Upstash before going live; otherwise nothing throttles a runaway client. See Security: Abuse Protection (Upstash).
Tuning happens in packages/shared/src/abuse.ts: each surface declares its window and limit per characteristic, so you can lower the per-user cap without changing the per-org cap.
Adding A New AI Feature
Reuse the same handler shape for a new feature (summarize a doc, generate alt text, draft an email, anything). The pattern is the streaming chat in miniature.
Define the schema in packages/shared
Drop a new file at packages/shared/src/schemas/<feature>.ts with Zod input/output schemas, plus any per-request constants (max input length, etc.). Re-export from packages/shared/src/schemas/index.ts so the API and the client see them.
export const summarizeInputSchema = z.object({
text: z.string().min(1).max(50_000),
style: z.enum(["bullet", "tldr", "executive"]).default("tldr"),
});Add usage accounting (optional)
If the feature should count toward a quota, add a model to packages/database/prisma/models/ai.prisma (mirror AiUsageEvent) or extend AiUsageEvent.kind with a new value, then run a migration. Skip this for free, internal-only features.
Write the oRPC procedure
Reuse organizationChatProcedure-style middleware: org-scoped auth, billing assertion, abuse gate, then streamText({ model: gateway(modelId) }) for streams or generateText for one-shot.
import { CHAT_DEFAULT_MODEL_ID } from "@syntaxkit/shared";
export const summarize = authorized
.use(withActiveOrganization)
.route({ path: "/summarize", method: "POST" })
.input(summarizeInputSchema)
.handler(async ({ context, input }) => {
const billing = await getBillingState(context.organization.id);
// Atomically reserve a slot in the monthly cap before spending
// gateway tokens. Refund the reservation if the model errors.
const usage = await reserveAiUsageEvent(billing, context.organization.id, {
kind: "chat_send",
chatId: null,
createdByUserId: context.user.id,
});
try {
const result = streamText({
model: gateway(CHAT_DEFAULT_MODEL_ID),
system: "Summarize the input in the requested style.",
prompt: input.text,
onError: () =>
prisma.aiUsageEvent
.delete({ where: { id: usage.id } })
.catch(() => {}),
});
return streamToEventIterator(result.toUIMessageStream());
} catch (error) {
await prisma.aiUsageEvent
.delete({ where: { id: usage.id } })
.catch(() => {});
throw error;
}
});Pick the right return shape
Streams: return streamToEventIterator(result.toUIMessageStream(...)). One-shots: return { text } from a plain generateText call. Streams give you token-by-token UI; one-shots give you a single mutation result.
Wire the client
For streams, use useChat with the same transport.sendMessages pattern (point it at your new procedure). For one-shots, a plain useMutation(orpc.<feature>.run.mutationOptions()) is fine; the response is a typed object, no streaming bookkeeping.
Compose the UI from ai-elements
Drop in <PromptInput> for input and <Conversation> + <Message> + <MessageResponse> for output. Pull in <Reasoning> and <Sources> if your model returns them. None of these primitives are chat-specific; they work for any UIMessage-shaped surface.
Where To Go Next
API
oRPC patterns: middleware, streaming procedures, OpenAPI surface, calling from React.
Billing
How plans, entitlements, and feature flags drive AI gating.
Storage
The presign + finalize + ownership pipeline that feeds chat image attachments.
Customization
The shadcn / Tailwind layer that the ai-elements primitives extend.
