Skip to content
Docs just relaunched - explore the new sidebar, OG images, and AI-ready content.
Build With SyntaxKit

AI

Streaming chat and AI building blocks.

Last updated on

10 min read

SyntaxKit ships a complete org-scoped streaming chat at /dashboard/ai-chat on top of the Vercel AI SDK and the Vercel AI Gateway. Everything routes through one adapter, gateway(modelId), so swapping providers is a string change rather than a refactor. The @syntaxkit/ui/components/ai-elements/* package gives you the same composer, transcript, attachments, reasoning, sources, model picker, and mic button to drop into any new AI feature.

We pick the AI Gateway over per-provider SDKs because it removes the "N keys for N providers" problem: one Vercel key, dozens of model ids, one billing surface, one place to switch when a new model lands.

How A Chat Turn Flows

A chat turn: PromptInput sends through useChat to /rpc/chat/send, the handler runs billing and abuse checks, persists the user message, calls streamText with gateway(modelId), and the token stream rides back through toUIMessageStream and streamToEventIterator into the Conversation. onFinish persists the assistant message and an AiUsageEvent.

The unusual bit is the transport. The kit pairs @ai-sdk/react's useChat with a custom transport.sendMessages that calls the oRPC streaming procedure client.chat.send and pipes the result through eventIteratorToUnproxiedDataStream from @orpc/client. The server returns streamToEventIterator(result.toUIMessageStream({ sendReasoning: true, sendSources: true })) so reasoning and sources arrive as first-class UI message parts. Persistence happens in onFinish so the user message is saved before the stream and the assistant message + an AiUsageEvent row land once tokens stop flowing.

The diagram source lives at apps/docs/diagrams/ai-chat-flow.mmd. Rerun pnpm --filter @syntaxkit/docs diagrams:build after editing it to refresh both SVG variants.

Package Layout

What's Wired In

CapabilityHow it's enabled
Streaming chat with reasoning + sourcesresult.toUIMessageStream({ sendReasoning: true, sendSources: true }) in chat.send
Image attachmentsBrowser presigns via storage.presign, PUTs the bytes, calls storage.finalize; the server enforces ownership via assertOwnedAttachmentParts (key prefix images/<userId>/)
Web searchPro-only; flips the model to perplexity/sonar and disables image attachments for that turn
Multi-model selectionPro-only; free plan is locked to CHAT_DEFAULT_MODEL_ID
Voice transcription<SpeechInput> uses the browser's Web Speech API, or hands a recorded Blob to your onAudioRecorded callback
Regeneratechat.regenerate deletes the trailing assistant turn(s) and replaces with a fresh stream
Auto-titled chatsFirst-turn onFinish calls generateText({ model: gateway("openai/gpt-4o-mini") }) and writes Chat.title once
Cursor-paginated chat list + searchchat.list (cursor) + chat.search (case-insensitive title and content)
Cursor-paginated message historychat.get returns the latest CHAT_DEFAULT_MESSAGES_PAGE_SIZE messages plus a nextCursor; chat.listMessages pages older messages from that cursor so a long chat can never produce an unbounded payload
Per-plan monthly response capreserveAiUsageEvent atomically counts and inserts an AiUsageEvent row under a per-organization Postgres advisory lock, so concurrent requests can't overshoot the cap
Abuse protectionSliding-window Upstash limits keyed by userId + organizationId for the chat.send and chat.regenerate surfaces

The AI SDK And Gateway

Two design choices to call out before wiring:

ChoiceWhy
Vercel AI SDK over per-provider SDKsOne streaming primitive (streamText), one prompt format (ModelMessage), one tool-calling protocol. Nothing in the kit's chat handler is OpenAI-specific.
AI Gateway over direct provider keysOne key (AI_GATEWAY_API_KEY), one billing surface, model-id strings like openai/gpt-5.2, anthropic/claude-haiku-4.5, google/gemini-3-flash, xai/grok-4.1-fast-non-reasoning, perplexity/sonar. Switch providers by editing a string.

The default model is CHAT_DEFAULT_MODEL_ID in packages/shared/src/schemas/chat.ts. The selectable model list lives in apps/web/components/dashboard/ai-chat/chat-view/models.ts (the models array) so each entry can carry display metadata (chef, provider slug for the logo, label). Adding a model is a two-line edit when you also want it in the picker; it's a zero-line edit when you just want the server to accept it (any string is forwarded to gateway()).

Env vars. The chat path needs exactly one AI-specific env: AI_GATEWAY_API_KEY (recognized by the AI SDK's gateway() adapter). The kit declares it in turbo.json for cache invalidation, but does not yet ship it in apps/web/.env.example.

Add AI_GATEWAY_API_KEY to your environment when you set up the kit. Without it, every AI request fails at the SDK boundary with an authentication error. Get a key from vercel.com/dashboard under "AI Gateway".

The full list of supported providers and current model ids lives at models.dev; whatever string the gateway accepts there will work in gateway(modelId) here.

The Streaming Procedure

chat.send is a single-purpose oRPC procedure (POST /rpc/chat/send) that flows through five gates before it ever opens a model connection. Listed in order:

Validate attachment ownership

assertOwnedAttachmentParts walks every file part on the incoming message and checks each url against NEXT_PUBLIC_S3_PUBLIC_URL. The key must start with images/<userId>/, where <userId> is the session user. This forces every chat attachment to have already gone through the kit's presign + finalize pipeline before it's allowed near the model.

Resolve the model and check billing features

resolveChatModel returns perplexity/sonar when webSearch is on (and rejects an explicit non-default model in that case). Otherwise resolveRequestedModel calls assertBillingFeature(billing, "multiModelAccess", ...) if the requested model is not the default. webSearch itself is gated by assertBillingFeature(billing, "webSearch", ...).

Apply the abuse policy

enforceChatAbusePolicy calls enforceAbusePolicy for the chat.send surface with userId and organizationId characteristics. Sliding-window limits live in packages/shared/src/abuse.ts. Going over returns TOO_MANY_REQUESTS with a retryAfter payload.

Build the model context

buildModelContextMessages walks history newest-first, capped at CHAT_MAX_CONTEXT_MESSAGES (40) and CHAT_MAX_CONTEXT_CHARACTERS (20,000). User parts that include images become structured { type: "image", image: URL, mediaType } content; everything else becomes plain { role, content }. The system prompt is set to "You are a helpful assistant." and lives inline in chat.ts, so changing it is a one-liner.

Reserve the monthly response slot

reserveAiUsageEvent atomically counts the active billing window (the active subscription period, or the current calendar month for the free plan) and inserts a new AiUsageEvent row inside a single prisma.$transaction. The transaction begins with pg_advisory_xact_lock(hashtextextended('ai-usage:<orgId>', 0)), which serialises concurrent reservations for the same organization so the count + insert pair is race-free. Free plans are capped at monthlyAiResponses: 100; Pro is null (unlimited, which skips the lock and just inserts). Going over throws CONFLICT. assertWithinAiResponseLimit is still exported as a non-mutating predicate for surfaces that need a soft check (e.g. dashboards), but the chat router never relies on it for enforcement.

If the model call later errors, streamText.onError (and a synchronous try/catch around streamText) issues a best-effort prisma.aiUsageEvent.delete to refund the reservation, so a failed turn doesn't consume the user's quota.

After the gates pass, the user Message is persisted, then:

const result = streamText({
  model: gateway(modelId),
  system: "You are a helpful assistant.",
  messages: modelMessages,
  onError: async () => {
    // Refund the reservation row so a failed turn does not consume the cap.
  },
  onFinish: async ({ text }) => {
    // Persist the assistant Message and (on the first turn)
    // generateText({ model: gateway("openai/gpt-4o-mini") }) to set
    // Chat.title and flip Chat.titleGenerated. The AiUsageEvent row has
    // already been written by reserveAiUsageEvent before streaming began.
  },
});

return streamToEventIterator(
  result.toUIMessageStream({
    sendReasoning: true,
    sendSources: true,
  })
);

chat.regenerate mirrors this: same gates, same streamText shape, but onFinish deletes the trailing assistant turn(s) before inserting the replacement and writes an AiUsageEvent with kind: "chat_regenerate" instead of "chat_send".

The Client: useChat With An oRPC Transport

The kit uses @ai-sdk/react's useChat over an oRPC streaming procedure rather than a plain fetch endpoint. The bridge is one helper:

import { useChat } from "@ai-sdk/react";
import { eventIteratorToUnproxiedDataStream } from "@orpc/client";
import { client } from "@/lib/orpc";

const { messages, sendMessage, status, stop } = useChat({
  id: chatId,
  messages: seedMessages,
  transport: {
    async sendMessages(options) {
      const latestMessage = options.messages[options.messages.length - 1];
      return eventIteratorToUnproxiedDataStream(
        await client.chat.send(
          {
            chatId: options.chatId,
            messages: buildSendPayloadMessages(latestMessage),
            model: webSearchRef.current ? undefined : modelRef.current,
            webSearch: webSearchRef.current,
          },
          { signal: options.abortSignal }
        )
      );
    },
    reconnectToStream() {
      throw new Error("Unsupported");
    },
  },
  onFinish: () => {
    // Invalidate the sidebar list and dashboard stats.
  },
});

A few patterns worth knowing:

PatternWhere
seedMessages[chatId]/page.tsx server-prefetches chat.get (latest CHAT_DEFAULT_MESSAGES_PAGE_SIZE messages plus a nextCursor); ChatView reads it with useSuspenseQuery and maps DB rows to UIMessages once via dbMessagesToUIMessages keyed to chatId. Older history is fetched on demand via chat.listMessages and prepended through useChat's setMessages.
status / stopPass status to <PromptInputSubmit> so the button toggles between submit, stop, and pending
RegenerateDrains client.chat.regenerate outside useChat (a plain for await) and ends with router.refresh() so RSC data updates
initialPromptThe ?prompt= URL param from the new-chat hub auto-sends once when status === "ready", then router.replace strips the query
Refs over closuresmodelRef and webSearchRef ensure the transport reads the latest user choice, since useChat captures the transport once per id

AI Elements: Building Blocks

Reusable presentation primitives. Group them by what they do, not what file they live in. All export from @syntaxkit/ui/components/ai-elements/<file>.

Transcript Primitives

Conversation

Scrollable shell with stick-to-bottom behavior, empty state, jump-to-bottom button, and Markdown download. Pure UI; takes a simple { role, content }[] for download.

Message

Row layout for one chat turn (user vs assistant styling). Required prop: from: UIMessage['role']. Sub-components: MessageContent, MessageActions, MessageBranch*.

MessageResponse

Memoized assistant body that renders Markdown via Streamdown (CJK, code highlighting, math, mermaid). Use it as the children of MessageContent for assistant turns.

MessageBranch

Optional 'multiple drafts' switcher for regenerated turns: MessageBranchSelector, MessageBranchPrevious, MessageBranchNext, MessageBranchPage.

Composer Primitives

PromptInput

Form shell with hidden file input, drag-and-drop, paste support, and an attachments context. Required prop: onSubmit(message: PromptInputMessage, event).

PromptInputTextarea / Submit / Tools

Layout slots that compose inside PromptInput: textarea with sensible defaults, submit/stop button driven by ChatStatus, tool-row container for buttons.

Attachments

Grid, inline, or list layouts for FileUIPart and SourceDocumentUIPart. Pair with usePromptInputAttachments() inside PromptInput to render staged uploads.

ModelSelector

Command-palette dialog model picker. ModelSelectorLogo pulls provider art from models.dev. Pure UI; the kit hides it for free plans.

SpeechInput

Mic button. Uses the browser Web Speech API when available and supported; otherwise records audio and defers transcription to your onAudioRecorded callback.

Suggestions

Horizontally scrollable row of pill buttons for starter prompts. Used by the new-chat hub and as an empty state inside chats.

Streaming-Aware Primitives

Reasoning

Collapsible thinking block. Tracks streaming duration, animates the trigger label with Shimmer, and renders the body via Streamdown. Wire isStreaming to your useChat status.

Sources

Collapsible citations list. Trigger shows 'Used N sources'; Source is a styled external-link row. Render this when a UIMessage has source-url parts.

Message, PromptInput, Attachments, and PromptInputSubmit lean on ai-package types (UIMessage, ChatStatus, FileUIPart, SourceDocumentUIPart). The rest are pure presentation, so you can use them outside useChat (for one-shot generations or non-chat AI features).

Billing And Limits

Two limit surfaces gate AI usage. Per-plan features control what a user is allowed to do; per-request limits control what fits in a single payload.

Per-plan features

FeatureFreePro
monthlyAiResponses100 / monthUnlimited (null)
multiModelAccess (model picker beyond default)NoYes
webSearch (Perplexity Sonar)NoYes

The cap is enforced by counting AiUsageEvent rows (kind: "chat_send" or "chat_regenerate") in the active billing window. See Billing for how plans and entitlements are configured, and Storage: How An Upload Flows for the attachment pipeline that feeds chat images.

Per-request limits

All defined in packages/shared/src/schemas/chat.ts and enforced by Zod on the server.

ConstantValueWhat it bounds
CHAT_MAX_MESSAGES_PER_REQUEST1Only the latest user turn is sent on each call
CHAT_MAX_PARTS_PER_MESSAGE4Text + file parts per message
CHAT_MAX_TEXT_LENGTH_PER_PART4,000Characters in any single text part
CHAT_MAX_USER_MESSAGE_TEXT_LENGTH4,000Total characters across all text parts in one message
CHAT_MAX_CONTEXT_MESSAGES40History the server includes when calling the model
CHAT_MAX_CONTEXT_CHARACTERS20,000Total history characters across included messages
CHAT_DEFAULT_MESSAGES_PAGE_SIZE50Messages returned per page from chat.get and chat.listMessages
CHAT_MAX_MESSAGES_PAGE_SIZE100Hard cap for the per-page message limit

History is trimmed newest-first inside buildModelContextMessages, so an active conversation always keeps its tail and the trim shows up only on long threads.

Abuse Protection

chat.send and chat.regenerate are surfaces in the shared abuse policy. Each request is keyed by userId and organizationId; both characteristics must be present, otherwise the surface fails closed.

When Upstash Redis is not configured, the chat handler logs a single warning per process and continues without rate limits. This is intentional for local development. Configure Upstash before going live; otherwise nothing throttles a runaway client. See Security: Abuse Protection (Upstash).

Tuning happens in packages/shared/src/abuse.ts: each surface declares its window and limit per characteristic, so you can lower the per-user cap without changing the per-org cap.

Adding A New AI Feature

Reuse the same handler shape for a new feature (summarize a doc, generate alt text, draft an email, anything). The pattern is the streaming chat in miniature.

Define the schema in packages/shared

Drop a new file at packages/shared/src/schemas/<feature>.ts with Zod input/output schemas, plus any per-request constants (max input length, etc.). Re-export from packages/shared/src/schemas/index.ts so the API and the client see them.

export const summarizeInputSchema = z.object({
  text: z.string().min(1).max(50_000),
  style: z.enum(["bullet", "tldr", "executive"]).default("tldr"),
});

Add usage accounting (optional)

If the feature should count toward a quota, add a model to packages/database/prisma/models/ai.prisma (mirror AiUsageEvent) or extend AiUsageEvent.kind with a new value, then run a migration. Skip this for free, internal-only features.

Write the oRPC procedure

Reuse organizationChatProcedure-style middleware: org-scoped auth, billing assertion, abuse gate, then streamText({ model: gateway(modelId) }) for streams or generateText for one-shot.

import { CHAT_DEFAULT_MODEL_ID } from "@syntaxkit/shared";

export const summarize = authorized
  .use(withActiveOrganization)
  .route({ path: "/summarize", method: "POST" })
  .input(summarizeInputSchema)
  .handler(async ({ context, input }) => {
    const billing = await getBillingState(context.organization.id);
    // Atomically reserve a slot in the monthly cap before spending
    // gateway tokens. Refund the reservation if the model errors.
    const usage = await reserveAiUsageEvent(billing, context.organization.id, {
      kind: "chat_send",
      chatId: null,
      createdByUserId: context.user.id,
    });
    try {
      const result = streamText({
        model: gateway(CHAT_DEFAULT_MODEL_ID),
        system: "Summarize the input in the requested style.",
        prompt: input.text,
        onError: () =>
          prisma.aiUsageEvent
            .delete({ where: { id: usage.id } })
            .catch(() => {}),
      });
      return streamToEventIterator(result.toUIMessageStream());
    } catch (error) {
      await prisma.aiUsageEvent
        .delete({ where: { id: usage.id } })
        .catch(() => {});
      throw error;
    }
  });

Pick the right return shape

Streams: return streamToEventIterator(result.toUIMessageStream(...)). One-shots: return { text } from a plain generateText call. Streams give you token-by-token UI; one-shots give you a single mutation result.

Wire the client

For streams, use useChat with the same transport.sendMessages pattern (point it at your new procedure). For one-shots, a plain useMutation(orpc.<feature>.run.mutationOptions()) is fine; the response is a typed object, no streaming bookkeeping.

Compose the UI from ai-elements

Drop in <PromptInput> for input and <Conversation> + <Message> + <MessageResponse> for output. Pull in <Reasoning> and <Sources> if your model returns them. None of these primitives are chat-specific; they work for any UIMessage-shaped surface.

Where To Go Next

Was this page helpful?

On this page