Streaming LLM Responses: Server-Sent Events, WebSockets & UX Patterns

Without streaming, a user submits a query and waits 5–15 seconds staring at a blank space. With streaming, text starts appearing within 200–500ms (TTFT) and flows continuously as the model generates. The total latency is identical — but the perceived experience is completely different.

SSE vs WebSockets

📡 Server-Sent Events (SSE)

One-way: server → client
Built on HTTP — works through proxies
Automatic reconnection
Simpler to implement
Best for: chatbots, document generation

🔌 WebSockets

Bidirectional: full duplex
Lower overhead per message
Better for interactive, back-and-forth
More complex (connection management)
Best for: voice interfaces, live collaboration

For most LLM streaming use cases, SSE is the right choice. It's simpler, works through corporate proxies, and the one-directional constraint fits the request→stream pattern perfectly.

Python streaming backend (FastAPI)

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from moltbot import Client

app = FastAPI()
client = Client()

async def token_stream(prompt: str):
    async for chunk in client.stream(prompt):
        yield f"data: {chunk.text}\n\n"   # SSE format
    yield "data: [DONE]\n\n"

@app.get("/stream")
async def stream(prompt: str):
    return StreamingResponse(
        token_stream(prompt),
        media_type="text/event-stream"
    )
      

React client for streaming

const streamResponse = async (prompt) => {
  const response = await fetch(`/stream?prompt=${prompt}`);
  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const text = decoder.decode(value).replace("data: ", "").trim();
    if (text === "[DONE]") break;
    setOutput(prev => prev + text);  // append each token
  }
};
      

UX patterns for streaming

Show a blinking cursor immediately — users need feedback that the system is working, even before the first token.
Render markdown progressively — avoid layout jumps by rendering completed sentences/paragraphs rather than per-token.
Buffer code blocks — don't render a code block until its closing backtick arrives. Partial syntax highlighting looks broken.
Allow interruption — users should be able to stop generation mid-stream. Implement a cancel button that closes the SSE connection.

Built-in streaming on MoltBot

SSE streaming, progressive markdown rendering, and cancel support — zero config. 14-day free trial.

Start Free Trial →

Streaming LLM Responses: SSE, WebSockets & UX Patterns