Without streaming, a user submits a query and waits 5β15 seconds staring at a blank space. With streaming, text starts appearing within 200β500ms (TTFT) and flows continuously as the model generates. The total latency is identical β but the perceived experience is completely different.
SSE vs WebSockets
π‘ Server-Sent Events (SSE)
- One-way: server β client
- Built on HTTP β works through proxies
- Automatic reconnection
- Simpler to implement
- Best for: chatbots, document generation
π WebSockets
- Bidirectional: full duplex
- Lower overhead per message
- Better for interactive, back-and-forth
- More complex (connection management)
- Best for: voice interfaces, live collaboration
For most LLM streaming use cases, SSE is the right choice. It's simpler, works through corporate proxies, and the one-directional constraint fits the requestβstream pattern perfectly.
Python streaming backend (FastAPI)
React client for streaming
UX patterns for streaming
- Show a blinking cursor immediately β users need feedback that the system is working, even before the first token.
- Render markdown progressively β avoid layout jumps by rendering completed sentences/paragraphs rather than per-token.
- Buffer code blocks β don't render a code block until its closing backtick arrives. Partial syntax highlighting looks broken.
- Allow interruption β users should be able to stop generation mid-stream. Implement a cancel button that closes the SSE connection.
Built-in streaming on MoltBot
SSE streaming, progressive markdown rendering, and cancel support β zero config. 14-day free trial.
Start Free Trial β