Stream LLM Responses with SignalR & ASP.NET

Streaming LLM Responses in Real-Time with SignalR & ASP.NET Core

Blog > Streaming LLM Responses in Real-Time with SignalR and ASP.NET Core

Streaming LLM Responses in Real-Time with SignalR and ASP.NET Core

Posted on April 29, 2026

Introduction

A few weeks ago, we built a small internal tool for our legal team. The idea was simple: upload a long vendor contract—sometimes 40 or 50 pages—and get a quick summary.

Nothing fancy in terms of tech. React on the frontend, ASP.NET Core on the backend, and an LLM handling the summarization.

We pushed it to staging, shared the link, and waited to see what would happen.

Within minutes, messages started popping up:

“Is this working?”
“Did it freeze?”
“I clicked summarize… nothing’s happening.”

The awkward part? It was working.

It just didn’t feel like it.

It was a reminder that even in straightforward builds like this, the real challenge often lies in shaping the user experience around AI—something we’ve been focusing on closely at Payoda while building similar enterprise solutions.

Where Things Felt Off

The summarization itself took around 20–25 seconds depending on the document. From a backend perspective, that’s not bad at all. No failures, no timeouts, nothing obviously broken.

But the UI told a different story.

You click the button, a spinner shows up… and then nothing. No progress, no partial results, just a blank wait.

Some users refreshed the page. Others clicked the button again (which, of course, triggered duplicate requests and extra cost).

That’s when it clicked for us: the issue wasn’t speed—it was visibility.

The Problem with Our API Approach

Initially, we treated the LLM like any normal API:

Send the request
Wait
Return the full response

That pattern works fine for quick operations. But LLMs don’t really behave like that—they generate output piece by piece.

By waiting for the entire response:

Users see nothing for a long time
The UI feels unresponsive
There’s a higher chance of timeouts

If you compare this to tools like ChatGPT, the difference is obvious. You start seeing output almost immediately.

That’s not faster processing—it’s just streaming.

What We Changed

We didn’t actually make the system faster.

We just stopped hiding the work.

Instead of waiting for the full response, we started sending chunks of it as they were generated. That meant keeping a connection open and pushing updates continuously.

It sounds like a small change, but it made a big difference.

Why We Chose SignalR

We did look at Server-Sent Events (SSE), which is fine for basic streaming.

But in our case, SignalR made more sense:

It’s already part of ASP.NET Core
Supports two-way communication
Handles reconnections for you
Fits well if your app is interactive

So we went with that.

How It Works (High Level)

The flow is pretty straightforward:

The client opens a SignalR connection
User submits a prompt
The hub forwards it to a service
The service calls the LLM with streaming enabled
Tokens come in one by one
Each token is pushed back to the client
The UI updates as it arrives

One thing we were careful about: keeping responsibilities separate.

The hub just handles communication
The service does the actual LLM work

That kept things cleaner than trying to do everything in one place.

A Minimal Example

Just to give an idea of the flow:

Program.cs

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSignalR();

var app = builder.Build();

app.UseRouting();

app.MapHub<LlmChatHub>(“/chat-hub”);

app.Run();

Hub

public class LlmChatHub : Hub

{

public async Task SendMessage(string prompt)

{

await SimulateStreaming(prompt);

}

private async Task SimulateStreaming(string prompt)

{

await Task.Delay(500);

var response = $”You asked: {prompt}. Here’s a streamed response example.”;

var parts = response.Split(‘ ‘);

foreach (var part in parts)

{

await Clients.Caller.SendAsync(“ReceiveToken”, part + ” “);

await Task.Delay(70);

}

await Clients.Caller.SendAsync(“StreamComplete”);

}

On the frontend, it’s basically:

Open a connection
Listen for tokens
Append them to the screen

That’s it.

Things We Didn’t Think About Initially

A few things caught us off guard.

Markdown Rendering

LLMs don’t always return plain text—often it’s Markdown.

If you render tokens as they come in, formatting breaks.

What worked better was:

Keep a full accumulated string
Re-render it each time using a Markdown parser

Requests Keep Running After Users Leave

If someone closes the tab halfway through, the LLM doesn’t magically stop.

So you’re still paying for tokens no one sees.

We fixed that by:

Tracking active connections
Cancelling the request when the client disconnects

UI Can’t Always Keep Up

On slower devices, tokens can arrive faster than they can be rendered.

That can actually cause lag.

We handled this by:

Adding a buffer (channels)
Throttling updates when needed

What Actually Improved

Interestingly, the backend still takes the same 20–25 seconds.

That didn’t change.

But the experience did:

Users stopped clicking multiple times
No more refreshes
Way fewer “is this broken?” messages

Some people even said it felt faster.

All we really did was show progress.

Conclusion

This ended up being more of a UX problem than a performance one.
A spinner with no feedback creates doubt. Streaming—even if it takes the same time—builds confidence.
Once you switch to streaming responses, it’s honestly hard to go back.

If you’re working with LLMs, it’s probably worth doing this early. It saves a lot of confusion later.

If you’re exploring similar patterns or building AI-driven tools at scale, Payoda works closely with teams to design, build, and productionize reliable LLM experiences—happy to connect.

FAQs