
Introduction
A few weeks ago, we built a small internal tool for our legal team. The idea was simple: upload a long vendor contract—sometimes 40 or 50 pages—and get a quick summary.
Nothing fancy in terms of tech. React on the frontend, ASP.NET Core on the backend, and an LLM handling the summarization.
We pushed it to staging, shared the link, and waited to see what would happen.
Within minutes, messages started popping up:
- “Is this working?”
- “Did it freeze?”
- “I clicked summarize… nothing’s happening.”
The awkward part? It was working.
It just didn’t feel like it.
It was a reminder that even in straightforward builds like this, the real challenge often lies in shaping the user experience around AI—something we’ve been focusing on closely at Payoda while building similar enterprise solutions.
Where Things Felt Off
The summarization itself took around 20–25 seconds depending on the document. From a backend perspective, that’s not bad at all. No failures, no timeouts, nothing obviously broken.
But the UI told a different story.
You click the button, a spinner shows up… and then nothing. No progress, no partial results, just a blank wait.
Some users refreshed the page. Others clicked the button again (which, of course, triggered duplicate requests and extra cost).
That’s when it clicked for us: the issue wasn’t speed—it was visibility.
The Problem with Our API Approach
Initially, we treated the LLM like any normal API:
- Send the request
- Wait
- Return the full response
That pattern works fine for quick operations. But LLMs don’t really behave like that—they generate output piece by piece.
By waiting for the entire response:
- Users see nothing for a long time
- The UI feels unresponsive
- There’s a higher chance of timeouts
If you compare this to tools like ChatGPT, the difference is obvious. You start seeing output almost immediately.
That’s not faster processing—it’s just streaming.
What We Changed
We didn’t actually make the system faster.
We just stopped hiding the work.
Instead of waiting for the full response, we started sending chunks of it as they were generated. That meant keeping a connection open and pushing updates continuously.
It sounds like a small change, but it made a big difference.
Why We Chose SignalR
We did look at Server-Sent Events (SSE), which is fine for basic streaming.
But in our case, SignalR made more sense:
- It’s already part of ASP.NET Core
- Supports two-way communication
- Handles reconnections for you
- Fits well if your app is interactive
So we went with that.
How It Works (High Level)
The flow is pretty straightforward:
- The client opens a SignalR connection
- User submits a prompt
- The hub forwards it to a service
- The service calls the LLM with streaming enabled
- Tokens come in one by one
- Each token is pushed back to the client
- The UI updates as it arrives
One thing we were careful about: keeping responsibilities separate.
- The hub just handles communication
- The service does the actual LLM work
That kept things cleaner than trying to do everything in one place.
A Minimal Example
Just to give an idea of the flow:
Program.cs
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSignalR();
var app = builder.Build();
app.UseRouting();
app.MapHub<LlmChatHub>(“/chat-hub”);
app.Run();
Hub
public class LlmChatHub : Hub
{
public async Task SendMessage(string prompt)
{
await SimulateStreaming(prompt);
}
private async Task SimulateStreaming(string prompt)
{
await Task.Delay(500);
var response = $”You asked: {prompt}. Here’s a streamed response example.”;
var parts = response.Split(‘ ‘);
foreach (var part in parts)
{
await Clients.Caller.SendAsync(“ReceiveToken”, part + ” “);
await Task.Delay(70);
}
await Clients.Caller.SendAsync(“StreamComplete”);
}
}
On the frontend, it’s basically:
- Open a connection
- Listen for tokens
- Append them to the screen
That’s it.
Things We Didn’t Think About Initially
A few things caught us off guard.
- Markdown Rendering
LLMs don’t always return plain text—often it’s Markdown.
If you render tokens as they come in, formatting breaks.
What worked better was:
- Keep a full accumulated string
- Re-render it each time using a Markdown parser
- Requests Keep Running After Users Leave
If someone closes the tab halfway through, the LLM doesn’t magically stop.
So you’re still paying for tokens no one sees.
We fixed that by:
- Tracking active connections
- Cancelling the request when the client disconnects
- UI Can’t Always Keep Up
On slower devices, tokens can arrive faster than they can be rendered.
That can actually cause lag.
We handled this by:
- Adding a buffer (channels)
- Throttling updates when needed
What Actually Improved
Interestingly, the backend still takes the same 20–25 seconds.
That didn’t change.
But the experience did:
- Users stopped clicking multiple times
- No more refreshes
- Way fewer “is this broken?” messages
Some people even said it felt faster.
All we really did was show progress.
Conclusion
This ended up being more of a UX problem than a performance one.
A spinner with no feedback creates doubt. Streaming—even if it takes the same time—builds confidence.
Once you switch to streaming responses, it’s honestly hard to go back.
If you’re working with LLMs, it’s probably worth doing this early. It saves a lot of confusion later.
If you’re exploring similar patterns or building AI-driven tools at scale, Payoda works closely with teams to design, build, and productionize reliable LLM experiences—happy to connect.
FAQs
- What if the LLM fails halfway through?
Send an error event through SignalR and let the UI handle it gracefully instead of leaving things hanging.
- Can this support multiple users or rooms?
Yes. SignalR has built-in support for groups.
- Do I have to use SignalR?
Not necessarily. SSE works for simpler cases. But SignalR is more flexible if you need two-way communication.
- Does streaming make things faster?
Not really. It just makes the wait feel shorter.
Talk to our solutions expert today.
Our digital world changes every day, every minute, and every second - stay updated.




