
Building an AI agent capable of naturally conversing and executing SQL queries against a data warehouse is a frequent highlight of modern tech showcases. While demos present this as a frictionless magic trick, real-world implementation quickly reveals the heavy engineering required—an area where teams like Payoda often support enterprises in turning experimental AI use cases into production-ready systems. Transitioning from a thin API wrapper to a system that orchestrates multi-step reasoning, precise tool execution, and stringent IAM authentication is where the actual build gets complicated.
A production-ready internal assistant can’t afford to hallucinate SQL. It has to actively fetch live table schemas, map business logic to exact data types, catch its own syntax faults, and negotiate corporate access policies before returning a single row.
In this post, we’ll walk through the architecture of building a highly constrained BigQuery analytics agent. We’ll connect Chainlit for the frontend conversational layer, use Google’s Agent Development Kit (ADK) for orchestration, and power the reasoning with Vertex AI’s Gemini 2.5 Pro.
(Note: All app-specific names in this guide are kept generic so you can easily adapt the story and code to your own project.)
More importantly, we are going to share the sprint “war room” experience: diagnosing and fixing the actual HTTP errors (400, 404, 405, 500) we hit along the way so you can skip the trial-and-error phase.
Architecture Overview
Instead of building a monolithic script, we isolated the reasoning logic from the UI state to prevent memory leaks during long analytical sessions. The stack breaks down into four explicit boundaries:
- The Interface (Chainlit): Beyond just rendering a chat box, Chainlit natively handles asynchronous event streaming. We chose it over Streamlit because we needed to render nested, step-by-step reasoning traces (the agent’s “inner monologue”) without writing custom React components to handle the asynchronous UI updates.
- The Orchestrator (Google ADK): Chainlit hands the message to an ADK Runner. The ADK Agent acts as the cognitive loop, utilizing a BuiltInPlanner. Relying on the ADK abstracted away the massive headache we initially had with manually parsing JSON tool calls and maintaining the growing array of conversation history.
- The Reasoning Engine (Vertex AI Gemini): The agent uses gemini-2.5-pro via the google-genai client for high-order logic and SQL generation. We intentionally hard-routed this through the Vertex AI endpoint (rather than public AI Studio) to guarantee our corporate schema data stayed strictly within our managed Google Cloud perimeter.
- The Execution Layer (BigQuery): Instead of giving the LLM raw database access, we restrict it to a scoped execute_sql tool provided by ADK’s BigQueryToolset. In our initial benchmarks, enforcing this single constraint reduced malformed query attempts by over 40%.
Wiring the Brain and the UI
The foundation of our assistant is tying the orchestration layer to the visual layer. We define strict boundaries on what the agent is allowed to execute, and then hook that logic up to our chat frontend.
Defining the ADK Agent
The ADK Agent requires a model and a set of tools. We equip it with the BigQueryToolset and a ThinkingConfig to expose the model’s internal chain-of-thought, which proved to be a lifesaver when debugging why the agent occasionally joined the wrong tables.
from google.adk.agents import Agent
from google.adk.planners import BuiltInPlanner
from google.adk.tools.bigquery import BigQueryToolset
from google.genai.types import ThinkingConfig
# Define the toolset. We filter for ‘execute_sql’ to keep the agent focused.
# Restricting tools prevents the LLM from attempting unauthorized actions like dropping tables.
bigquery_toolset = BigQueryToolset(tool_filter=[“execute_sql”])
# ThinkingConfig allows the model to expose its internal reasoning process.
thinking_config = ThinkingConfig(
include_thoughts=True,
)
agent = Agent(
model=“gemini-2.5-pro”,
name=“bigquery_analytics_agent”,
description=“Answers business questions by querying BigQuery.”,
instruction=“Explain how to answer questions using SQL on BigQuery tables. Always inspect the schema before generating queries.”,
tools=[bigquery_toolset],
planner=BuiltInPlanner(thinking_config=thinking_config),
)
What’s happening here?
- Tool Filtering: By enforcing tool_filter=[“execute_sql”], we actively prevent the LLM from attempting unauthorized schema changes. During early prototyping, we noticed an unfiltered agent would occasionally try to run CREATE TABLE commands to store temporary data when confused. Scoping the tools eliminated this behavioral risk entirely.
- Thinking Config: Setting include_thoughts=True forces the agent to print its assumptions before finalizing an answer. When we deployed our first test group, enabling this diagnostic trace helped us identify incorrect table references in nearly 30% of our initial user queries. Watching the agent realize a column anomaly and autonomously rewrite its SQL is what ultimately convinced our stakeholders to trust the system.
- BuiltInPlanner: This logic engine dictates the “Reason -> Act -> Observe” loop. It continuously cycles until the BigQuery response satisfies the original user prompt.
- Note: We omit the thinking_level parameter here. As detailed in the debugging section, forcing this parameter caused payload validation conflicts with our specific Vertex AI provision.
Integrating with Chainlit
Chainlit manages the user session. When a message arrives, we pass it to the ADK Runner, which maps the agent’s actions to the user interface.
import chainlit as cl
from google.adk.runners import Runner
from google.adk.sessions import DatabaseSessionService
from my_app.config import config
from my_app.agent import agent
# DatabaseSessionService ensures chat history is persisted across sessions.
# This prevents users from losing context if they accidentally refresh their browser window.
session_service = DatabaseSessionService(db_url=config.secrets.app_db_url)
runner = Runner(
agent=agent,
app_name=config.app.name,
session_service=session_service,
)
@cl.on_message
async def on_message(message: cl.Message):
# Iterate through the stream of events (thoughts, tool calls, and final output).
async for event in runner.run(message.content):
# We catch these events and render them in the Chainlit UI
await cl.Message(content=event.output).send()
Understanding the Integration:
- The Runner: If the Agent is the cognitive core, the Runner is the nervous system. It handles the lifecycle of the request, including network retries and maintaining the database session state.
- Event Streaming: In our deployment, complex aggregation queries over millions of rows routinely take 10 to 15 seconds. Piping the ADK event stream directly to Chainlit means users see the agent “thinking” and “querying” in real-time. Implementing this visual feedback drastically dropped our user abandonment rate during slow BigQuery jobs.
Configuration and Variables
To keep the application portable and secure across environments, we use a combination of TOML for safe metadata and environment variables for sensitive cloud configuration.
config.toml
[app]
name = “analytics-assistant”
[bigquery]
project_id = “YOUR_PROJECT_ID”
dataset_id = “YOUR_DATASET”
supported_tables = [“sales_data”, “inventory_logs”]
[llm]
agent_model = “gemini-2.5-pro”
.env
GOOGLE_GENAI_USE_VERTEXAI=TRUE
GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
GOOGLE_CLOUD_LOCATION=us-central1
APP_DB_URL=postgresql+asyncpg://user:pass@localhost:5432/app_db
Addressing Environment and Authentication Constraints
Nothing burns development hours faster than chasing ghosts caused by local environment drift. Leftover variables from local LLM experiments can silently hijack your SDK calls, routing production traffic into the void.
Implementing Application Default Credentials (ADC)
Hardcoding API keys inside application logic is a direct violation of our security policies. Instead, we rely entirely on Google Cloud’s ADC. This elegantly utilizes a developer’s local identity during testing and designated IAM service accounts in production:
- Run gcloud auth application-default login.
- Lock in your target project: gcloud config set project YOUR_PROJECT_ID.
- Initialize the GenAI client dynamically with vertexai=True.
The Sanitation Block
To guarantee the SDK doesn’t inadvertently inherit conflicting keys (such as an old AI Studio token from a weekend project), we enforce this sanitation block at the absolute top of our application entry point:
import os
# Pop conflicting or legacy environment variables at runtime
os.environ.pop(“GOOGLE_API_KEY”, None)
os.environ.pop(“GEMINI_API_KEY”, None)
os.environ.pop(“GOOGLE_GEMINI_BASE_URL”, None)
Why is this mandatory?
Python’s os.environ constantly persists variables from hidden background shell configurations (.bashrc / .zshrc). If the google-genai library detects a GOOGLE_API_KEY first, it instantly defaults to the public endpoint instead of Vertex AI, generating obscured 401/403 errors. Wiping them at runtime forces execution consistency across all developer machines.
The Error War Room
During our sprint, moving from local prototyping to our staging cloud deployment surfaced several opaque errors. SDKs often abstract away underlying HTTP failures, masking the real infrastructure problem. Here is the post-mortem of our debugging sequences.
Error 1: 400/500 — Authentication & Identity Routing
- The Symptom: Our application crashed on startup with Missing key inputs argument! or a generic Internal Server Error.
- The Debugging Sequence: We lost a full morning misdiagnosing this as an IAM permission gap on the BigQuery service account. After finally printing the raw outgoing HTTP request headers via middleware, we realized the payload format was completely wrong for Vertex AI.
- The Actual Cause: The SDK was caught in a split-brain authentication state. It registered vertexai=True but also pulled a GOOGLE_API_KEY from the local environment, causing it to attempt a hybrid, undocumented login path.
- The Fix: We strictly enforced GOOGLE_GENAI_USE_VERTEXAI=TRUE and ensured the Sanitation Block (from Part 2) purged all extraneous keys prior to touching the client.
Error 2: 404 — The “Model Not Found” Mirage
- The Symptom: The execution halted with Publisher Model … was not found.
- The Cause: While it looks like a typo in your config.toml, hitting this inside Vertex AI actually points to infrastructure oversights:
- API Disabled: The newly spun-up staging project lacked the aiplatform.googleapis.com enablement.
- ToS Not Accepted: Corporate Google Cloud projects require explicit, manual acceptance of the Generative AI Terms of Service via the web console before SDKs can even ping the models.
- Regional Availability: We were trying to access a preview model that was only provisioned in us-central1, while our default region was set to us-east4.
Error 3: 405 — The Phantom Proxy
- The Symptom: Terminal logs returned a rigid 405 Method Not Allowed.
- The Debugging Sequence: This surfaced during a late-night push when the application flatly refused to initialize the model. After confirming the region and IAM roles for the tenth time, we inspected the target endpoint URL.
- The Cause: One developer’s machine had a legacy GOOGLE_GEMINI_BASE_URL injected by their shell profile (left over from testing a LiteLLM proxy), forcing the SDK to route Vertex AI REST calls through a local port that didn’t accept POST requests.
- The Fix: The environmental Sanitation Block was permanently updated to explicitly pop base URL overrides.
Error 4: 400 — thinking_level Validation
- The Symptom: thinking_level is not supported by this model.
- The Cause: We explicitly passed ThinkingConfig(thinking_level=”LOW”) to save on token costs. While this argument is accepted on the public developer API, it triggered a hard payload validation rejection against the isolated gemini-2.5-pro instance hosted inside our Vertex AI tenant.
- The Fix: We dropped the argument entirely, defaulting back to ThinkingConfig(include_thoughts=True). We allowed the infrastructure to determine the reasoning depth, which immediately resolved the schema conflict.
Putting It All Together
With the architecture locked in and the networking errors mitigated, the application lifecycle became incredibly stable:
- Startup: Chainlit sanitizes the host environment and loads the application state.
- Initialization: The ADK Agent spins up, securely binding the BigQuery toolset to ADC-backed credentials without leaking keys.
- Execution: When prompted with an analytical request, the BuiltInPlanner identifies the required SQL structure, dispatches execute_sql, parses the BigQuery JSON response, and streams the structured answer. The complete diagnostic trace is rendered within the UI, allowing analysts to double-check the LLM’s math.
Conclusion
Shipping an AI data assistant taught us that the LLM is often the easiest part of the stack. The real engineering goes into state management, strict tool boundaries, and IAM compliance.
- Isolate Development Environments: Systematically clear legacy API keys and base URLs from local shells via code. Do not trust your developers’ terminal configurations.
- Validate Infrastructure Early: Write a 10-line Python test script to ping the model endpoint before launching the full application stack. Failing fast on a basic ping test saves hours of chasing phantom UI-level errors.
- Offload the Boilerplate: While Chainlit manages complex asynchronous rendering, the Agent Development Kit provides the structured cognitive loop. Gluing these together allowed our engineering team to focus entirely on writing accurate BigQuery tools instead of reinventing JSON parsers for model outputs.
By applying these constraints and debugging patterns, you can move past the fragile chatbot prototype phase and deploy an analytics tool that actually survives contact with real corporate data.
If you’re looking to build or scale AI-driven data systems that go beyond prototypes, Payoda can help you get there faster. From designing agent architectures to integrating with enterprise data platforms and enforcing production-grade governance, our teams work hands-on to deliver solutions that are reliable, secure, and ready for real-world complexity. Connect with Payoda to start building AI systems that actually hold up in production.
Talk to our solutions expert today.
Our digital world changes every day, every minute, and every second - stay updated.




