Slow and inefficient query generation from natural language inputs bottlenecks decision-making. This forces analysts and business users to rely heavily on data teams, delaying insights and limiting agility.
Text-to-SQL is shifting how we interact with data, enabling users to query structured databases in natural language. This is especially useful when deploying a domain-specific model for specialized industries. However, inferencing with these models at scale for analytics workloads comes with latency and performance challenges.
In this tutorial, we’ll show how to optimize Vanna’s text-to-SQL solution using NVIDIA NIM, accelerated inference microservices that deliver optimized endpoints for generative AI models. Vanna’s open-source text-to-SQL solution has gained traction among organizations due to its flexibility, security, and adaptability.
You’ll learn how to:
- Insert preprocessed data into a database and connect with Vanna.
- Train Vanna on business terminology or definitions.
- Leverage an NVIDIA NeMo Retriever embedding model to retrieve context for SQL generation.
- Use an NVIDIA NIM for LLMs endpoints for accelerated inference.
For the demonstration, we used two open-source Steam games datasets from Kaggle, a rich dataset of game metadata that’s ideal for analytics queries.
Prerequisites
- Python version 3.10 or higher installed
- NVIDIA supported LangChain package for an NVIDIA NeMo Retriever embedding model
- Access to an NVIDIA NIM endpoint
- Familiarity with SQLiteVanna library installed
- Steam datasets
Procedure
1. Download and preprocess the Steam dataset
To follow along, clone the repository located at /NVIDIA/GenerativeAIExamples and open the Vanna_with_NVIDIA notebook in the community section.
The preprocessing steps in this tutorial are adapted from an excellent Kaggle notebook and can be found in the ‘Data Preparation’ section of the Vanna with NVIDIA notebook.
Key preprocessing steps include:
- Filtering required columns
- Unpacking columns with a range into separate columns
- Adding game length to each game title
We get three CSVs at the end of the preprocessing step:
tableau_games.csv
tableau_categories.csv
tableau_tags.csv
2. Initialize Vanna with NVIDIA NIM and NeMo Retriever
A Vanna instance needs a connection to a vector database, an embedder, and the LLM endpoint. We use the Milvus vector database as it has GPU acceleration capabilities, an NVIDIA retriever embedding model (llama-3.2-nv-embedqa-1b-v2), and the Llama 3.1 70B NIM microservice.
With NIM microservices, your inference model runs on NVIDIA accelerated infrastructure, delivering faster response times and cost efficiency for production deployments. NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—can run anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.
For setup instructions on NIM endpoints, refer to the NVIDIA NIM documentation.

In the following text, we will go over the initialization of a Vanna instance with NVIDIA-accelerated components.
- Define the Vanna Class as follows.
from pymilvus import MilvusClient, model
from vanna.milvus import Milvus_VectorStore
from vanna.openai import OpenAI_Chat
from openai import OpenAI
# Set the nvidia api key
nvidia_api_key = '...'
# Define Vanna Class
class VannaMilvus(Milvus_VectorStore, OpenAI_Chat):
def __init__(self, llm_client, config=None):
Milvus_VectorStore.__init__(self, config=config)
OpenAI_Chat.__init__(self, client=llm_client, config=config)
- Next, create a NIM client based on OpenAI Wrapper with an NVIDIA API key.
def get_openai_client():
client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = nvidia_api_key
)
return client
llm_client = get_openai_client()
- Use a wrapper for the NeMo Retriever embedding model with Milvus DB. You can use the same
nvidia_api_key
.
class EmbeddingWrapper:
def __init__(self, embedder):
self.embedder = embedder
def encode_documents(self, texts):
result = self.embedder.embed_documents(texts)
return [np.array(r) for r in result]
def encode_queries(self, texts):
embeddings = []
for text in texts:
embeddings.append(self.embedder.embed_query(text))
return embeddings
vanna_embedder = EmbeddingWrapper(nvidia_embedder)
- Next, create a NIM client based on OpenAI Wrapper with an NVIDIA API key.
def get_openai_client():
client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = nvidia_api_key
)
return client
llm_client = get_openai_client()
- Using a local Milvus DB instance.
milvus_uri = "./milvus_nvidia.db"
milvus_client_nvidia = MilvusClient(uri=milvus_uri)
- Specify the config for the Vanna instance with the components defined previously.
# Specify the LLM used for SQL generation
model_name = "nvidia/llama-3.1-70b-instruct"
# Define the configuration for Vanna instance
config_nvidia = {
"model": model_name,
"milvus_client": milvus_client_nvidia,
"embedding_function": vanna_embedder,
"n_results": 2, # The number of results to return from Milvus semantic search.
}
vn_nvidia = VannaMilvus(llm_client, config=config_nvidia)
3. Create and populate the SQLite database
- Using SQLite for this demo provides a lightweight, serverless database for testing text-to-SQL. You can use any database supported by Vanna. Insert the three CSVs from the pre-processing step into the DB.
import sqlite3
import pandas as pd
# Specify the path to the SQLite database
sqlite_path = 'steam_data.db'
# Connect to the SQLite database
sql_connect = sqlite3.connect(sqlite_path)
c = sql_connect.cursor()
# Create tables
init_sqls = """
CREATE TABLE IF NOT EXISTS games (
app_id INTEGER PRIMARY KEY,
name TEXT,
release_date TEXT,
price REAL,
short_description TEXT,
positive INTEGER,
negative INTEGER,
min_owners INTEGER,
max_owners INTEGER,
hltb_single REAL
);
CREATE TABLE IF NOT EXISTS categories (
app_id INTEGER,
categories TEXT,
FOREIGN KEY (app_id) REFERENCES games(app_id)
);
CREATE TABLE IF NOT EXISTS tags (
app_id INTEGER,
tags TEXT,
tag_frequencies TEXT,
FOREIGN KEY (app_id) REFERENCES games(app_id)
);
"""
for sql in init_sqls.split(";"):
c.execute(sql)
# Read the CSV files
games_df = pd.read_csv('processed_dataset/games.csv')
categories_df = pd.read_csv('processed_dataset/categories.csv')
tags_df = pd.read_csv('processed_dataset/tags.csv')
# Insert data into tables
games_df.to_sql('games', sql_connect, if_exists='append', index=False)
categories_df.to_sql('categories', sql_connect, if_exists='append', index=False)
tags_df.to_sql('tags', sql_connect, if_exists='append', index=False)
sql_connect.commit()
- Now, connect the SQL DB to Vanna.
# Connect to the SQLite database
vn_nvidia.connect_to_sqlite(sqlite_path)
4. Train with data
Vanna performs best with context on the queried data and the business-specific terminology.
- Remove existing training data to avoid de-duplication and train Vanna with the database data definition language (DDL).
# Remove existing training data
existing_training_data = vn_nvidia.get_training_data()
if len(existing_training_data) > 0:
for _, training_data in existing_training_data.iterrows():
vn_nvidia.remove_training_data(training_data["id"])
# Get the DDL of the SQLite database
df_ddl = vn_nvidia.run_sql("SELECT type, sql FROM sqlite_master WHERE sql is not null")
# Train the model on the DDL data
for ddl in df_ddl["sql"].to_list():
vn_nvidia.train(ddl=ddl)
- Next, add documentation about the three tables in the database schema
games
,categories
, andtags
.
# Add documentation about your business terminology or definitions.
vn_nvidia.train(
documentation="""
This dataset is used to answer questions about the game trends.
"""
)
# Add documentation about the tables
vn_nvidia.train(
documentation="""
The games table contains information about the games.
The app_id is the unique identifier for the game.This is a primary key.
The name is the name of the game.
The release_date is the date the game was released.
The price is the price of the game. Price in USD, 0.0 if its free.
The short_description is a brief description of the game.
The positive is the number of positive reviews or votes.
The negative is the number of negative reviews or votes.
The min_owners is the minimum number of owners. Used together with max_owners to get an estimate of the player base.
The max_owners is the maximum number of owners. Used together with min_owners to get an estimate of the player base.
The hltb_single is the average playtime of the game. This is an estimate.
"""
)
vn_nvidia.train(
documentation="""
The categories table contains information about the categories of the games.
The app_id is the unique identifier for the game.
The categories is the categories of the game.
The app_id is a foreign key to the games table.
"""
)
vn_nvidia.train(
documentation="""
The tags table contains information about the tags of the games.
The app_id is the unique identifier for the game.
The tags is the tags of the game. These are user defined.
The tag_frequencies is the frequencies of the tags.
The app_id is a foreign key to the games table.
"""
)
- Then, view the training data with a simple command.
training_data = vn_nvidia.get_training_data()
training_data
5. Generate SQL
You are all set to start querying the database with NIM-powered Vanna.
- Start with a simple query. The generated SQL can also be printed before execution.
sql = vn_nvidia.generate_sql("Which 5 games have the most positive reviews and how many?")
vn_nvidia.run_sql(sql)
- Run a harder query now.
sql = vn_nvidia.generate_sql("Which indie game has the biggest player base?")
vn_nvidia.run_sql(sql)
Conclusion
In this tutorial, you learned how to accelerate text-to-SQL inference on Vanna using NVIDIA NIM. With a clean database schema, real-world dataset, and an optimized inference endpoint, the system is well-positioned to deliver more responsive analytics for user-generated queries.
To explore further:
- Try Vanna on your own dataset using this notebook as a starting point.
- Deploy NVIDIA NIM endpoints for production-scale inference.
Experiment with adding different training information to unlock better SQL generation.