Generative AI

Accelerating Text-to-SQL Inference on Vanna with NVIDIA NIM for Faster Analytics

Illustration of text-to-SQL Inference.

Slow and inefficient query generation from natural language inputs bottlenecks decision-making. This forces analysts and business users to rely heavily on data teams, delaying insights and limiting agility. 

Text-to-SQL is shifting how we interact with data, enabling users to query structured databases in natural language. This is especially useful when deploying a domain-specific model for specialized industries. However, inferencing with these models at scale for analytics workloads comes with latency and performance challenges.

In this tutorial, we’ll show how to optimize Vanna’s text-to-SQL solution using NVIDIA NIM, accelerated inference microservices that deliver optimized endpoints for generative AI models. Vanna’s open-source text-to-SQL solution has gained traction among organizations due to its flexibility, security, and adaptability.

You’ll learn how to:

  • Insert preprocessed data into a database and connect with Vanna.
  • Train Vanna on business terminology or definitions.
  • Leverage an NVIDIA NeMo Retriever embedding model to retrieve context for SQL generation.
  • Use an NVIDIA NIM for LLMs endpoints for accelerated inference. 

For the demonstration,  we used two open-source Steam games datasets from Kaggle, a rich dataset of game metadata that’s ideal for analytics queries. 

Prerequisites

Procedure

1. Download and preprocess the Steam dataset

To follow along, clone the repository located at /NVIDIA/GenerativeAIExamples and open the Vanna_with_NVIDIA notebook in the community section. 

The preprocessing steps in this tutorial are adapted from an excellent Kaggle notebook and can be found in the ‘Data Preparation’ section of the Vanna with NVIDIA notebook.

Key preprocessing steps include:

  • Filtering required columns 
  • Unpacking columns with a range into separate columns
  • Adding game length to each game title

We get three CSVs at the end of the preprocessing step:

  • tableau_games.csv
  • tableau_categories.csv
  • tableau_tags.csv 

2. Initialize Vanna with NVIDIA NIM and NeMo Retriever

A Vanna instance needs a connection to a vector database, an embedder, and the LLM endpoint. We use the Milvus vector database as it has GPU acceleration capabilities, an NVIDIA retriever embedding model (llama-3.2-nv-embedqa-1b-v2), and the Llama 3.1 70B NIM microservice. 

With NIM microservices, your inference model runs on NVIDIA accelerated infrastructure, delivering faster response times and cost efficiency for production deployments. NVIDIA NIM—which is included with the NVIDIA AI Enterprise software license—can run anywhere, giving you ownership of your customizations and full control of your intellectual property (IP) and AI applications.

For setup instructions on NIM endpoints, refer to the NVIDIA NIM documentation

A diagram showing the NVIDIA-accelerated Vanna text-to-SQL pipeline, including input question, embedding generation, vectorDB retrieval, and SQL generation using GPU-accelerated components.
Figure 1. NVIDIA accelerated Vanna pipeline

In the following text, we will go over the initialization of a Vanna instance with NVIDIA-accelerated components.

  • Define the Vanna Class as follows.
from pymilvus import MilvusClient, model
from vanna.milvus import Milvus_VectorStore
from vanna.openai import OpenAI_Chat
from openai import OpenAI

# Set the nvidia api key 
nvidia_api_key = '...'

# Define Vanna Class
class VannaMilvus(Milvus_VectorStore, OpenAI_Chat):
    def __init__(self, llm_client, config=None):
        Milvus_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, client=llm_client, config=config)
  • Next, create a NIM client based on OpenAI Wrapper with an NVIDIA API key.
def get_openai_client():
    client = OpenAI(
        base_url = "https://integrate.api.nvidia.com/v1",
        api_key = nvidia_api_key
    )
    return client

llm_client = get_openai_client()
  • Use a wrapper for the NeMo Retriever embedding model with Milvus DB. You can use the same nvidia_api_key.
class EmbeddingWrapper:
    def __init__(self, embedder):
        self.embedder = embedder

    def encode_documents(self, texts):
        result = self.embedder.embed_documents(texts)
        return [np.array(r) for r in result]

    def encode_queries(self, texts):
        embeddings = []
        for text in texts:
            embeddings.append(self.embedder.embed_query(text))
        return embeddings

vanna_embedder = EmbeddingWrapper(nvidia_embedder)

  • Next, create a NIM client based on OpenAI Wrapper with an NVIDIA API key.
def get_openai_client():
    client = OpenAI(
        base_url = "https://integrate.api.nvidia.com/v1",
        api_key = nvidia_api_key
    )
    return client

llm_client = get_openai_client()
  • Using a local Milvus DB instance.
milvus_uri = "./milvus_nvidia.db"
milvus_client_nvidia = MilvusClient(uri=milvus_uri)
  • Specify the config for the Vanna instance with the components defined previously.
# Specify the LLM used for SQL generation
model_name = "nvidia/llama-3.1-70b-instruct"

# Define the configuration for Vanna instance
config_nvidia = {
        "model": model_name,
        "milvus_client": milvus_client_nvidia,
        "embedding_function": vanna_embedder,
        "n_results": 2,  # The number of results to return from Milvus semantic search.
    }

vn_nvidia = VannaMilvus(llm_client, config=config_nvidia)

3. Create and populate the SQLite database

  • Using SQLite for this demo provides a lightweight, serverless database for testing text-to-SQL. You can use any database supported by Vanna. Insert the three CSVs from the pre-processing step into the DB.
import sqlite3
import pandas as pd

# Specify the path to the SQLite database
sqlite_path = 'steam_data.db'

# Connect to the SQLite database
sql_connect = sqlite3.connect(sqlite_path)
c = sql_connect.cursor()

# Create tables
init_sqls = """
CREATE TABLE IF NOT EXISTS games (
    app_id INTEGER PRIMARY KEY,
    name TEXT,
    release_date TEXT,
    price REAL,
    short_description TEXT,
    positive INTEGER,
    negative INTEGER,
    min_owners INTEGER,
    max_owners INTEGER,
    hltb_single REAL
);

CREATE TABLE IF NOT EXISTS categories (
    app_id INTEGER,
    categories TEXT,
    FOREIGN KEY (app_id) REFERENCES games(app_id)
);

CREATE TABLE IF NOT EXISTS tags (
    app_id INTEGER,
    tags TEXT,
    tag_frequencies TEXT,
    FOREIGN KEY (app_id) REFERENCES games(app_id)
);
"""

for sql in init_sqls.split(";"):
    c.execute(sql)

# Read the CSV files
games_df = pd.read_csv('processed_dataset/games.csv')
categories_df = pd.read_csv('processed_dataset/categories.csv')
tags_df = pd.read_csv('processed_dataset/tags.csv')

# Insert data into tables
games_df.to_sql('games', sql_connect, if_exists='append', index=False)
categories_df.to_sql('categories', sql_connect, if_exists='append', index=False)
tags_df.to_sql('tags', sql_connect, if_exists='append', index=False)

sql_connect.commit()
  • Now, connect the SQL DB to Vanna.
# Connect to the SQLite database
vn_nvidia.connect_to_sqlite(sqlite_path)

4. Train with data 

Vanna performs best with context on the queried data and the business-specific terminology. 

  • Remove existing training data to avoid de-duplication and train Vanna with the database data definition language (DDL).
# Remove existing training data
existing_training_data = vn_nvidia.get_training_data()
if len(existing_training_data) > 0:
    for _, training_data in existing_training_data.iterrows():
        vn_nvidia.remove_training_data(training_data["id"])

# Get the DDL of the SQLite database
df_ddl = vn_nvidia.run_sql("SELECT type, sql FROM sqlite_master WHERE sql is not null")

# Train the model on the DDL data
for ddl in df_ddl["sql"].to_list():
    vn_nvidia.train(ddl=ddl)
  • Next, add documentation about the three tables in the database schema games, categories, and tags.
# Add documentation about your business terminology or definitions.
vn_nvidia.train(
    documentation="""
    This dataset is used to answer questions about the game trends.
    """
)

# Add documentation about the tables
vn_nvidia.train(
    documentation="""
    The games table contains information about the games.
    The app_id is the unique identifier for the game.This is a primary key.
    The name is the name of the game.
    The release_date is the date the game was released.
    The price is the price of the game. Price in USD, 0.0 if its free.
    The short_description is a brief description of the game.
    The positive is the number of positive reviews or votes.
    The negative is the number of negative reviews or votes.
    The min_owners is the minimum number of owners. Used together with max_owners to get an estimate of the player base.
    The max_owners is the maximum number of owners. Used together with min_owners to get an estimate of the player base.
    The hltb_single is the average playtime of the game. This is an estimate.
    """
)

vn_nvidia.train(
    documentation="""
    The categories table contains information about the categories of the games.
    The app_id is the unique identifier for the game.
    The categories is the categories of the game.
    The app_id is a foreign key to the games table.
    """
)

vn_nvidia.train(
    documentation="""
    The tags table contains information about the tags of the games.
    The app_id is the unique identifier for the game.
    The tags is the tags of the game. These are user defined.
    The tag_frequencies is the frequencies of the tags.
    The app_id is a foreign key to the games table.
    """
)
  • Then, view the training data with a simple command.
training_data = vn_nvidia.get_training_data()
training_data

5. Generate SQL

You are all set to start querying the database with NIM-powered Vanna.

  • Start with a simple query. The generated SQL can also be printed before execution.
sql = vn_nvidia.generate_sql("Which 5 games have the most positive reviews and how many?")
vn_nvidia.run_sql(sql)
  • Run a harder query now.
sql = vn_nvidia.generate_sql("Which indie game has the biggest player base?")
vn_nvidia.run_sql(sql)

Conclusion

In this tutorial, you learned how to accelerate text-to-SQL inference on Vanna using NVIDIA NIM. With a clean database schema, real-world dataset, and an optimized inference endpoint, the system is well-positioned to deliver more responsive analytics for user-generated queries.

To explore further:

  • Try Vanna on your own dataset using this notebook as a starting point.
  • Deploy NVIDIA NIM endpoints for production-scale inference.

Experiment with adding different training information to unlock better SQL generation.

Discuss (0)

Tags