Agentic AI / Generative AI

Advanced AI and Retrieval-Augmented Generation for Code Development in High-Performance Computing

Decorative image of multimodal RAG workflow.

May 12, 2024

By Harry Petty, Geetika Gupta and Sarah Tsai

Discuss (2)

AI-Generated Summary

Dislike

Sandia National Laboratories is developing a coding assistant for Kokkos, a C++ library for high-performance computing (HPC) applications, using NVIDIA NeMo software offerings and advanced retrieval-augmented generation (RAG) techniques to provide accurate, context-aware code suggestions.
The team compiled extensive code repositories that use Kokkos into a dynamic dataset, which is then converted into vector embeddings and stored in a database, allowing for efficient retrieval of relevant code chunks based on cosine similarity.
The Sandia team's initial evaluations showed a 3-4 point increase in scaled mean evaluations with the implementation of the RAG approach, and they are continuing to refine their retrieval methods and AI models to enhance the coding assistant's ability to generate functionally accurate and contextually relevant code suggestions.

AI-generated content may summarize information incompletely. Verify important information. Learn more

In the rapidly evolving field of software development, AI tools such as chatbots and GitHub Copilot have significantly transformed how developers write and manage code. These tools, built on large language models (LLMs), enhance productivity by automating routine coding tasks.

Parallel computing challenges

However, the use of LLMs in generating parallel computing code—essential for high-performance computing (HPC) applications—has encountered challenges. Parallel computing applications require a nuanced understanding of both the logic of functional programming and the complexities involved in managing multiple concurrent operations, such as avoiding deadlocks and race conditions. Traditional AI models can generate serial code effectively but falter with parallel constructs due to these added complexities.

NVIDIA faced a similar challenge with generating complex code but in the context of supporting the design of new accelerated computing semiconductors. Highly relevant work addressing this problem, ChipNeMo, was published during Supercomputing 2023, which combined domain-adaptive pre-training (DAPT) and retrieval-augmented generation (RAG) techniques successfully to produce an EDA co-pilot for engineers. The combination of DAPT with advanced RAG techniques resonated well with the problem facing the scientists at Sandia and kickstarted the following collaboration.

A team of scientists at Sandia National Laboratories embarked on an ambitious project using NVIDIA NeMo software offerings, such as early access to nVolve-40K (Embedding model), to address these challenges. They developed a specialized coding assistant tailored for Kokkos, a C++ library that provides you with the tools to write performance-portable applications. It abstracts hardware complexities and enables code to run efficiently on different types of HPC architectures.

Creating an AI model for such a specific task usually requires fine-tuning a foundation model with domain-specific knowledge, a resource-intensive and time-consuming process. To stay agile with software updates and bug fixes, HPC developers generally prefer a more flexible approach.

This assistant aims to support you by providing accurate, context-aware code suggestions, using the latest in AI advancements.

Implementing advanced RAG

In collaboration with NVIDIA, Sandia is developing an advanced RAG approach, to enable a modular workflow that is adaptable to ongoing data set changes and integrates state-of-the-art models with HPC workloads as soon as they are released.

According to Robert Hoekstra, Ph.D., senior manager of Extreme Scale Computing at Sandia, organizations across several industries, including Sandia, are unlocking valuable insights with NVIDIA generative AI, enabling semantic search of their enterprise data. Sandia and NVIDIA are collaborating to evaluate emerging generative AI tools to maximize data insights while improving accuracy and performance.

By compiling extensive code repositories that use Kokkos, Sandia scientists created a dynamic and continually updated dataset. Using a recursive text splitter for C programming, the dataset can be organized into manageable chunks. These dataset chunks are then converted into vector embeddings and stored in a database, yielding a less resource-intensive process than fine-tuning.

When a query is made, it’s transformed into a vector that retrieves relevant code chunks from the database based on their cosine similarity. These chunks are then used to provide context for generating the response, creating a rich, informed output incorporating real-world coding patterns and solutions.

RAG evaluation

In Sandia’s preliminary evaluation of naive RAG, a variety of metrics including BLEU, ChrF, METEOR, and ROUGE-L were used to assess the effectiveness of the generated code against standard benchmarks provided by Sandia Kokkos developers. Encouragingly, initial results revealed a 3-4 point increase in scaled mean evaluations with the implementation of the RAG approach.

\(Scaled mean = [100(BLEU+RougeL+CodeBLEU+METEOR)+ChrF]\slash5\)

Embedding models	LLMs
BGE – large – EN – V 1.5	MISTRAL – 7B – INSTRUCT – V0.2
E5 – base – V2.	MIXTRAL – 8X7V – INSTRUCT – V 0.1
NVolve40K	WIZARDCODER – 15B – V1.0
UAE – LARGE – V1	MAGICOD ER – S – DS – 6.7 B.

Table 1. Embedding models and open-source LLMs tested

Model	OSS scaled mean	RAG scaled mean
mistralai-MIstral-7B-Instruct-v0.2	18.75	22.59
mistralai-Mixtral-8x7B-Instruct-v0.1	23.68	23.85
WizardLM_WizardCoder-15B-V1.0	25.01	28.61
ise-uiuc_MagicCoder-S-DS-6.7B	25.90	28.96

Table 2. Benchmarking open-source software (OSS) embedding models and LLMs

RAG results for modular HPC workflows

Minor performance improvements were achieved with a naïve RAG pipeline implementation. The Mixtral model (MoE architecture) was not significantly affected by the appended Kokkos context.

No superior embedding model was found. Instead, specific pairings of embedding models and LLMs offer the best results.

RAG for multi-query retrieval

The Sandia team also experimented with advanced RAG techniques to refine how to retrieve relevant content.

One such technique involved generating multiple related queries to broaden the search for applicable code snippets, especially when user queries were vague or lacked specific details. This method improved the likelihood of retrieving more relevant context, enhancing the accuracy and usefulness of the generated code.

The following example shows a query and the generated queries created by the multi-query retriever.

Prompt: Create a Kokkos view of doubles with size 10 x 3

Generated Queries:

1. How can I create a Kokkos view of doubles with size 10 x 3?
2. How can I create a Kokkos view of doubles with size 10 x 3 and initialize it with random values?
3. How can I create a Kokkos view of doubles with size 10 x 3 and initialize it with values from a file?

Dataset context enrichment

Sandia researchers are also exploring the parent document retriever approach. In this approach, the data is segmented into large parent chunks that provide general context and smaller child chunks focused on specific details.

This strategy helps balance the need for specificity with the breadth of context, optimizing both the relevance and comprehensiveness of the information retrieved.

The following prompt example shows the child and parent chunks used to search the database.

Prompt: Create a Kokkos view of doubles with size 10 x 3

Child Chunk:
// Create the Kokkos::View X_lcl.
const size_t numLclRows = 10;
const size_t numVecs = 3;
typename dual_view_type::t_dev X_lcl ("X_lcl", numLclRows, numVecs);


Parent Chunk:
// Create the Kokkos::View X_lcl.
const size_t numLclRows = 10;
const size_t numVecs = 3;
typename dual_view_type::t_dev X_lcl ("X_lcl", numLclRows, numVecs);

// Modify the Kokkos::View's data.
Kokkos::deep_copy (X_lcl, ONE);

{
lclSuccess = success ? 1 : 0;
gblSuccess = 0; // output argument
reduceAll<int, int> (*comm, REDUCE_MIN, lclSuccess, outArg (gblSuccess));
TEST_EQUALITY_CONST(gblSuccess, 1);
if (gblSuccess != 1) {
return;
}
std::ostringstream os;
os << "Proc " << comm->getRank () << ": checkpoint 1" << std::endl;
std::cerr << os.str ();
}

Advanced AI for HPC results

The Sandia team’s initial evaluations used standard NLP metrics like BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation).

However, they recognized the need for a more tailored evaluation metric that better suit code generation, particularly one that can assess the syntactical accuracy and operational efficacy of the generated code without relying on traditional output comparisons.

Enhancing future code functionality

As Sandia scientists refine their retrieval methods and AI models, they anticipate further enhancements in the coding assistant’s ability to generate not only functionally accurate but also contextually relevant code suggestions. Future developments will focus on fine-tuning the base model similar to how the NVIDIA engineers did on ChipNemo and refining the retrieval processes and integrating more advanced LLMs and embedding techniques as they become available.

AI for HPC code development

The integration of AI into code development, especially within the scope of high-performance computing, represents a significant leap forward in productivity and efficiency. The proof of concept work with some of the offerings within NVIDIA NeMo and Sandia is at the forefront of this transformation, pushing the boundaries of what AI can achieve in complex computing environments. By continuously adapting and improving their AI models and methodologies, Sandia aims to provide you with powerful tools that enhance your ability to innovate and solve complex problems.

This advanced RAG initiative showcases the potential of AI in software development and highlights the importance of targeted solutions in technology advancement.

At Sandia, they’re evaluating both in-house and external cloud solutions to host these complex AI models, which require significant GPU resources. The choice depends on the availability of appropriate hardware and the scale of use across Sandia teams. Partnering with vendors like NVIDIA could provide valuable support, especially for integrating their container systems with Sandia’s existing infrastructure.

More information

For more information, see the following resources:

Maximizing The Value of Your Enterprise Data with RAG (requires login): Introduces RAG, a system that combines information finding with text or code generation to maximize the value of enterprise data.

ChipNeMo: Domain-Adapted LLMs for Chip Design: Includes an explanation of how RAG is used to create EDA scripts for chip design, a challenge with some similar characteristics to create a RAG model for parallel code.

Discuss (2)

About the Authors

About Harry Petty
Harry Petty is a senior technical marketing manager for HPC and AI edge applications at NVIDIA. Previously, he was a principal engineer and marketing director at Cisco Systems where he brought SDN innovations to market for hybrid cloud, multitenant security, and data center application performance. Harry has an MBA from Booth Graduate School of Business and a BS in mathematics and computer science from the University of Dayton.

View all posts by Harry Petty

About Geetika Gupta
Geetika Gupta is senior director of Product Management at NVIDIA. She works on defining products for applying AI, HPC, scientific research, and autonomous discovery. Geetika holds an MBA from the Anderson School at UCLA and a bachelor’s in mechanical engineering from IITBHU.

View all posts by Geetika Gupta

About Sarah Tsai
Sarah Tsai is a data scientist at Sandia National Laboratories, focusing on the development of generative AI and machine learning applications. Before joining Sandia, she conducted research on biological modeling aimed at advancing cancer treatment. Sarah holds a B.Sc. in biomedical engineering from the University of Texas at Austin.

View all posts by Sarah Tsai