In the rapidly evolving field of software development, AI tools such as chatbots and GitHub Copilot have significantly transformed how developers write and manage code. These tools, built on large language models (LLMs), enhance productivity by automating routine coding tasks.
Parallel computing challenges
However, the use of LLMs in generating parallel computing code—essential for high-performance computing (HPC) applications—has encountered challenges. Parallel computing applications require a nuanced understanding of both the logic of functional programming and the complexities involved in managing multiple concurrent operations, such as avoiding deadlocks and race conditions. Traditional AI models can generate serial code effectively but falter with parallel constructs due to these added complexities.
NVIDIA faced a similar challenge with generating complex code but in the context of supporting the design of new accelerated computing semiconductors. Highly relevant work addressing this problem, ChipNeMo, was published during Supercomputing 2023, which combined domain-adaptive pre-training (DAPT) and retrieval-augmented generation (RAG) techniques successfully to produce an EDA co-pilot for engineers. The combination of DAPT with advanced RAG techniques resonated well with the problem facing the scientists at Sandia and kickstarted the following collaboration.
A team of scientists at Sandia National Laboratories embarked on an ambitious project using NVIDIA NeMo software offerings, such as early access to nVolve-40K (Embedding model), to address these challenges. They developed a specialized coding assistant tailored for Kokkos, a C++ library that provides you with the tools to write performance-portable applications. It abstracts hardware complexities and enables code to run efficiently on different types of HPC architectures.
Creating an AI model for such a specific task usually requires fine-tuning a foundation model with domain-specific knowledge, a resource-intensive and time-consuming process. To stay agile with software updates and bug fixes, HPC developers generally prefer a more flexible approach.
This assistant aims to support you by providing accurate, context-aware code suggestions, using the latest in AI advancements.
Implementing advanced RAG
In collaboration with NVIDIA, Sandia is developing an advanced RAG approach, to enable a modular workflow that is adaptable to ongoing data set changes and integrates state-of-the-art models with HPC workloads as soon as they are released.
According to Robert Hoekstra, Ph.D., senior manager of Extreme Scale Computing at Sandia, organizations across several industries, including Sandia, are unlocking valuable insights with NVIDIA generative AI, enabling semantic search of their enterprise data. Sandia and NVIDIA are collaborating to evaluate emerging generative AI tools to maximize data insights while improving accuracy and performance.
By compiling extensive code repositories that use Kokkos, Sandia scientists created a dynamic and continually updated dataset. Using a recursive text splitter for C programming, the dataset can be organized into manageable chunks. These dataset chunks are then converted into vector embeddings and stored in a database, yielding a less resource-intensive process than fine-tuning.
When a query is made, it’s transformed into a vector that retrieves relevant code chunks from the database based on their cosine similarity. These chunks are then used to provide context for generating the response, creating a rich, informed output incorporating real-world coding patterns and solutions.
RAG evaluation
In Sandia’s preliminary evaluation of naive RAG, a variety of metrics including BLEU, ChrF, METEOR, and ROUGE-L were used to assess the effectiveness of the generated code against standard benchmarks provided by Sandia Kokkos developers. Encouragingly, initial results revealed a 3-4 point increase in scaled mean evaluations with the implementation of the RAG approach.
Embedding models | LLMs |
BGE – large – EN – V 1.5 | MISTRAL – 7B – INSTRUCT – V0.2 |
E5 – base – V2. | MIXTRAL – 8X7V – INSTRUCT – V 0.1 |
NVolve40K | WIZARDCODER – 15B – V1.0 |
UAE – LARGE – V1 | MAGICOD ER – S – DS – 6.7 B. |
Model | OSS scaled mean | RAG scaled mean |
mistralai-MIstral-7B-Instruct-v0.2 | 18.75 | 22.59 |
mistralai-Mixtral-8x7B-Instruct-v0.1 | 23.68 | 23.85 |
WizardLM_WizardCoder-15B-V1.0 | 25.01 | 28.61 |
ise-uiuc_MagicCoder-S-DS-6.7B | 25.90 | 28.96 |
RAG results for modular HPC workflows
Minor performance improvements were achieved with a naïve RAG pipeline implementation. The Mixtral model (MoE architecture) was not significantly affected by the appended Kokkos context.
No superior embedding model was found. Instead, specific pairings of embedding models and LLMs offer the best results.
RAG for multi-query retrieval
The Sandia team also experimented with advanced RAG techniques to refine how to retrieve relevant content.
One such technique involved generating multiple related queries to broaden the search for applicable code snippets, especially when user queries were vague or lacked specific details. This method improved the likelihood of retrieving more relevant context, enhancing the accuracy and usefulness of the generated code.
The following example shows a query and the generated queries created by the multi-query retriever.
Prompt: Create a Kokkos view of doubles with size 10 x 3
Generated Queries:
1. How can I create a Kokkos view of doubles with size 10 x 3?
2. How can I create a Kokkos view of doubles with size 10 x 3 and initialize it with random values?
3. How can I create a Kokkos view of doubles with size 10 x 3 and initialize it with values from a file?
Dataset context enrichment
Sandia researchers are also exploring the parent document retriever approach. In this approach, the data is segmented into large parent chunks that provide general context and smaller child chunks focused on specific details.
This strategy helps balance the need for specificity with the breadth of context, optimizing both the relevance and comprehensiveness of the information retrieved.
The following prompt example shows the child and parent chunks used to search the database.
Prompt: Create a Kokkos view of doubles with size 10 x 3
Child Chunk:
// Create the Kokkos::View X_lcl.
const size_t numLclRows = 10;
const size_t numVecs = 3;
typename dual_view_type::t_dev X_lcl ("X_lcl", numLclRows, numVecs);
Parent Chunk:
// Create the Kokkos::View X_lcl.
const size_t numLclRows = 10;
const size_t numVecs = 3;
typename dual_view_type::t_dev X_lcl ("X_lcl", numLclRows, numVecs);
// Modify the Kokkos::View's data.
Kokkos::deep_copy (X_lcl, ONE);
{
lclSuccess = success ? 1 : 0;
gblSuccess = 0; // output argument
reduceAll<int, int> (*comm, REDUCE_MIN, lclSuccess, outArg (gblSuccess));
TEST_EQUALITY_CONST(gblSuccess, 1);
if (gblSuccess != 1) {
return;
}
std::ostringstream os;
os << "Proc " << comm->getRank () << ": checkpoint 1" << std::endl;
std::cerr << os.str ();
}
Advanced AI for HPC results
The Sandia team’s initial evaluations used standard NLP metrics like BLEU (Bilingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation).
However, they recognized the need for a more tailored evaluation metric that better suit code generation, particularly one that can assess the syntactical accuracy and operational efficacy of the generated code without relying on traditional output comparisons.
Enhancing future code functionality
As Sandia scientists refine their retrieval methods and AI models, they anticipate further enhancements in the coding assistant’s ability to generate not only functionally accurate but also contextually relevant code suggestions. Future developments will focus on fine-tuning the base model similar to how the NVIDIA engineers did on ChipNemo and refining the retrieval processes and integrating more advanced LLMs and embedding techniques as they become available.
AI for HPC code development
The integration of AI into code development, especially within the scope of high-performance computing, represents a significant leap forward in productivity and efficiency. The proof of concept work with some of the offerings within NVIDIA NeMo and Sandia is at the forefront of this transformation, pushing the boundaries of what AI can achieve in complex computing environments. By continuously adapting and improving their AI models and methodologies, Sandia aims to provide you with powerful tools that enhance your ability to innovate and solve complex problems.
This advanced RAG initiative showcases the potential of AI in software development and highlights the importance of targeted solutions in technology advancement.
At Sandia, they’re evaluating both in-house and external cloud solutions to host these complex AI models, which require significant GPU resources. The choice depends on the availability of appropriate hardware and the scale of use across Sandia teams. Partnering with vendors like NVIDIA could provide valuable support, especially for integrating their container systems with Sandia’s existing infrastructure.
More information
For more information, see the following resources:
- Maximizing The Value of Your Enterprise Data with RAG (requires login): Introduces RAG, a system that combines information finding with text or code generation to maximize the value of enterprise data.
ChipNeMo: Domain-Adapted LLMs for Chip Design: Includes an explanation of how RAG is used to create EDA scripts for chip design, a challenge with some similar characteristics to create a RAG model for parallel code.