Generative AI / LLMs

A Deep Dive into the Latest AI Models Optimized with NVIDIA NIM

Delivered as optimized containers, NVIDIA NIM microservices are designed to accelerate AI application development for businesses of all sizes, paving the way for rapid production and deployment of AI technologies. The set of microservices can be used to build and deploy AI solutions across speech AI, data retrieval, digital biology, digital humans, simulation, and large language models (LLMs).

Each month, NVIDIA works to deliver NIM microservices for leading AI models across industries and domains. This post offers a look at the latest additions.

Speech and translation NIM microservices

The latest NIM microservices for speech and translation enable organizations to integrate advanced multilingual speech and translation capabilities into their world-wide conversational applications. These include automatic speech recognition (ASR), text-to-speech (TTS), and neural machine translation (NMT), catering to diverse industry needs.

Parakeet ASR

The Parakeet ASR-CTC-1.1B-EnUS ASR model, with 1.1 billion parameters, provides record-setting English language transcription capabilities. It delivers exceptional accuracy and robustness, adeptly handling diverse speech patterns and noise levels. It enables businesses to advance their voice-based services, ensuring superior user experiences.

FastPitch-HiFiGAN TTS

A TTS NIM, FastPitch-HiFiGAN-EN integrates FastPitch and HiFiGAN models to generate high-fidelity audio from text. It enables businesses to create natural-sounding voices, elevating user engagement and delivering immersive experiences, setting a new standard in audio quality.

Megatron NMT

A powerful NMT model, Megatron 1B-En32 excels in real-time translation across multiple languages, facilitating seamless multilingual communication. It enables organizations to extend their global reach, engage diverse audiences, and foster efficient international collaboration.

By leveraging these advanced speech and translation NIM microservices, enterprises can revolutionize their conversational AI applications. From creating multilingual intelligent personal assistants and brand ambassadors to developing global customer service platforms, businesses can innovate and enhance user experiences across diverse languages and contexts.

Retrieval NIM microservices

The latest NVIDIA NeMo Retriever NIM microservices help developers efficiently fetch the best proprietary data to generate knowledgeable responses for their AI applications. NeMo Retriever enables organizations to seamlessly connect custom models to diverse business data and deliver highly accurate responses for AI applications using retrieval-augmented generation (RAG).

Embedding QA E5

The NVIDIA NeMo Retriever QA E5 embedding model is optimized for text question-answering retrieval. An embedding model is a crucial component of a text retrieval system, as it transforms textual information into dense vector representations. They are typically transformer decoders that process tokens of input text (for example, question, passage) to output an embedding.

Embedding QA Mistral 7B

The NVIDIA NeMo Retriever QA Mistral 7B embedding model is a popular multilingual community base model fine-tuned for text embedding for high-accuracy question-answering. This embedding model is most suitable for users who want to build a question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.

Developers can achieve 2x improved throughput with the NeMo Retriever QA Mistral 7B NIM.

Snowflake Arctic Embed

Snowflake Arctic Embed is a suite of text embedding models for high-quality retrieval, optimized for performance. These models are ready for commercial use, free of charge. The Arctic Embed models have achieved state-of-the-art performance on the MTEB/BEIR leaderboard for each of their size variants.

Reranking QA Mistral 4B

The NVIDIA NeMo Retriever QA Mistral 4B reranking model is optimized for providing a logit score that represents how relevant a document is to a given query. The ranking model is a component in a text retrieval system to improve the overall accuracy. A text retrieval system often uses an embedding model (dense) or lexical search (sparse) index to return relevant text passages given the input. 

A ranking model can be used to rerank the potential candidates into a final order. A ranking model receives the question-passage pairs as an input and can therefore process cross attention between the words. It would not be feasible to apply a ranking model on all documents in the knowledge base, so ranking models are often deployed in combination with embedding models.

Developers can achieve 1.75x improved throughput with the NeMo Retriever QA Mistral 4B reranking NIM.

Digital biology NIM microservices

In the healthcare and life sciences sectors, NVIDIA NIM microservices are transforming digital biology. These advanced AI tools empower pharmaceutical companies, biotechnology, and healthcare facilities with capabilities to expedite innovation and the delivery of life-saving medicine to patients.

MolMIM

MolMIM is a transformer-based model for controlled small molecule generation. It can optimize and sample molecules from the latent space that have improved values of the desired scoring functions. This includes functions from other models and functions based on experimental data testing for various chemical and biological properties. Built on robust inference engines, the MolMIM NIM microservice can be deployed in the cloud or on-premises for enterprise-grade inference in computational drug discovery workflows, including virtual screening, lead optimization, and other lab-in-the-loop approaches. 

DiffDock

NVIDIA DiffDock NIM microservice is built for high-performance, scalable molecular docking at enterprise scale. It requires protein and molecule 3D structures as input but does not require any information about a binding pocket. Driven by a generative AI model and accelerated 3D equivariant graph neural networks, it can predict up to 7x more poses per second compared to the baseline published model, reducing the cost of computational drug discovery workflows, including virtual screening and lead optimization. 

These digital biology NIM microservices enable pharmaceutical companies to streamline their drug development computational workflows, potentially delivering life-saving treatments faster at lower R&D cost.

LLM NIM microservices

LLMs continue to be a cornerstone of AI innovation. New NVIDIA NIM microservices for LLMs offer unprecedented performance and accuracy across various applications and languages.

Llama 3.1 8B and 70B

The Llama 3.1 8B and 70B models provide cutting-edge text generation and language understanding capabilities, serving as powerful tools for creating engaging and informative content. When deploying Llama 3.1 8B NIM on NVIDIA H100 data center GPUs, developers can achieve an out-of-the-box performance increase of up to 2.5x tokens per second for content generation compared to deploying the model without NIM.

Bar chart showing the comparison of tokens used for the Mixtral 8x22B Llama 3.1 8B model with and without NIM. Without NIM, the performance output is 2,679 tokens per second. With NIM, the output is improved by 2.5x, and results in a performance output of 6,372 tokens per second.
Figure 1. Llama 3.1 8B NIM shows improved throughput for translation

Llama3.1 8B Instruct, 1 x H100 SXM; input and output token lengths: 1,000. Concurrent client requests: 200. NIM on: BF16, TTFT: ~1s, ITL: ~30ms. NIM off: BF16, TTFT: ~4s, ITL: ~65ms

Llama 3.1 405B 

Llama 3.1 405B is the largest openly available model that can be used for a wide variety of use cases. One key use case is synthetic data generation, helping businesses enhance model performance and expand their datasets. The Llama 3.1 405B NIM microservice can be downloaded and run anywhere today from the NVIDIA API catalog. 

Simulation NIM microservices

New NVIDIA USD NIM microservices offer the ability to leverage generative AI copilots and agents to develop Universal Scene Description (OpenUSD) tools that accelerate the creation of 3D worlds

The following microservices are now available to preview

USD Code

USD Code is a state-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.

USD Search provides AI-powered search for OpenUSD data, 3D models, images, and assets using text- or image-based inputs.

USD Validate 

USD Validate enables verifying compatibility of OpenUSD assets with instant RTX render and rule-based validation.

With these new USD NIM microservices, more industries will be able to develop applications for visualizing industrial design and engineering projects, or to simulate environments to build the next wave of physical AI and robots.

Video conferencing NIM microservices

NVIDIA Maxine simplifies the deployment for AI features that enhance audio, video, and augmented reality effects for video conferencing and telepresence. 

Maxine Audio2Face-2D

Maxine Audio2Face-2D, now available in the API catalog, animates a 2D image in real time, using speech audio only. Speech signals are interpreted to corresponding facial animation in the portrait photo to produce an H.264 compressed output video. It also enables head pose animation for natural delivery and can be coupled with a chatbot output or translated speech. A common use case is virtual agents. You can begin prototyping with Audio2Face-2D through the API catalog today. 

Eye contact

Eye contact plays a key role in establishing social connections, and in face-to-face conversations it signifies confidence, connection, and attention. To improve, augment, and enhance the user experience, NVIDIA has developed NVIDIA Maxine Eye Contact NIM microservice. This feature uses AI to apply a filter to the user’s webcam feed in real time and redirects their eye gaze toward the camera.

Accelerate AI application development

NVIDIA NIM streamlines the creation of complex AI applications by enabling the integration of specialized microservices across domains. Using NIM microservices, organizations can bypass the complexities of building AI models from scratch, saving time and resources. This frees teams to focus on integrating these pre-trained models into their workflows, accelerating operational transformation. The modular nature of NIM microservices allows for the assembly of customized AI solutions that meet specific business needs.

For example, a company can combine ACE NIM microservices, including speech recognition, with LLM NIM microservices to create digital humans for personalized customer service across industries such as healthcare, finance, and retail. 

Video 1. Learn how digital humans can transform industries

NIM microservices can also be integrated into supply chain management systems, combining cuOpt NIM microservice for route optimization with NeMo Retriever NIM microservices for retrieval-augmented generation (RAG) and  LLM NIM microservices so business can talk to their supply chain.

Video 2. Respond to supply chain changes in seconds using NIM microservices

Get started

NVIDIA NIM empowers enterprises to fully harness AI, accelerating innovation, maintaining a competitive edge, and delivering superior customer experiences. Explore the latest AI models available with NIM microservices and discover how these powerful tools can transform your business.

Discuss (0)

Tags