The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI’s ChatGPT in 2022, the new technology amassed over 100M users within months and drove a surge of development activities across almost every industry. By 2023, developers began POCs using APIs and open-source community models from Meta, Mistral, Stability, and more. Entering 2024…
]]>What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: The benefits are great, but training and deploying large models can be computationally expensive and resource-intensive. Computationally efficient, cost-effective, and energy-efficient systems, architected to deliver real-time…
]]>Generative AI has the potential to transform every industry. Human workers are already using large language models (LLMs) to explain, reason about, and solve difficult cognitive tasks. Retrieval-augmented generation (RAG) connects LLMs to data, expanding the usefulness of LLMs by giving them access to up-to-date and accurate information. Many enterprises have already started to explore how…
]]>In the era of generative AI, where machines are not just learning from data but generating human-like text, images, video, and more, retrieval-augmented generation (RAG) stands out as a groundbreaking approach. A RAG workflow builds on large language models (LLMs), which can understand queries and generate responses. However, LLMs have limitations, including training complexity and a lack of…
]]>At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. pandas, a flexible and powerful data analysis and manipulation library for Python, is a top choice for data scientists because of its easy-to-use API. However, as dataset sizes grow, it struggles with processing speed and efficiency in…
]]>Autonomous machine development is an iterative process of data generation and gathering, model training, and deployment characterized by complex multi-stage, multi-container workflows across heterogeneous compute resources. Multiple teams are involved, each requiring shared and heterogeneous compute. Furthermore, teams want to scale certain workloads into the cloud…
]]>Across the globe, enterprises are realizing the benefits of generative AI models. They are racing to adopt these models in various applications, such as chatbots, virtual assistants, coding copilots, and more. While general-purpose models work well for simple tasks, they underperform when it comes to catering to the unique needs of various industries. Custom generative AI models outperform…
]]>Across every industry, and every job function, generative AI is activating the potential within organizations—turning data into knowledge and empowering employees to work more efficiently. Accurate, relevant information is critical for making data-backed decisions. For this reason, enterprises continue to invest in ways to improve how business data is stored, indexed, and accessed.
]]>A random forest is a supervised algorithm that uses an ensemble learning method consisting of a multitude of decision trees, the output of which is the consensus of the best answer to the problem. Random forest can be used for classification or regression.
]]>Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models with the open-source release of Mistral Mixtral 8x7B. The strong relative performance of the Mixtral model has raised much interest and numerous questions about MoE and its use in LLM architectures. So, what is MoE and why is it important?
]]>As ray tracing becomes the predominant rendering technique in modern game engines, a single GPU RayGen shader can now perform most of the light simulation of a frame. To manage this level of complexity, it becomes necessary to observe a decomposition of shader performance at the HLSL or GLSL source-code level. As a result, shader profilers are now a must-have tool for optimizing ray tracing.
]]>NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.
]]>The development of useful quantum computing is a massive global effort, spanning government, enterprise, and academia. The benefits of quantum computing could help solve some of the most challenging problems in the world related to applications such as materials simulation, climate modeling, risk management, supply chain optimization, and bioinformatics. Realizing the benefits of quantum…
]]>NVIDIA Holoscan for Media is a software-defined platform for building and deploying applications for live media. Recent updates introduce a user-friendly developer interface and new capabilities for application deployment to the platform. Holoscan for Media now includes Helm Dashboard, which delivers an intuitive user interface for orchestrating and managing Helm charts.
]]>Video quality metrics are used to evaluate the fidelity of video content. They provide a consistent quantitative measurement to assess the performance of the encoder. VMAF combines human vision modeling with machine learning techniques that are continuously evolving, enabling it to adapt to new content. VMAF excels in aligning with human visual perception by combining detailed analysis…
]]>GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game’s performance. Short of running the game’s logic on the GPU, I see the pinnacle of GPU-driven rendering as a scenario in which the CPU sends the GPU only the new frame’s camera information…
]]>When it comes to game application performance, GPU-driven rendering enables better scalability for handling large virtual scenes. Direct3D 12 (D3D12) introduces work graphs as a programming paradigm that enables the GPU to generate work for itself on the fly. For an introduction to work graphs, see Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12. This post features a Direct3D…
]]>Today, NVIDIA, and the Alliance for OpenUSD (AOUSD) announced the AOUSD Materials Working Group, an initiative for standardizing the interchange of materials in Universal Scene Description, known as OpenUSD. As an extensible framework and ecosystem for describing, composing, simulating, and collaborating within 3D worlds, OpenUSD enables developers to build interoperable 3D workflows…
]]>While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically usage from Python and Julia. We also demonstrate the performance of cuTENSOR based on benchmarks in a number of application domains. This post explores applications and performance benchmarks for cuTENSOR 2.0. For more information…
]]>NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array slices. The release of cuTENSOR 2.0 represents a major update—in both functionality and performance—over its predecessor. This version reimagines its APIs to be more expressive, including advanced just-in-time compilation capabilities all…
]]>Graph neural networks (GNNs) have revolutionized machine learning for graph-structured data. Unlike traditional neural networks, GNNs are good at capturing intricate relationships in graphs, powering applications from social networks to chemistry. They shine particularly in scenarios like node classification, where they predict labels for graph nodes, and link prediction, where they determine the…
]]>Graph analytics, or graph algorithms, are analytic tools used to determine the strength and direction of relationships between objects in a graph. The focus of graph analytics is on pairwise relationships between two objects at a time and the structural characteristics of the graph as a whole.
]]>In the dynamic realm of generative AI, diffusion models stand out as the most powerful architecture for generating high-quality images with text prompts. Models like Stable Diffusion have revolutionized creative applications. However, the inference process of diffusion models can be computationally intensive due to the iterative denoising steps required. This presents significant challenges…
]]>Learn how AI and NVIDIA Maxine are transforming the video streaming and conferencing industry.
]]>Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated art through denoising diffusion techniques. This can be applied to many enterprise use cases such as creating personalized content for marketing, generating imaginative backgrounds for objects in…
]]>We are so excited to be back in person at GTC this year at the San Jose Convention Center. With thousands of developers, industry leaders, researchers, and partners in attendance, attending GTC in person gives you the unique opportunity to network with legends in technology and AI, and experience NVIDIA CEO Jensen Huang’s keynote live on-stage at the SAP Center. Past GTC alumni? Get 40%
]]>Migrating between major versions of software can present several challenges to the infrastructure management teams: These challenges can prevent users from adopting the newer versions, so they miss out on newer, more powerful features. Effective planning and thorough testing are essential to overcoming these challenges and ensuring a smooth transition. Cumulus Linux 3.7.x and 4.x.
]]>Federated learning (FL) is experiencing accelerated adoption due to its decentralized, privacy-preserving nature. In sectors such as healthcare and financial services, FL, as a privacy-enhanced technology, has become a critical component of the technical stack. In this post, we discuss FL and its advantages, delving into why federated learning is gaining traction. We also introduce three key…
]]>From cities and airports to Olympic Stadiums, AI is transforming public spaces into safer, smarter, and more sustainable environments.
]]>The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new features and enhancements included in this release: CUDA and the CUDA Toolkit software provide the foundation for all NVIDIA GPU-accelerated computing applications in data science and analytics, machine learning…
]]>Quantitative finance libraries are software packages that consist of mathematical, statistical, and, more recently, machine learning models designed for use in quantitative investment contexts. They contain a wide range of functionalities, often proprietary, to support the valuation, risk management, construction, and optimization of investment portfolios. Financial firms that develop such…
]]>In 2022, the city of Lismore, Australia bore the brunt of devastating floods, leaving over 3K homes damaged and communities shattered. With $6B in losses, this was the second-costliest event in the world for insurers in 2022 and the most expensive disaster in Australian history. With each passing year, natural disaster events such as those experienced in Lismore grow in rate and scale across…
]]>For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to run simulation and optimization calculations in parallel. Multicore linear solvers used in process modeling and simulation have not achieved expected improvements, and in certain cases have underperformed optimized single-core solvers.
]]>This week’s model release features the NVIDIA-optimized language model Smaug 72B, which you can experience directly from your browser. NVIDIA AI Foundation Models and Endpoints are a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Try leading models such as Nemotron-3, Mixtral 8x7B, Gemma 7B…
]]>Hear from ExxonMobil, Honeywell, Siemens Energy, and more as they explore AI and HPC innovation in oil, gas, power, and utilities.
]]>Stream processing is the continuous processing of new data events as they’re received. A lot of data is produced as a stream of events, for example financial transactions, sensor measurements, or web server logs.
]]>Hear from Amdocs, Indosat, KT, NTT, ServiceNow, Singtel, SoftBank, and Verizon, plus a special address from NVIDIA at GTC. Explore AI transforming customer service, network operations, sovereign AI factories, and AI-RAN.
]]>Learn how synthetic data is supercharging 3D simulation and computer vision workflows, from visual inspection to autonomous machines.
]]>Gain a foundational understanding of USD, the open and extensible framework for creating, editing, querying, rendering, collaborating, and simulating within 3D worlds.
]]>In the ever-evolving landscape of large language models (LLMs), effective data management is a key challenge. Data is at the heart of model performance. While most advanced machine learning algorithms are data-centric, necessary data can’t always be centralized. This is due to various factors such as privacy, regulation, geopolitics, copyright issues, and the sheer effort required to move vast…
]]>Learn how to build a RAG-powered application with a human voice interface at NVIDIA GTC 2024 Speech and Generative AI Developer Day.
]]>Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep learning–based computational methods have been emerging and have shown promising results. Among these lines of work, AlphaFold2 is the first method that has achieved results comparable to slower physics-based computational methods.
]]>Join us on March 20 for Cybersecurity Developer Day at GTC to gain insights on leveraging generative AI for cyber defense.
]]>Coding is essential in the digital age, but it can also be tedious and time-consuming. That’s why many developers are looking for ways to automate and streamline their coding tasks with the help of large language models (LLMs). These models are trained on massive amounts of code from permissively licensed GitHub repositories and can generate, analyze, and document code with little human…
]]>Join experts from NVIDIA and the public sector industry to learn how cybersecurity, generative AI, digital twins, and more are impacting the way that government agencies operate.
]]>Retrieval-augmented generation (RAG) is exploding in popularity as a technique for boosting large language model (LLM) application performance. From highly accurate question-answering AI chatbots to code-generation copilots, organizations across industries are exploring how RAG can help optimize processes. According to State of AI in Financial Services: 2024 Trends, 55%
]]>This week’s model release features the NVIDIA-optimized language model Phi-2, which can be used for a wide range of natural language processing (NLP) tasks. You can experience Phi-2 directly from your browser. NVIDIA AI Foundation Models and Endpoints are a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications.
]]>The past few decades have witnessed a surge in rates of waste generation, closely linked to economic development and urbanization. This escalation in waste production poses substantial challenges for governments worldwide in terms of efficient processing and management. Despite the implementation of waste classification systems in developed countries, a significant portion of waste still ends up…
]]>Connect with industry leaders, learn from technical experts, and collaborate with peers at NVIDIA GTC 2024 Developer Days.
]]>For developers working on Microsoft DirectX ray-tracing applications, ray-tracing validation is here to help you improve performance, find hard-to-debug issues, and root cause crashes. Unlike existing debug solutions, ray-tracing validation performs checks at the driver level, which enables it to identify potential problems that cannot be caught by tools such as the D3D12 Debug Layer.
]]>Discover a wide variety of AI tools and resources designed to equip students with practical solutions for real-world problem-solving. Join experts from NVIDIA, Google, OpenAI, Stanford, UC Berkeley, and more throughout GTC week.
]]>Energy efficiency refers to a system or device’s ability to use as little energy as possible to perform a particular task or function within acceptable limits. Essentially, it means using energy in the most effective way possible and minimizing waste. There are many applications, such as energy-efficient windows or homes, but to understand energy efficiency from an NVIDIA perspective…
]]>The conversation about designing and evaluating Retrieval-Augmented Generation (RAG) systems is a long, multi-faceted discussion. Even when we look at retrieval on its own, developers selectively employ many techniques, such as query decomposition, re-writing, building soft filters, and more, to increase the accuracy of their RAG pipelines. While the techniques vary from system to system…
]]>Join experts from Stanford, Cornell, Meta, and more to learn about the latest in AI for academia and what’s next in cutting-edge research.
]]>NVIDIA Spectrum-X is swiftly gaining traction as the leading networking platform tailored for AI in hyperscale cloud infrastructures. Spectrum-X networking technologies help enterprise customers accelerate generative AI workloads. NVIDIA announced significant OEM adoption of the platform in a November 2023 press release, along with an update on the NVIDIA Israel-1 Supercomputer powered by Spectrum…
]]>Developers and enterprises can now deploy lifelike virtual and mixed reality experiences with Varjo’s latest XR-4 series headsets, which are integrated with NVIDIA technologies. These XR headsets match the resolution that the human eye can see, providing users with realistic visual fidelity and performance. The latest XR-4 series headsets support NVIDIA Omniverse and are powered by NVIDIA…
]]>Discover the transformative power of computer vision and video analytics at GTC. Dive into cutting-edge techniques such as vision transformers, AI agents, multi-modal foundation models, 3D technology, large language models (LLMs), vision language models (VLMs), generative AI, and more.
]]>Developers have long been building interfaces like web apps to enable users to leverage the core products being built. To learn how to work with data in your large language model (LLM) application, see my previous post, Build an LLM-Powered Data Agent for Data Analysis. In this post, I discuss a method to add free-form conversation as another interface with APIs. It works toward a solution that…
]]>HOMEE AI, an NVIDIA Inception member based in Taiwan, has developed an “AI-as-a-service” spatial planning solution to disrupt the $650B global home decor market. They’re helping furniture makers and home designers find new business opportunities in the era of industrial digitalization. Using NVIDIA Omniverse, the HOMEE AI engineering team developed an enterprise-ready service to deliver…
]]>Discover why OpenUSD is central to the future of 3D development with Aaron Luk, a founding developer of Universal Scene Description.
]]>Many PC games are designed around an eight-core console with an assumption that their software threading system ‘just works’ on all PCs, especially regarding the number of threads in the worker thread pool. This was a reasonable assumption not too long ago when most PCs had similar core counts to consoles: the CPUs were just faster and performance just scaled. In recent years though…
]]>On March 5, 8am PT, learn how NVIDIA Metropolis microservices for Jetson Orin helps you modernize your app stack, streamline development and deployment, and future-proof your apps with the ability to bring the latest generative AI capabilities to any customer through simple API calls.
]]>NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology used to create the Gemini models. An optimized release with TensorRT-LLM enables users to develop with LLMs using only a desktop with an NVIDIA RTX GPU. Created by Google DeepMind, Gemma 2B and Gemma 7B—the first models in the series…
]]>Join us at the Game Developers Conference March 18-22 to discover how the latest generative AI and NVIDIA RTX technologies are accelerating game development.
]]>This week’s model release features NVIDIA cuOpt, a world-record-breaking accelerated optimization engine that helps teams solve complex routing problems and deliver new capabilities. It enables organizations to reimagine logistics, operations research, transportation, and supply chain optimization. NVIDIA cuOpt facilitates many logistics optimization use cases, including: Ultimately…
]]>A virtual digital assistant is a program that understands natural language and can answer questions or complete tasks based on voice commands.
]]>Advances in AI are rapidly transforming every industry. Join us in person or virtually to learn about the latest technologies, from retrieval-augmented generation to OpenUSD.
]]>The quest for new, effective treatments for diseases that remain stubbornly resistant to current therapies is at the heart of drug discovery. This traditionally long and expensive process has been radically improved by AI techniques like deep learning, empowered by the rise of accelerated computing. Receptor.AI, a London-based drug discovery company and NVIDIA Inception member…
]]>Discover how generative AI is powering cybersecurity solutions with enhanced speed, accuracy, and scalability.
]]>The NVIDIA DOCA 2.6 release includes support for NVIDIA Spectrum-X reference architecture with the NVIDIA BlueField-3 SuperNIC and enhances DOCA host-based networking (HBN).
]]>On March 19, learn how to build generative AI-enabled 3D pipelines and tools using Universal Scene Description for industrial digitalization.
]]>Learn how inference for LLMs is driving breakthrough performance for AI-enabled applications and services.
]]>This week’s release features the NVIDIA-optimized Mamba-Chat model, which you can experience directly from your browser. This post is part of Model Mondays, a program focused on enabling easy access to state-of-the-art community and NVIDIA-built models. These models are optimized by NVIDIA using TensorRT-LLM and offered as .nemo files for easy customization and deployment.
]]>With the GTC session catalog now live, it’s time to start building your personalized agenda for the conference. For those of you who will be joining us in San Jose, this post covers the technical training opportunities that you won’t want to miss. If you can’t attend GTC in person, please take advantage of the 15 virtual workshops scheduled in EMEA, India, and China time zones.
]]>Cluster analysis is the grouping of objects such that objects in the same cluster are more similar to each other than they are to objects in another cluster.
]]>Speakers from NVIDIA, Meta, Microsoft, OpenAI, and ServiceNow will be talking about the latest tools, optimizations, trends and best practices for large language models (LLMs).
]]>CUDA Quantum is an open-source programming model for building quantum-classical applications. Useful quantum computing workloads will run on heterogeneous computing architectures such as quantum processing units (QPUs), GPUs, and CPUs in tandem to solve real-world problems. CUDA Quantum enables the acceleration of such applications by providing the tools to program these computing architectures…
]]>Visual generative AI is the process of creating images from text prompts. The technology is based on vision-language foundation models that are pretrained on web-scale data. These foundation models are used in many applications by providing a multimodal representation. Examples include image captioning and video retrieval, creative 3D and 2D image synthesis, and robotic manipulation.
]]>Join us in-person or virtually and learn about the power of RAG with insights and best practices from experts at NVIDIA, visionary CEOs, data scientists, and others.
]]>This week’s Model Monday release features the NVIDIA-optimized code Llama, Kosmos-2, and SeamlessM4T, which you can experience directly from your browser. With NVIDIA AI Foundation Models and Endpoints, you can access a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. Meta’s Code Llama 70B is the latest…
]]>Large language models (LLMs) have revolutionized the field of AI, creating entirely new ways of interacting with the digital world. While they provide a good generalized solution, they often must be tuned to support specific domains and tasks. AI coding assistants, or code LLMs, have emerged as one domain to help accomplish this. By 2025, 80% of the product development lifecycle will make…
]]>This NVIDIA HPC SDK update includes the cuBLASMp preview library, along with minor bug fixes and enhancements.
]]>NVIDIA Modulus 24.01 updates distributed utilities and samples for physics informing DeepONet and GNNs.
]]>Synthetic data generation is a data augmentation technique necessary for increasing the robustness of models by supplying training data. Explore the use of Transformers for synthetic tabular data generation in the new self-paced course.
]]>NVIDIA AI Workbench is now in beta, bringing a wealth of new features to streamline how enterprise developers create, use, and share AI and machine learning (ML) projects. Announced at SIGGRAPH 2023, NVIDIA AI Workbench enables developers to create, collaborate, and migrate AI workloads on their GPU-enabled environment of choice. To learn more, see Develop and Deploy Scalable Generative AI Models…
]]>Accelerated networking combines CPUs, GPUs, DPUs (data processing units), or SuperNICs into an accelerated computing fabric specifically designed to optimize networking workloads. It uses specialized hardware to offload demanding tasks to enhance server capabilities. As AI and other new workloads continue to grow in complexity and scale, the need for accelerated networking becomes paramount.
]]>The past decade has seen a remarkable surge in the adoption of deep learning techniques for computer vision (CV) tasks. Convolutional neural networks (CNNs) have been the cornerstone of this revolution, exhibiting exceptional performance and enabling significant advancements in visual perception. By employing localized filters and hierarchical architectures, CNNs have proven adept at…
]]>Learn the basics of retrieval-augmented generation (RAG), an end-to-end architecture used to optimize the output of an LLM.
]]>Building vision AI applications for the edge often comes with notoriously long and costly development cycles. At the same time, quickly developing edge AI applications that are cloud-native, flexible, and secure has never been more important. Now, a powerful yet simple API-driven edge AI development workflow is available with the new NVIDIA Metropolis microservices.
]]>While harnessing the potential of AI is a priority for many of today’s enterprises, developing and deploying an AI model involves time and effort. Often, challenges must be overcome to move a model into production, especially for mission-critical business operations. According to IDC research, only 18% of enterprises surveyed could put an AI model into production in under a month.
]]>Following the introduction of ChatGPT, enterprises around the globe are realizing the benefits and capabilities of AI, and are racing to adopt it into their workflows. As this adoption accelerates, it becomes imperative for enterprises not only to keep pace with the rapid advancements in AI, but also address related challenges such as optimization, scalability, and security.
]]>As a comprehensive software framework for data center infrastructure developers, NVIDIA DOCA has been adopted by leading AI, cloud, enterprise, and ISV innovators. The release of DOCA 2.5 marks its third anniversary. And, due to the stability and robustness of the code base combined with several networking and platform upgrades, DOCA 2.5 is the first NVIDIA BlueField-3 long-term support (LTS)…
]]>Learn how generative AI can help defend against spear phishing in this January 30 webinar.
]]>As industrial automation increases, safety becomes a greater challenge and top priority for enterprises. Safety encompasses multiple aspects: The same technological solution that’s driving automation can be used to also address safety: artificial intelligence. AI-powered stationary outside-in safety platforms, which monitor activity across many distributed machines or robots…
]]>A common technological misconception is that performance and complexity are directly linked. That is, the highest-performance implementation is also the most challenging to implement and manage. When considering data center networking, however, this is not the case. InfiniBand is a protocol that sounds daunting and exotic in comparison to Ethernet, but because it is built from the ground up…
]]>NVIDIA Metropolis Microservices for Jetson provides a suite of easy-to-deploy services that enable you to quickly build production-quality vision AI applications while using the latest AI approaches. This post explains how to develop and deploy generative AI–powered applications with Metropolis Microservices on the NVIDIA Jetson edge AI platform by walking through a reference example that can…
]]>NVIDIA Metropolis microservices provide powerful, customizable, cloud-native APIs and microservices to develop vision AI applications and solutions. The framework now includes NVIDIA Jetson, enabling developers to quickly build and productize performant and mature vision AI applications at the edge. APIs enhance flexibility, interoperability, and efficiency in software development by enabling…
]]>NVIDIA AI Foundation Models and Endpoints provides access to a curated set of community and NVIDIA-built generative AI models to experience, customize, and deploy in enterprise applications. On Mondays throughout the year, we’ll be releasing new models. This week, we released the NVIDIA-optimized DePlot model, which you can experience directly from your browser. If you haven’t already…
]]>Robots are typically equipped with cameras. When designing a digital twin simulation, it’s important to replicate its performance in a simulated environment accurately. However, to make sure the simulation runs smoothly, it’s crucial to check the performance of the workstation that is running the simulation. In this blog post, we explore the steps to setting up and running a camera benchmark…
]]>