Large language models (LLMs) have revolutionized natural language processing (NLP) in recent years, enabling a wide range of applications such as text summarization, question answering, and natural language generation.
Arctic, developed by Snowflake, is a new open LLM designed to achieve high inference performance while maintaining low cost on various NLP tasks.
Arctic
Arctic is based on a new Dense-MoE (Mixture of Experts) Hybrid transformer architecture – combining a 10B parameter dense transformer model with a residual 128×3.66B MoE Multi-Layer Perceptron (MLP). This allows the architecture to effectively hide the additional all-to-all communication overhead imposed by vanilla MoE models with computation – allowing for more efficient use of resources during training and inference.
The resultant network has 480B total parameters – and uses top-2 gating for the experts to choose 17B active parameters. By leveraging a large number of experts and total parameters the architecture allows for top-tier intelligence while choosing from many, but condensed, experts to engage only a moderate number of active parameters for training and cost-effective inference.
The LLM for Enterprise Use Cases
The main use cases of Arctic are SQL, coding, and instruction following for enterprise applications.
SQL and code generation is challenging, as it requires understanding the semantics and syntax of both natural language and programming languages, and generating valid and accurate outputs that match the user’s intent.
The model beats other state-of-the-art open models, achieving 79% accuracy in Spider benchmark – a large-scale, complex, and cross-domain semantic parsing and text-to-SQL dataset designed to evaluate the ability of models to translate natural language questions into SQL queries.
Additionally, it leads the pack in HumanEval+ and MBPP+ benchmarks for code generation and delivers superior performance to other models in IFEval benchmark – performance for assessing the instruction-following capabilities of LLMs.
NVIDIA NIM microservices
Arctic is optimized for latency and throughput and now joins more than two dozen popular AI models that are supported by NVIDIA NIM – a microservice designed to simplify the deployment of performance-optimized NVIDIA AI Foundation models and custom models, enabling more enterprise application developers to contribute to AI transformations.
Experience the model
NVIDIA API catalog is a collection of performance-optimized API endpoints packaged as enterprise-grade runtime that you can experience from a browser.
With free NVIDIA cloud credits, you can start testing the model at scale. You can also build a proof of concept (POC) by connecting your application on the NVIDIA-hosted API endpoint running on a fully accelerated stack. The APIs are integrated with frameworks like Langchain and LlamaIndex, simplifying enterprise application development.
And when you’re ready to take it to production, deploy the model in minutes with the NIM microservice on-premises, in the cloud, or on your workstation. The flexibility to run anywhere keeps your data secure and private, avoids platform lock-in, and enables you to take advantage of your existing infrastructure investments and cloud commitments.
To get started, visit build.nvidia.com.