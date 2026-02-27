Technical Blog
Related Resources
Agentic AI / Generative AI

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

Feb 27, 2026
By Anu Srivastava
Discuss (0)

Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native vision-language model (VLM) with reasoning built with a hybrid architecture of mixture of experts (MoE) and Gated Delta Networks. Qwen3.5 can understand and navigate user interfaces, which improves on the previous generation of VLMs. 

Qwen3.5 is ideal for a variety of use cases, including:

  • Coding, including web development
  • Visual reasoning, including mobile and web interfaces
  • Chat applications
  • Complex search
Qwen3.5
ModalitiesVision, language
Total parameters397B
Active parameters17B
Activation rate4.28%
Input context length256K extensible to 1M tokens
Languages supported200+
Additional configuration information
Experts512
Shared experts1
Experts per token11 (10 routed + 1 shared)
Layers60
Words (vocabulary)248,320
Table 1. Specifications and configuration details for the Qwen3.5 model

Build with NVIDIA endpoints

You can start building with Qwen3.5 today with free access to GPU-accelerated endpoints on build.nvidia.com, powered by NVIDIA Blackwell GPUs. As part of the NVIDIA Developer Program, you can explore quickly in the browser, experiment with prompts, and even test the model with your own data to evaluate real-world performance.

Video 1. Learn how to you can test Qwen3.5 on NVIDIA GPU-accelerated endpoints

You can also use the NVIDIA-hosted model through the API, free with registration in the NVIDIA Developer Program.  

import requests 
  
invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions" 
  
headers = { 
	"Authorization": "Bearer $NVIDIA_API_KEY", 
	"Accept": "application/json", 
} 
  
payload = { 
  "messages": [ 
	{ 
  	"role": "user", 
  	"content": "" 
	} 
  ], 
  "model": "qwen/qwen3.5-397b-a17b", 
  "chat_template_kwargs": { 
	"thinking": True 
  }, 
  "frequency_penalty": 0, 
  "max_tokens": 16384, 
  "presence_penalty": 0, 
  "stream": True, 
  "temperature": 1, 
  "top_p": 1 
} 
  
# re-use connections 
session = requests.Session() 
  
response = session.post(invoke_url, headers=headers, json=payload) 
  
response.raise_for_status() 
response_body = response.json() 
print(response_body)

To take advantage of tool calling, simply define an array of OpenAI compatible tools to add to the chat completions tools parameter.

NVIDIA NIM makes it easy to take Qwen3.5 from development into production. Available as optimized, containerized inference microservices, NIM packages the model with the performance tuning, standardized APIs, and deployment flexibility enterprises need. Download and run it anywhere; on-premises, in the cloud, or across hybrid environments.

Customize with NVIDIA NeMo  

While Qwen3.5 offers impressive “out-of-the-box” multimodal capabilities, the NVIDIA NeMo framework provides the essential tools to adapt it for specialized domain needs. Using the NeMo Automodel library, developers can fine-tune the Qwen3.5 397B-parameter architecture with high-throughput efficiency.

NeMo Automodel is a PyTorch-native training library that offers Day 0 Hugging Face support, enabling direct training on existing checkpoints without tedious model conversions. This facilitates rapid experimentation, whether performing full supervised fine-tuning (SFT) or using memory-efficient methods such as LoRA.

As a reference implementation guide, developers can leverage the technical tutorial on Medical Visual QA, which details how to fine-tune Qwen3.5 on radiological datasets. For massive scale, NeMo supports multinode Slurm and Kubernetes deployments, ensuring that even the largest MoE models are optimized for domain-specific reasoning and complex agentic workflows with minimal latency.

Get started with Qwen3.5 

From data center deployments on NVIDIA Blackwell to NVIDIA NIM microservice for containerized deployment anywhere, NVIDIA offers solutions for your integration of Qwen3.5. To get started, check out the Qwen3.5 model page on Hugging Face and test Qwen3.5 on build.nvidia.com.

Discuss (0)

Tags

Agentic AI / Generative AI | Developer Tools & Techniques | General | NeMo | NIM | Beginner Technical | Intermediate Technical | Tutorial | AI Agent | Mixture of Experts (MoE) | Open Source | VLMs

About the Authors

Avatar photo
About Anu Srivastava
Anu Srivastava is a senior technical marketing manager who focuses on NVIDIA’s lighthouse AI model collaborations. She works with key partners and foundations to enable NVIDIA accelerated platform support for the open source developer ecosystem. Prior to NVIDIA, she worked at Google for over a decade in various engineering and management roles and holds a degree in computer science from the University of Texas at Austin.

Comments