Today’s demanding AI developer workloads often need more memory than desktop systems provide or require access to software that laptops or PCs lack. This forces work to be moved to the cloud or data center.
NVIDIA DGX Spark provides an alternative to cloud instances and data-center queues. The Blackwell-powered, compact supercomputer contains 1 petaflop of FP4 AI computer performance, 128 GB of coherent unified system memory, memory bandwidth of 273 GB/second, and the NVIDIA AI software stack preinstalled. With DGX Spark, you can work with large, compute intensive tasks locally, without moving to the cloud or data center.
We’ll walk you through how DGX Spark’s compute performance, large memory, and preinstalled AI software accelerate fine-tuning, image generation, data science, and inference workloads. Keep reading for some benchmarks.
Fine-tuning workloads on DGX Spark
Tuning pre-trained models is a common task for AI developers. To show how DGX Spark performs at this workload, we ran three tuning tasks using different methodologies: full fine-tuning, LoRA, and QLoRA.
In full fine-tuning of a Llama 3.2B model, we reached a peak of 82,739.2 tokens per second. Tuning a Llama 3.1 8B model using LoRA on DGX Spark reached a peak of 53,657.6 tokens per second. Tuning a Llama 3.3 70B model using QLoRA on DGX Spark reached a peak of 5,079.4 tokens per second.
Since fine-tuning is so memory intensive, none of these tuning workloads can run on a 32 GB consumer GPU.
| Fine-tuning | ||||
| Model | Method | Backend | Configuration | Peak tokens/sec |
| Llama 3.2 3B | Full fine tuning | PyTorch | Sequence length: 2048 Batch size: 8 Epoch: 1 Steps: 125BF16 | 82,739.20 |
| Llama 3.1 8B | LoRA | PyTorch | Sequence length: 2048 Batch size: 4 Epoch: 1 Steps: 125BF16 | 53,657.60 |
| Llama 3.3 70B | QLoRA | PyTorch | Sequence length: 2048 Batch size: 8 Epoch: 1 Steps: 125FP4 | 5,079.04 |
DGX Spark’s image-generation capabilities
Image generation models are always pushing for greater accuracy, higher resolutions, and faster performance. Creating high-resolution images or multiple images per prompt drives the need for more memory, as well as the compute required to generate the images.
DGX Spark’s large GPU memory and strong compute performance lets you work with larger-resolution images and higher-precision models to provide higher image quality. Support for the FP4 data format enables DGX Spark to generate images quickly, even at high resolutions.
Using the Flux.1 12B model at FP4 precision, DGX Spark can generate a 1K image every 2.6 seconds (see Table 2 below). DGX Spark’s large system memory provides the capacity necessary to run a BF16 SDXL 1.0 model and generate seven 1K images per minute.
| Image generation | ||||
| Model | Precision | Backend | Configuration | Images/min |
| Flux.1 12B Schnell | FP4 | TensorRT | Resolution: 1024×1024 Denoising steps: 4 Batch size: 1 | 23 |
| SDXL1.0 | BF16 | TensorRT | Resolution: 1024×1024 Denoising steps: 50 Batch size: 2 | 7 |
Using DGX Spark for data science
DGX Spark supports foundational CUDA-X libraries like NVIDIA cuML and cuDF. NVIDIA cuML accelerates machine-learning algorithms in scikit-learn, as well as UMAP and HDBSCAN on GPUs with zero code changes required.
For computationally intensive ML algorithms like UMAP and HDBSCAN, DGX Spark can process 250 MB datasets in seconds. (See Table 3 below.) NVIDIA cuDF significantly speeds up common pandas data analysis tasks like joins and string methods. cuDF pandas operations on datasets with tens of millions of records run in just seconds on DGX Spark.
| Data science | |||
| Library | Benchmark | Dataset size | Time |
| NVIDIA cuML | UMAP | 250 MB | 4 secs |
| NVIDIA cuML | HDBSCAN | 250 MB | 10 secs |
| NVIDIA cuDF pandas | Key data analysis operations (joins, string methods, UDFs) | 0.5 to 5 GB | 11 secs |
Using DGX Spark for inference
DGX Spark’s Blackwell GPU supports the FP4 data format, specifically the NVFP4 data format that provides near-FP8 accuracy (<1% degradation). This enables use of smaller models without sacrificing accuracy. The smaller data footprint of FP4 also improves performance. Table 4 below provides inference performance data for DGX Spark.
DGX Spark supports a range of 4-bit data formats: NVFP4, MXFP4, as well as many backends such as TRT-LLM, llama.cpp, and vLLM. The system’s 1 petaflop of AI performance enables it to deliver fast prompt processing, as shown in Table 4. The quick prompt processing results in a faster time-to-first response token, which delivers a better experience for users and speeds up end-to-end throughput.
| Inference (ISL|OSL= 2048|128, BS=1) | |||||
| Model | Precision | Backend | Prompt processing throughput (tokens/sec) | Token generation throughput (tokens/sec) | |
| Qwen3 14B | NVFP4 | TRT-LLM | 5928.95 | 22.71 | |
| GPT-OSS-20B | MXFP4 | llama.cpp | 3670.42 | 82.74 | |
| GPT-OSS-120B | MXFP4 | llama.cpp | 1725.47 | 55.37 | |
| Llama 3.1 8B | NVFP4 | TRT-LLM | 10256.9 | 38.65 | |
| Qwen2.5-VL-7B-Instruct | NVFP4 | TRT-LLM | 65831.77 | 41.71 | |
| Qwen3 235B (on dual DGX Spark) | NVFP4 | TRT-LLM | 23477.03 | 11.73 | |
NVFP4: 4-bit floating point format was introduced with the NVIDIA Blackwell GPU architecture. MXFP4: Microscaling FP4 is a 4-bit floating point format created by the Open Compute Project (OCP). ISL (Input Sequence Length): Number of tokens in the input prompt (a.k.a. prefill tokens). And OSL (Output Sequence Length): Number of tokens generated by the model in response (a.k.a. decode tokens).
We also connected two DGX Sparks together via their ConnectX-7 chips to run the Qwen3 235B model. The model uses over 120 GB of memory, including overhead. Such models typically run on large cloud or data-center servers, but the fact that they can run on dual DGX Spark systems shows what’s possible for developer experimentation. As shown in the last row of Table 4, the token generation throughput on dual DGX Sparks was 11.73 tokens per second.
The new NVFP4 version of the NVIDIA Nemotron Nano 2 model also performs well on DGX Spark. With the NVFP4 version, you can now achieve up to 2x higher throughput with little to no accuracy degradation. Download the model checkpoints from Hugging Face or as an NVIDIA NIM.
And get your DGX Spark, join the DGX Spark developer community, and start your AI-building journey today.