AI Pipeline
NVIDIA® Riva is an application framework for multimodal conversational AI services that deliver real-time performance on GPUs.
Click here to view other performance data.
Riva Benchmarks
H100 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
conformer | n-gram | 1 | 14.6 | 1 | H100 SXM5-80GB |
conformer | n-gram | 64 | 71 | 64 | H100 SXM5-80GB |
conformer | n-gram | 128 | 110 | 126 | H100 SXM5-80GB |
conformer | n-gram | 256 | 184 | 249 | H100 SXM5-80GB |
H100 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 9.4 | 1 | H100 SXM5-80GB |
citrinet | n-gram | 8 | 13 | 8 | H100 SXM5-80GB |
citrinet | n-gram | 16 | 17.6 | 16 | H100 SXM5-80GB |
citrinet | n-gram | 32 | 25 | 32 | H100 SXM5-80GB |
citrinet | n-gram | 48 | 32 | 48 | H100 SXM5-80GB |
citrinet | n-gram | 64 | 40 | 64 | H100 SXM5-80GB |
conformer | n-gram | 1 | 13 | 1 | H100 SXM5-80GB |
conformer | n-gram | 8 | 23 | 8 | H100 SXM5-80GB |
conformer | n-gram | 16 | 26 | 16 | H100 SXM5-80GB |
conformer | n-gram | 32 | 42 | 32 | H100 SXM5-80GB |
conformer | n-gram | 48 | 54 | 48 | H100 SXM5-80GB |
H100 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
conformer | n-gram | 32 | 1900 | H100 SXM5-80GB |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
L40 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
conformer | n-gram | 1 | 12.7 | 1 | NVIDIA L40 |
conformer | n-gram | 64 | 75 | 64 | NVIDIA L40 |
conformer | n-gram | 128 | 113 | 126 | NVIDIA L40 |
conformer | n-gram | 256 | 180 | 250 | NVIDIA L40 |
L40 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 8.7 | 1 | NVIDIA L40 |
citrinet | n-gram | 8 | 15 | 8 | NVIDIA L40 |
citrinet | n-gram | 16 | 20 | 16 | NVIDIA L40 |
citrinet | n-gram | 32 | 32 | 32 | NVIDIA L40 |
citrinet | n-gram | 48 | 40 | 48 | NVIDIA L40 |
citrinet | n-gram | 64 | 53 | 64 | NVIDIA L40 |
conformer | n-gram | 1 | 11 | 1 | NVIDIA L40 |
conformer | n-gram | 8 | 21 | 8 | NVIDIA L40 |
conformer | n-gram | 16 | 28 | 16 | NVIDIA L40 |
conformer | n-gram | 32 | 40 | 32 | NVIDIA L40 |
conformer | n-gram | 48 | 53 | 48 | NVIDIA L40 |
L40 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
conformer | n-gram | 32 | 2100 | NVIDIA L40 |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
L4 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
conformer | n-gram | 1 | 15 | 1 | NVIDIA L4 |
conformer | n-gram | 64 | 140 | 63 | NVIDIA L4 |
conformer | n-gram | 128 | 228 | 125 | NVIDIA L4 |
conformer | n-gram | 256 | 455 | 244 | NVIDIA L4 |
L4 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 11.25 | 1 | NVIDIA L4 |
citrinet | n-gram | 8 | 16 | 8 | NVIDIA L4 |
citrinet | n-gram | 16 | 28 | 16 | NVIDIA L4 |
citrinet | n-gram | 32 | 46 | 32 | NVIDIA L4 |
citrinet | n-gram | 48 | 60 | 48 | NVIDIA L4 |
citrinet | n-gram | 64 | 84 | 63 | NVIDIA L4 |
conformer | n-gram | 1 | 13.5 | 1 | NVIDIA L4 |
conformer | n-gram | 8 | 24 | 8 | NVIDIA L4 |
conformer | n-gram | 16 | 33 | 16 | NVIDIA L4 |
conformer | n-gram | 32 | 56 | 32 | NVIDIA L4 |
conformer | n-gram | 48 | 90 | 47 | NVIDIA L4 |
L4 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
conformer | n-gram | 32 | 920 | NVIDIA L4 |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
A100 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 11 | 1 | A100 SXM4-40GB |
citrinet | n-gram | 64 | 64 | 63 | A100 SXM4-40GB |
citrinet | n-gram | 128 | 108 | 126 | A100 SXM4-40GB |
citrinet | n-gram | 256 | 172 | 248 | A100 SXM4-40GB |
citrinet | n-gram | 384 | 240 | 367 | A100 SXM4-40GB |
citrinet | n-gram | 512 | 320 | 482 | A100 SXM4-40GB |
citrinet | n-gram | 768 | 484 | 703 | A100 SXM4-40GB |
conformer | n-gram | 1 | 16 | 1 | A100 SXM4-40GB |
conformer | n-gram | 64 | 97 | 63 | A100 SXM4-40GB |
conformer | n-gram | 128 | 140 | 126 | A100 SXM4-40GB |
conformer | n-gram | 256 | 230 | 247 | A100 SXM4-40GB |
conformer | n-gram | 384 | 330 | 365 | A100 SXM4-40GB |
conformer | n-gram | 512 | 470 | 478 | A100 SXM4-40GB |
A100 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 9.91 | 1 | A100 SXM4-40GB |
citrinet | n-gram | 8 | 14.48 | 8 | A100 SXM4-40GB |
citrinet | n-gram | 16 | 23 | 16 | A100 SXM4-40GB |
citrinet | n-gram | 32 | 35 | 32 | A100 SXM4-40GB |
citrinet | n-gram | 48 | 46 | 48 | A100 SXM4-40GB |
citrinet | n-gram | 64 | 55 | 63 | A100 SXM4-40GB |
conformer | n-gram | 1 | 13.92 | 1 | A100 SXM4-40GB |
conformer | n-gram | 8 | 26.19 | 8 | A100 SXM4-40GB |
conformer | n-gram | 16 | 37 | 16 | A100 SXM4-40GB |
conformer | n-gram | 32 | 52 | 32 | A100 SXM4-40GB |
conformer | n-gram | 48 | 62 | 48 | A100 SXM4-40GB |
conformer | n-gram | 64 | 76 | 63 | A100 SXM4-40GB |
A100 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
citrinet | n-gram | 32 | 4000 | A100 SXM4-40GB |
conformer | n-gram | 32 | 1500 | A100 SXM4-40GB |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
A40 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
conformer | n-gram | 1 | 15 | 1 | A40 |
conformer | n-gram | 64 | 130 | 63 | A40 |
conformer | n-gram | 128 | 203 | 126 | A40 |
conformer | n-gram | 256 | 354 | 247 | A40 |
A40 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 10 | 1 | A40 |
citrinet | n-gram | 8 | 17.44 | 8 | A40 |
citrinet | n-gram | 16 | 28 | 16 | A40 |
citrinet | n-gram | 32 | 42 | 32 | A40 |
citrinet | n-gram | 48 | 53 | 48 | A40 |
citrinet | n-gram | 64 | 69 | 63 | A40 |
conformer | n-gram | 1 | 12.8 | 1 | A40 |
conformer | n-gram | 8 | 26 | 8 | A40 |
conformer | n-gram | 16 | 37 | 16 | A40 |
conformer | n-gram | 32 | 60 | 32 | A40 |
conformer | n-gram | 48 | 92.5 | 48 | A40 |
A40 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
conformer | n-gram | 32 | 1200 | A40 |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
A30 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 15 | 1 | A30 |
citrinet | n-gram | 64 | 106 | 63 | A30 |
citrinet | n-gram | 128 | 150 | 125 | A30 |
citrinet | n-gram | 256 | 274 | 245 | A30 |
citrinet | n-gram | 384 | 420 | 359 | A30 |
citrinet | n-gram | 512 | 620 | 467 | A30 |
conformer | n-gram | 1 | 21 | 1 | A30 |
conformer | n-gram | 64 | 140 | 63 | A30 |
conformer | n-gram | 128 | 210 | 125 | A30 |
conformer | n-gram | 256 | 374 | 243 | A30 |
A30 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 13.84 | 1 | A30 |
citrinet | n-gram | 8 | 23 | 8 | A30 |
citrinet | n-gram | 16 | 40.5 | 16 | A30 |
citrinet | n-gram | 32 | 60 | 32 | A30 |
citrinet | n-gram | 48 | 64 | 48 | A30 |
citrinet | n-gram | 64 | 80 | 63 | A30 |
conformer | n-gram | 1 | 18.934 | 1 | A30 |
conformer | n-gram | 8 | 40 | 8 | A30 |
conformer | n-gram | 16 | 53 | 16 | A30 |
conformer | n-gram | 32 | 66 | 32 | A30 |
conformer | n-gram | 48 | 91 | 47 | A30 |
A30 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
citrinet | n-gram | 32 | 2500 | A30 |
conformer | n-gram | 32 | 1020 | A30 |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
A10 ASR Benchmarks - Best Streaming Throughput Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 12.18 | 1 | A10 |
citrinet | n-gram | 64 | 83 | 63 | A10 |
citrinet | n-gram | 128 | 150 | 126 | A10 |
citrinet | n-gram | 256 | 292 | 247 | A10 |
citrinet | n-gram | 384 | 433 | 363 | A10 |
citrinet | n-gram | 512 | 600 | 476 | A10 |
conformer | n-gram | 1 | 15.66 | 1 | A10 |
conformer | n-gram | 64 | 140 | 63 | A10 |
conformer | n-gram | 128 | 240 | 125 | A10 |
conformer | n-gram | 256 | 440 | 245 | A10 |
A10 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model | Language Model | # of Streams | Avg Latency (ms) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
citrinet | n-gram | 1 | 11.53 | 1 | A10 |
citrinet | n-gram | 8 | 18 | 8 | A10 |
citrinet | n-gram | 16 | 30 | 16 | A10 |
citrinet | n-gram | 32 | 50 | 32 | A10 |
citrinet | n-gram | 48 | 68 | 48 | A10 |
citrinet | n-gram | 64 | 80 | 63 | A10 |
conformer | n-gram | 1 | 13.988 | 1 | A10 |
conformer | n-gram | 8 | 27 | 8 | A10 |
conformer | n-gram | 16 | 39 | 16 | A10 |
conformer | n-gram | 32 | 70 | 32 | A10 |
conformer | n-gram | 48 | 102 | 47.445 | A10 |
A10 ASR Benchmarks - Offline Mode
Acoustic Model | Language Model | # of Streams | Throughput (RTFX) | GPU Version |
---|---|---|---|---|
citrinet | n-gram | 32 | 2400 | A10 |
conformer | n-gram | 32 | 920 | A10 |
ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk
Riva Benchmarks
H100 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 20 | 2.67 | 150 | H100 SXM5-80GB |
FastPitch+Hifi-GAN | 4 | 30 | 3.92 | 420 | H100 SXM5-80GB |
FastPitch+Hifi-GAN | 6 | 50 | 4.6 | 450 | H100 SXM5-80GB |
FastPitch+Hifi-GAN | 8 | 60 | 5.3 | 510 | H100 SXM5-80GB |
FastPitch+Hifi-GAN | 10 | 68 | 5.6 | 530 | H100 SXM5-80GB |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
L40 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 20 | 2.54 | 160 | NVIDIA L40 |
FastPitch+Hifi-GAN | 4 | 40 | 4 | 350 | NVIDIA L40 |
FastPitch+Hifi-GAN | 6 | 60 | 5 | 400 | NVIDIA L40 |
FastPitch+Hifi-GAN | 8 | 80 | 5.5 | 400 | NVIDIA L40 |
FastPitch+Hifi-GAN | 10 | 90 | 6 | 430 | NVIDIA L40 |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
L4 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 24 | 3.57 | 130 | NVIDIA L4 |
FastPitch+Hifi-GAN | 4 | 50 | 7 | 250 | NVIDIA L4 |
FastPitch+Hifi-GAN | 6 | 80 | 9.4 | 255 | NVIDIA L4 |
FastPitch+Hifi-GAN | 8 | 115 | 11 | 260 | NVIDIA L4 |
FastPitch+Hifi-GAN | 10 | 133 | 12.4 | 262 | NVIDIA L4 |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
A100 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 20 | 3.14 | 140 | A100 SXM4-40GB |
FastPitch+Hifi-GAN | 4 | 43 | 5 | 320 | A100 SXM4-40GB |
FastPitch+Hifi-GAN | 6 | 60 | 6 | 360 | A100 SXM4-40GB |
FastPitch+Hifi-GAN | 8 | 73 | 7.4 | 400 | A100 SXM4-40GB |
FastPitch+Hifi-GAN | 10 | 80 | 8 | 420 | A100 SXM4-40GB |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
A30 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 24 | 4 | 120 | A30 |
FastPitch+Hifi-GAN | 4 | 50 | 7 | 250 | A30 |
FastPitch+Hifi-GAN | 6 | 84 | 7.6 | 270 | A30 |
FastPitch+Hifi-GAN | 8 | 103 | 8.5 | 300 | A30 |
FastPitch+Hifi-GAN | 10 | 120 | 9.03 | 310 | A30 |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
A10 TTS Benchmarks
Model | # of streams | Avg Latency to first audio (sec) | Avg Latency between audio chunks (sec) | Throughput (RTFX) | GPU Version |
---|---|---|---|---|---|
FastPitch+Hifi-GAN | 1 | 20 | 3.9 | 120 | A10 |
FastPitch+Hifi-GAN | 4 | 57 | 7.5 | 230 | A10 |
FastPitch+Hifi-GAN | 6 | 94 | 8.6 | 240 | A10 |
FastPitch+Hifi-GAN | 8 | 118 | 10 | 260 | A10 |
FastPitch+Hifi-GAN | 10 | 140 | 11 | 264 | A10 |
TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz
View More Performance Data
Training to Convergence
Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.
Learn MoreAI Inference
Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.
Learn More