GPU GUIDE · NVIDIA

Best AI models for the
NVIDIA L40

The NVIDIA L40 has 48.0GB of VRAM. Below are the top 30 open-source AI models that fit, ranked by composite benchmark score. Each row shows the best quantization that fits your hardware.

VRAM

48.0GB

Brand

nvidia

Models that fit

30

Generation

Data Center

Open full results filter →

Top 30 models for the NVIDIA L40

DeepSeek V4 Flash

Best fit: Q8_0 · 40.3GBCan I run it? →

Qwen 3.6 27B

Best fit: Q8_0 · 29.7GBCan I run it? →

Qwen3.5-27B

Best fit: Q8_0 · 30.5GBCan I run it? →

Qwen 3.6 35B A3B

Best fit: Q8_0 · 38.2GBCan I run it? →

Gemma 4 31B

Best fit: Q8_0 · 33.9GBCan I run it? →

Qwen 3.5 35B A3B

Best fit: Q8_0 · 38.2GBCan I run it? →

Gemma 4 26B A4B

Best fit: Q8_0 · 29.2GBCan I run it? →

Devstral 2

Best fit: Q8_0 · 26.5GBCan I run it? →

gemma 4 12B it

Best fit: fp16 · 25.0GBCan I run it? →

Qwen 3.5 9B

Best fit: fp16 · 19.0GBCan I run it? →

Qwen 3.5 4B

Best fit: fp16 · 9.0GBCan I run it? →

Devstral Small 2

Best fit: fp16 · 15.0GBCan I run it? →

Seed OSS 36B Instruct

Best fit: Q8_0 · 39.5GBCan I run it? →

Devstral Small 2 24B Instruct 2512

Best fit: Q8_0 · 26.5GBCan I run it? →

Qwen3 Next 80B A3B Thinking

Best fit: Q4_K_S · 47.5GBCan I run it? →

Ministral 3 14B

Best fit: fp16 · 29.0GBCan I run it? →

Ministral 3 8B

Best fit: fp16 · 17.0GBCan I run it? →

Gemma 4 E2B

Best fit: f32 · 9.0GBCan I run it? →

Nemotron 3 Nano Omni (free)

Best fit: Q8_0 · 36.1GBCan I run it? →

GPT-OSS 20B

Best fit: fp16 · 44.0GBCan I run it? →

Qwen3 30B A3B Thinking 2507

Best fit: Q8_0 · 33.4GBCan I run it? →

Nemotron 3 Nano 30B A3B

Best fit: Q8_0 · 34.6GBCan I run it? →

Qwen3 Coder 30B A3B Instruct

Best fit: Q8_0 · 33.4GBCan I run it? →

diffusiongemma 26B A4B it

Best fit: Q8_0 · 28.4GBCan I run it? →

QwQ 32B

Best fit: Q8_0 · 35.9GBCan I run it? →

Qwen3 Next 80B A3B Instruct

Best fit: Q4_K_S · 47.5GBCan I run it? →

Qwen3 VL 30B A3B Thinking

Best fit: Q8_0 · 34.0GBCan I run it? →

Mistral Medium 3.5

Best fit: Q4_K_M · 43.4GBCan I run it? →

Qwen3 4B Thinking 2507

Best fit: fp16 · 9.0GBCan I run it? →

gemma 4 E4B it

Best fit: f32 · 33.0GBCan I run it? →

FAQ — running AI on the NVIDIA L40

How many AI models can the NVIDIA L40 run?

With 48.0GB of VRAM, the NVIDIA L40 can run 30+ open-source models from our database, including DeepSeek V4 Flash, Qwen 3.6 27B, Qwen3.5-27B.

What's the largest LLM I can run on a NVIDIA L40?

The biggest model that fits is approximately 81.3B. Larger models would need to be quantized further or won't fit at all.

Is 48.0GB of VRAM enough for local AI?

Yes — 48.0GB comfortably runs most popular open-source models including 30B-class LLMs at Q4_K_M.