Can I Run / GLM-5 / on NVIDIA RTX 4090

Can I Run GLM-5 on a NVIDIA RTX 4090?

No

Won't fit — even the smallest quant (Q4_K_M) needs 140.4GB VRAM.

Model size
230B
GPU memory
24.0GB
Smallest quant
Q4_K_M
Best fit

None of GLM-5's quantizations fit

Even the most aggressive quantization needs more memory than the NVIDIA RTX 4090 provides. Your options below: rent a bigger GPU in the cloud, or upgrade.

Or upgrade your hardware

GPUs that would let you run this model locally:

Apple Mac Studio M3 Ultra (192GB)~$7,499

Unified memory means ~190GB of usable model RAM in a single quiet box. Runs 405B at Q4.

NVIDIA H100 80GB~$30,000

Datacenter-grade. Most users should rent rather than buy — see cloud options.

Advertisement
Full model details
GLM-5

All quant variants, benchmark scores, and use-case tags.

Best models for this GPU
NVIDIA RTX 4090

Top-ranked open-source models that fit in 24.0GB.

FAQ

Can the NVIDIA RTX 4090 run GLM-5?

No. GLM-5 (230B) needs at least 140.4GB even at its smallest quantization, more than the 24.0GB on the NVIDIA RTX 4090.

What's the best quantization to use?

None of GLM-5's available quantizations fit in 24.0GB. You'll need either a larger GPU, a smaller model, or to run it in the cloud.

What if I need more headroom for context length?

KV cache memory grows with context length. The numbers above assume a baseline 2K-4K context. For long-context use (32K+), add another 2-6GB depending on the model architecture.