Question 1

Can the AMD Ryzen AI Max+ 395 (64GB) run Llama 3.3 Nemotron Super 49B V1.5?

Accepted Answer

Yes. The AMD Ryzen AI Max+ 395 (64GB)'s 48.0GB of unified memory is enough to run Llama 3.3 Nemotron Super 49B V1.5 at Q6_K quantization (42.1GB required).

Question 2

What's the best quantization to use?

Accepted Answer

Q6_K is the highest-precision quantization that fits in your 48.0GB. It uses about 42.1GB of memory and 43.6GB recommended for comfortable inference.

Question 3

What if I need more headroom for context length?

Accepted Answer

KV cache memory grows with context length. The numbers above assume a baseline 2K-4K context. For long-context use (32K+), add another 2-6GB depending on the model architecture.

Quant	Min VRAM	Recommended	File size	Headroom
Q6_KBEST	42.1 GB	43.6 GB	40.9 GB	+5.9 GB
Q5_1	38.4 GB	39.9 GB	37.4 GB	+9.6 GB
Q5_K_M	36.4 GB	37.9 GB	35.4 GB	+11.6 GB
Q5_K_S	35.4 GB	36.9 GB	34.4 GB	+12.6 GB
Q5_0	35.3 GB	36.8 GB	34.3 GB	+12.7 GB
Q4_1	32.2 GB	33.7 GB	31.4 GB	+15.8 GB
Q4_K_M	31.3 GB	32.8 GB	30.2 GB	+16.8 GB
Q4_K_S	29.6 GB	31.1 GB	28.6 GB	+18.4 GB
Q4_0	29.1 GB	30.6 GB	28.5 GB	+18.9 GB
Q3_K_L	23.2 GB	24.7 GB	26.3 GB	+24.8 GB
Q3_K_M	21.9 GB	23.4 GB	24.3 GB	+26.1 GB
Q3_K_S	20.2 GB	21.7 GB	22.0 GB	+27.8 GB
Q2_K	17.4 GB	18.9 GB	18.7 GB	+30.6 GB

Can I Run Llama 3.3 Nemotron Super 49B V1.5 on a AMD Ryzen AI Max+ 395 (64GB)?

13 quantizations fit your 48.0GB

Try it in the cloud first

FAQ

Can the AMD Ryzen AI Max+ 395 (64GB) run Llama 3.3 Nemotron Super 49B V1.5?

What's the best quantization to use?

What if I need more headroom for context length?