In a recent benchmark showdown, AMD’s top-tier RX 7900 XTX was pitted against Nvidia’s RTX 4090 and RTX 4080 Super using the DeepSeek AI model. David McAfee from X reported that the RDNA3-equipped RX 7900 XTX outshone the RTX 4090 by as much as 13% and exceeded the performance of the RTX 4080 Super by up to 34%.
Detailed tests were conducted comparing the three GPUs across multiple large language models (LLMs) and settings via DeepSeek R1. The RX 7900 XTX achieved its highest lead over the RTX 4090 with the DeepSeek R1 Distill Qwen 7B model, surpassing the Ada Lovelace-based RTX 4090 by 13%. Additional tests involving three different LLM setups against the RTX 4090 showed that the RX 7900 XTX was faster in two out of three scenarios—11% quicker with Distill Llama 8B and 2% faster with Distill Qwen 14B. However, in one setup using Distill Qwen 32B, the RTX 4090 edged out the RX 7900 XTX by 4%.
Against the RTX 4080 Super, AMD conducted tests in three configurations where the RX 7900 XTX outpaced its competitor by 34% using DeepSeek R1 Distill Qwen 7B. The performance advantage narrowed to 27% with Distill Llama 8B and 22% with Distill Qwen 14B.
However, these results should be approached with caution, as it’s unclear how Nvidia’s GPUs were configured for these AMD-conducted tests. Not every AI task fully utilizes the computational capacity of a GPU, as evidenced in our own Stable Diffusion evaluations, which didn’t leverage FP8 computations or TensorRT coding.
While not typically known for AI tasks, the RX 7900 XTX’s underlying RDNA 3 architecture supports essential matrix operations and is suited for AI workloads, supporting both BF16 and INT8. The term “AI Accelerator” was recently adopted by AMD for its RDNA 3 architecture to highlight its capacity for handling AI-driven tasks. The RX 7900 XTX is equipped with 192 AI accelerators.
AMD has also released a guide to help users harness DeepSeek R1 on AMD-compatible consumer hardware, including the RX 7900 XTX. DeepSeek R1, a new AI model, delivers performance on par with leading Western AI models but requires significantly less computational power. This model benefits from hardware-based optimizations that allow it to operate up to 11 times faster than its competitors, using programming approaches like Nvidia’s PTX.