Testing the AMD MI300X GPU: Superior Performance Over Nvidia H100

Gamingdeputy reported on June 27 that the technology blog Chips and Cheese Blog post published on June 25tested the AMD MI300X GPU in detail,After comparing in terms of cache, latency, inference, etc., it was found that the performance far exceeds NVIDIA's H100.

MI300X Introduction

The AMD Instinct MI300X artificial intelligence accelerator is built on the CDNA 3 architecture, using a hybrid 5nm and 6nm process technology, and has up to 153 billion transistors.

Advertisement

Storage has also been greatly improved, with the MI300X's HBM3 capacity 50% higher than the previous generation MI250X (128 GB).

Compared with NVIDIA H100, MI300X has the following advantages:

  • 2.4 times more memory capacity

  • 1.6 times higher memory bandwidth

  • 1.3 times higher FP8 performance (TFLOPS)

  • 1.3x higher FP16 performance (TFLOPS)

  • In 1v1 comparison test, the performance can lead H100 (Llama 2 70B) by up to 20%

  • In 1v1 comparison tests, the performance can lead H100 (FlashAttention 2) by up to 20%

  • In the 8v8 server comparison test, the performance can lead H100 (Llama 2 70B) by up to 40%

  • In the 8v8 server comparison test, the performance can lead H100 (Bloom 176B) by up to 60%

cache

The AMD MI300X is equipped with 32KB L1 cache, 16KB scalar cache, 4MB L2 cache, and a huge 256MB Infinity Cache (as L3 cache).

Advertisement

The cache benchmarks from hips and cheese show that the cache bandwidth of the MI300X is significantly better than that of Nvidia's H100 at all relevant cache levels.

The L1 cache performance shows that the bandwidth of MI300X is 1.6 times that of H100, the bandwidth of L2 cache is 3.49 times that of H100, and the bandwidth of the last level cache (i.e., infinite cache) of MI300X is 3.12 times that of H100.

Memory bandwidth

The AMD GPU's local HBM3 memory is 2.72 times that of the H100 PCIe, and the VRAM bandwidth is 2.66 times that of the H100 PCIe.

The only place where the AMD GPU loses out in the memory test is in the memory latency result, where the H100 is 57% faster than the AMD GPU.

Note that this is the lowest spec H100 PCIe card with 80GB of HBM2E. Subsequent versions such as the H200 include up to 141GB of HBM3E with bandwidth up to 4.8 TB/s.

Raw Throughput

The instruction throughput results for Chips and Cheese take into account INT32, FP32, FP16, and INT8 computations.

The instruction throughput of the AMD chip is ridiculously high. At times, the MI300X is 5 times faster than the H100, and at worst, it's 40% faster.

Artificial Intelligence Reasoning Test

Gamingdeputy AttachedOriginal URLinterested users can read in depth.

Advertising Statement: The external jump links contained in the article (including but not limited to hyperlinks, QR codes, passwords, etc.) are used to convey more information and save selection time. The results are for reference only. All articles in Gamingdeputy contain this statement.

Advertisement