logo
Startseite Rechtssachen

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC

Bescheinigung
China Beijing Qianxing Jietong Technology Co., Ltd. zertifizierungen
China Beijing Qianxing Jietong Technology Co., Ltd. zertifizierungen
Kunden-Berichte
Das Verkaufspersonal von Beijing Qianxing Jietong Technology Co.,Ltd ist sehr Berufs- und geduldig. Sie können Zitate schnell zur Verfügung stellen. Die Qualität und das Verpacken der Produkte ist auch sehr gut. Unsere Zusammenarbeit ist sehr glatt.

—— 》 《Festfing DV LLC

Als ich Intel CPU und nach Toshiba SSD dringend suchte, gab Sandy von Beijing Qianxing Jietong Technology Co., Ltd mir viel Hilfe und erhielt mir die Produkte, die ich schnell benötigte. Ich schätze sie wirklich.

—— Kitty Yen

Sandy von Beijing Qianxing Jietong Technology Co.,Ltd ist ein sehr vorsichtiger Verkäufer, der mich an Konfigurationsfehler in der Zeit erinnern kann, als ich einen Server kaufe. Die Ingenieure sind auch sehr Berufs und können den Prüfungsprozeß schnell abschließen.

—— Strelkin Mikhail Vladimirovich

Wir sind sehr zufrieden mit unserer Erfahrung in der Zusammenarbeit mit Beijing Qianxing Jietong. Die Produktqualität ist ausgezeichnet und die Lieferung erfolgt immer pünktlich. Ihr Verkaufsteam ist professionell, geduldig und sehr hilfreich bei all unseren Fragen. Wir schätzen ihre Unterstützung sehr und freuen uns auf eine langfristige Partnerschaft. Sehr empfehlenswert!

—— Ahmad Navid

Qualität: “Große Erfahrung mit meinem Lieferanten. Der MikroTik RB3011 war bereits benutzt, aber er war in sehr gutem Zustand und alles funktioniert perfekt.Und alle meine Sorgen wurden schnell gelöst.Ein sehr zuverlässiger Lieferant wird empfohlen.

—— Geran Colesio

Ich bin online Chat Jetzt

AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC

April 15, 2026
AMD has announced its MLPerf Inference v6.0 benchmark results, positioning the Instinct MI355X GPU as a highly scalable inference platform capable of supporting single-node, multinode, and heterogeneous deployments. Beyond incremental performance gains, the submission introduces new workloads, demonstrates cluster-scale throughput exceeding 1 million tokens per second, and validates consistent performance reproducibility across an expanding partner ecosystem.

CDNA 4 Architecture Targets High-Capacity Inference


The Instinct MI355X is built on AMD’s CDNA 4 architecture, leveraging a TSMC dual-process chiplet design: compute dies (XCDs) use a 3nm node, while I/O dies utilize 6nm FinFET technology. The multi-chiplet package integrates 185 billion transistors and supports FP4 and FP6 data formats—critical for efficient large-model inference. Each GPU is equipped with up to 288GB of HBM3E memory (delivering 8 TB/sec of memory bandwidth) , enabling support for models up to 520 billion parameters on a single device. AMD emphasizes that this combination of compute density and memory capacity eliminates the need for excessive model partitioning, a key advantage for large-scale inference workloads.

Available in UBB8 configurations, the platform offers both air-cooled and direct liquid-cooled options, aligning with diverse data center deployment requirements. Notably, the MI355X features a 1400W TBP (Thermal Design Power) with liquid cooling, delivering higher performance than its air-cooled counterpart, the MI350X.

Multinode Throughput Surpasses 1 Million Tokens per Second


A standout achievement from the MLPerf v6.0 round is AMD’s cluster-scale throughput exceeding 1 million tokens per second. Using Instinct MI355X GPUs, AMD hit this milestone with Llama 2 70B in both Server and Offline scenarios, as well as with GPT-OSS-120B in Offline mode.

aktueller Firmenfall über AMD Instinct MI355X Achieves MLPerf Inference v6.0 Gains with Over 1 Million Tokens per Second and Supports Scalable ROC  0

AMD MLPerf 1M tokens per second graphic

These results reflect a growing industry shift toward evaluating inference performance at the cluster level, rather than per individual accelerator. Aggregate throughput and time-to-serve have become primary metrics for determining production readiness in large-scale AI deployments.

AMD also demonstrated exceptional scaling efficiency. For Llama 2 70B, an 11-node, 87-GPU configuration achieved over 1 million tokens per second across Offline, Server, and Interactive scenarios, with scale-out efficiency ranging from 93% to 98%. For GPT-OSS-120B, a 12-node, 94-GPU cluster delivered similar throughput with over 90% scaling efficiency—proving performance translates effectively as deployments expand beyond a single system.

Generational Gains and Competitive Single-Node Performance


AMD reported significant generational improvements, with the Instinct MI355X delivering 3.1x better performance on Llama 2 70B Server compared to the prior-generation Instinct MI325X, reaching 100,282 tokens per second. This improvement stems from both CDNA 4 architectural enhancements and ROCm software optimizations. Offline scores improved by 4.4x and Server scores by 4.8x compared to prior MLPerf rounds, primarily driven by FP4 quantization—a key feature of the MI355X that unlocks higher throughput for AI workloads.

AMD Inference results vs previous gen graphic

In single-node comparisons against NVIDIA platforms, the MI355X demonstrated strong competitiveness. On Llama 2 70B, it matched NVIDIA B200 in Offline throughput, achieved near parity in Server performance, and outperformed it in Interactive mode. Against NVIDIA B300, the MI355X delivered 92% of Offline performance, 93% of Server performance, and exceeded it by 4% in Interactive mode. Notably, the MI355X also offers superior cost-efficiency, delivering 40% more tokens per dollar compared to the NVIDIA B200.

First-Time Model Enablement Expands Coverage


MLPerf Inference v6.0 introduced several new workloads, and AMD used this round to showcase rapid model enablement. GPT-OSS-120B, a mixture-of-experts model, made its MLPerf debut with the MI355X, achieving competitive results against NVIDIA systems in both Offline and Server scenarios.

AMD also submitted results for Wan-2.2 text-to-video generation, marking its entry into multimodal and generative video inference. While the official submission focused on Single Stream latency, the results were on par with existing platforms. Post-submission tuning further improved performance, highlighting room for optimization as the software stack matures.

These additions underscore AMD’s commitment to expanding beyond traditional LLM benchmarks to support emerging AI workloads across diverse use cases.

ROCm Software Enables Scaling and Heterogeneous Inference


AMD credits much of the MI355X’s performance and scalability to its ROCm software stack. Key enhancements include optimized FP4 execution, improved GPU-to-GPU communication for distributed inference, and support for dynamic workload distribution across heterogeneous environments—critical for mixed-GPU deployments.

AMD MLPerf inference results instinct mI355x graphic
A milestone heterogeneous submission—developed by Dell and MangoBoost—used three AMD Instinct GPU models: MI300X, MI325X, and MI355X. This configuration achieved 141,521 tokens per second on Llama 2 70B Server and 151,843 tokens per second on Llama 2 70B Offline. Notably, the MI355X platform was located in Dell’s U.S. lab, while the MI300X and MI325X systems were in Korea—demonstrating the ability to coordinate distributed systems across geographic locations.

Ecosystem Growth and Reproducibility


AMD’s partner ecosystem expanded significantly in this MLPerf round, with nine companies submitting results across multiple Instinct GPU generations. Participating vendors include Cisco, Dell, Giga Computing, HPE, MangoBoost, MiTAC, Oracle, Supermicro, and Red Hat—reflecting broad industry adoption of AMD’s inference solutions.

Partner submissions closely aligned with AMD’s internal results, typically within 4% and in some cases within 1%. This consistency confirms that MI355X performance is reproducible across OEM and cloud platforms, reducing deployment risk and boosting confidence in real-world performance outcomes.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Kontaktdaten
Beijing Qianxing Jietong Technology Co., Ltd.

Ansprechpartner: Ms. Sandy Yang

Telefon: 13426366826

Senden Sie Ihre Anfrage direkt an uns (0 / 3000)