Google Dethrones NVIDIA With Split Results In Latest Artificial Intelligence Benchmarking Tests

A portion of one of Google’s Cloud TPU v4 Pods

Google

Digital transformation is responsible for artificial intelligence workloads being created at an unprecedented scale. These workloads require corporations to collect and store mountains of data. Even as business intelligence is being extracted from current machine learning models, new data inflows are being used to create new models and update existing models.

Building AI models is complex and expensive. It is also very much different than traditional software development. Artificial intelligence models need specialized hardware for accelerated compute and high-performance storage as well as a purpose-built infrastructure to handle AI’s technical nuances.

In today’s world, many critical business decisions and customer-facing services rely on accurate machine learning insights. To train, run, and scale models as quickly and accurately as possible, an enterprise have the knowledge to choose the best hardware and software for its machine learning applications.

Benchmarking

ML Commons

MLCommons is an open engineering consortium that has made it easier for companies to make machine learning decisions with its standardized benchmarking. Its mission is to make machine learning better for everyone. Tests are conducted and unbiased comparisons help companies determine which vendor best suits its artificial intelligence application requirements. The foundation for MLCommons began its first MLPerf benchmarking in 2018.

MLcommons recently conducted a benchmarking program called MLPerf Training v2.0 to measure the performance of hardware and software used to train machine learning models. There were 250 performance results reported from 21 different submitters, including Azure, BaiduBIDU
BIDU
, Dell, Fujitsu, GIGABYTE, GoogleGOOG
GOOG
, Graphcore, HPHPQ
HPQ
E, Inspur, Intel-Habana Labs, Lenovo, Nettrix, NVIDIVIDI
NVDA
DIA
VIDI
NVDA
DIA
A, Samsung, and Supermicro.

This round of testing focused on determining how long it takes to train various neural networks. Faster model training leads to speedier model deployment, impacting the model’s TCO and ROI.

A new object detection benchmark was added to MLPerf Training 2.0, which trains the new RetinaNet reference model on a larger and more diverse dataset called Open Images. This new test reflects state-of-the-art ML training for applications like collision avoidance for vehicles and robotics, retail analytics, and many others.

Results

Machine learning has seen much innovation since 2021, both in hardware and software. For the first time since MLPerf began, Google’s cloud based TPU v4 ML supercomputer outperformed NVIDIA A100 in four out of eight training tests covering language (2), computer vision (4), reinforcement learning (1), and recommender systems (1).

Higher is better. TPUs demonstrated significant speedup in all five published benchmarks over the … [+] fastest non-Google submission (NVIDIA on-premises). Taller bars are better. The numbers inside the bars represent the quantity of chips / accelerators used for each of the submissions

Google

According to the graphic comparing the performance of Google and NVIDIA, Google had the quickest training times for BERT (language), ResNet (image recognition), RetinaNet (object detection), and MaskRCNN (image recognition). As for DLRM (recommendation), Google came in narrowly ahead of NVIDIA, but this was a research project and unavailable for public use.

Overall, Google submitted scores for five out of the eight benchmarks, best training times are shown below:

Data: MLCommons

<…….

Source: https://www.forbes.com/sites/moorinsights/2022/07/12/google-dethrones-nvidia-in-latest-artificial-intelligence-benchmarking-tests/

Google Dethrones NVIDIA With Split Results In Latest Artificial Intelligence Benchmarking Tests – Forbes

Leave a comment Cancel reply