Sustainable Metal Cloud achieves record AI efficiency with new tech
Sustainable Metal Cloud (SMC) has announced a significant breakthrough in AI training efficiency through the use of Firmus Technologies' single-phase immersion cooling technology. The AI GPU cloud platform, which leverages this innovative cooling method, has demonstrated remarkable energy savings and performance improvements in the latest MLPerf training benchmarks.
SMC's recent submission to the MLPerf training benchmarks, a benchmark suite for machine learning training performance, utilised GPT-3 175B and 512 H100 Tensor Core GPUs. The submission showcased SMC's capability to achieve world-class performance while significantly cutting energy consumption. Compared to traditional air-cooled data centres, SMC's platform offers up to 50% energy savings, thereby reducing both the cost and the carbon footprint of AI development.
David Kanter, Executive Director of MLCommons, commended SMC's efforts, stating, "It's fantastic to see Sustainable Metal Cloud (SMC), one of our newest members, submit to MLPerf Training with our first-ever power measurements." He added, "SMC's release establishes a baseline for best practice power consumption. The MLPerf benchmarks help buyers understand how systems perform on relevant workloads. The addition of the new power consumption benchmark gives our members, buyers, and the entire AI community a new way to rate energy efficiency and environmental impact."
Edward Pretty, Chairman of SMC, expressed his enthusiasm about the results. "We are thrilled to be a part of MLCommons and contribute to advancements in energy-efficient AI training," he said. "These results, verified by MLCommons members, validate the transformative power of our Sustainable AI Factories in reducing the environmental impact of large-scale AI. As the demand for AI grows, addressing resource consumption is critical."
The cornerstone of SMC's Sustainable AI Factories is its single-phase immersion cooling technology. This method involves submerging the servers directly into a liquid tank, thus eliminating the need for energy-intensive fans and air conditioning systems. This innovative cooling technique results in a 30% energy saving at the server level, with an additional 20% energy saving achieved by retrofitting the immersion platform into traditional air-cooled data centres.
David Kanter further remarked, "As a community, it's important that we can measure things, so we can improve them. I hope SMC's initial results will help drive transparency around the power consumption of AI training. This is an example of why MLCommons exists - to bring the best of industry together and have new, scaling tech platforms benchmarked against the world's largest infrastructure providers."
Additionally, SMC's collaboration with technology partners such as NVIDIA has led to further optimisations. By adopting an enhanced VBOOST-enabled software stack, the company achieved further energy reductions, lowering consumption to 451kWh and registering a 7% performance improvement. This positions SMC's customer cloud training performance environment just 6% below NVIDIA's flagship Eos AI supercomputer, although this result has not been verified by MLCommons Association.
SMC's commitment to sustainability aims not only at achieving industry-leading efficiency but also at making sustainable AI accessible. By minimising the environmental impact and delivering cost savings, the company strives to democratise AI technology. SMC is actively involved in MLCommons to both showcase its technology and contribute to setting industry standards for energy-efficient AI training.
Moving forward, Sustainable Metal Cloud remains dedicated to supporting the development of advanced AI models and promoting the broader adoption of energy-efficient AI technologies. The company continues to invest in research to ensure its platform delivers top-tier performance for next-generation AI workloads while minimising environmental impact.