How big is the data centre energy problem? We've been through a decade or two of a virtualisation-led cloud, effectively sharing physical servers among users. Whenever you get a new email or scroll through social media, you ask a server somewhere to do a tiny piece of work. Leveraging the elasticity of the cloud means you rent that microsecond of use rather than buying a physical machine that sits there powered on but idle. You also get that cloud/data centre's commitment to sustainability.
Generative AI flips this dynamic on its head. Training large models requires tens to hundreds of GPUs running full tilt for weeks. As we all engage AI, our collective need consumes vast energy, hence carbon and water relative to our Web-2.0 / old-school cloud use. Given (pre generative AI / pro cloud era) data centres accounted for almost a fifth of all electricity used in the Republic of Ireland in 2022, rising by 400% since 2015 [1], the impact the AI era will have on datacentres is a valid concern.
Today, every data centre and cloud has a sustainability, net zero or liquid cooling play. But how do we know which are real? Which has the most significant impact? Which has the greatest promise of attaining sustainability? The issue here is measures of data centre efficiency are globally poor:
PUE is imprecise, leaving much to interpretation and hence inconsistencies between claims, and
building codes, such as NABERS (an otherwise relevant and excellent code in Australia), use PUE and are yet to catch up with the AI era change
Sustainability decision-makers are increasingly conscious of the materiality of claims. Proactive authenticity has an advantage, but sometimes, you must pave the way. To this end, we're incredibly proud of our friends at SMC. A year ago, we celebrated the formation of Sustainable Metal Cloud, a partnership informed in part by our work with Firmus. Setting authenticity as a principle, SMC has validated their pioneering technology and subsequent efficiency standard for AI factories. They are the first to publish the full suite of power results and performance data for MLPerf on clusters as large as 64 nodes (512 GPUs) - the de facto standard for benchmarking AI resources. In their news article, they claim:
"This showcases significant energy savings over conventional air-cooled infrastructure, which when combined within our Singapore data centre, has proven to save close to 50% total energy."
We knew that 50% — how could SMC authentically prove it? Publishing the full power results with their MLPerf benchmark submission is an excellent way!
It's so good to see a regional innovation and partnership coming to fruition and leading the global conversation! Well done SMC!
[1] Data centres use almost a fifth of Irish electricity, BBC News (https://www.bbc.com/news/articles/cpe9l5ke5jvo.amp)