In a previous article written in March 2024, I briefly laid out some of the environmental issues related to AI, include its carbon footprint. At the time, I didn’t report any quantitative data, because to my knowledge it didn’t exist. That changed a few weeks ago when the MIT Technology Review published a five-article series examining the positive and negative impacts of AI on the environment. For those who don’t have subscriptions or don’t want to spend the time to read all five, here’s my notes on the series:
To understand how to calculate emissions, you first need to understand how the models work. You can refer to my inaugural post for an in-depth look, but the TL;DR is that AI models are just lots of mathematical operations.
You tell your computer what math you want it to do and give it some data to learn from. This is called training.
When you’re done training, you can then query the model to do whatever it is you intended it to do. This is called inference.
For small models, you can do this all on your own machine. But for large language models and other large systems, you will instead load this math onto chips in a data center.
To access them, you open a connection between your computer and the data center and tell it what you want. That math is performed on the chips, and the answer is sent back to you.
It may seem like the math is done by magic, but it’s actually the result of electrical signals being sent through the chips. Creating those signals and managing the side effects of them is where most of the carbon emissions come from.
Originally, it seemed like most emissions would come from training. This was a logical guess because you are continuously performing mathematical operations during training, and you are also loading large amounts of data onto your chips. Training for models can take weeks, months or years depending on how big it is. In addition, models can also perform something called “online training” where it continues to learn as people perform inference, thus adding to the energy requirements.
However, with the rise in popularity in chatbots, it’s now estimated that inference is consuming 80-90% of energy. This is because the number of people using these systems is orders of magnitude larger than the number of people training them.
It's really hard to estimate the carbon emissions of an AI model. It depends on many factors, including how big the model is, what power plant is used to generate energy to perform the computation, and even the time of day (some plants use mostly renewables but switch back to fossil fuels during peak demand).
But, we can still make some educated guesses. The majority of the emissions don’t come from the model itself, but from the systems in data centers used to cool the machines. The current chips that perform these operations aren’t maximally efficient, and they emit energy as heat. But if the machine gets too hot, the electrical components stop working. So, data centers have elaborate cooling systems meant to keep the machines at a constant temperature. Traditionally, it’s a pretty elaborate system of pipes that pump cool water through the building. And due to an interesting quirk in the design, air-cooled systems actually use more water than the traditional systems, so that’s not a great solution. Burning fuel to power these systems is where most emissions come from.
Side note: As anyone who has long hair and showers regularly knows, if you want pipes to work, there can’t be anything in the water. So the data centers want potable water so they don’t have to spend lots of time cleaning the pipes. This leads to political tensions in places like Reno, where politicians are attempting to bring jobs to their communities by luring tech companies with tax incentives, but there’s not enough water to power the data centers and still provide for communities in the long term, especially as climate change worsens. It’s even worse than you might expect because the places in the US that are best for data centers in terms of cost of land are in the American Southwest, which is where the water crises is worst and hits underserved communities like indigenous tribes the most.
Back to the emissions: if you know how big the model is, and if we know what data center it lives at and where that center gets its energy from, you can create a formula that estimates the total emissions.
Several organizations have done just that, including HuggingFace and ML.Energy
These organization will have different rankings, and that’s because of the different assumptions about data center energy sources, as well as different methodologies. That’s not a bad thing, especially when the research is so new. There’s a saying in statistics: “All models are bad, some models are useful”. What’s important is that we can gain insight into what’s going on with these models so we can try to find a way to use them while still managing our emissions.
Something counterintuitive to me that came out of the rankings is that text-based models are less efficient than image-based. However, if you think about it in terms of math, it starts to make sense. Each pixel in a black-and-white image can only have 255 different possibilities. Compare that to the billions of words in different languages and you can see why it would be harder to train a text-based model.
Now that the energy use has been quantified, you can then try to figure out how to power the models. Let’s say that you work at a power company and OpenAI wants to build a data center in your area. They say they need continuous access to the amount of energy it needs to train a single model, and that they want you to build enough infrastructure to support that. They’re willing to pay for part of it. How do you approach that problem?
One way to do this is with renewables. But unfortunately this doesn’t seem to be a common option. The sun and the wind can’t be called on command, and the energy storage options needed to efficiently store and transmit clean energy are new and require thinking about the power grid in new ways. The current administration is openly hostile to such projects, and utilities aren’t willing to shoulder the financial risks themselves.
The most common way to solve the problem of needing more energy is to burn natural gas. This is an especially common solution in the South, where most data centers are being built. It’s true that burning natural gas emits less carbon dioxide than burning coal, but the fracking required to extract the natural gas emits more methane than extracting coal. Methane is worse for climate change than carbon dioxide, and the air and water pollution caused by fracking is both environmentally devastating and politically unpopular.
Nuclear power plants are also an option, but they take decades to build and the tech companies want the energy now.
While AI seem like a great bet right now, the field is so new that anything could happen. It could fizzle out, and there could be another dot-com bust. Or, as discussed later, the AI industry could use less energy than currently estimated. So while many municipalities focused on short-term goals and rush to build more natural gas or nuclear pipelines, if the situation changes, citizens could be left footing the bill for expensive infrastructure that no one is using. According to one study, new natural gas pipelines in Virginia could cost taxpayers an additional $37.50 a month in their electricity bills even if the tech companies held up their end of the bargain.
But on the other hand, it makes sense that politicians want to bring new industries and opportunities to their communities. So, we should be thinking of ideas that balance economic opportunity with the potential risks.
According to a Duke study, most utilities are only using 53% of their total capacity at any given time. The other 47% is saved for large spikes, like when there’s severe heat or cold. So, local governments could negotiate with the tech companies. The companies could agree to only train their models during times of lower demand. In exchange, they’d save money by not having to pay for new infrastructure. They’d also get good press and not have to spend time fighting local groups.
From my personal experience: It’s annoying for the engineers to have to program the training like this, but it is possible. In fact, it’s pretty common to learn to do this in graduate research when all students have to share limited computer resources. If you have seemingly unlimited access to compute power, it’s easy to just train for a long time and see what happens. If you have limited resources, you’re forced to be more creative and efficient. I won’t go as far as to say that this would have a noticeable impact on AI’s carbon footprint, but I would be interested to hear from the engineers about what they’d do differently.
Some tech companies, like Amazon and OpenAI, are agreeing to invest in nuclear or carbon capture or other energy forms. Politicians could force them to invest in green energy instead.
It’s not all doom and gloom. The last article lists four reasons for optimism:
More efficient models: As the technology becomes more widespread, people can innovate by creating their own smaller models with more curated datasets for specific tasks. They would emit less emissions (and probably be more accurate) than larger models. There’s also reason to believe that more efficient inferencing and parallel computing breakthroughs can reduce emissions.
More efficient chips: Most older AI models were trained on chips designed for computer graphics, not the matrix math used in AI models. There’s lots of research being done to design AI-specific chips that would be more efficient. I discuss it in more detail in a previous article. Ironically, I also recently attended a talk about using AI to design more efficient chips for AI, so it’d be interesting to see what happens with that.
Better cooling in data centers: We don’t have to just waste the water used in data centers. Countries like Denmark build infrastructure to re-use the water to heat homes. We could also develop new technologies to cool more efficiently.
Cutting cost/energy use: It’s expensive to use so much energy. It makes economic sense for companies to try to reduce their energy usage to save money, especially as the field gets more competitive.
Thank you for making the complicated simple to understand!!!