1 Comment
Apr 20, 2023·edited Apr 23, 2023

Dear David,

We appreciate your interest in our work and writing comments on our article. While responding to your comments on your website may not be the most ideal way, we believe there is a need to clarify the potential misunderstandings.

#1. Your comment on "3.8-5.2 L/kWh".

Response: We just briefly mention in our paper that an average data center could consume 3.8L/kWh and some data centers could use as much as around 5L/kWh. But, these numbers are not used in any of our estimates for training GPT-3 in Microsoft's data center. They're not used for estimating Google' LaMDA water footprint either unless they coincide with our modeled WUEs.

When we estimate the water consumption for training GPT-3 in Microsoft's data centers, we use Microsoft's own water usage effectiveness (WUE) number --- 0.55 L/kWh [1] --- which, along with GPT-3 energy consumption of 1,287 MWh, yields our estimate of 700,000 liters of direct water consumption. This is very simple math: 1287000*0.55= 707000. And, this number will increase to 2.1 million liters if training were done in Microsoft's data centers in Asia (again, based on Microsoft's own WUE number of 1.65 L/kWh).

There's no publication about when or where GPT-3 was trained. So, the number of 700,000 can be a few times higher if GPT-3 was trained in the summer and/or dry areas like Arizona. We also note that our number is fairly conservative, as it doesn't consider the additional overheads (e.g., hyperparameter turning, architecture design) prior to training which can be another significant amount of water consumption.

We also estimate the water consumption for inference --- a conversation of 20-50 questions can consume 500 ml of water. The methodology is available in Footnote 3 of our paper. As you can see, we set the WUE for on-site water consumption as 0.5--5L/kWh. This is a fairly reasonable value, as 0.5 is what Microsoft reports for its **annualized** average number in all its US data centers, and 5 L/kWh can be the number in certain dry regions and/or hot months.

The industry's average on-site WUE may be around "1.8L/kWh" in the U.S. This number is from a DOE's data center energy report published in 2016. Nonetheless, the calculation method was not clearly specified in DOE's report.. It seems to be based on the lower range reported by a WSJ news article and other anecdotal data. Even if we use this number of 1.8L/kWh, we'll get the on-site water consumption for GPT-3 as around 2.3 million liters, instead of 700,000 liters! Recall Microsoft's WUE for its U.S. data centers is 30% of the U.S. average (0.55 vs 1.80). If we still use "30%" as the relative standing of Microsoft's WUE in Asia (which is 1.65L/kWh), then we'll see that the average WUE in Asia would be around 5.5 L/kWh. Therefore, the 5 L/kWh we get from a real data center in Beijing isn't out of the reasonable range.

In fact, a recent research shows that the on-site WUE in hot regions (Phoenix, AZ) can be as high as 9 L/kWh in the summer: https://www.sciencedirect.com/science/article/pii/S0921344922000428 (This number is based on actual operational data for a large commercial data center.) This again confirms that our empirical WUE model is reasonable.

To sum up, our estimate of 700,000 for training GPT-3 is based on Microsoft's own on-site WUE in its U.S. data centers and is a very "conservative" estimate (excluding the significant overheads and not considering regional/seasonal differences). The "3.8L/kWh -- 5L/kWh" was only mentioned as a general reference and not used in our estimates for training GPT-3 or ChatGPT in Microsoft's data center.

#2. Your comment on Facebook and Google WUEs.

Response: Facebook mostly uses "outside air cooling" for which water is only used to control humidity and when the outside temperature is very high. Hence, Facebook has a WUE of around 0.2-0.3 L/kWh. In contrast, Google uses conventional cooling towers as stated in their sustainability reports. You can refer to Google's and Facebook's data center webpages to see the detailed description of their cooling systems.

Google did not disclose their on-site WUE, but we found one estimate: "Aaron Wemhoff at Villanova University in Pennsylvania found that Google’s data centers had an on-site water usage effectiveness of about 1.1 litres per kilowatt hour of energy usage." according to a January 17, 2023, report "Google has finally revealed how much water its data centres use" [2]. Again, this number is annualized and averaged over all different locations, including both hot and cold regions. Nonetheless, this is enough to show that your statement "Google's efficiency is likely similar, making the graphed values excessive" may not be correct.

Given the lack of detailed data from Google, we used a generic cooling tower and selected four different locations (Nevada, Oregon, Virginia, and Texas). You can see from the figures in our appendix that our estimated on-site real-time WUE is between 0 and 5.5 L/kWh, depending on the locations and seasons. We did not pick up the highest water number for Google's LaMDA model; instead, we showed the water footprint for different locations and different starting months (see Figure 4). You can see that there is big variation in terms of the water footprint depending on the seasons and locations.

We do not claim our estimate is perfectly accurate. Instead, we clearly acknowledge in our paper that "our estimated water footprint of LaMDA only serves as an approximate reference point for the research community and general public, rather than an accurate calculation that is impossible to know without further transparency from the model developer."

"While it is impossible to know the actual water footprint without detailed information from Google" (as stated in our paper), we believe that our estimate is fairly reasonable and in the same order as Google's actual value.

All the numbers we used in our papers are open and transparent with concrete support. Our estimates and methodology are technically correct with scientific support. The source codes are available at: https://github.com/ren-research/making-ai-less-thirsty

#3. Your comments on "the actual figures are not provided (a footnote, strangely not in the methodology section, explains a range of “0.5L/kWh and 5L/kWh depending on weather conditions”)"

Response: If you read our paper, you'll see that the formula we used in the footnote is the same as the one presented in Eqn. (1) in our methodology section. We put the formula in the footprint just to improve the readability so that readers do not have to go to the methodology section to see how we estimated the water footprint for ChatGPT. If you plug in the numbers we present in the footnote, you'll get our conclusion that a conversation of 20-50 questions has a total water footprint of about 500 mL. Additionally, you'll see that Fig. 5 (quoted in your blog) has nothing to do with our estimate for ChatGPT or GPT-3. Fig. 5 shows a snapshot of the on-site hourly WUE based on an empirical formula for four different locations to highlight that the on-site WUE can change over time and across locations.

#4. Your comments on contextualizing data centers' water usage.

Responses: We believe that water conservation needs "whole-society" efforts shared by everyone across the board, including data centers, regardless of how much water one consumes. Tech giants agree with us, e.g., "Water is a finite resource, and every drop matters." by Meta in its 2020 sustainability report. As the Colorado River is at its historically low level and the Biden administration is weighing in to possibly enforce water cuts across the western states, we believe that AIs and data centers in general should take their shared responsibility. We believe that you also agree with us on this point.

Also, there's growing evidence that data centers' water is in tension with the local community: Microsoft's data centers in Chandler, AZ, and the Netherlands, Google's data center in The Dalles, Or, and the recent bill SB-1078 proposed in Virginia earlier this year would require "a site assessment shall be performed to examine the effect of the data center on water usage and carbon emissions as well as any impacts on agricultural resources." (URL: https://lis.virginia.gov/cgi-bin/legp604.exe?231+sum+SB1078 )

We hope that our responses can address your concerns. If you have any questions, please feel free to reach out to us.

Finally, we're also happy to share one of our early papers on data center water footprint published in the IEEE International Conference on Cloud and Green Computing in 2013: "Optimizing Water Efficiency in Distributed Data Centers" ( https://ieeexplore.ieee.org/document/6686011 )

We sincerely hope to receive your valuable feedback if you'd like to engage in technical discussions with us.

Shaolei Ren

(on behalf of co-authors)

**********

References:

[1] https://azure.microsoft.com/en-us/blog/how-microsoft-measures-datacenter-water-and-energy-use-to-improve-azure-cloud-sustainability/

[2] https://www.newscientist.com/article/2354801-google-has-finally-revealed-how-much-water-its-data-centres-use/

Expand full comment