Optimizing Data Center Efficiency: A Deep Dive into Freecooling Techniques

cover
14 May 2024

In the previous article, we discussed the rapid expansion of data center infrastructure and the increase in electricity consumption it resulted in. As servers convert electricity into heat during operation, managing high temperatures and cooling both the data center facilities and equipment becomes a number 1 problem for the DC teams.

While traditional cooling methods, including air conditioners and chillers effectively cool data center premises and servers, their costliness remains a significant drawback. Free cooling in contrast to traditional methods does not demand substantial investments but offers the same level of efficiency and reliability. In this article, I will make a detailed overview of free cooling technology, highlighting its benefits, limitations, and the requirements for successful implementation.

Physics of Free Cooling

To understand the physics behind free cooling, we'll need to revisit the heat energy formula:

Q = mcΔT

Here, 'Q' represents the amount of heat gained or lost, 'm' stands for the mass of the sample (in our case, the mass of air in the data centre), 'c' denotes the specific heat capacity of air, and ΔT signifies the temperature differential.

In a data centre, the primary heat source is the CPU. Typically, there are 2 to 4 CPUs, each operating at approximately 200 watts. As discussed earlier, all electrical energy consumed by the CPUs is converted into heat. Therefore, with 2 CPUs, for instance, we generate 400 watts of heat that need to be dissipated. Now our objective is to determine the amount of air required for this purpose.

The parameter ΔT, or temperature differential, indicates that the lower the outdoor air temperature, the less air mass is needed to cool the CPUs. For instance, if the inlet air temperature is 0°C and the outlet temperature is 35°C, ΔT would be only 35, signifying a rather lower requirement for air mass. However, during the summer season, cooling becomes more challenging due to rising ambient temperatures. The higher the outdoor temperature, the greater the amount of air will be required for cooling the servers.

Server and Network Components Temperature Limitations

Though free cooling may be efficient for moderate and cold climates, it still has limitations due to temperature constraints on server components. Critical components in IT and network equipment, such as processors, RAM, HDDs, SSDs, and NVMe drives, have operational temperature requirements:

  • Processors: max 89°C
  • RAM: max 75°C
  • HDDs: max 50°C
  • SSDs and NVMe drives: max 47-48°C

These limitations directly impact the suitability of outdoor air temperatures for cooling. Free cooling would not be viable in regions where outdoor temperatures exceed these thresholds or even get close to them, as it could damage the system due to overheating. Regional Limitations

As we have already explained, outdoor temperatures must consistently remain lower than the IT equipment's maximum operational temperatures for free cooling to be effective. This necessitates careful consideration of the DC location's climate conditions. Organizations must analyze long-term weather forecasts to ensure that temperatures do not exceed the required thresholds, even on specific days or hours. Additionally, considering the long lifespan of data centers (typically 10-15 years), the effects of global warming should also be factored into location decisions.

Server Node Architecture Requirements

In the context of physics, achieving efficient cooling in servers relies on ensuring an ample flow of air through the system. The architecture of the server plays an important role in this process.

An example of server architecture featuring ventilation holes that facilitate the necessary airflow and allow effective free cooling

Conversely, servers lacking appropriate design features, such as perforations or openings, can impede airflow, potentially compromising the overall efficiency of the free cooling mechanism.

Humidity Control

The humidity level is another critical consideration when it comes to free cooling. As we lack control over external humidity conditions two pertinent inquiries arise: firstly, addressing humidity levels nearing or exceeding 100% within the data center (DC); secondly, addressing scenarios of very low air humidity, such as during a frosty February day with an outdoor temperature of -30°C and relative humidity ranging from 2% to 5%. Let's systematically examine these situations.

In conditions of elevated humidity, there is a common concern regarding the potential occurrence of condensation and its adverse effects on equipment functionality. Contrary to this concern, within the recooling zones of the DC, where the cooling process occurs, condensation is precluded. This is due to the principle that condensation transpires when warm, moist air comes into contact with colder surfaces. However, within the free cooling system of the DC, no element is colder than the surrounding air. Consequently, condensation is inherently impeded, eliminating the need for proactive measures.

On the opposite, when dealing with low humidity, the apprehension shifts toward the generation of static electricity, posing a threat to equipment stability. This issue is not associated with condensation but requires a distinctive resolution. Mitigation involves grounding procedures and the application of a specialized floor coating. These measures align with established methods for safeguarding internal equipment against static electricity. By grounding construction elements, racks, and IT equipment, a static charge is dissipated harmlessly to the ground, preserving the integrity of the equipment.

In the natural climate, instances of extremely high or low humidity are seldom. Notable exceptions include rare events such as a thunderstorm achieving 100% humidity in July or a severe frost causing very low humidity. However, for the majority of the time humidity levels remain well within acceptable ranges that do not pose any harm to the equipment, even in the absence of active interventions.

Air Quantity and Speed

As we have already discussed, to facilitate effective cooling we need a substantial volume of external air. Simultaneously, a seemingly counterintuitive requirement emerges – maintaining a low airflow within the building. This apparent paradox is rooted in the challenges posed by high-speed air currents circulating within.

To simplify, imagine high airspeed as a robust stream from a tube, creating swirls and turbulence around the IT equipment. This turbulence potentially leads to irregular air movements and localized overheating. To address this, we strategically aim for an overall low airspeed of 1-2 meters per second throughout the space.

Maintaining this controlled airspeed allows us to eliminate turbulence. A higher speed would risk irregularities in air movement. By adhering to the 1-2 meters per second range, we foster a smooth, uniform airflow, avoiding localized overheating. This delicate balance ensures optimal IT equipment cooling by sidestepping pitfalls associated with high-speed air currents.

As can be seen, the free cooling approach revolves around the efficient use of external air while prioritizing a controlled low internal airspeed. This deliberate strategy helps maintain a laminar and uniform airflow, ensuring the effectiveness of IT equipment cooling.

Building Concept

In the free cooling paradigm, traditional air ducts are not employed within the building's structure. Unlike conventional setups with designated air ducts in walls, ceilings, or specific areas, data processing centers adopt an unconventional approach. The building itself is conceived as an air duct, rendering traditional air-conditioning units obsolete. The sheer scale of these air ducts transforms them into integral components of rooms and floors.

A schematic depiction of the free-cooling building design

The airflow process initiates as external air enters the building, passing through two types of filters – coarse filters and fine filters. Once the air undergoes the cleaning process, it is propelled by fans into expansive building volumes, approximately equivalent to four floors in height. This substantial volume serves its own purpose: to decelerate the airflow, reducing its speed to the required range of 1-2 metres per second. Subsequently, the air descends to the machinery room.

After traversing the machinery room, the air continues its journey through IT racks, progressing into the hot aisle. From there, it enters the hot air collector before being expelled outside through exhaust fans. This structured airflow path ensures an efficient cooling process while maintaining controlled airspeed.

Airspeed and Volume

The deliberate design choice of using expansive building volumes serves a dual purpose. First and foremost, it allows for a gradual reduction in airspeed, ensuring that the airflow achieves the desired velocity of 1-2 metres per second. This controlled airspeed is essential to prevent turbulence and maintain a laminar flow, particularly important as the air progresses through sensitive IT equipment. Secondly, the significant volume accommodates the necessary air volume to dissipate the generated heat efficiently. The synchronised interplay of airspeed and volume contributes to the overall success of the system.

Differential Pressure as the Sole Management Driver

In a free cooling setup, we don't have control over the external air temperature, leading to variations in the air temperature entering the Data Center (DC). Despite this, estimating the required airflow for equipment cooling is essential. To address this, we rely on the method of differential pressure.

Inside each IT rack, servers with internal fans operate at different speeds, collectively creating a differential pressure between the rack's front and back. With numerous servers, each contributing to the overall airflow, this pressure difference gradually builds up between the cold and hot aisles. Using pressure sensors in both aisles and outside the DC building, we can measure this differential pressure.

The calculation involves subtracting the pressure sensor data in the hot aisle from atmospheric pressure and subtracting the pressure sensor data in the cold aisle from atmospheric pressure. Thus as in the example below:

Real-World Example

The resulting values then guide us in determining the necessary air supply to the DC and the required exhaust to offset the server fans' operation. In simpler terms, we gauge our airflow needs based on the pressure differentials, allowing us to manage the cooling process within the DC efficiently.

Heating and Mixing Chamber

The traditional heating systems are usually not implemented in Data Centres with free cooling. Using water is considered irrational due to cost and potential risks to equipment. This poses a challenge during extreme colds, reaching -20–30 degrees outside. While the equipment handles it well, engineers seek a gentler approach. The most elegant and logical solution here is reusing hot air generated by IT equipment. Directing the hot air from servers into a mixing chamber, and returning part of it to the main air current, the system keeps the premises warm in winter and allows to save costs on heating.

Simplicity and Reliability

A key thesis in reliability theory asserts that simplicity begets reliability. This holds for the free cooling system which stands as a remarkably simple concept. The system functions as a barricade, ushering air from the outside through filters, passing it through IT equipment, and then just expelling it.

The absence of complex systems enhances reliability, with only fans posing a vulnerability in hot weather. The free-cooling approach exemplifies radical system simplification, substantially improving reliability by reducing the number of elements.

DC Fans vs Server Fans

The hierarchical authority of the fans is another fundamental question in the dynamics of airflow within DCs. As we have discussed, there are large-scale fans at the DC level and those at the server level. The question is: do the data center fans merely supply air, leaving the server fans to consume as much as needed? Or does the demand originate from the server fans, compelling the DC fans to fulfill their requirements?

The mechanism is as follows: the server fans have a dominant role in this process, determining the necessary airflow. Subsequently, the DC fans respond by delivering the required volume of air. It becomes evident that if the cumulative demand from all servers surpasses the supply capacity of the DC fan, it can lead to potential overheating.

So the answer is that server fans have the primacy in this dynamic. They orchestrate the airflow, specifying the needed air quantity.

Efficiency and PUE Calculation

To evaluate the efficiency of a DC project the calculation of Power Usage Effectiveness (PUE) is traditionally used. The formula for PUE is the ratio of Total Facility Power to IT Equipment Power:

  PUE = Total Facility Power / IT Equipment Power

Ideally, it equals 1, signifying that all energy is directed to IT equipment without any wastage. However, achieving this perfect scenario is rare in real-world projects.

Another issue arises when we try to establish a clear methodology for computing Power Usage Effectiveness (PUE). Thus, for example, in our system, we possess a metric indicating instantaneous power consumption in watts, which makes it possible to calculate PUE in real time.

Moreover, we can derive an average PUE over an annual period, which offers a more comprehensive assessment considering seasonal fluctuations. This is particularly pertinent given the disparity in energy usage between seasons; for instance, the disparity in cooling requirements between summer and winter months. This means that if we want to have a more reliable evaluation, we need to prioritise an annual average providing a more balanced and comprehensive assessment.

It is also important to explore PUE not solely in terms of energy but also monetary units, thereby incorporating the seasonal fluctuations of electricity prices. Evaluating PUE in monetary terms gives a more holistic perspective on operational efficiency.

Besides, this approach unveils possibilities to achieve a PUE value of less than 1 when measured in dollars. It becomes possible, for instance, when we use waste heat for water heating and sell it further on to the nearby cities. Noteworthy examples, such as Google's data center in the USA and Yandex's facility in Finland, demonstrate the viability of such practices, particularly in regions characterized by high energy costs.

Efficiency vs. Reliability

Concerns about reducing costs and increasing efficiency often raise questions about potential negative impacts on reliability. However, I would like to emphasize that in free cooling the pursuit of efficiency does not compromise reliability. Instead, its technological side effects can even enhance efficiency. For example, as we have already discussed, redirecting excess heat to heat pumps for additional benefits, such as generating hot water for nearby cities, becomes a financially advantageous practice without sacrificing reliability.

Future of Free Cooling

Despite all the advantages free cooling offers, the data center industry is still driven by a conservative approach and demands proven reliability, with a tendency to resist innovative solutions. The reliance on certifications from bodies like the Uptime Institute for marketing poses another hurdle for free cooling solutions, lacking an established certification, leading commercial providers to view them with skepticism.

Yet, there is a trend among corporate hyper-scalers to adopt free cooling as the main solution for their DCs. With a growing number of companies acknowledging the cost-effectiveness and operational benefits of this technology, we expect that more corporate-free cooling data centers will appear in the next 10-20 years.