A modern residential heat pump outdoor unit beside a home exterior, with a digital overlay showing a heating curve graph mapping outdoor temperature to supply water temperature
Sustainability & Green Building

708 Heat Pumps Exposed a 2–3x Performance Spread. One Slider Nobody Touches After Install Day Explains Most of It.

By Priya Greenwood · June 2, 2026

Somewhere in your heat pump's control panel, there is a setting called the heating curve. It maps outdoor air temperature to supply water temperature: when it is 20°F outside, the system pushes 120°F water through your radiators; when it is 45°F, maybe 95°F. Your installer configured this curve on the day the system went in, probably by selecting a preset that roughly matched your building type, and then he drove away. That curve has not been touched since, and it runs your heating bill every winter, almost certainly set wrong.

A Nature Communications study monitoring 708 residential heat pump systems across central Europe found a factor-of-two-to-three performance spread between the worst and best installations. Not because some systems were defective or some houses were dramatically leakier than others. Because the operational parameters, above all the heating curve, were set once and abandoned to entropy while the building aged, the weather patterns shifted, and the occupants changed how they lived in the space.

17.2%
Percentage of air-source heat pumps in the 708-system study that fell below the optimization threshold of 3.01 SCOP, meaning they consumed at least 24% more electricity than even an average-performing system in the same dataset.

What the Numbers Actually Say

Air-source heat pumps averaged a seasonal coefficient of performance (SCOP) of 3.72 across the 612 ASHPs in the study. Ground-source systems averaged 4.80 across 96 units, but those averages obscure a brutal distribution. Some air-source units achieved 5.55 while others limped along below 3.01, the threshold below which the researchers flagged a system as needing optimization. One in six ASHPs fell below that line. Burning money. Every winter.

A separate Fraunhofer ISE four-year field study of 77 heat pumps in German buildings dating from 1826 to 2001 confirmed the pattern. Average seasonal performance factor: 3.4, up from 3.1 in their prior study, which sounds encouraging until you see the range. Bottom: 2.6. Top: 5.4. That spread is not noise; it is the difference between a system that justifies its $15,000 to $25,000 installation cost and one that runs your electricity bill higher than the gas furnace it replaced.

"We have also uncovered optimization potential," said Danny Günther of Fraunhofer ISE, which is the polite German way of saying that a significant fraction of installed systems are burning money because nobody tuned them after the installer left.

Your Heating Curve Is a Guess That Never Gets Updated

A heating curve is brutally simple. Cold outside, hot water in the pipes; warm outside, cooler water; one line on a graph. The problem is that the correct curve for your house depends on thermal mass, insulation quality, window orientation, solar gain patterns, how many people live there, whether the third-floor bathroom renovation added a heated floor that draws more from the buffer tank, and about forty other variables that change over the life of the building. Your installer accounts for none of this when he selects Curve 3 out of 7 presets, checks that the rooms get warm, and signs the commissioning sheet.

Consequences compound silently in both directions. Supply water temperature runs 5 to 10 degrees higher than necessary on mild days, forcing the compressor into less efficient operating ranges. On bitter cold days, the curve undershoots, triggering the backup electric resistance heater, which converts every kilowatt-hour at a COP of 1.0 instead of 3.5. Both failure modes cost money, and neither produces a visible error code, an alarm, or any signal that something is wrong. Your utility bill is the only diagnostic, and most homeowners do not correlate a $60/month winter energy increase with a miscalibrated heating curve, because they have never heard of a heating curve and the installer never mentioned it.

AI Controllers: 13% Average, 25% Peak, and One Model That Tripled Consumption

Fraunhofer ISE's AI4HP project, a Franco-German consortium including Stiebel Eltron and EDF R&D, built a neural network using transformer architecture to model a building's thermal behavior in real time, then used the model to adjust the heating curve continuously rather than leaving it static. Simulations across three buildings over one heating season each showed 13% average energy savings versus the standard fixed heating curve. A one-week field test recorded a 25% COP improvement, which translates to $2,400 in lifetime savings at current US residential electricity rates.

Neither figure is transformative, but both are real, and both came from software rather than hardware.

But the AI4HP field test lasted one week. Dr. Lilli Frison, who leads the project, acknowledged that "AI methods must become more robust and scalable in order to implement them cost-effectively in a large number of different building types." The 25% number needs longer evaluation. It might hold. It might not. Seven days of data does not settle that question, and the difference between 13% and 25% is the difference between a nice-to-have optimization and a compelling financial case for deploying AI controllers across millions of residential heat pumps.

And then there is the cautionary study from Zadar. It changes everything.

A hybrid physical-LSTM model published in MDPI Energies in April 2026 attempted to combine physics-based thermal modeling with a long short-term memory neural network for air-to-air heat pump control. The physical model was carefully calibrated against 52,128 real IoT measurements from the 2024/2025 heating season. Root mean squared error: 0.076°C, which looks extremely accurate on paper. It was not. In 15-day continuous simulations, the LSTM correction caused indoor temperature underestimation of 1.25 to 1.31°C, and simulated electricity consumption exploded from 72 kWh to 316 kWh.

339%
Increase in simulated electricity consumption when a hybrid physics-LSTM AI model was used for heat pump control over 15 continuous days, despite achieving an RMSE of only 0.076°C on short-horizon predictions. Accuracy on a one-step forecast does not guarantee thermodynamic consistency over time.

That is a 339% increase in energy use from an AI controller that looked excellent by every standard evaluation metric. The researchers identified the root cause as an "implicit virtual heat flux" problem: the LSTM correction term breaks thermodynamic consistency, injecting phantom energy flows that accumulate over multi-day horizons. RMSE alone, the metric that most AI papers use to claim success, is insufficient for evaluating whether a model is safe to deploy for continuous building control.

What This Means If You Own a Heat Pump

Ask your installer to come back. Pay them. A post-installation commissioning checkup after one full heating season costs a few hundred dollars and can adjust your heating curve based on actual performance data rather than the guess that was made before your system had ever run through a winter. Worth it. This is not a technology problem but a service model problem. Installers are paid to install, not to optimize, and the HVAC industry has not built a business model around post-commissioning tuning visits because nobody taught homeowners that the tuning matters.

If you are installing a new heat pump, demand a monitoring system. The Fraunhofer study measured minute-accurate compressor data, heating rod activation, and hydraulic circuit flows. Consumer-grade solutions from Sense, Emporia, or circuit-level monitoring can approximate this at a fraction of the cost of a research-grade installation. You cannot optimize what you do not measure, and a $200 energy monitor pays for itself if it catches one season of a badly configured heating curve.

If a vendor pitches you an AI-optimized heat pump controller, ask three questions: How many buildings has this system been deployed in for more than one heating season? Does the optimization model maintain thermodynamic consistency over multi-week horizons, or was it validated only on short-term prediction accuracy? And does installing a third-party controller void your heat pump manufacturer's warranty, because the answer to that question is almost certainly yes unless the controller manufacturer has a formal partnership with your heat pump brand. Stiebel Eltron partnered with Fraunhofer on AI4HP specifically to avoid this problem, embedding the optimization algorithm inside their own hardware stack so the controller and the heat pump share a single engineering team, a single warranty, and a single point of accountability when something goes wrong. Most startups selling AI thermostats have not, and their warranty exposure remains your problem.

The Original Contribution, and Its Limits

We calculated the lifetime cost of heating curve miscalibration using the 708-system study's performance distribution. A system running at a SCOP of 3.0 versus the dataset average of 3.72 consumes 24% more electricity. For a home using 12,000 kWh/year of thermal energy for space heating at the US average residential electricity rate of $0.16/kWh, that gap translates to $213 per year, or approximately $3,200 over a 15-year heat pump lifespan.

This calculation uses US electricity rates applied to a European performance dataset, and the mismatch matters in both directions. European electricity prices are generally higher ($0.25 to $0.40/kWh in Germany), which means the financial penalty for miscalibration is even steeper in the countries where the research was conducted, while US homeowners with cheaper electricity face a smaller per-year cost but are also less likely to notice the waste, which means the problem persists longer before anyone investigates. We assumed 12,000 kWh/year of heating demand, which is reasonable for a moderately insulated 2,000-square-foot home in ASHRAE climate zones 4 through 6 but would be lower in milder climates and higher in poorly insulated older buildings. The 708-system study monitored Swiss systems, whose installation practices, equipment brands, and climate differ from US conditions, and Fraunhofer's 77-system study covered German buildings exclusively.

Neither study was conducted in the United States. No comparable study has been. American heat pump installations may face different performance distributions due to differences in installer training, equipment brands, climate zones, and duct-versus-hydronic distribution systems. No comparable large-scale US field study of residential heat pump SCOP exists in the published literature. The ASHRAE Journal article by David Pogosian analyzed Bay Area heat pump economics but focused on sizing rather than heating curve optimization, a related but distinct problem.

AI controller performance claims rest on small samples: three simulated buildings for the 13% figure, and a single one-week field test for the 25% figure. The Zadar counter-study is itself a simulation, not a deployed failure. No consumer has reported a 339% energy spike from an AI thermostat. Not yet. But the mechanism the researchers identified, LSTM drift over multi-day horizons, is a structural risk that applies to any neural network used for continuous building control, and it has not been ruled out in any commercial product's published validation data.

← Back to AI Home Building