analysis - UJ Content - University of Johannesburg

(1)

1Mercator Research Institute on Global Commons and Climate Change, Berlin, Germany. ²Hertie School, Berlin, Germany. ³Department of Geographical Sciences, University of Maryland, College Park, MD, USA. ⁴School of Earth and Environment, University of Leeds, Leeds, UK. ⁵Technische Universität Berlin, Berlin, Germany. ⁶Center for Applied Economic Research, University of Münster, Münster, Germany. ⁷Stockholm Environment Institute,

Stockholm, Sweden. ⁸Africa Centre for Evidence, University of Johannesburg, Johannesburg, South Africa. ⁹Neon Neue Energieökonomik, Berlin, Germany.

10Institute for Labour Economics, Bonn, Germany. ✉e-mail: [email protected]

F

inding low-energy demand pathways is necessary to hedge against the risks involved in decarbonizing energy supply and is key to finding socially acceptable ways to meet the Paris climate goals^1–4. Energy demand from buildings was responsible for 28% of global energy-related CO2 emissions in 2019, when indirect emissions from upstream power generation are considered. After plateauing between 2013 and 2016, buildings-related CO2 emissions increased to an all-time high of 10 GtCO2 in 2019 (ref. ⁵), with residential buildings accounting for almost 60% of these emissions. This recent growth in emissions was driven by an increased demand for building energy services that has outpaced energy effi- ciency and decarbonization efforts⁶. Besides technologies and archi- tecture, behaviour, lifestyle and culture have a major effect on the energy demand from buildings, with three- to fivefold difference in energy use for the provision of similar building-related energy service levels⁷. Hence, addressing current growth trends in energy demand from residential buildings is one important component for guiding developments towards responsible, low-energy pathways in climate change mitigation. But a lack of systematic efforts to quantify demand-side solutions in general, and interventions in household energy demand specifically, in a way amenable to climate change assessments such as those by the Intergovernmental Panel on Climate Change (IPCC), has led to a bias towards riskier supply-side solutions^1,8,9.

There is a rich and diverse literature available on demand-side solutions^1,10. Since the oil price shock in the 1970s, interventions to reduce energy use in buildings and appliance use have been

researched extensively¹¹. Experiments that use monetary incentives to reduce consumption have been trialled widely, even more so since the introduction of smart metering at scale over the last decade¹². Evidence has accumulated on the use of other behavioural interventions, which encompass a range of initiatives that may, either by themselves or in conjunction with the more typical policy tools (for example, infrastructure and incentives), achieve greater energy consumption reductions than have been achieved by the typical tools alone¹³. Despite this vast pool of evidence that can be employed for policy-making, little is known about the global carbon emissions reduction potential of such interventions.

In this study we have addressed this gap through an interdisciplinary meta-analysis of interventions that foster behavioural change for reducing energy consumption by households, excluding incentives for equipment adoption and structural changes¹⁴. Previous meta-analyses tended to be disciplinary or focused on subsets of interventions: Faruqui et al.¹⁵ on pricing interventions, Karlin et al.¹⁶ on feedback and Abrahamse et al.¹⁷ and Andor and Fels¹⁸ on social comparison, commitment devices, goal setting and labelling. Nisa et al.¹⁹ considered evidence from a wider range of household behaviours relevant to climate change mitigation, but did not review interventions in energy consumption exhaustively. The meta-analysis by Delmas et al.²⁰ broke new ground but was based on a narrower literature search and did not include studies published after 2012, which constitute about half of our sample. In this paper we provide a comprehensive, up-to-date meta-analysis that criti- cally assesses the energy savings potential of interventions aimed

A multi-country meta-analysis on the role of

behavioural change in reducing energy consumption and CO ₂ emissions in residential buildings

Tarun M. Khanna

^1,2

, Giovanni Baiocchi

³

, Max Callaghan

^1,4

, Felix Creutzig

^1,5

, Horia Guias

⁶

, Neal R. Haddaway

^1,7,8

, Lion Hirth

^2,9

, Aneeque Javaid

¹

, Nicolas Koch

^1,10

, Sonja Laukemper

¹

, Andreas Löschel

⁶

, Maria del Mar Zamora Dominguez

¹

and Jan C. Minx

^1,4

✉

Despite the importance of evaluating all mitigation options to inform policy decisions addressing climate change, a comprehen- sive analysis of household-scale interventions and their emissions reduction potential is missing. Here, we address this gap for interventions aimed at changing individual households’ use of existing equipment, such as monetary incentives or feedback.

We have performed a machine learning-assisted systematic review and meta-analysis to comparatively assess the effective- ness of these interventions in reducing energy demand in residential buildings. We extracted 360 individual effect sizes from 122 studies representing trials in 25 countries. Our meta-regression confirms that both monetary and non-monetary interven- tions reduce the energy consumption of households, but monetary incentives, of the sizes reported in the literature, tend to show on average a more pronounced effect. Deploying the right combinations of interventions increases the overall effective- ness. We have estimated a global carbon emissions reduction potential of 0.35 GtCO₂ yr⁻¹, although deploying the most effec- tive packages of interventions could result in greater reduction. While modest, this potential should be viewed in conjunction with the need for de-risking mitigation pathways with energy-demand reductions.

(2)

at household energy consumption, as well as their implications for carbon emissions reduction, to inform upcoming climate change assessments.

Our findings extend previous meta-analytic literature in important ways. Our sample of 122 relevant primary studies is twice as large as previous analyses and is geographically diverse. Based on these studies, we have found a statistically robust, medium-sized average effect of interventions targeted at behavioural change in energy consumption in residential buildings. We also evaluated the effectiveness of subsets, and also combinations of interventions. Our findings support the idea that behavioural interventions aimed at household energy consumption should not be looked at only individually, but rather as packages to increase effectiveness.

Smart packaging can ensure that the overall effect of a portfolio of well-integrated interventions is larger than the separate effects when interventions are applied in isolation. We also translated the evidence on interventions in household energy consumption into estimates of CO2 reduction potential. We found that the interventions studied deliver on average a modest reduction of 0.35 GtCO2 yr⁻¹ in global carbon emissions from residential buildings, although deploying the most effective packages of interventions and including other interventions that target maintenance and the replacement or upgrade of heating and non-heating equipment could result in greater reduction.

Interventions in household energy consumption are diverse

We performed a systematic review and meta-analysis of the literature (see Methods) on interventions in residential energy demand to reduce energy use by existing equipment. These interventions

included monetary incentives that offer households a tangible financial reward for reducing energy consumption, and other behavioural interventions, such as nudging, appealing to norms, providing easily interpretable and credible information at the point of decision making and improving the skills required to perform or forego behaviours¹³. Following previous studies^16,18,20, we classified the studied interventions into five categories, namely monetary incentives, information, feedback, social norms and motivation interventions (Fig. 1). We note that our focus in this study was on demand reductions through behavioural change that is mainly associated with the use and, to an extent, maintenance of existing equipment, and we did not specifically look at interventions that target household actions relating to the replacement or upgrade of heating and non-heating equipment, or structural changes in buildings¹⁴.

We identified and coded a total of 122 relevant studies across disciplines and geographies. This is twice the number of studies included in previous meta-analyses (Supplementary Table 3). We extracted 360 effect sizes from these studies, or an average of about three effects per study. Evidence is scattered unevenly across interventions. For example, more than half the studies in our sample involved forms of feedback, and only about one-fifth involved a motivation experiment (Table 1). Moreover, several of the studies not only evaluated the effects of individual interventions, but combinations of two or more (see Supplementary Table 5 for descriptive statistics). This has allowed us to assess, across a large body of evidence, to what extent these packages of interventions are synergistic as proposed in the literature^16,21,22.

Overall, our final sample represents research on a total of 1.1 million households across 25 countries. About half of the effect sizes in the sample come from studies in economics or business,

Intervention type Intervention Description

Monetary incentives

Information

Feedback

Feedback interventions are rooted in psychological research that posits that directing an individuals’ attention to a feedback- standard gap that is relevant to the individuals can engender behavioural change¹⁶. Most experiments provide individuals information about their energy use, drawing comparison with their historical consumption. The effect of feedback seems to depend on its frequency, medium and duration^16,47.

Motivation Social comparison

Critical peak pricing, seasonal pricing, time-of- use pricing, real-time pricing, rewards and rebates

Home audits, tips, reminders

Historical, in-home displays

Home energy reports, norms-based comparison

Households are benchmarked against the performance of their social group^20,48. Norm-based communication has been widely adopted by utilities in the form of home energy reports⁴⁹, which seem to be effective in some cases even years after households received their initial reports⁵⁰.

Commitment devices, goal setting, gamification

Time-of-use pricing aligns the prices faced by households with the underlying cost of supply, which is higher during peak demand periods⁴³. Other interventions reward consumers for reducing peak- period consumption. Households are expected to reduce consumption as long as the financial savings from reduced consumption outweigh the costs of shifting or reducing consumption²¹.

These policies focus on promoting energy-saving behaviour by reducing the information deficit faced by households with activities and actions that can help reduce energy consumption¹⁷. The information provided may be general advice like energy-saving tips and practices through workshops⁴⁴ and mass media campaigns⁴⁵ or tailored advice in the form of home audits⁴⁶.

Social pressure has also been employed in the form of public pledges or commitments by households to practice energy- conserving behaviours¹⁷. Goal-setting interventions in which households commit to reducing energy consumption by a certain percentage over the course of the experiment are other commitment devices¹⁸. Some recent experiments have used web- based gamified platforms or mobile apps to induce behavioural change.

Fig. 1 | Typology of interventions in household energy consumption. Interventions in household energy consumption have been grouped in five categories for this analysis. The category ‘monetary incentives’ captures all interventions that involve a financial reward. Non-monetary interventions have been grouped into information, feedback, social norms and motivation^43–50.

(3)

about a fifth from psychology and around a fifth from engineer- ing or technology literature. The earliest studies date back to the mid-1970s, but around half of the sample is from studies conducted after 2013. About 45% of the effect sizes come from studies in the United States, 25% from continental Europe and another 10% from the United Kingdom. The number of effect sizes from studies in Asia has increased recently and constitutes 10% of the sample, with the remaining 10% coming from Australia, Latin America, Africa and the Middle East. The mean (standard deviation) baseline consumption across studies was 7,399 (8,823) kWh yr⁻¹ and the mean duration of the underlying experiments was 21.5 (26.7) weeks (Supplementary Table 4).

The studies in our sample reported effects in terms of the relative change in energy consumption, but the exact dependent variable and statistical technique employed varied across studies. To estimate the aggregate effect size, we first standardized the effects by converting the estimates of energy reduction reported for each study to Fisher’s Z (ref. ²³), and then used meta-analysis models to calculate the aggregate effect across studies (see Methods).

What interventions work best

Our analysis reveals a medium average effect size across all interventions. The estimated average effect (in terms of Fisher’s Z) varies between 0.10 and 0.15, depending on the estimation model used, and is both statistically significant and substantive across model specifications (Table 2). The average effect size is 0.10 in a random effects model with the DerSimonian−Laird (DL) estimator and 0.15 in a random effects model with the restricted maximum likelihood (REML) estimator. The REML estimator is recommended when heterogeneity is large, as is the case in our sample²⁴. We also estimated a multilevel model to account for the dependence between effect sizes coming from the same study. This also gave an average effect size of 0.15, although with a slightly larger confidence interval. While an average effect size of 0.10 can still be considered small at the level of a single household but relevant if scaled up, an average effect size of 0.15 indicates a medium effect and is considered to be consequential both at a single household level and cumulatively over many households^25,26. Removing studies that used pre-post analysis and including only those with control−treatment or difference-in-difference (DID) study designs reduced the average effect size from 0.15 to 0.13.

The average effect size of only the studies that employed randomization is 0.14 (Table 2). Our main results are robust to influential study analysis and variance matrix specification (see Methods).

However, do note that these results may be subject to publication bias, which has been reported earlier in this literature^19,20. Both the visual inspection of the funnel plot and the Egger’s test hint at the potential presence of small-study bias in our dataset also (Supplementary Note 1).

Our analysis reveals differences in average effect sizes across individual interventions (Fig. 2a). Studies that solely focused on monetary incentives (0.21; 95% confidence interval (CI) = (0.12, 0.30)) have the highest average effect size. Amongst the non-monetary interventions, studies concerned only with information (0.15; 95%

CI = (0.06, 0.24)) or feedback (0.12; 95% CI = (0.06, 0.19)) have higher average effect size than studies concerned with social comparison (0.10; 95% CI = (0.00, 0.19)) and motivation (0.10; 95%

CI = (0.00, 0.21)).

The analysis shows that many interventions are complementary in that the effect of a combination of the interventions is higher than the effect of individual interventions (Fig. 2b). For studies that combined motivation, feedback and monetary incentives, the effect (0.48; 95%

CI = (0.17, 0.79)) is additive, and is higher than the sum of the individual effect sizes (0.43). For other combinations, the interventions are complementary, but are not strictly additive, and the combined effect is lower than the sum of individual effects. For studies that combined feedback, social comparison and monetary interventions, the effect size (0.35; 95% CI = (0.12, 0.58)) is higher than the effect size for feedback, social comparison and monetary interventions studied individually, although it is lower than the sum of the individual effect sizes (0.43). Similarly, studies that combined information and feedback (0.17; 95% CI = (0.09, 0.25)) and studies that combined information and social comparison (0.18; 95% CI = (0.08, 0.28)) find an effect that is higher than the individual interventions.

In other cases, the effect size of the combination is about the same as individual effects, indicating little gain from combining Table 1 | Descriptive statistics of the included studies

No. of

effects Percentage of total

sample Reduction in energy consumption (%) Standardized effect size Z Average Standard

deviation Weighted

average Average Standard

deviation

Feedback 228 63.3 5.04 6.91 1.77 0.13 0.21

Information 174 48.3 5.61 6.84 1.91 0.17 0.24

Monetary incentives 74 20.6 6.12 6.43 1.59 0.15 0.19

Motivation 73 20.3 9.51 9.73 1.87 0.19 0.16

Social comparison 135 37.5 5.34 7.62 1.81 0.13 0.21

All interventions 360 6.24 7.42 1.92 0.16 0.22

Table 2 | estimated average treatment effect across model specifications

Average treatment effect

95%

confidence interval

95% prediction interval Random effects model

with DL estimator 0.10 0.08, 0.11 0.02, 0.18 Random effects model

with REML estimator 0.15 0.12, 0.17 −0.23, 0.53 Multilevel model with

REML estimator 0.15 0.12, 0.18 −0.22, 0.52 Multilevel model with

REML estimator (excluding pre-post studies)

0.13 0.10, 0.16 −0.16, 0.42

Multilevel model with REML estimator (excluding studies without randomization)

0.14 0.11, 0.18 −0.20, 0.49

(4)

interventions. For example, the effect size of feedback and social comparison (0.12; 95% CI = (0.05, 0.19)) is about the same as feedback (0.12). The average effect size of studies with the combination of feedback, information and social comparison (0.13; 95%

CI = (0.06, 0.20)) is slightly lower than that of information (0.15) and about the same as that of feedback (0.12). Interestingly, the effect size of the combination of feedback and monetary incentives (0.17; 95% CI = (0.08, 0.26)) is lower than the effect size of monetary incentives (0.21), suggesting non-complementarities in combining them.

Overall, these results support the idea that interventions in household energy consumption should not be looked at only individually, but rather as packages to increase effectiveness¹³, although there might be trade-offs in combining them.

explaining heterogeneity in effect sizes

The meta-analysis models used to estimate the aggregate treatment effects indicate a high degree of heterogeneity in effect sizes

across studies (I² is 94.12 for the DL model and 99.74 for the REML model). To understand what drives effect size heterogeneity, we performed a meta-regression controlling for a range of study characteristics, including region and time of study, study design and study level controls (Table 3).

The impact of household interventions varies across regions and countries. Our results show that compared with the studies from the United States, studies from Asia involving monetary incentives (Table 3) report a higher effect size that is also statistically significant. While the underlying cause of this heterogeneity is not clear, it could be on account of differences in, for example, the size of the monetary incentive, energy costs and the use of electricity for heating and cooling, and this needs further research. Studies from the United Kingdom report lower effect size for motivation, monetary incentives and feedback studies compared with the United States.

The average effect size reported by studies from continental Europe as compared with the United States is higher for monetary incentives but smaller for social comparison (as has also been reported in some primary studies²⁷), although the coefficients are not statistically significant in either case.

The design of studies also impacts the effect sizes reported by them. The primary studies in our dataset either compared the electricity consumption of the households before and after an intervention (pre-post), or across control–treatment groups, or both before and after intervention and across treatment groups (DID). The control–treatment and DID studies report lower average effect size.

The coefficients of the moderator variables are statistically significant and negative when all interventions are considered together, and for all subsets of interventions except motivation interventions.

We also found differences in the reported outcomes according to the statistical methods employed in the underlying studies. Studies that employed difference of means instead of regression designs (ordinary least squares (OLS) or household/time fixed effects panel) show higher effect sizes on average. Because electricity demand and prices are correlated with weather conditions, it could be an important confounding factor, especially in time-of-use pricing interventions. Indeed, the coefficient of the weather variable is negative and statistically significant for monetary interventions, indicating the need to control for weather in primary studies. We did not find a statistically significant impact of other control variables such as household size and demographics.

Household selection also impacts study outcomes. With monetary incentives, which have the largest effect size, the average effect is lower in studies in which households were not required to opt into the experiment. The coefficient of the opted-in moderator variable is positive and statistically significant. There are no statistically significant differences in the results between studies that employed randomization and studies that did not, except in case of feedback interventions.

Our results show that studies with longer treatment duration tend to find smaller effects on average for feedback and monetary incentives, although the coefficient is quite small. However, long-term studies are scarce—the mean (median) treatment duration in our sample was only 21.5 (12) weeks, indicating the need for long-term trials. Our meta-regression indicates that the average effect reported by newer studies is lower, although the trend is statistically significant only for feedback studies. Even though we controlled for qual- ity and study design, there could be further differences in the study design over time that we are unable to capture.

The five-category classification of interventions shown in Fig. 1 contains subgroups of interventions for which effects might be different. For the group of interventions classified under monetary incentives, we introduced a moderator variable for monetary rewards that are distinct from other types of monetary incentives that are concerned with pricing. The coefficient of rewards is negative, indicating that rewards may not be as powerful in influencing

0 0.1 0.2 0.3 0.4 0.5

1 2 3 4

Number of interventions used

Z

Monetary

incentives Information Feedback Social Motivation

comparison i

i i

€

i

€

Combinations where the average effect is higher than individual parts

Z

0 0.1 0.2 0.3 0.4 0.5

Social

comparison Motivation Feedback Information Monetary incentives Average = 0.14

Combinations where the average effect is lower than individual parts

a

b

Fig. 2 | estimated average effect size of different categories of reviewed interventions. a, The average effect size for individual interventions along with the corresponding 95% confidence interval. b, The average effect size for combinations of interventions. Only combinations with an average effect size that is statistically significant at the 5% level of significance are labelled. Z > 0 implies a reduction in energy consumption. The results are from a multilevel meta-regression model with interacted dummy variables for the five interventions. Only studies that employed randomization are included.

(5)

household behaviour as pricing, but the effect is not statistically significant. For the category of interventions tagged as feedback, we introduced a moderator variable for studies that deployed in-home displays. A negative coefficient, but one that is small and not statistically significant, indicates that such devices may not necessarily be a better means of providing feedback. It should, however, be noted that in-home displays come with different func- tionalities and their impact might vary, something that we are not able to capture here. For motivation interventions, we introduced moderator variables for studies that employed gamification or commitment devices in conjunction with or instead of simple goal setting. We found that these subcategories of interventions report higher effect sizes on average. While the number of studies using such experiments is small (11 and 4 effect sizes for gamification and commitment devices, respectively), it points to a promising new area of research.

emissions reductions from behavioural interventions To address the lack of synthetic evidence on demand-side solutions¹, we extended our meta-analysis to provide an initial estimate of the carbon emissions mitigation potential of the studied interventions for climate change assessments. We did this by using the percentage reduction in electricity consumption as the dependent variable in our meta-analytical models along with the aggregate emissions of residential buildings (see Methods). Interventions aimed at changing household behaviour on average deliver a reduction of 0.35 GtCO2 yr⁻¹ (95% CI = (0.17, 0.43)) in global carbon emissions from residential buildings. While the estimated emissions

reductions are relevant in size, they are very modest compared with the approximately 5.6 GtCO2 of emissions from residential buildings in 2018. Cumulatively, the emissions reductions will add up to 1.05–1.75 GtCO2 if these effects persist over 3–5 years. We do not consider reductions over a longer period because of doubts over the persistence of these effects.

The reductions in emissions could be higher if interactions between the various interventions highlighted in the previous sec- tions, between injunctive and descriptive norms²⁸, and between social norms, behavioural interventions and infrastructure pro- visions²⁹ are considered while devising policies. The cost effectiveness of a basket of interventions should also be assessed by considering the costs of different interventions (monetary incentives, for example, could entail higher infrastructure and regula- tory costs). Further, our estimate is based on the current average emissions intensity of electricity grids, but would increase if the reductions in energy demand were to lead to a reduction of generation from coal power plants at the margin, as has been the case in the current COVID-induced demand reductions³⁰. Our estimate only considers the reduction in energy consumption from household interventions, but not the shift in consumption from peak to non-peak hours, which could reduce electricity consumption during peak carbon emissions hours by up to 10% (ref. ³¹). Finally, our moderator variable analysis does not find substantial differences in the effectiveness of interventions across regions, and it is reason- able to expect that interventions in energy demand can temper the rapid growth of energy demand in developing countries in Asia and sub-Saharan Africa, leading to higher savings in emissions.

Table 3 | Results from the meta-regression random effects model with ReML estimator

All Feedback Information Monetary

incentives Motivation Social comparison

Intercept 5.45* 7.21** 3.31 1.26 2.53 6.65

Study design: DID −0.15*** −0.25*** −0.19*** −0.08 0.20** −0.30***

Study design: control–treatment −0.10* −0.23*** −0.17** −0.03 0.19* −0.30***

Statistical method: difference of means 0.11** 0.11** 0.19** −0.02 0.00 0.16**

Statistical method: panel effects 0.02 −0.02 −0.02 0.14 −0.03 −0.04

Weather −0.01 0.01 0.00 −0.19* 0.13** 0.06

Household type 0.00 0.03 0.05 0.30 −0.05 0.00

Residence type (demographics) −0.03 −0.03 −0.06 −0.20 0.01 −0.03

Opted-in: yes 0.04 0.02 0.03 0.17** −0.10 −0.08

Randomization: yes 0.03 0.09* 0.04 −0.05 0.06 −0.01

Intervention treatment period −0.001* −0.002*** −0.002 −0.004** −0.001 −0.001

Region: Asia 0.15*** 0.03 0.04 0.16 0.01 0.06

Region: United Kingdom 0.03 −0.07 0.03 −0.19 −0.22* 0.04

Region: continental Europe 0.02 0.01 0.05 0.13 −0.04 −0.07

Region: others 0.03 −0.01 0.08 −0.12 0.06 −0.08

Study year −0.003* −0.003** −0.002 −0.001 −0.001 −0.003

In-home display −0.05

Rewards −0.03

Commitment strategies 0.25**

Gamification 0.08*

No. of effects 324 198 150 71 72 115

I² 99.41 99.09 99.58 97.34 24.54 99.38

R² 29.48 56.98 37.92 41.38 93.09 55.68

The dependent variable is Fisher’s Z calculated for each effect size. Z > 0 implies a reduction in energy consumption. The results from the multilevel model are broadly consistent and are given in Supplementary Table 6. ***P < 0.001; **P < 0.01; *P < 0.05.

(6)

Discussion

We have performed an interdisciplinary meta-analysis of the effectiveness of interventions in household energy consumption com- prising 122 primary studies resulting in 360 effect sizes representing 1.1 million households in 25 countries. Our results reveal on average a medium-sized impact of interventions targeted towards fostering behavioural change in residential energy use among households.

The effect is robust across the meta-analytical models and subsets of interventions. For the studies included in this analysis, monetary incentives have the highest average effect size, while motivation and social comparison have a lower average effect size. However, we note that this depends on the size of the monetary incentives. For example, a lower or higher peak to off-peak price ratio can decrease or increase the effectiveness of dynamic pricing³². Similarly, increas- ing or decreasing the time and frequency of feedback can improve or impair the outcomes of feedback interventions^16,17. The design parameters of the primary studies and location can also have a bear- ing on the outcomes of interventions¹⁴. Because the relative effects of interventions may vary with context, the comparison between average effect sizes of different interventions should be interpreted with caution.

Our findings support the idea that such interventions should not be looked at only individually but rather as synergistic packages to increase effectiveness. Smart packaging can ensure that the overall effect of a portfolio of well-integrated interventions is larger than the effect of individual interventions. Although some mechanisms relating to how various interventions interact have been suggested in the literature^13,22, further research is needed to test these hypoth- eses, to harvest synergies and avoid possible trade-offs while combining interventions.

Beyond the interactions between interventions, we also used meta-regressions to investigate the heterogeneity of effect sizes.

The moderator variable analysis points towards possibly lower effects for interventions implemented at scale due to self-selection bias, a concern that has also been noted in primary studies³³. Our analysis also highlights the need for more long-term trials, using rigorous methodology and controls for contiguous factors. We were unable to assess the persistence of effects after the treatment period, which is critical, especially for non-monetary interventions, but also for monetary incentives to an extent. This is because the studies did not always include follow-up periods and even when they were included, they were not consistent in terms of the energy consumption metric, and the comparator used (follow-up period consumption to treatment period consumption or baseline consumption).

We drew on our meta-analysis to estimate a global emissions reduction potential of 0.35 GtCO2 yr⁻¹ through interventions aimed at behavioural change in the use of existing equipment in residential buildings. This estimate is a first attempt in trying to quantitatively evaluate possible emissions reductions from behavioural interventions based on meta-analytic evidence and needs to be refined further, for example, with regional- and country-specific estimates as the data become available.

Our initial estimate highlights the limited effect of behavioural interventions compared with the emission reductions required on the way to net-zero emissions⁸. However, the interventions considered in this paper do not cover all the interventions targeting household behaviours that are relevant to climate change mitigation.

Most importantly, there are a series of interventions that specifically target the replacement or upgrade of heating and non-heating equipment, or structural changes in buildings that could be associated with more substantial emission reductions, but these were not the focus of this study. Previous analysis for the United States suggests stronger impacts of these interventions on emissions and their careful design can also have higher behavioural plasticity¹⁴. Comprehensive meta-analysis of the full spectrum of interventions

available to building residents would fill an important research gap.

The IPCC has started giving more consideration to demand-side solutions and it is important that the scientific community supports this attempt by providing rigorous, gold-standard assessments of the available evidence.

Methods

Literature search and data extraction. Our data collection strategy involved (1) a search of the relevant existing literature reviews and the studies referenced in them, (2) string-based searches of bibliographic databases and (3) searches of grey literature on Google. In accordance with the guidance on rigorous evidence syntheses, we searched a broad set of bibliographic databases (Web of Science Core Collections Citation Indexes, Scopus, JSTOR, MEDLINE) and the web-based academic search engine Google Scholar, based on a comprehensive search string.

The Web of Science Core Collection Citation Indexes included in our search were as follows: Science Citation Index Expanded (SCI-Expanded) – 1900 to present, Social Sciences Citation Index (SSCI) – 1900 to present, Arts & Humanities Citation Index (A&HCI) – 1975 to present, Conference Proceedings Citation Index - Science (CPCI-S) – 1990 to present, Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) – 1990 to present, Emerging Sources Citation Index (ESCI) – 2015 to present.

We developed a search string that followed the PICOS (population, intervention, comparator, outcome and study design) logic recommended by the Campbell Collaboration (Supplementary Table 1). The search string was developed iteratively by checking the results of the search against a set of studies of known relevance. We searched for articles that dealt with household energy (or electricity) consumption along with one or more of the interventions of interest (see Supplementary Table 2 for the inclusion/exclusion criteria). We only tagged as relevant studies that dealt with the energy consumption of households or dormitories and contained a quantitative estimate of the energy saved through a relevant intervention. We did not include studies that focused on price effects but only referenced load effects (changes in kW and not kWh) or those that only reported the effect on peak consumption and not total consumption. Studies that only provided an effect size but not the associated variance were not included in the final synthesis. In addition, studies in which no obvious comparator group was available (untreated control group or pre-intervention data) or in which the sample size was too small to extract meaningful estimates were excluded from the analysis.

Studies that only measured the energy consumption of certain appliances or the direct load control of appliances by utilities were not considered. We reviewed quantitative studies aimed at behaviours for reducing energy demand mainly by leveraging the use and maintenance of existing equipment in residences and not by the adoption of long-lasting structural changes that reduce carbon emissions, such as energy-efficient appliances, building insulation and switching to green sources of energy, such as installing solar panels or switching to a greener supplier, although three of the included studies involved the provision of information/tips to households for reducing energy consumption through both home improvements and behavioural changes (Supplementary Table 7). Excluding these studies from the analysis did not materially change the results.

Because we did not make any exclusions based on the date, methodology or the field of publication, the searches returned a large number of studies (64,931) after removing duplicates. To enable the screening of relevant papers, we applied a machine learning algorithm using support vector machines³⁴ to rank the studies in the order of relevance of their abstracts (Supplementary Note 2). A team of four reviewers then manually screened the abstracts of the top 6,023 studies. Full text screening was performed on a selection of 939 studies deemed relevant from this initial screening. The final sample included 122 studies after critical appraisal.

The ROSES (RepOrting standards for Systematic Evidence Syntheses)³⁵ flowchart for screening and coding is available in Supplementary Fig. 1, and the complete list of studies included in the analysis is provided in Supplementary Table 7. Four reviewers extracted the relevant data from these studies using the rules laid out in a codebook. To ensure consistency, a sample of 50 studies were screened at the abstract level (Kappa = 0.77). The reviewers next carried out full text screening and coded the relevant papers from this sample, then discussed the coded fields to see whether any disagreements occurred and finally made suitable adjustments to the codebook. A single reviewer double-checked the final data collected from all the included studies. We used the NACSOS software³⁶ to manage the search results, remove duplicates, screen records and extract data.

Standardizing effect sizes. Although the dependent variable in the studies in our sample was uniform, the relative change in energy consumption, the exact functional form and the precision of the estimates varied across studies. Because most of the original studies employed regression analysis, following convention²³, we standardized the effects by first converting the regression coefficients extracted from the studies into correlation coefficients r using the total sample size, which we then converted to Fisher’s Z . For studies that employed the difference-of-means design, we first calculated the standardized mean differences or Cohen’s d and then converted them to Fisher’s Z. The conversions were performed using the standard formulae prescribed by Ringquist²³.

(7)

Synthesis. To estimate the aggregate effect size, we first standardized the effects by converting the estimates reported by each study to Fisher’s Z (ref. ²³). We used a random effects model to aggregate the standardized Fisher’s Z from the original studies. A random effects model is appropriate when effect sizes in primary studies do not consistently converge to a central population mean^23,37, which is certainly the case for studies relating to energy consumption in households with heterogeneous treatment effects²⁰. We used the metafor package in R (ref. ³⁸) to implement the random effects model with the DL and REMLs estimators.

Although the DL method is relatively simple and popular, it can lead to severe underestimation of the variance when either the number of studies is limited or the heterogeneity is large. Instead, REML is often recommended, especially when the heterogeneity is relatively high²⁴, so estimating using a REML estimator was preferred here. To check that no single study exerted undue influence on the aggregate effect sizes measured, we followed best practices by calculating three metrics for the estimation of influence, that is, Cook’s distance, the cov ratio and tau2, using the influence function in the metafor package in R. This function calculates the value of these metrics for each effect size included in the analysis.

The values of these metrics were distinctly different for eight effects belonging to four studies and could therefore be suspected of being influential observations in the analysis. Dropping the influential observations reduced the estimated average effect size to 0.08–0.12, but the results remained statistically significant and the estimate of tau2 decreased leading to a smaller prediction interval.

The ordinary random effects model is inappropriate when the effect sizes included are not statistically independent²³. Effect sizes are likely to be dependent in our sample as we extracted multiple effect sizes from each study. In addition, several of the studies in our set employed multiple treatments, and some used data from the same underlying experiments. We employed a multilevel meta-analysis model to account for such dependence. The multilevel model explicitly assumes that several of the effect sizes come from the same study. For the multilevel analysis we used the default variance–covariance structure in the metafor package³⁸. To test the robustness of our findings we also used cluster robust inference methods using the clubsandwich package in R to estimate the variance–covariance matrix (cluster-robust variance estimation). The results presented in the main paper were robust to the use of these methods.

Interaction effects between the various interventions were estimated by including treatment type (monetary incentives, information, feedback, social comparison and motivation) as interacted dummy variables in the multilevel model.

The resulting output gives the estimated effect when a single intervention is applied alone and estimates for all possible combinations of effects seen in the dataset.

Moderator variables for effect heterogeneity. The meta-regression models used to investigate the causes of heterogeneity in effect sizes were estimated using the random effects and multilevel models and by introducing moderator variables into the estimation equation. Mathematically, the interpretation of a parameter estimate on a moderator variable in a meta-regression is the same as for a parameter estimate from a traditional regression, that is, it represents the average change in the effect size associated with a one-unit change in the moderator.

Moderator variables could represent factors that genuinely affect the magnitude of the relationship between the focal predictor and the outcome of interest, or could represent the design elements of the original studies that may affect the effect size from coded studies²³. In this study we included both types of moderator variables.

The design elements of the original studies were captured as dummy variables for the following variables: weather controls (if a study controls for any aspect of weather, it is assigned the value 1), demographic controls (if a study controls for demographic variables, such as age, income and composition of the family, it is assigned the value 1), residence controls (if a study controls for the characteristics of the house, such as size, it is assigned the value 1) and randomization (assigned the value 1 if households are assigned randomly between interventions). We also included as moderator variables the study design (difference-in-difference, control–treatment or pre-post) and statistical method (panel regression, OLS regression or difference-of-means tests) employed in the studies. Other moderator variables captured factors that were likely to affect the relationship between energy use and treatment, for example, the duration of experiment or region in which the experiment was performed.

Emissions reductions. To calculate the mitigation wedge, we used the data on direct and indirect CO2 emissions of buildings from the International Energy Agency (IEA)³⁹. The reduction in electricity consumption was calculated by multiplying the estimated CO2 emissions of residential buildings in 2018 (5.6 Gt) by the average percentage reduction in the energy consumption of households due to interventions computed in the meta-analysis models. The meta-analysis models for this part were run using the percentage change in energy consumption reported in primary studies as the dependent variable. The corresponding variance was approximated using the square root of the sample size²⁰. The weighted percentage reduction in energy consumption corresponding to the weights of the meta-analysis models was estimated to be 6.3% (95% CI = (4.9%, 7.7%)).

Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The authors declare that the data supporting the findings of this study are available within the paper and its Supplementary Information and on GitHub (https://

github.com/tarun-hertie/Household-Interventions). All the information collected in this project is publicly available in line with the systematic reviews reporting protocol^40,41, providing the transparency and reproducibility required to conform with Open Synthesis principles⁴² (see the ROSES checklist³⁵). Source data are provided with this paper.

Code availability

We used the NACSOS software³⁶ to manage search results, remove duplicates, screen records and extract data, and the metafor package in R (ref. ³⁸) for the meta-regressions. All the software packages used are open source and freely accessible. The code developed for the paper and its Supplementary Information is available on GitHub (https://github.com/tarun-hertie/Household-Interventions).

Received: 12 December 2020; Accepted: 15 June 2021;

Published: xx xx xxxx References

1. Creutzig, F. et al. Beyond technology: demand-side solutions for climate change mitigation. Annu. Rev. Environ. Resour. 41, 173–198 (2016).

2. Grubler, A. et al. A low energy demand scenario for meeting the 1.5 °C target and sustainable development goals without negative emission technologies.

Nat. Energy 3, 515–527 (2018).

3. van Vuuren, D. P. et al. Alternative pathways to the 1.5 °C target reduce the need for negative emission technologies. Nat. Clim. Change 8, 391–397 (2018).

4. von Stechow, C. et al. 2 °C and SDGs: united they stand, divided they fall?

Environ. Res. Lett. 11, 34022 (2016).

5. The Critical Role of Buildings (IEA, 2019); https://www.iea.org/reports/

the-critical-role-of-buildings

6. Tracking Buildings 2020 (IEA, 2020); https://www.iea.org/reports/

tracking-buildings-2020

7. Ürge-Vorsatz, D. et al. in Climate Change 2014: Mitigation of Climate Change (eds Edenhofer, O. et al.) 671–738 (IPCC, Cambridge Univ. Press, 2014).

8. IPCC Special Report on Global Warming of 1.5 °C (eds Masson-Delmotte, V.

et al.) (WMO, 2018).

9. von Stechow, C. et al. Integrating global climate change mitigation goals with other sustainability objectives: a synthesis. Annu. Rev. Environ. Resourc. 40, 363–394 (2015).

10. Ivanova, D. et al. Quantifying the potential for climate change mitigation of consumption options. Environ. Res. Lett. 15, 9 (2020).

11. Hahn, R. & Metcalfe, R. The impact of behavioral science experiments on energy policy. Econ. Energy Environ. Policy 5, 1–23 (2016).

12. Faruqui, A., Arritt, K. & Sergici, S. The impact of advanced metering infrastructure on energy conservation: a case study of two utilities. Electr. J.

30, 56–63 (2017).

13. Stern, P. C. A reexamination on how behavioral interventions can promote household action to limit climate change. Nat. Commun. 11, 918 (2020).

14. Dietz, T., Gardner, G. T., Gilligan, J., Stern, P. C. & Vandenbergh, M. P.

Household actions can provide a behavioral wedge to rapidly reduce US carbon emissions. Proc. Natl Acad. Sci. USA 106, 18452–18456 (2009).

15. Faruqui, A. & Sergici, S. Arcturus: international evidence on dynamic pricing.

Electr. J. 26, 55–65 (2013).

16. Karlin, B., Zinger, J. F. & Ford, R. The effects of feedback on energy conservation: a meta-analysis. Psychol. Bull. 141, 1205–1227 (2015).

17. Abrahamse, W., Steg, L., Vlek, C. & Rothengatter, T. A review of intervention studies aimed at household energy conservation. J. Environ. Psychol. 25, 273–291 (2005).

18. Andor, M. A. & Fels, K. M. Behavioral economics and energy conservation – a systematic review of non-price interventions and their causal effects.

Ecol. Econ. 148, 178–210 (2018).

19. Nisa, C. F., Bélanger, J. J., Schumpe, B. M. & Faller, D. G. Meta-analysis of randomised controlled trials testing behavioural interventions to promote household action on climate change. Nat. Commun. 10, 4545 (2019).

20. Delmas, M. A., Fischlein, M. & Asensio, O. I. Information strategies and energy conservation behavior: a meta-analysis of experimental studies from 1975 to 2012. Energy Policy 61, 729–739 (2013).

21. Chen, V. L., Delmas, M. A., Locke, S. L. & Singh, A. Dataset on information strategies for energy conservation: a field experiment in India. Data Brief 16, 713–716 (2018).

22. Wolske, K. S. & Stern, P. C. in Psychology and Climate Change: Human Perceptions, Impacts, and Responses (eds Clayton, S. & Manning, C) 127–160 (Academic Press, 2018); https://doi.org/10.1016/B978-0-12-813130-5.00007-2 23. Ringquist, E. J. Meta-Analysis for Public Management and Policy (Wiley, 2013).

24. van der Linden, S. & Goldberg, M. H. Alternative meta-analysis of behavioral interventions to promote action on climate change yields different

conclusions. Nat. Commun. 11, 3915 (2020).

(8)

25. Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research:

sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168 (2019).

26. Abelson, R. P. A variance explanation paradox: when a little is a lot.

Psychol. Bull. 97, 129–133 (1985).

27. Andor, M. A., Gerster, A., Peters, J. & Schmidt, C. M. Social norms and energy conservation beyond the US. J. Environ. Econ. Manage. 103, 102351 (2017).

28. Bonan, J., Cattaneo, C., d’Adda, G. & Tavoni, M. The interaction of descriptive and injunctive social norms in promoting energy conservation.

Nat. Energy 5, 900–909 (2020).

29. Javaid, A., Creutzig, F. & Bamberg, S. Determinants of low-carbon transport mode adoption: systematic review of reviews. Environ. Res. Lett. 15, 103002 (2020).

30. Bertram, C. et al. COVID-19-induced low power demand and market forces starkly reduce CO2 emissions. Nat. Clim. Change 11, 193–196 (2021).

31. Faruqui, A. & Sergici, S. Household response to dynamic pricing of electricity—

a survey of the empirical evidence. J. Regul. Econ. 38, 193–225 (2010).

32. Faruqui, A. & Palmer, J. The discovery of price responsiveness—a survey of experiments involving dynamic pricing of electricity. Preprint at https://doi.org/10.2139/ssrn.2020587 (2012).

33. Löschel, A., Rodemeier, M. & Werthschulte, M. When nudges fail to scale: field experimental evidence from goal setting on mobile phones. CESifo working paper no. 8485. Preprint at https://doi.org/10.2139/ssrn.3693673 (2020).

34. Callaghan, M. W. & Müller-Hansen, F. Statistical stopping criteria for automated screening in systematic reviews. Syst. Rev. 9, 273 (2020).

35. Haddaway, N. R., Macura, B., Whaley, P. & Pullin, A. S. ROSES RepOrting standards for Systematic Evidence Syntheses: pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps. Environ. Evid. 7, 7 (2018).

36. Callaghan, M., Müller-Hansen, F., Hilaire, J. & Lee, Y. T. NACSOS: NLP Assisted Classification, Synthesis and Online Screening Version v0.1.0 (2020);

https://doi.org/10.5281/zenodo.4121525

37. Nelson, J. P. & Kennedy, P. E. The use (and abuse) of meta-analysis in environmental and natural resource economics: an assessment. Environ.

Resour. Econ. 42, 345–377 (2009).

38. Viechtbauer, W. Conducting meta-analisys in R with metafor package.

J. Stat. Softw. 36, 1–48 (2010).

39. CO2 Emissions from Fuel Combustion: Overview (IEA, 2020); https://www.iea.

org/reports/co2-emissions-from-fuel-combustion-overview

40. Haddaway, N. R. & Macura, B. The role of reporting standards in producing robust literature reviews. Nat. Clim. Change 8, 444–447 (2018).

41. Guidelines for Systematic Review and Evidence Synthesis in Environmental Management Version 4.2 (Collaboration for Enviromental Evidence, 2013);

www.environmentalevidence.org/Documents/Guidelines/Guidelines4.2.pdf 42. Haddaway, N. R. Open Synthesis: on the need for evidence synthesis to

embrace Open Science. Environ. Evid. 7, 4–8 (2018).

43. Sexton, R. J. & Sexton, T. A. Theoretical and methodological perspectives on consumer response to electricity information. J. Consum. Aff. 21, 238–257 (1987).

44. Gaskell, G. & Pike, R. Residental energy use: an investigation of consumers and conservation strategies. J. Consum. Policy 6, 285–302 (1983).

45. Winett, R. A., Leckliter, I. N., Chinn, D. E., Stahl, B. & Love, S. Q. Effects of television modeling on residential energy conservation. J. Appl. Behav. Anal.

18, 33–44 (1985).

46. Winett, R. A., Love, S. Q. & Kidd, C. The effectiveness of an energy specialist and extension agents in promoting summer energy conservation by home visits. J. Environ. Syst. 12, 61–70 (1982).

47. Fischer, C. Feedback on household electricity consumption: a tool for saving energy? Energy Effic. 1, 79–104 (2008).

48. Abrahamse, W. & Shwom, R. Domestic energy consumption and climate change mitigation. Wiley Interdiscip. Rev. Clim. Change 9, 1–16 (2018).

49. Allcott, H. Social norms and energy conservation. J. Public Econ. 95, 1082–1095 (2011).

50. Allcott, H. & Rogers, T. The short-run and long-run effects of behavioral interventions: experimental evidence from energy conservation. Am. Econ. Rev.

104, 3003–3037 (2014).

Acknowledgements

Funding by the German Ministry for Education and Research (BMBF) from the START (ref. no. 03EK3046B, A.L. and J.C.M.) and Ariadne (ref. no. 03SFK5, J.C.M.) projects is gratefully acknowledged. T.M.K. is supported by a PhD stipend from the Hertie School.

Author contributions

J.C.M., T.M.K. and M.M.Z.D. designed the research. N.R.H., T.M.K. and M.M.Z.D.

developed the literature screening strategy. T.M.K., M.M.Z.D., S.L., A.J. and H.G.

manually screened the literature and collected the data. M.C. performed the machine learning-enabled screening. T.M.K. performed the meta-analysis. T.M.K., J.C.M., F.C., N.K., L.H., A.L. and G.B. analysed the results. T.M.K. and J.C.M. wrote the manuscript with contributions from all authors.

Competing interests

The authors declare no competing interests.

Additional information

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41560-021-00866-x.

Correspondence and requests for materials should be addressed to J.C.M.

Peer review information Nature Energy thanks Paul Stern, Massimo Tavoni and Jeroen van den Bergh for their contribution to the peer review of this work.

Reprints and permissions information is available at www.nature.com/reprints.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

(9)

April 2020 Corresponding author(s): Jan Minx

Last updated by author(s): 25/05/2021

Reporting Summary

Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement

A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided

Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested

A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons

A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings

For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above.

Software and code

Policy information about availability of computer code

Data collection To enable screening of relevant papers, we applied a novel machine learning algorithm using support vector machines to rank the studies in the order of relevance of their abstracts. Data was collected manually from underlying studies.

Data analysis Analysis was done using R (version 4.0.3 2020-10-10) and metafor package (development version Oct 2020). Code available as sumplementary information on Github

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data

Policy information about availability of data

All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:

- Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data

- A description of any restrictions on data availability

All the data collected from underlying studies is available on the Github page. There are no restrictions to availability of this data. The data regarding buildings emissions was taken from IEA (2020), Tracking Buildings report.