Interpretable and sample-efficient machine learning enables the joint optimization of energy efficiency and indoor comfort in residential buildings

Interpretable and sample-efficient machine learning enables the joint optimization of energy efficiency and indoor comfort in residential buildings

More than half of the world's population lives in cities, which consume about 75% of global energy and produce 80% of greenhouse gas emissions [1]. In China, the building sector is the second largest energy consumer, with residential buildings accounting for over 38% of total energy consumption [2]. Space heating and cooling loads dominate annual operational energy consumption, particularly in climates with harsh winters or hot, humid summers [3]. This excessive energy demand is largely due to suboptimal building morphology and poorly controlled heat transfer mechanisms at the envelope level during the design phase. Therefore, improving thermal performance through early optimization of architectural form is crucial for achieving energy-efficient and climate-adaptive building design.

According to the standard of climatic regionalization for architecture, China is divided into five main climatic zones: severe cold, cold, hot summer and cold winter, hot summer and warm winter, and mild regions [4]. This climatic diversity requires housing design strategies that are highly responsive to local environmental conditions. The architectural form, defined by orientation, roof pitch, floor height and window geometry, directly determines the thermal boundary conditions of the building and its ability to resist or utilize heat flows. Adjustments to these parameters affect heat gain/loss through conduction, radiation, and infiltration, influencing indoor air conditioning loads throughout the year. Therefore, early design decisions play a central role in determining long-term thermal energy consumption and indoor comfort [[5], [6], [7]]. Traditional simulation-based approaches to assessing thermal performance are computationally intensive, especially when considering multi-objective trade-offs (e.g. between heating/cooling demand and indoor comfort). While ML techniques have emerged as faster alternatives for predicting building performance, most existing applications rely on black box models that lack physical interpretability and design transparency [8,9]. This limits their usefulness as a guide for thermal design strategies, where it is important to understand the influence of each morphological parameter on heating and cooling loads.

To address these limitations, this study proposes a sampling-efficient and interpretable ML framework for early optimization of building thermal performance. The method integrates SHAP, LOWESS smoothing, XGBoost regression, and NSGA-II optimization to balance three core objectives: reducing annual heating and cooling energy requirements, improving adequate indoor daylighting, and minimizing thermal discomfort. By embedding SHAP-LOWESS into the interpretation pipeline and leveraging learning curve analysis to reduce data requirements by over 60%, the proposed framework provides a transparent, computationally efficient and scalable solution. This integrated approach not only accelerates the optimization process, but also provides actionable design insights that are often missing in traditional optimization pipelines, thereby increasing its relevance to real-world architectural practice.

Optimizing thermal performance is central to reducing the heating and cooling energy requirements of buildings. Existing studies have examined multiple strategies in the areas of envelope design, architectural form, and system operation. At the envelope level, facade insulation, energy-efficient glazing and heat-sensitive roofs have been proven to reduce heat transfer and improve indoor thermal stability [[10], [11], [12]]. Architectural design parameters such as window-to-wall ratio (WWR), orientation, floor height, and shading configurations significantly impact thermal loading by influencing solar gain, infiltration, and surface exposure [[13], [14], [15], [16]]. System-level efforts, including improving HVAC control strategies, improving coefficient of performance (COP), and selecting appropriate heating and cooling equipment, also contribute to energy savings [[17], [18], [19]]. However, these interventions are often only carried out after the design phase. In contrast, early optimization of building form offers greater potential for passive heat load reduction and long-term thermal efficiency.

In practical building design, different building types must meet multiple, often contradictory, performance criteria. Unlike single-objective optimization problems, which typically provide a unique optimal solution, MOO involves multiple, often conflicting objectives, making it difficult to identify a generally optimal design. To address this challenge, the concept of Pareto Optimal Solution (POS) was introduced from economic theory [20]. A POS represents a state in which no goal can be improved without compromising another, resulting in a series of equally optimal compromise solutions rather than a single best outcome. Identifying the POS set is therefore the central goal of solving MOO problems.

Current architectural optimization studies are increasingly combining MOO with ML and evolutionary algorithms to improve search efficiency. In particular, genetic algorithms (GA) are widely used to explore large design spaces and generate optimal trade-off solutions for building performance [[21], [22], [23], [24], [25]]. A summary of representative studies on the use of GA-based MOO in architectural design is presented in Table 1.

Among the various algorithms used for MOO, NSGA-II has emerged as one of the most widely used approaches in building performance research. Originally proposed by Srinivas and Deb in 1995, NSGA was developed based on the principles of Pareto optimality and evolutionary computation [40]. To address the limitations of NSGA in terms of computational efficiency and preservation of elitism, Deb et al. Later, NSGA-II was introduced, which significantly improved sorting efficiency, diversity preservation and convergence speed [41,42]. Due to its robust performance and scalability, NSGA-II has been widely applied to complex MOO problems in construction, including building design optimization, energy efficiency trade-offs, and thermal comfort balancing [43,44].

ML algorithms are increasingly being used as an efficient alternative to simulation-based methods for predicting the thermal performance of buildings. With these data-driven approaches, metrics such as energy consumption, thermal comfort and daylight availability can be estimated quickly and with minimal computational effort [[45], [46], [47]]. For example, Santos et al. [48] applied artificial neural networks (ANN) to predict hourly energy consumption without detailed physical modeling. Xie et al. [49] combined neural networks with spatial analysis to predict mean radiant temperature (MRT), supporting layout optimization. Palladino et al. [50] used ANN to estimate PMV with only three design inputs. Xu et al. [51] developed a data-driven framework for predicting energy savings potential using operational data, and Pittarello et al. [52] validated ANN models in different building types.

Despite these advances, most ML applications in architectural design remain “black box” models and provide limited insight into the causal or correlative effects of design features on performance. This lack of interpretability limits their practical utility, as architects and engineers require transparent, explainable models to make informed design decisions. Furthermore, many studies focus on narrow climatic conditions or isolated design variables, which limits their generalizability across building typologies and environmental contexts.

Numerous studies have attempted to optimize building performance during the design phase [53,54]. A review of the literature reveals four key limitations that motivate this study:

  • 1.

    Heavy reliance on simulations: Existing frameworks often rely on detailed physical simulations, which are computationally intensive and inefficient for exploring large design spaces.

  • 2.

    Limited model interpretability: While machine learning has been used to predict performance, most models act as black boxes and provide little insight into the impact of design variables on thermal stresses and comfort.

  • 3.

    Narrow scope of variables: Many studies target isolated components – such as windows or atriums – but neglect broader design parameters such as orientation, building shape or layout and therefore do not capture the full thermal design complexity.

  • 4.

    Limited climatic generalizability: Most models are trained for single climate zones, which limits their ability to adapt to different environmental conditions.

These gaps highlight the need for an optimization framework that is computationally efficient, interpretable, and generalizable across climates—a goal that this study aims to achieve.

To address high simulation costs, limited interpretability, and limited climate adaptability in residential building optimization, this study proposes an interpretable and sample-efficient machine learning framework. By integrating SHAP-LOWESS analysis, learning curve-driven sampling, XGBoost modeling and NSGA-II optimization, the framework enables transparent, fast and robust co-optimization of heating and cooling loads, sufficient daylight and thermal comfort in the early design phase.

The key innovation lies in the synergistic integration of these techniques, which together overcome the key bottlenecks—computational cost, model interpretability, and limited generalizability—that limit practical ML applications in early architectural design. The main contributions are:

  • Integrated Workflow: A novel combination of SHAP-LOWESS interpretability, learning curve-based sampling, XGBoost surrogate modeling, and NSGA-II optimization that represents a significant methodological advance.

  • Sample Efficient Optimization: Demonstrates a paradigm using learning curve analysis that reduces computational cost by over 60% without sacrificing prediction accuracy, enabling ML-based optimization in early design.

  • Transparent design insights: SHAP-LOWESS uncovers nonlinear relationships and threshold effects between design variables and performance metrics, going beyond “black box” predictions.

  • Robust and generalizable framework: Validated in five different climates in China, demonstrating adaptability and eliminating the limitations of context-specific studies.

Leave a comment

Your email address will not be published. Required fields are marked *