Talking carbon: a lexical approach to predictive embodied carbon analysis via machine learning
ABSTRACT Sustainable architecture faces significant challenges, particularly during the early design stages where critical decisions often lack sufficient detail for traditional environmental analysis. This paper presents the first development and use of an artificial intelligence-based tool designed to predict embodied carbon emissions from high-level natural human language descriptions of buildings. The new approach combines a Histogram-based Gradient Boosting regression model with a multi-step system of Natural Language Processing techniques to convert complex, unstructured text into structured features and quantities suitable for predictive modelling. The work rests on a foundation of 150,000 new synthetic training samples, generated by systematically randomising building specifications. Evaluation of the method’s performance was based on four strands: extraction sensitivity, relative accuracy, linguistic robustness, and usability. In tests of extraction sensitivity, the method successfully identified core structural and external elements over 80% of the time. Relative accuracy assessments with seven real-world buildings revealed a Spearman’s rank correlation of 0.71, confirming the system’s ability to identify differences in carbon-intensity. Linguistic robustness was proven by describing identical buildings in multiple ways, with predicted values differing by only 10%. A user study of 43 industry professionals produced a System Usability Scale score of 84.74. This reflects a strong acceptance of the method and the potential for its integration into existing workflows, emphasising its promise as an approach for advancing architectural practice. Collectively, these outcomes highlight the success of this new approach for embodied-carbon assessment and underscore the potential of AI-enabled insight in sustainable design.
- Research Article
3
- 10.1109/tpami.2023.3333949
- Feb 1, 2024
- IEEE Transactions on Pattern Analysis and Machine Intelligence
When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them. A number of approaches have been proposed to produce synthetic parallel sentences that are similar to those in the parallel data available. These approaches work under the assumption that non-fluent target-side synthetic training samples can be harmful and may deteriorate translation performance. Even so, in this paper we demonstrate that synthetic training samples with non-fluent target sentences can improve translation performance if they are used in a multilingual machine translation framework as if they were sentences in another language. We conducted experiments on ten low-resource and four high-resource translation tasks and found out that this simple approach consistently improves translation performance as compared to state-of-the-art methods for generating synthetic training samples similar to those found in corpora. Furthermore, this improvement is independent of the size of the original training corpus, the resulting systems are much more robust against domain shift and produce less hallucinations.
- Dissertation
- 10.1184/r1/16623154.v1
- Sep 15, 2021
Synthetic 3D Building Energy Model (BEM) Dataset Generation for Human + AI Synergies in Early-Phase High Performance Building Design
- Conference Article
1
- 10.69997/sct.141240
- Jul 9, 2024
Initial design stages are inherently complex and often lack comprehensive information, posing challenges in evaluating sustainability metrics. Machine Learning (ML) emerges as a valuable solution to address these challenges. ML algorithms, particularly effective in predicting environmental impacts of new chemicals with limited data, enable more informed decisions in sustainable design. This study focuses on employing ML for predicting the environmental impacts related to human health, ecosystem quality, climate change, and resource utilization to aid in early-stage environmental impact assessment of chemical processes. The effectiveness of the ML algorithm, eXtreme Gradient Boosting (XGBoost) tested using a dataset of 350 points, divided into training, testing, and validation sets. The study also includes a practical application of the model in a cradle-to-cradle LCA of N-Methylpyrrolidone (NMP), demonstrating its utility in sustainable chemical process design. This approach signifies a significant advancement in the early stages of process design, highlighting the potential of ML in enhancing environmental sustainability in the chemical industry.
- Research Article
101
- 10.1371/journal.pone.0030412
- Jan 19, 2012
- PLoS ONE
BackgroundElectronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually.AimTo develop an algorithm to identify relevant free texts automatically based on labelled examples.MethodsWe developed a novel machine learning algorithm, the ‘Semi-supervised Set Covering Machine’ (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our ‘Freetext Matching Algorithm’ natural language processor.ResultsOnly 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%).ConclusionsOur novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
- Research Article
83
- 10.1016/j.scs.2019.101596
- May 22, 2019
- Sustainable Cities and Society
Assessing environmental performance in early building design stage: An integrated parametric design and machine learning method
- Conference Article
- 10.52842/conf.ecaade.2014.1.227
- Jan 1, 2014
In architectural design, computer aided design tools have an important impact on design process, but still early design stage and sustainable design are problematic issues.During sustainable architectural design process, the designer needs to comply with some regulations, which requires calculations and comparisons.Green building certification systems are developed to assist designers during this complicated process, but for an efficient sustainable design for different regions, environmental information and local building codes must be considered with green building certification system criteria.In this paper, LEED and BREEAM certification systems are going to be considered as being the most representative building environment assessment schemes that are in use.As there are conflicting criteria's according to LEED and BREAM sustainable site parameters, local building codes and environmental conditions; an efficient decision support system can be developed by using multi-objective genetic algorithm.This paper presents an effective site-use multi-objective optimization model that use pareto genetic algorithm to determine the most efficient sustainable site layout design for social housing, which could assist designers in the early stage of design process.
- Conference Article
3
- 10.1145/3508230.3508250
- Dec 17, 2021
Extracting causality information from unstructured natural language text is a challenging problem in natural language processing. However, there are no mature special causality extraction systems. Most people use basic sequence labeling methods, such as BERT-CRF model, to extract causal elements from unstructured text and the results are usually not well. At the same time, there is a large number of causal event relations in the field of finance. If we can extract enormous financial causality, this information will help us better understand the relationships between financial events and build related event evolutionary graphs in the future. In this paper, we propose a causality extraction method for this question, named CBCP (Center word-based BERT-CRF with Pattern extraction), which can directly extract cause elements and effect elements from unstructured text. Compared to BERT-CRF model, our model incorporates the information of center words as prior conditions and performs better in the performance of entity extraction. Moreover, our method combined with pattern can further improve the effect of extracting causality. Then we evaluate our method and compare it to the basic sequence labeling method. We prove that our method performs better than other basic extraction methods on causality extraction tasks in the finance field. At last, we summarize our work and prospect some future work.
- Research Article
73
- 10.1093/jamia/ocab236
- Dec 13, 2021
- Journal of the American Medical Informatics Association
ObjectiveTo determine the effects of using unstructured clinical text in machine learning (ML) for prediction, early detection, and identification of sepsis.Materials and methodsPubMed, Scopus, ACM DL, dblp, and IEEE Xplore databases were searched. Articles utilizing clinical text for ML or natural language processing (NLP) to detect, identify, recognize, diagnose, or predict the onset, development, progress, or prognosis of systemic inflammatory response syndrome, sepsis, severe sepsis, or septic shock were included. Sepsis definition, dataset, types of data, ML models, NLP techniques, and evaluation metrics were extracted.ResultsThe clinical text used in models include narrative notes written by nurses, physicians, and specialists in varying situations. This is often combined with common structured data such as demographics, vital signs, laboratory data, and medications. Area under the receiver operating characteristic curve (AUC) comparison of ML methods showed that utilizing both text and structured data predicts sepsis earlier and more accurately than structured data alone. No meta-analysis was performed because of incomparable measurements among the 9 included studies.DiscussionStudies focused on sepsis identification or early detection before onset; no studies used patient histories beyond the current episode of care to predict sepsis. Sepsis definition affects reporting methods, outcomes, and results. Many methods rely on continuous vital sign measurements in intensive care, making them not easily transferable to general ward units.ConclusionsApproaches were heterogeneous, but studies showed that utilizing both unstructured text and structured data in ML can improve identification and early detection of sepsis.
- Research Article
2
- 10.2352/ei.2022.34.5.mlsi-202
- Jan 16, 2022
- Electronic Imaging
Limited-angle X-ray tomography reconstruction is an ill-conditioned inverse problem in general. Especially when the projection angles are limited and the measurements are taken in a photon-limited condition, reconstructions from classical algorithms such as filtered backprojection may lose fidelity and acquire artifacts due to the missing-cone problem. To obtain satisfactory reconstruction results, prior assumptions, such as total variation minimization and nonlocal image similarity, are usually incorporated within the reconstruction algorithm. In this work, we introduce deep neural networks to determine and apply a prior distribution in the reconstruction process. Our neural networks learn the prior directly from synthetic training samples. The neural nets thus obtain a prior distribution that is specific to the class of objects we are interested in reconstructing. In particular, we used deep generative models with 3D convolutional layers and 3D attention layers which are trained on 3D synthetic integrated circuit (IC) data from a model dubbed CircuitFaker. We demonstrate that, when the projection angles and photon budgets are limited, the priors from our deep generative models can dramatically improve the IC reconstruction quality on synthetic data compared with maximum likelihood estimation. Training the deep generative models with synthetic IC data from CircuitFaker illustrates the capabilities of the learned prior from machine learning. We expect that if the process were reproduced with experimental data, the advantage of the machine learning would persist. The advantages of machine learning in limited angle X-ray tomography may further enable applications in low-photon nanoscale imaging.
- Research Article
- 10.21608/jes.2018.35442
- Dec 1, 2018
- Journal of Environmental Science
This paper offers a conceptual frame work for sustainable space design optimization problems and objectives on which space optimization model is built. The study is based on understanding the philosophy of optimization, its problems and objectives; then classifying them into logical groups on which optimization model frame work can be created. This frame work links classical design optimization problems to sustainable design goals through space as being the smallest building design component. The research starts with defining general building design optimization problems and sustainable design goals specifically green design practice as being the most propagating sustainable practice approach , then it moves forward to make a projections of these two subjects on space design. This framework offers a wider vision for the meaning of sustainable space. The most common understanding of sustainable space is related to the enhancement of indoor environmental Quality. This can be recognized though different green rating and assessment methods. Through this model more dimensions can be added to improve the efficiency sustainable space design. The importance of this research work is that it provides the conceptual approach for the optimization model which can be enhanced mathematically and implemented practically through future research work. It also give awareness for more dimensions that can be taken in consideration at early design stages .This will have a positive impact on space design quality. Also this conceptual frame work can be a guide line for architects to customize their space design to fit space optimization problems. This will generally enhance sustainable design movement.
- Research Article
- 10.1002/sd.70134
- Aug 5, 2025
- Sustainable Development
ABSTRACTResearch examining the role of life cycle assessment (LCA) in sustainable product design (SPD) has experienced substantial growth in recent years. This article employs bibliometric network techniques to analyze this domain based on 1567 Web of Science (WoS) and Scopus documents. The documents were written by 4366 authors from 69 countries over two decades (2003–2023). The findings reveal that the most relevant journals in this field are the Journal of Cleaner Production, International Journal of Life Cycle Assessment, Journal of Industrial Ecology, Sustainability, and Building and Environment. The thematic evolution timeline shows key trends in sustainable design, sustainability assessment, early design stages, building design, and global warming potential. This study provides valuable insights for scholars by offering a comprehensive analysis of LCA and SPD research, highlighting important trends, citation networks, and thematic maps. It establishes a robust framework for advancing sustainability knowledge and practices within the LCA and SPD domains.
- Conference Article
61
- 10.1109/wacv45572.2020.9093425
- Mar 1, 2020
In this work, we argue that conditioning on the natural language (NL) description of a target provides information for longer-term invariance, and thus helps cope with typical tracking challenges. However, deriving a formulation to combine the strengths of appearance-based tracking with the language modality is not straightforward. Therefore, we propose a novel deep tracking-by-detection formulation that can take advantage of NL descriptions. Regions that are related to the given NL description are generated by a proposal network during the detection stage of the tracker. Our LSTM based tracker then predicts the update of the target from regions proposed by the NL based detection stage. Our method runs at over 30 fps on a single GPU. In benchmarks, our method is competitive with state of the art trackers that employ bounding boxes for initialization, while it outperforms all other trackers on targets given unambiguous and precise language annotations. When conditioned on NL descriptions only, our model doubles the performance of the previous best attempt [25].
- Research Article
3
- 10.1017/s0890060424000118
- Jan 1, 2024
- Artificial Intelligence for Engineering Design, Analysis and Manufacturing
Due to their significant role in creative design ideation, databases of causal ontology-based models for biological and technical systems have been developed. However, creating structured database entries through system models using a causal ontology requires the time and effort of experts. Researchers have worked toward developing methods that can automatically generate representations of systems from documents using causal ontologies by leveraging machine learning (ML) techniques. However, these methods use limited, hand-annotated data for building the ML models and have manual touchpoints that are not documented. While opportunities exist to improve the accuracy of these ML models, more importantly, it is required to understand the complete process of generating structured representations using causal ontology. This research proposes a new method and a set of rules to extract information relevant to the constructs of the SAPPhIRE model of causality from descriptions of technical systems in natural language and report the performance of this process. This process aims to understand the information in the context of the entire description. The method starts by identifying the system interactions involving material, energy and information and then builds the causal description of each system interaction using the SAPPhIRE ontology. This method was developed iteratively, verifying the improvements through user trials in every cycle. The user trials of this new method and rules with specialists and novice users of the SAPPhIRE modeling showed that the method helps in accurately and consistently extracting the information relevant to the constructs of the SAPPhIRE model from a given natural language description.
- Conference Article
1
- 10.2495/arc060191
- Jun 7, 2006
Long-term sustainability—including maintenance, operation, and life cycle cost analysis—should start during the concept design stage where most critical decisions are determined. This paper provides advice for owners, facility managers, and designers to optimize sustainable design options. Furthermore, by front ending the costs for implementing those design options, owners return on their investments are likely to be long term, they are likely to have reduced operational maintenance costs, and are likely to have an increase in global energy conservation. This paper proposes how to formalize a “Sustainable Methodology” (SM) to facilitate effective contributions by decision makers during the early concept design stage of a facility development project. The SM framework has five phases: input, evaluation, summarization, synthesizing, and output. These phases are initiated by the owner’s preliminary architectural program and sustainable design goals, starting with site planning. The site planning elements are climate (macroand micro-climate), orientation, use, function, shape/form, and surrounding (landscaping and buildings). The SM framework evaluates planning elements and suggests implementation options in harmony with environmental sustainability objectives. In addition, this paper describes how the SM framework was tested on a multi-story mixed-use development project during its site planning. Further studies can extend the SM framework to include other aspects of facility design such as envelope, structure, services, and space planning.
- Conference Article
28
- 10.1145/3417990.3421385
- Oct 16, 2020
Domain modelling transforms domain problem descriptions written in natural language (NL) into analyzable and concise domain models (class diagrams) during requirements analysis or the early stages of design in software development. Since the practice of domain modelling requires time in addition to modelling skills and experience, several approaches have been proposed to automate or semi-automate the construction of domain models from problem descriptions expressed in NL. Despite the existing work on domain model extraction, some significant challenges remain unaddressed: (i) the extracted domain models are not accurate enough to be used directly or with minor modifications in software development, (ii) existing approaches do not facilitate the tracing of the rationale behind the modelling decisions taken by the model extractor, and (iii) existing approaches do not provide interactive interfaces to update the extracted domain models. Therefore, in this paper, we introduce a domain modelling bot called DoMoBOT, explain its architecture, and implement it in the form of a web-based prototype tool. The bot automatically extracts a domain model from a problem description written in NL with an accuracy higher than existing approaches. Furthermore, the bot enables modellers to update a part of the extracted domain model and in response the bot re-configures the other parts of the domain model pro-actively. To improve the accuracy of extracted domain models, we combine the techniques of Natural Language Processing and Machine Learning. Finally, we evaluate the accuracy of the extracted domain models.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.