Abstract
•Built flexible ML pipeline for robust model selection, validation, and explanation•Applied modular ML approach to stepwise empirical material development workflow•Optimized hydrogel bioblocks, granular matrices, complex rheology, and extrudability•Produced data-driven models and extracted human-readable predictive design insights Granular hydrogel matrices are promising for biomedical applications ranging from extrusion-based bioprinting to injectable tissue engineering. However, they remain challenging to design, assemble, and optimize. Each development stage involves multidimensional input-output spaces affected by poorly understood multi-scale, multi-physics phenomena. Here, we demonstrate the utility of a flexible and modular machine learning (ML) approach to advance complex materials in a stepwise fashion. We apply our ML approach to automatically construct, validate, and explain predictive design frameworks for each set of empirical results. These data-driven models allow one to assess each experimental design space and provide condensed design insights extracted from high-dimensional input-output maps. The resulting bioblock materials have broad biomedical applications, yet our approach should be applicable for data-driven advancement of any complex material system. Granular hydrogel matrices have emerged as promising candidates for cell encapsulation, bioprinting, and tissue engineering. However, it remains challenging to design and optimize these materials given their broad compositional and processing parameter space. Here, we combine experimentation and computation to create granular matrices composed of alginate-based bioblocks with controlled structure, rheological properties, and injectability profiles. A custom machine learning pipeline is applied after each phase of experimentation to automatically map the multidimensional input-output patterns into condensed data-driven models. These models are used to assess generalizable predictability and define high-level design rules to guide subsequent phases of development and characterization. Our integrated, modular approach opens new avenues to understanding and controlling the behavior of complex soft materials. Granular hydrogel matrices have emerged as promising candidates for cell encapsulation, bioprinting, and tissue engineering. However, it remains challenging to design and optimize these materials given their broad compositional and processing parameter space. Here, we combine experimentation and computation to create granular matrices composed of alginate-based bioblocks with controlled structure, rheological properties, and injectability profiles. A custom machine learning pipeline is applied after each phase of experimentation to automatically map the multidimensional input-output patterns into condensed data-driven models. These models are used to assess generalizable predictability and define high-level design rules to guide subsequent phases of development and characterization. Our integrated, modular approach opens new avenues to understanding and controlling the behavior of complex soft materials. Granular hydrogel matrices are an emerging class of soft matter that offer several advantages over traditional biomaterials. Composed of discrete, yet densely packed building blocks, these materials are promising for a wide range of biomedical applications.1Riley L. Schirmer L. Segura T. Granular Hydrogels: Emergent Properties of Jammed Hydrogel Microparticles and Their Applications in Tissue Repair and Regeneration. Elsevier Ltd, 2019https://doi.org/10.1016/j.copbio.2018.11.001Crossref Scopus (90) Google Scholar,2Daly A.C. Riley L. Segura T. Burdick J.A. Hydrogel microparticles for biomedical applications.Nat. Rev. Mater. 2020; 5: 20-43https://doi.org/10.1038/s41578-019-0148-6Crossref PubMed Scopus (352) Google Scholar For example, granular matrices composed of hydrogel bioblocks can encapsulate drugs, biologics, and cells,3Newsom J.P. Payne K.A. Krebs M.D. Microgels: modular, tunable constructs for tissue regeneration.Acta Biomater. 2019; 88: 32-41https://doi.org/10.1016/j.actbio.2019.02.011Crossref PubMed Scopus (40) Google Scholar,4McClements D.J. Designing biopolymer microgels to encapsulate, protect and deliver bioactive components: physicochemical aspects.Adv. Colloid Interface Sci. 2017; 240: 31-59https://doi.org/10.1016/j.cis.2016.12.005Crossref Scopus (153) Google Scholar,5Truong N.F. Kurt E. Tahmizyan N. Lesher-Pérez S.C. Chen M. Darling N.J. Xi W. Segura T. Microporous annealed particle hydrogel stiffness, void space size, and adhesion properties impact cell proliferation, cell spreading, and gene transfer.Acta Biomater. 2019; 94: 160-172https://doi.org/10.1016/j.actbio.2019.02.054Crossref Scopus (59) Google Scholar serve as an ink or support matrix for in vitro bioprinting,6Shin M. Song K.H. Burrell J.C. Cullen D.K. Burdick J.A. Injectable and conductive granular hydrogels for 3D printing and electroactive tissue support.Adv. Sci. 2019; 6: 1901229https://doi.org/10.1002/advs.201901229Crossref Scopus (68) Google Scholar,7Highley C.B. Song K.H. Daly A.C. Burdick J.A. Jammed microgel inks for 3D printing applications.Adv. Sci. 2019; 6: 1801076https://doi.org/10.1002/advs.201801076Crossref Scopus (196) Google Scholar,8Highley C.B. Rodell C.B. Burdick J.A. Direct 3D printing of shear-thinning hydrogels into self-healing hydrogels.Adv. Mater. 2015; 27: 5075-5079https://doi.org/10.1002/adma.201501234Crossref PubMed Scopus (666) Google Scholar,9Xin S. Chimene D. Garza J.E. Gaharwar A.K. Alge D.L. Clickable PEG hydrogel microspheres as building blocks for 3D bioprinting.Biomater. Sci. 2019; 7: 1179-1187https://doi.org/10.1039/C8BM01286ECrossref PubMed Google Scholar or be injected into cavities, open wounds, or damaged cardiac tissue for in vivo tissue engineering.10Nih L.R. Sideris E. Carmichael S.T. Segura T. Injection of microporous annealing particle (MAP) hydrogels in the stroke cavity reduces gliosis and inflammation and promotes NPC migration to the lesion.Adv. Mater. 2017; 29: 1606471https://doi.org/10.1002/adma.201606471Crossref Scopus (120) Google Scholar,11Mealy J.E. Chung J.J. Jeong H.H. Issadore D. Lee D. Atluri P. Burdick J.A. Injectable granular hydrogels with multifunctional properties for biomedical applications.Adv. Mater. 2018; 30: e1705912https://doi.org/10.1002/adma.201705912Crossref PubMed Scopus (157) Google Scholar,12Griffin D.R. Weaver W.M. Scumpia P.O. di Carlo D. Segura T. Accelerated wound healing by injectable microporous gel scaffolds assembled from annealed building blocks.Nat. Mater. 2015; 14: 737-744https://doi.org/10.1038/nmat4294Crossref PubMed Scopus (530) Google Scholar,13Béduer A. Bonini F. Verheyen C.A. Genta M. Martins M. Brefie-Guth J. Tratwal J. Filippova A. Burch P. Naveiras O. Braschler T. An injectable meta-biomaterial: from design and simulation to in vivo shaping and tissue induction.Adv. Mater. 2021; 33: 2102350https://doi.org/10.1002/ADMA.202102350Crossref Google Scholar Despite such promise, they remain challenging to design, assemble, and optimize. Individual hydrogel bioblocks must first be generated (e.g., via microfluidics, fragmentation, bulk emulsion) and then consolidated into densely packed granular matrices that exhibit the reversible yielding and shear-thinning behavior required for bioprinting and injectability.1Riley L. Schirmer L. Segura T. Granular Hydrogels: Emergent Properties of Jammed Hydrogel Microparticles and Their Applications in Tissue Repair and Regeneration. Elsevier Ltd, 2019https://doi.org/10.1016/j.copbio.2018.11.001Crossref Scopus (90) Google Scholar,2Daly A.C. Riley L. Segura T. Burdick J.A. Hydrogel microparticles for biomedical applications.Nat. Rev. Mater. 2020; 5: 20-43https://doi.org/10.1038/s41578-019-0148-6Crossref PubMed Scopus (352) Google Scholar,3Newsom J.P. Payne K.A. Krebs M.D. Microgels: modular, tunable constructs for tissue regeneration.Acta Biomater. 2019; 88: 32-41https://doi.org/10.1016/j.actbio.2019.02.011Crossref PubMed Scopus (40) Google Scholar,14Cloitre M. Borrega R. Monti F. Leibler L. Structure and flow of polyelectrolyte microgels: from suspensions to glasses.C. R. Phys. 2003; 4: 221-230Crossref Scopus (82) Google Scholar,15Bonnecaze R.T. Cloitre M. Micromechanics of soft particle glasses.Adv. Polym. Sci. 2010; 236: 117-161https://doi.org/10.1007/12_2010_90Crossref Google Scholar,16Schiller U.D. Krüger T. Henrich O. Mesoscopic modelling and simulation of soft matter.Soft Matter. 2017; 14: 9-26https://doi.org/10.1039/C7SM01711ACrossref Google Scholar,17Pellet C. Cloitre M. The glass and jamming transitions of soft polyelectrolyte microgel suspensions.Soft Matter. 2016; 12: 3710-3720https://doi.org/10.1039/c5sm03001cCrossref PubMed Scopus (89) Google Scholar,18Muir V.G. Qazi T.H. Shan J. Groll J. Burdick J.A. Influence of microgel fabrication technique on granular hydrogel properties.ACS Biomater. Sci. Eng. 2021; 7: 4269-4281https://doi.org/10.1021/ACSBIOMATERIALS.0C01612Crossref Google Scholar Open challenges in this workflow include the scalable and tunable formation of user-defined hydrogel bioblocks, the dynamic evolution of bioblocks and their compaction into densely packed granular matrices, the emergent non-linear rheology of soft granular matrices, and the controlled flow of soft granular matrices in confined geometries.19Coussot P. Rheometry of Pastes, Suspensions, and Granular Materials. John Wiley & Sons, Inc., 2005https://doi.org/10.1002/0471720577Crossref Scopus (494) Google Scholar,20Shewan H. Rheology of Soft Particle Suspensions.2015https://doi.org/10.14264/UQL.2015.533Crossref Google Scholar,21Alzanbaki H. Moretti M. Hauser C.A.E. Engineered microgels—their manufacturing and biomedical applications.Micromachines. 2021; 12: 45https://doi.org/10.3390/MI12010045Crossref Google Scholar,22Scheffold F. Pathways and challenges towards a complete characterization of microgels.Nat. Commun. 2020; 11: 4315https://doi.org/10.1038/s41467-020-17774-5Crossref Scopus (43) Google Scholar,23Stokes J.R. Frith W.J. Rheology of gelling and yielding soft matter systems.Soft Matter. 2008; 4: 1133-1140https://doi.org/10.1039/B719677FCrossref Google Scholar,24Villone M.M. Maffettone P.L. Dynamics, rheology, and applications of elastic deformable particle suspensions: a review.Rheol. Acta. 2019; 58: 109-130https://doi.org/10.1007/S00397-019-01134-2Crossref Google Scholar,25van der Gucht J. Grand challenges in soft matter physics.Front. Phys. 2018; 6: 87https://doi.org/10.3389/FPHY.2018.00087/BIBTEXCrossref Scopus (0) Google Scholar In data-driven modeling (Figure 1A ), supervised machine learning (ML) is applied to material databases to automatically build predictive frameworks directly from the data itself.26de Pablo J.J. Jackson N.E. Webb M.A. Chen L.Q. Moore J.E. Morgan D. Jacobs R. Pollock T. Schlom D.G. Toberer E.S. et al.New frontiers for the materials genome initiative.npj Comput. Mater. 2019; 5 (41–23)https://doi.org/10.1038/s41524-019-0173-4Crossref Scopus (204) Google Scholar,27Liu Y. Zhao T. Ju W. Shi S. Shi S. Shi S. Materials discovery and design using machine learning.J. Materiomics. 2017; 3: 159-177https://doi.org/10.1016/J.JMAT.2017.08.002Crossref Scopus (0) Google Scholar,28Himanen L. Geurts A. Foster A.S. Rinke P. Data-driven materials science: status, challenges, and perspectives.Adv. Sci. 2019; 6: 1900808https://doi.org/10.1002/ADVS.201900808Crossref Scopus (0) Google Scholar,29Agrawal A. Choudhary A. Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science.Apl. Mater. 2016; 4: 053208https://doi.org/10.1063/1.4946894Crossref Scopus (524) Google Scholar Unlike humans, computers can learn arbitrarily complex patterns from heterogeneous and high-dimensional data without pre-defined theoretical frameworks and without a bias toward positive or recent trials. To date, researchers have successfully harnessed this approach at the atomic and molecular levels, often relying on large simulation-derived databases.28Himanen L. Geurts A. Foster A.S. Rinke P. Data-driven materials science: status, challenges, and perspectives.Adv. Sci. 2019; 6: 1900808https://doi.org/10.1002/ADVS.201900808Crossref Scopus (0) Google Scholar,29Agrawal A. Choudhary A. Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science.Apl. Mater. 2016; 4: 053208https://doi.org/10.1063/1.4946894Crossref Scopus (524) Google Scholar,30Butler K.T. Davies D.W. Cartwright H. Isayev O. Walsh A. Machine learning for molecular and materials science.Nature. 2018; 559: 547-555https://doi.org/10.1038/s41586-018-0337-2Crossref PubMed Scopus (1582) Google Scholar By contrast, less attention has been given to using ML for experimental granular-scale soft matter.30Butler K.T. Davies D.W. Cartwright H. Isayev O. Walsh A. Machine learning for molecular and materials science.Nature. 2018; 559: 547-555https://doi.org/10.1038/s41586-018-0337-2Crossref PubMed Scopus (1582) Google Scholar,31Peerless J.S. Milliken N.J.B. Oweida T.J. Manning M.D. Yingling Y.G. Soft matter informatics: current progress and challenges.Adv. Theory Simul. 2019; 2: 1800129https://doi.org/10.1002/ADTS.201800129Crossref Google Scholar,32Zhai C. Li T. Shi H. Yeo J. Discovery and design of soft polymeric bio-inspired materials with multiscale simulations and artificial intelligence.J. Mater. Chem. B. 2020; 8: 6562-6587https://doi.org/10.1039/D0TB00896FCrossref Google Scholar Further, ML tools are often used to map a single input-output space, whereas full material lifecycles involve many potential input-output spaces at many different stages of development.33Li J. Lim K. Yang H. Ren Z. Raghavan S. Chen P.Y. Buonassisi T. Wang X. AI applications through the whole life cycle of material discovery.Matter. 2020; 3: 393-432https://doi.org/10.1016/J.MATT.2020.06.011Abstract Full Text Full Text PDF Google Scholar In experimental optimization (Figure 1B), researchers explore potential design spaces to uncover reliable processing routes and structure, property, and performance insights.34Olson G.B. Computational design of hierarchically structured materials.Science. 1997; 277: 1237-1242https://doi.org/10.1126/SCIENCE.277.5330.1237Crossref Scopus (0) Google Scholar,35Arróyave R. McDowell D.L. Systems Approaches to Materials Design: Past, Present, and Future.Annu. Rev. Mater. Res. 2019; 49: 103-126https://doi.org/10.1146/annurev-matsci-070218-125955Crossref Scopus (36) Google Scholar Links between parameters and outcomes are informally encoded as “expertise” or formally encoded in design plots and mathematical models. Many materials were advanced this way, but there are limitations. First, humans are not adept at handling high-dimensional data, so processes with multiple inputs pose challenges for evaluation.31Peerless J.S. Milliken N.J.B. Oweida T.J. Manning M.D. Yingling Y.G. Soft matter informatics: current progress and challenges.Adv. Theory Simul. 2019; 2: 1800129https://doi.org/10.1002/ADTS.201800129Crossref Google Scholar,33Li J. Lim K. Yang H. Ren Z. Raghavan S. Chen P.Y. Buonassisi T. Wang X. AI applications through the whole life cycle of material discovery.Matter. 2020; 3: 393-432https://doi.org/10.1016/J.MATT.2020.06.011Abstract Full Text Full Text PDF Google Scholar Second, humans are subject to positive-results bias, recency bias, or confirmation bias that can inadvertently distort analysis.26de Pablo J.J. Jackson N.E. Webb M.A. Chen L.Q. Moore J.E. Morgan D. Jacobs R. Pollock T. Schlom D.G. Toberer E.S. et al.New frontiers for the materials genome initiative.npj Comput. Mater. 2019; 5 (41–23)https://doi.org/10.1038/s41524-019-0173-4Crossref Scopus (204) Google Scholar,36Raccuglia P. Elbert K.C. Adler P.D.F. Falk C. Wenny M.B. Mollo A. Zeller M. Friedler S.A. Schrier J. Norquist A.J. Machine-learning-assisted materials discovery using failed experiments.Nature. 2016; 533: 73-76https://doi.org/10.1038/nature17439Crossref PubMed Scopus (799) Google Scholar The omission of negative results, reporting of subsets, or failure to include confounders can skew conclusions. Finally, complex materials like granular bioblock matrices exhibit multi-scale, multi-physics phenomena that are difficult to describe or model, hindering the translation of experimental results into predictive design frameworks.31Peerless J.S. Milliken N.J.B. Oweida T.J. Manning M.D. Yingling Y.G. Soft matter informatics: current progress and challenges.Adv. Theory Simul. 2019; 2: 1800129https://doi.org/10.1002/ADTS.201800129Crossref Google Scholar,33Li J. Lim K. Yang H. Ren Z. Raghavan S. Chen P.Y. Buonassisi T. Wang X. AI applications through the whole life cycle of material discovery.Matter. 2020; 3: 393-432https://doi.org/10.1016/J.MATT.2020.06.011Abstract Full Text Full Text PDF Google Scholar,37Karniadakis G.E. Kevrekidis I.G. Lu L. Perdikaris P. Wang S. Yang L. Physics-informed machine learning.Nat. Rev. Phys. 2021; 3: 422-440https://doi.org/10.1038/s42254-021-00314-5Crossref Scopus (569) Google Scholar Indeed, manual derivation of governing equations at each step of development would be intractable.38Radjai F. Roux J.-N. Daouadji A. Modeling granular materials: century-long Research across scales.J. Eng. Mech. 2017; 143: 04017002https://doi.org/10.1061/(ASCE)EM.1943-7889.0001196Crossref Scopus (53) Google Scholar We posit that data-driven modeling could be coupled with experimental optimization to assess the predictability of soft granular material design spaces and delineate high-level input-output relationships. Hence, we propose to combine a flexible ML workflow with structured empirical results to derive predictive design frameworks at each stage of material development. Specifically, we integrate data-driven modeling with experiments to create extrudable and injectable granular matrices composed of alginate-based bioblocks (Figures 1C–1F). Computationally, we focused on automated tuning and selection of algorithms, rigorous evaluation via multi-metric grouped and nested cross-validation, and simplified predictive maps for human-readable insight into n-dimensional design spaces. Experimentally, we focused on the scalable generation of alginate-based bioblocks (Figure 1C), which are subsequently compacted into dense granular matrices (Figure 1D) with tunable rheological properties (Figure 1E) that facilitate controlled delivery during extrusion or injection (Figure 1F). At each step, we leverage our modular ML approach to (1) assess whether the empirical data structure is learnable and generalizable, and (2) identify the underlying relationships among design, structure, property, and performance outcomes. We find that these ML models facilitate transparent data-driven progression through the material processing pipeline, from initial bioblock assembly to final functional characterization. We expect this integrated approach will be applicable for a broad range of soft and living materials. We converted unstructured experimental results from physical and digital records into structured machine-readable datasets with input design matrices and corresponding output vectors.39Wickham H. Tidy data.J. Stat. Softw. 2014; 59: 1-23https://doi.org/10.18637/JSS.V059.I10Crossref PubMed Scopus (0) Google Scholar We based our computational workflow on tree-based ensemble algorithms, namely random forest (RF) and gradient boosting (GB) (Figure 2A ). These non-parametric algorithms are chosen because they can flexibly handle classification or regression tasks, non-linear relationships, high-dimensional data, and mixed datatypes.40Song Y.Y. Lu Y. Decision tree methods: applications for classification and prediction.Shanghai Arch. Psychiatry. 2015; 27: 130-135https://doi.org/10.11919/J.ISSN.1002-0829.215044Crossref PubMed Google Scholar RF constructs a simple-averaged ensemble in parallel, using random bootstrapped data with random feature subsets.41Breiman L. Random forests.Mach. Learn. 2001; 45: 5-32https://doi.org/10.1023/A:1010933404324Crossref Scopus (64657) Google Scholar GB constructs a weighted-average ensemble in series, using weighted bootstrapped data with all features.42Friedman J.H. Greedy function approximation: a gradient boosting machine.Ann. Statist. 2001; 29: 1189-1232https://doi.org/10.1214/AOS/1013203451Crossref Google Scholar In general, ensembles provide better predictive capacity and stability than single models.43Polikar R. Zhang C. Ma Y. Ensemble learning.Ensemble Machine Learning. 2012; : 1-34https://doi.org/10.1007/978-1-4419-9326-7_1Crossref Google Scholar,44Sagi O. Rokach L. Ensemble learning: a survey.Wiley Interdiscip Rev Data Min Knowl Discov. 2018; 8: e1249https://doi.org/10.1002/WIDM.1249Crossref Scopus (0) Google Scholar,45Dietterich T.G. Ensemble methods in machine learning.in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2000https://doi.org/10.1007/3-540-45014-9_1Crossref Scopus (4389) Google Scholar Given the variability between datasets, base algorithms are tuned for optimal performance on each problem.46Probst P. Wright M.N. Boulesteix A.L. Hyperparameters and tuning strategies for random forest.Wiley Interdiscip Rev Data Min Knowl Discov. 2019; 9: e1301https://doi.org/10.1002/WIDM.1301Crossref Scopus (0) Google Scholar,47Natekin A. Knoll A. Gradient boosting machines, a tutorial.Front. Neurorobot. 2013; 7: 21https://doi.org/10.3389/FNBOT.2013.00021/BIBTEXCrossref PubMed Scopus (0) Google Scholar We applied an automated grid search to appraise hyperparameter configurations and select models with sufficient flexibility to fit signal without overfitting to noise48Claesen M. de Moor B. Hyperparameter search in machine learning.ArXiv. 2015; (Preprint at)https://doi.org/10.48550/arxiv.1502.02127Crossref Google Scholar (Figure 2A). Because our datasets are empirically derived, we must consider issues like repeated measures, batch effects, and uneven sampling.49Roberts D.R. Bahn V. Ciuti S. Boyce M.S. Elith J. Guillera-Arroita G. Hauenstein S. Lahoz-Monfort J.J. Schröder B. Thuiller W. et al.Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure.Ecography. 2017; 40: 913-929https://doi.org/10.1111/ECOG.02881Crossref Scopus (0) Google Scholar,50Soneson C. Gerster S. Delorenzi M. Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.PLoS One. 2014; 9: e100335https://doi.org/10.1371/JOURNAL.PONE.0100335Crossref PubMed Scopus (0) Google Scholar,51Hsieh K. Phanishayee A. Mutlu O. Gibbons P.B. The non-IID data quagmire of decentralized machine learning.in: Proceedings of the 37th International Conference on Machine Learning. 2020https://doi.org/10.5281/zenodo.3676081Google Scholar To rigorously evaluate the model-building process, we applied a nested and grouped cross-validation (CV) procedure49Roberts D.R. Bahn V. Ciuti S. Boyce M.S. Elith J. Guillera-Arroita G. Hauenstein S. Lahoz-Monfort J.J. Schröder B. Thuiller W. et al.Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure.Ecography. 2017; 40: 913-929https://doi.org/10.1111/ECOG.02881Crossref Scopus (0) Google Scholar,52Vabalas A. Gowen E. Poliakoff E. Casson A.J. Machine learning algorithm validation with a limited sample size.PLoS One. 2019; 14: e0224365https://doi.org/10.1371/JOURNAL.PONE.0224365Crossref PubMed Scopus (0) Google Scholar,53Cawley G.C. Talbot N.L.C. On over-fitting in model selection and subsequent selection bias in performance evaluation.J. Mach. Learn. Res. 2010; 11: 2079-2107Google Scholar (Figure 2B). The full dataset is subjected to an “outer” k-fold CV and each outer training fold is subjected to an “inner” k-fold CV. Configured algorithms are trained, scored, and selected via the inner protocol, then top performers are re-fit and scored on unseen outer test data. Because no scoring metric is perfect, we used three different metrics for each problem to obtain more comprehensive estimates of model performance (accuracy, area under the receiver operating characteristic curve [ROC-AUC], F1 for classification; r2, mean absolute error [MAE], median absolute error [AE] for regression).54Caruana R. Niculescu-Mizil A. Data mining in metric space: an empirical analysis of supervised learning performance criteria.in: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’04. 2004https://doi.org/10.1145/1014052Crossref Google Scholar,55Spuler M. Sarasola-Sanz A. Birbaumer N. Rosenstiel W. Ramos-Murguialday A. Comparing metrics to evaluate performance of regression methods for decoding of neural signals.Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2015; 2015: 1083-1086https://doi.org/10.1109/EMBC.2015.7318553Crossref Scopus (26) Google Scholar,56Vishwakarma G. Sonpal A. Hachmann J. Metrics for benchmarking and uncertainty quantification: quality, applicability, and best practices for machine learning in chemistry.Trends Chem. 2021; 3: 146-156https://doi.org/10.1016/J.TRECHM.2020.12.004Abstract Full Text Full Text PDF Scopus (0) Google Scholar To avoid data leakage, we assigned unique IDs for each experiment and used a grouping procedure to ensure the same experiment could not appear in both train and test simultaneously.57Jones D.T. Setting the standards for machine learning in biology.Nat. Rev. Mol. Cell Biol. 2019; 20: 659-660https://doi.org/10.1038/s41580-019-0176-5Crossref PubMed Scopus (44) Google Scholar With a single holdout set, performance estimates may display high variance or optimistic bias (if experimental errors are minimized over time or if the samples are in a densely sampled region of the training distribution).58Webb G.I. Conilione P. Estimating Bias and Variance from Data.2005Google Scholar,59Yadav S. Shukla S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification.in: Proceedings - 6th International Advanced Computing Conference IACC 2016. 2016: 78-83https://doi.org/10.1109/IACC.2016.25Crossref Scopus (317) Google Scholar However, our approach synthetically used all available experiments as holdouts to create a composite score independent of (1) when the trials are conducted, and (2) what the specific trial conditions are. Accordingly, if some trials display significant noise or batch effects, or if a sparsely sampled region is unpredictable, the composite score would be properly penalized (unlike a single holdout that may not capture such variability or failure).58Webb G.I. Conilione P. Estimating Bias and Variance from Data.2005Google Scholar,60Saeb S. Lonini L. Jayaraman A. Mohr D.C. Kording K.P. The need to approximate the use-case in clinical machine learning.GigaScience. 2017; 6: 1-9https://doi.org/10.1093/GIGASCIENCE/GIX019Crossref Google Scholar Thus, we report scores that should reflect the generalizability and reproducibility of our data-driven models for each particular experimental phase. Next, we applied a standard k-fold CV to automatically configure and select top-performing algorithms using the validated modeling process (Figure 2C). These final configurations are trained on all available data and averaged into a final ensemble to further improve stability.43Polikar R. Zhang C. Ma Y. Ensemble learning.Ensemble Machine Learning. 2012; : 1-34https://doi.org/10.1007/978-1-4419-9326-7_1Crossref Google Scholar,44Sagi O. Rokach L. Ensemble learning: a survey.Wiley Interdiscip Rev Data Min Knowl Discov. 2018; 8: e1249https://doi.org/10.1002/WIDM.1249Crossref Scopus (0) Google Scholar,45Dietterich T.G. Ensemble methods in machine learning.in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2000https://doi.org/10.1007/3-540-45014-9_1Crossref Scopus (4389) Google Scholar At this stage, our data-driven models can be deployed directly for predictive analytics (i.e., predicting material outcomes for candidate design inputs) or combinatorial optimization (i.e., searching for optimal design inputs for target material outcomes). However, black-box models and high-dimensional patterns are incomprehensible to human users.61Roscher R. Bohn B. Duarte M.F. Garcke J. Explainable machine learning for scientific insights and discoveries.IEEE Access. 2020; 8: 42200-42216https://doi.org/10.1109/ACCESS.2020.2976199Crossref Scopus (257) Google Scholar,62Kovalerchuk B. Ahmad M.A. Teredesai A. Survey of explainable machine learning with visual and granular methods beyond quasi-explanations.Stud. Comput. Intell. 2021; 937: 217-267https://doi.org/10.1007/978-3-030-64949-4_8Crossref Scopus (15) Google Scholar,63Lipton Z.C. The mythos of model interpretability.Commun. ACM. 2018; 61: 36-43https://doi.org/10.48550/arxiv.1606.03490Crossref Scopus (0) Google Scholar As experimentalists, we desired a human-in-the-loop approach to develop data-driven intuition about the input-output associations in our problem domain, identify promising avenues for further exploration, and explain the models via condensed design summaries.61Roscher R. Bohn B. Duarte M.F. Garcke J. Explainable machine learning for scientific insights and discoveries.IEEE Access. 2020; 8: 42200-42216https://doi.org/10.1109/ACCESS.2020.2976199Crossref Scopus (257) Google Scholar,63Lipton Z.C. The mythos of model interpretability.Commun. ACM. 2018; 61: 36-43https://doi.org/10.48550/arxiv.1606.03490Crossref Scopus (0) Google Scholar Accordingly, we extracted human-readable predictive maps of the design space. First, simplified 2xn synthetic datasets are created by selecting low and high values for each input (within the training distributions) and then generating a matrix containing pairwise combinations (to obtain coverage of the design space) (Figure 2D). Th
Full Text
Topics from this Paper
Granular Matrices
PubMed Scopus
Modular Machine Learning
Google Scholar
Granular Hydrogel
+ Show 5 more
Create a personalized feed of these topics
Get StartedTalk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
iScience
Jan 1, 2021
Matter
Jul 1, 2020
Trends in Pharmacological Sciences
Sep 1, 2021
Physica Medica
Jun 1, 2022
Trends in Chemistry
Feb 1, 2021
The American Journal of Pathology
Feb 1, 2021
Gastroenterology
Aug 1, 2020
Journal of Biological Chemistry
Apr 1, 2003
The Lancet Planetary Health
Jul 1, 2021
iScience
Mar 1, 2022
Journal of Biological Chemistry
Nov 1, 2001
Neuron
Apr 1, 2020
Molecular Therapy - Oncolytics
Dec 1, 2022
Matter
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023
Matter
Nov 1, 2023