Abstract
Assessing the impact of chemicals on the environment and addressing subsequent issues are two central challenges to their safe use. Environmental data are continuously expanding, requiring flexible, scalable, and extendable data management solutions that can harmonize multiple data sources with potentially differing nomenclatures or levels of specificity. Here, we present the methodological steps taken to construct a rule-based labeled property graph database, the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph, for potential environmental impact chemicals (PEIC) and its subsequent application harmonizing multiple large-scale databases. The resulting data encompass 16,739 unique PEICs attributed to their corresponding chemical class, stereo-chemical information, valid synonyms, use types, unique identifiers (e.g., Chemical Abstract Service registry number CAS RN), and others. These data provide researchers with additional chemical information for a large amount of PEICs and can also be publicly accessed using a web interface. Our analysis has shown that data harmonization can increase up to 98% when using the MAGIC graph approach compared to relational data systems for datasets with different nomenclatures. The graph database system and its data appear more suitable for large-scale analysis where traditional (i.e., relational) data systems are reaching conceptional limitations.
Highlights
IntroductionWe present the methodological steps taken to construct a rule-based labeled property graph database, the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph, for potential environmental impact chemicals (PEIC) and its subsequent application harmonizing multiple large-scale databases
Science can rely on numerous databases providing these data (Table 1) for potential environmental impact chemicals (PEICs, e.g., pesticides, industrial chemicals, flame retardants, and solvents)
When linking data spatially, some problems typically arise around issues of specificity, i.e., data present at different spatial scales or resolutions, while linking data within chemical dimension is often impeded by the usage of different nomenclatures
Summary
We present the methodological steps taken to construct a rule-based labeled property graph database, the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph, for potential environmental impact chemicals (PEIC) and its subsequent application harmonizing multiple large-scale databases. Service registry number CAS RN), and others. These data provide researchers with additional chemical information for a large amount of PEICs and can be publicly accessed using a web interface. Our analysis has shown that data harmonization can increase up to 98% when using the MAGIC graph approach compared to relational data systems for datasets with different nomenclatures. The graph database system and its data appear more suitable for large-scale analysis where traditional (i.e., relational) data systems are reaching conceptional limitations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.