Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via ad hoc solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, dupin, that allows for universal event detection from particle trajectory data irrespective of the system details. dupin works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, dupin can serve as a tool in automated and reproducible workflows. We demonstrate the application of dupin using three examples and discuss its applicability to a wider class of problems. Program summaryProgram Title:dupinCPC Library link to program files:https://doi.org/10.17632/kjcn97zc46.1%Developer's repository link::https://github.com/glotzerlab/dupinLicensing provisions: BSD 3-clauseProgramming language: PythonNature of problem: In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.Solution method: We develop a versatile python package called dupin for detecting molecular events and structural transitions in simulation trajectories. dupin's workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In dupin, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package ruptures. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, dupin may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.Additional comments including restrictions and unusual features: Our package, dupin, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.
Read full abstract