In smart manufacturing, human-cyber-physical systems host digital twins and IoT-based networks. The networks weave manufacturing enablers such as CNC machine tools, robots, CAD/CAM systems, process planning systems, enterprise resource planning systems, and human resources. The twins work as the brains of the enablers; that is, the twins supply the required knowledge and help enablers solve problems autonomously in real-time. Since surface roughness is a major concern of all manufacturing processes, twins to solve surface roughness-relevant problems are needed. The twins must machine-learn the required knowledge from the relevant datasets available in big data. Therefore, preparing surface roughness-relevant datasets to be included in the human-cyber-physical system-friendly big data is a critical issue. However, preparing such datasets is a challenge due to the lack of a steadfast procedure. This study sheds some light on this issue. A state-of-the-art method is proposed to prepare the said datasets for surface roughness, wherein each dataset consists of four segments: semantic annotation, roughness model, simulation algorithm, and simulation system. These segments provide input information for digital twins’ input, modeling, simulation, and validation modules. The semantic annotation segment boils down to a concept map. A human- and machine-readable concept map is thus developed where the information of other segments (roughness model, simulation algorithm, and simulation system) is integrated. The delay map of surface roughness profile heights plays a pivotal role in the proposed dataset preparation method. The successful preparation of datasets of surface roughness underlying milling, turning, grinding, electric discharge machining, and polishing shows the efficacy of the proposed method. The method will be extended to the manufacturing processes in the next phase of this study.