A primary concern in the realm of mechanical engineering is to ensure the efficient and effective data entry of hardware devices. Fasteners are mechanical tools that rigidly connect or affix two surfaces or objects together. They are small and often different fasteners might look similar; it is therefore a long and prone-to-risk procedure to manually analyze them to classify and store their related information. With the widespread diffusion of AI frameworks in several domains, equipment manufacturers started to rely on AI technologies for these heavy tasks. Automatically classifying fasteners by type and extracting metadata from natural language questions are important tasks that fastener manufacturers and suppliers encounter. In this paper, we address these challenges. To address the first task, we introduce an augmentation methodology that starts with a small set of 3D models representing each of the 21 types of fasteners we aim to classify. This methodology efficiently generates multiple 2D images from these models. Next, we train a vision transformer using the collected data to address a single-label multi-class classification task. For the second task, we introduce a prompt-engineering technique designed for conversational agents. This technique leverages in-context knowledge to extract (metadata field, value) pairs from natural language questions. Subsequently, we tackle a question-answering task to the description fields of the extracted fasteners. Our evaluation demonstrates the effectiveness of both approaches, surpassing the baselines we tested.
Read full abstract