Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table 1. As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results.
Read full abstract