Abstract

Abstract A procedure for identifying discoveries in the biomedical sciences is described that makes use of citation context information, or more precisely citing sentences, drawn from the PubMed Central database. The procedure focuses on use of specific terms in the citing sentences and the joint appearance of cited references. After a manual screening process to remove non-discoveries, a list of over 100 discoveries and their associated articles is compiled and characterized by subject matter and by type of discovery. The phenomenon of multiple discovery is shown to play an important role. The onset and timing of recognition of the articles are studied by comparing the number of citing sentences with and without discovery terms, and show both early onset and delays in recognition. A comparative analysis of the vocabularies of the discovery and non-discovery sentences reveals the types of words and concepts that scientists associate with discoveries. A machine learning application is used to efficiently extend the list. Implications of the findings for understanding the nature and justification of scientific discoveries are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call