Abstract
There is an increasing need to digitally preserve and provide access to historical document collections residing (and possibly decaying) in libraries, museums and archives. Documents range from ancient manuscripts, through early printed books, to typewritten administrative documents of the twentieth century. A common thread is that the documents are typically valued for their physical appearance as much as their content. The documents to be analysed can be originals (paper, parchment, etc.) or in image form (already scanned, possibly using now outdated technology). The key requirement is to be able to process these unique manuscripts, whether they are presented as free flowing text (e.g., treatises and novels) or structured at different levels of physical-logical structure correspondence (e.g., letters, census lists, trade forms). Degradation may be caused by a lifetime of use and physical deterioration. In addition to the original content, access must also be preserved to user annotations and corrections, stamps and unique artwork. Each class of document requires a different approach throughout the conversion process and lends itself to different levels of information extraction and description. As the application of existing technology to the analysis of historical documents exposes a myriad of weaknesses, novel and more robust methods are being developed to cope with this challenging problem. The issues involved in the analysis of historical documents are highly topical, as is evident from
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Document Analysis and Recognition (IJDAR)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.