Abstract

Mathematical formula identification is an important part of mathematical formula recognition and retrieval. It is more difficult for extracting formulas from the document images in PDF files because of the diversity of their acquisition ways. To solve the problem, this paper designs a method of mathematical formula identification in English PDF document images, which includes three steps: judging columns, extracting mathematical formula character blocks, merging mathematical formula character blocks. Through analyzing and concluding characteristics of the document images in PDF files as well as its effects on mathematical formula identification, this paper designs a related parameter adjustment algorithm for avoiding influences on the performance of mathematical formula identification caused by the resolution variation. The experimental result shows that the adaptability of mathematical formula identification algorithm is improved by some applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.