Abstract

Numbers and data mining are easy. Our numerical system counts 10 digits, any combination is possible, and every measured value can be captured in a number. Large quantities of measures can be analysed efficiently using incredibly powerful calculators, and resulting information can be shown is simple clear graphs. Text is hard. Hundreds of letters and millions of different combinations can be used in the personal interpretation of information, in words and phrases that reflect one's personality rather than objective measurements. Depending on context and language, the same expression carries totally different information, or no meaning at all. Text Mining requires 'education' at different levels: for providing information, to capture, to store and to retrieve that information, and to interpret results of the mining process. I will provide a few examples of a few text mining tools in daily practice.

Highlights

  • Pitfalls in applying text mining to scientific literature

  • Our numerical system counts 10 digits, any combination is possible, and every measured value can be captured in a number

  • Hundreds of letters and millions of different combinations can be used in the personal interpretation of information, in words and phrases that reflect one’s personality rather than objective measurements

Read more

Summary

Introduction

Pitfalls in applying text mining to scientific literature From Workshop on Advances in Bio Text Mining Ghent, Belgium. Our numerical system counts 10 digits, any combination is possible, and every measured value can be captured in a number.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.