Abstract

Information systems encapsulate digital data although the origin of many sources of information is analog or continuous. Digital signal representation has been widely used since Shannon formulated his sampling theorem. Still, several questions regarding digital information processing remain unsolved. One of the most relevant up-to-date goals is data storage and mining. The purpose of this work is to analyze the relation between continuous signals, or data sources, and their digitized representation. We concentrate on the operation widely used to compare and find similarities in vector representation, the inner-product. Applying the sampling scheme to continuous information sources that do not satisfy Shannon conditions (non band-limited), but with higher enough sampling rate, is assumed to yield only small approximation errors. In this work it is shown, however, that this assumption should be made with much prudence. In some cases, the result is likely to differ from that of the original continuous signals. We provide an analytic estimation to this error of digitization, and several applications are considered, including medical records. Our results provide a quantitative tool for calculating sampling errors, thus affording a useful means in evaluating the ongoing process worldwide of digitizing analog information sources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call