Abstract
The aim of this paper is to find an accurate and efficient algorithm for evaluating the summation of large sets of floating-point numbers. We present a new representation of the floating-point number system in which a number is represented as a linear combination of integers and the coefficients are powers of the base of the floating-point system. The approach allows to build up an accurate floating-point summation algorithm based on the fact that no rounding error occurs whenever two integer numbers are summed or a floating-point number is multiplied by powers of the base of the floating-point system. The proposed algorithm seems to be competitive in terms of computational effort and, under some assumptions, the computed sum is greatly accurate. With such assumptions, less-conservative in the practical applications, we prove that the relative error of the computed sum is bounded by the unit roundoff.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.