Con-temporarily, information data has become the cornerstone of every company’s decision-making. In a vast flow of information, choosing the right data is the first step in developing successful predictions. After the determinations of the requirements, analysis purpose and prediction direction, outlier processing, missing value processing and repeated value processing are usually encountered. This paper introduces the limitations, advantages and disadvantages of different methods in application in detail. At the same time, this paper introduces some interpolation methods based on mathematical statistics, such as thermal interpolation, Lagrange interpolation and Newton interpolation. At the same time, it also provides the normal distribution processing method which is better in dealing with outlier problems, and the popular K-nearest neighbor algorithm. Finally, it illustrates the logic diagram of data cleaning in the data preparation stage. Overall, these results offer a guideline for selecting the appropriate treatment in the corresponding situation during data cleaning process.