During the semiconductor manufacturing process, predicting the yield of the semiconductor is an important problem. Early detection of defective product production in the manufacturing process can save huge production cost. The data generated from the semiconductor manufacturing process have characteristics of non-normal distributions, random missing patterns and high missing rate, which complicate the prediction of the yield. We propose the Dirichlet Process -Naive Bayes model (DPNB) that can simultaneously impute missing values and address classification problems. Since the DPNB is based on the infinite Gaussian mixture model, it can estimate complex data distributions and make predictions for missing datasets with some missing patterns due to nice properties of the Gaussian distribution. The DPNB also performs well for high missing rates since it uses all information of observed components. Experiments on various real datasets including semiconductor manufacturing data show that the DPNB has better performance than state-of-the-art methods in terms of predicting missing values and target variables as percentage of missing values increases.
Read full abstract