Machine learning-based time-series forecasting has recently been intensively studied. Deep learning (DL), specifically deep neural networks (DNN) and long short-term memory (LSTM), are the popular approaches for this purpose. However, these methods have several problems. First, DNN needs a lot of data to avoid over-fitting. Without sufficient data, the model cannot be generalized so it may not be good for unseen data. Second, impaired data affect forecasting accuracy. In general, one trains a model assuming that normal data enters the input. However, when anomalous data enters the input, the forecasting accuracy of the model may decrease substantially, which emphasizes the importance of data integrity. This paper focuses on these two problems. In time-series forecasting, especially for photovoltaic (PV) forecasting, data from solar power plants are not sufficient. As solar panels are newly installed, a sufficiently long period of data cannot be obtained. We also find that many solar power plants may contain a substantial amount of anomalous data, e.g., 30%. In this regard, we propose a data preprocessing technique leveraging convolutional autoencoder and principal component analysis (PCA) to use insufficient data with a high rate of anomaly. We compare the performance of the PV forecasting model after applying the proposed anomaly detection in constructing a virtual power plant (VPP). Extensive experiments with 2517 PV sites in the Republic of Korea, which are used for VPP construction, confirm that the proposed technique can filter out anomaly PV sites with very high accuracy, e.g., 99%, which in turn contributes to reducing the forecasting error by 23%.