In the conventional machine learning approaches, wesometimes have put somewhat unrealistic assumptions fortraining/test data such as sufficiency, ideal statisticalproperty (e.g., Gaussian, i.i.d.), stationary process, com-plete observations (i.d., no missing variable), etc. Theseassumptions ensure the convergence and the universality oflearning algorithms, the stability of system behaviors, andso on. However, considering our daily life, we can easilyfind the cases that violate such assumptions.In face recognition, for example, since human faceshave large variations by expressions, lighting conditions,makeup, hairstyles, and so forth, it is hard to consider allvariations of face in advance. Thus, this makes quitedifficult for a system to ensure the robustness over thespatial and temporal variations of human faces. Actually,conventional face recognition systems can achieve excel-lent performance when tested over a limited set of faceimages. However, it could drop rather drastically whenthey are operated in a practical environment. This isbecause the training set of face images may be eitherinsufficient or inappropriate for future events. Eventhough a large amount of face images are collected whenconstructing a face recognition system, all the variationsthat will happen in future cannot be considered inadvance. Therefore, a realistic solution for this is that asystem is learned incrementally whenever it misclassifiesobjects. Consequently, it is a natural assumption underrealistic environments that face images for training aregiven as a stream.Here is another example of face recognition that weshould consider under a realistic situation. In general, theobjects we encounter in our daily life can be described invarious ways. For example, from a human face, we rec-ognize multiple features such as name, gender, age, healthcondition, etc. Thus, when we see a person, we might haveto answer the name of the person from his/her face. Insome cases, however, we might have to answer the age,gender or health condition. The right answer depends on acontext we are facing. Generally, it is believed that wehumans learn and perform multiple pattern recognitiontasks in parallel or sequentially. In machine learning, therecognition of a feature (e.g., name, gender, age, etc.) isusually defined as an individual task; that is, the recogni-tion task to answer the name of a person is called personidentification, the recognition tasks to answer the genderand the age are called gender recognition and age esti-mation, respectively. From the above facts, it is natural forus to expect that a learning system can also learn multipletasks like humans (Thrun et al. 1998).There are many situations other than the above that weshould consider in constructing a learning systems underrealistic environments. For example, a part of training datamight have no supervised information (e.g., class labelsand target signals); therefore, a learner should predict theinformation to generate training data on its own. In somecase, data distributions could be changed over time withslow or sudden drifts. And there might be no prior infor-mation on useful features of high-dimensional inputs; then,a learner should select or extract optimal features dynam-ically. Furthermore, training data could also include
Read full abstract