Classification of Document points towards associating one or more predefined categories based on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays an important role in training the system with predefined categories. The importance of Machine learning approach has felt because of which the study has been taken up for text classification based on the statistical event models available. The aim of this paper is to present the important techniques and methodologies that are employed for text documents classification, at the same time making awareness of some of the interesting challenges that remain to be solved, focused mainly on text representation and machine learning techniques. Keywords: mining, Web mining, Documents classification, Information retrieval, Event models. I. Introduction With the rapid growth of the World Wide Web and increasing availability of electronic documents, the task of automatic categorization of documents became important for organizing the information and knowledge discovery. Proper categorization of electronic documents, online news, blogs, e-mails and digital libraries requires text mining, machine learning and natural language processing techniques to extract required knowledge information. The term Text document refers to written, printed, or online that presents or communicates narrative or tabulated data in the form of an article, letter, memorandum, report, etc. The expresses a vast range of information, but encodes the information in the form that is difficult to decipher automatically. In the existing online word huge amount of textual information is available in textual form in databases and various sources. The information may be available in structured and unstructured form. Unstructured means data that does not reside in fixed locations. The term generally refers to free-form text, which is present everywhere. Data that resides in fixed fields within a record or file that data is termed as a structured data. Relational databases and spreadsheets are examples of structured data. In reality a large portion of the available information does not appear in structured databases but rather in collections of text articles drawn from various sources. Unstructured information refers to computerized information that either does not have a data model or the one that is not easily usable by a computer program. The term distinguishes such information from data stored in field form in databases or annotated in documents. However, data mining deals with structured data, whereas text presents special characteristics and is unstructured. The important task is how these documented data can be properly retrieved, presented and classified. Extraction, Integration and classification of electronic documents from different sources and knowledge information discovery from these documents are important. In data mining, Machine learning is often used for Prediction or Classification. Classification involves finding rule that partition the data into disjoint groups. The input for the classification is the training data set, whose class labels are already known. Classifications analyze the training data set and construct a model based on the class label. The goal of classification is to build a set of models that can correctly predict the class of the different objects. Machine learning is an area of artificial intelligence concerned with the development of techniques which allow computers to learn. More specifically, machine learning is a method for creating computer programs by the analysis of data sets since machine learning study the analysis of data. Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine. Human intuition cannot be entirely eliminated since the designer of the system must specify how the data are to be represented and what mechanisms will be used to search for a characterization of the data. Machine learning has a wide spectrum of applications including search engines, medical diagnosis, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, game playing and robot locomotion.