Abstract

With the rise in big data, machine learning has become particularly important for solving problems. Machine learning uses two types of techniques: supervised learning and unsupervised learning. Clustering is the most common unsupervised learning technique. Classification and Regression are supervised learning techniques. Clustering algorithms fall into two broad groups: Hard clustering and soft clustering. K-Means, K-Mediods, Hierarchical clustering, Self-organizing Map are some of the hard clustering methods. Fuzzy C- Means, Gaussian Mixture Model are soft clustering methods. In classification problem, the classes may be binary or multiclass. A multiclass classification problem is generally more challenging because it requires a more complex model. Most common classification algorithms include Logistic Regression, k Nearest Neighbor (kNN), Support Vector Machine (SVM), Neural Network, Naïve Bayes, Discriminant Analysis, Decision Tree, Bagged and Boosted Decision Trees. Regression algorithms include Gaussian Process Regression Model, SVM Regression, Generalized Linear Model and Regression Tree. Depends on the application, some problems require pre-processing and optimization. Real-world datasets can be messy, incomplete and in a variety of formats. Hence Pre-processing is necessary before solving the problem. Machine learning is an effective method for finding patterns in big datasets. But bigger data brings added complexity. As datasets get bigger, it is essential to reduce the number of features. The three most commonly used dimensionality reduction techniques are: Principal Component Analysis (PCA), Factor Analysis and Nonnegative matrix factorization. The performance of the method apparently increases when machine learning algorithms is used. Selecting a machine learning algorithm is a process of trial and error. The specific characteristics of the algorithms include Speed of training, Memory usage, Predictive accuracy on new data, Transparency or interpretability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.