CSE 6363: Machine Learning

Outline of covered topics :

Revisit classifers kNN, Centroid, probabilistic classification, Baye's errors

Linear Regression, 2-class and multi-class, Ridge regression.

Logistic Regression and softmax regression

Decision Trees, ensemble learning, and random forest

Revisit Support Vector Mchines and kernel methods, indepth analysis

Analyze K-means clustering from many different point of view.

Gaussion mixture model, EM algorithm

Principal component analysis for dimension reduction and subspace learning

Linear discriminant analysis (LDA) for supervised dimension reduction

Feature selections

Multidimensional scaling and graph-embeding.

Semi-supervised learning

Cover Sparse coding

In the last 3 weeks, cover neural network/deep learning. The emphasis is practical algorithms, coding, data sets.

There are 4 projects. They are a coherent mini package for machine learning
The following are the topics that we hope the students are already familiar:
-----------------------------------------------------------------------------

Week 1.
We start with three concrete examples:
1. Data Mining example: Market basket Data analysis
2. Pattern Recognition example: Handwritten letters recognition
3. Cancer prediction using DNA expressions recorded on microarrays

From these examples, key ideas, concepts and methods will be introduced.
Data mining uses many techniques from Machine Learning and Pattern Recognition.

Week 2.
- Brief Introduction to Information Retrieval, word processing, vector space model
- Naive Bayes Classification

Week 3.
Classification, decision boundary, Bayes classifier
-k Nearest Neighbors (kNN)
- Centroid Method
- Linear Regression

Project 1. Classification using kNN, Centroid method, Linear Regression
Due Feb 24

Week 4.
- Support Vector Machine
- Multi-class classification using binary classifiers
- Kernels (Gussian, polynomial)
- Evaluation of classifiers: Precision, Recall, cross-validation

Project 2. Split data into training and testing. Run kNN, Centroid method, Linear Regression. Run SVM using linear and Guassian kernels. Do 5-fold cross-validation.
Due March 9

Week 5 - 6.
Clustering
- K-means clustering
- Gaussin Mixture Model and EM Algorithm

Project 3. Clustering using K-means, use data in Project 1.
Due March 30

Week 7.
Data types, preprocessing, normalization, etc

Week 8 - 9.
Feature Selection
- t-statistic, f-statisic
- mutual information
- mininum redundency, maximun relevance
- filters, wrappers, feature set selection

Project 4a. Using f-statistic to select features. Run kNN, centroids, linRegression, SVM. Run K-means on selected data.

Week 10 - 11.
Dimension Reduction
- principle component analysis
- linear discriminant analysis

Project 4b. Run PCA and LDA on data to obtain low-dimensional representatin. Run kNN, centroids, linRegression, SVM. Run K-means on selected data.

Week 12 - 13.
Graph Embedding
- Embedding a graph (distance matrix) in a metric space: multi-dimensional scaling
- Embedding a graph (similarity matrix) in a metric space
- Laplacian embedding

Project 4c. Computing a kernel. Embed it in low-dimensional space. Run kNN, centroids, linRegression, SVM. Run K-means.
Due May 6. Project 4 presentation at 4-6pm.

Week 14 - 15.
Semi-supervied Learning
- Large number of unclassified data; small number of data have class labels

Final Exam