Home > Teaching > 5334

CSE-5334 Data Mining

Home Syllabus Notes Project Links

Data Sets (Examples):

  • UCI KDD archive (http://kdd.ics.uci.edu/)
  • UCI machine learning repository (http://www.ics.uci.edu/~mlearn/MLRepository.html)
  • KDD cup data sets (e.g., KDD Cup 2000, http://www.ecn.purdue.edu/KDDCUP/)
  • PKDD discovery challenge (e.g., PKDD discovery challenge 2006, http://www.ecmlpkdd2006.org/challenge.html)
  • Data mining cup (http://www.data-mining-cup.com/)
  • The NetFlix prize (http://www.netflixprize.com/)
  • Frequent itemset mining dataset repository (http://fimi.cs.helsinki.fi/data/)
  • MIT reality mining dataset (http://reality.media.mit.edu/download.php)
  • KDnuggets' collection of datasets for data mining (http://www.kdnuggets.com/datasets/)
  • A collection of datasets for data mining (http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html)

Proposal (Due Wednesday, October 3, before class): You should clearly state:

Description of the task: Define the problem you propose to address in the project. State what is the motivation and what are the challenges.

Data set(s): List the data set(s) you intend to use in the project. Include a brief description of the data set(s) and the source(s).

Review of related work: Briefly review existing work related to your proposed task. Describe previous results on the data set(s) you choose for the project.

Outlines of deliverables: Clearly state

  • what you will definitely accomplish in the project (stage 0.8)
  • what you expect to accomplish in the project (stage 1.0)
  • what you will accomplish if you have time (stage 1.2)
In particular, describe what data mining techniques you propose to apply and how you intend to measure the performance of constructed data mining models.

Final Report (Due Friday, December 7, midnight): You should submit a zip file that contains a final report (pdf file) and any programs (source code and executables) or scripts you wrote in the project. The final report should include:

Description of the task: A formal statement of the problem you addressed in the project. List the data mining tasks you attempted in the project.

Data set(s): List the data set(s) you used in the project. Include a brief description of the data set(s) and the source(s).

Review of related work: Include a comprehensive review of existing work related to your project. Describe previous results on the adopted data set(s).

Outlines of deliverables: A detailed description of your findings. In particular, describe what data mining techniques you applied and which measures you adopted to evaluate the performance of constructed data mining models. You may also include a summary of the steps you take towards the findings, such as the initial attempts and efforts you made, the data mining models your obtained, and how you subsequently improved the models.