Matteo Riondato

Contact info

COSC-254 Data Mining (Spring 2019)

Course info

Times & Location: MW 2—3.20pm, Science Center E110

Website: http://rionda.to/courses/cosc-254-s19/, Moodle for assignments and forum

Prerequisites: COSC-211 Data Structures

Instructor: Matteo Riondato (he/his, please call me "Matteo")
Contact: mriondato@amherst.edu (please use [COSC254] in front of your subject. Only for confidential messages that cannot go to the forum.)
Office Hours: T 3.30—5.30pm, Science Center C214. Please reserve a 15-minutes slot by the day before (Sunday) at 4pm.

TA: Alexander Einarsson
Office Hours: Th 3—5.00pm, Science Center E210.

Description

This course is an introduction to data mining, the area of computer science that deals with the development of efficient algorithms for extracting information from data. We will:
  • talk about the key tasks in the analysis of transactional datasets, time series, and graphs, and the most efficient algorithms to solve them;
  • learn about parallel/distributed systems to perform the analysis of massive datasets;
  • use interactive notebooks and large-scale systems to evaluate algorithms and analyze data.

Syllabus

Most of the information you need is available in the syllabus. For anything else, please ask on the Moodle forum or, if it is confidential, email Matteo (please use [COSC254] in front of your subject).

Schedule & Diary

For the past dates, the listed topics are the topics covered on those dates. For future dates, they are the planned topics, and subject to change. For the readings, MMD denotes the Mining of Massive Datasets book, and DMT denotes Data Mining — The Textbook.

  • List of covered topics
  • Lecture of 4/18: Ranking Slides on HITS Slides on PageRank
    HW07 is out. Due on 4/24 at 1.59pm.
  • Lecture of 4/15: Indexing and Ranking Slides
  • Lecture of 4/10: Link prediction. The Web and crawling Slides
  • No Lecture of 4/8
  • Lecture of 4/3: Community detection Slides.
  • Lecture of 4/1: Closeness and Betweenness Centrality Slides.
  • Lecture of 3/27: Centrality measures Slides. Readings: MMDS 10.1, DMT from 19.2.1 to 19.2.5.2.
    HW06 is out! Due 4/3 at 1.59pm.
  • Lecture of 3/25: Social network analysis Slides.
  • Project 02 is out: proj02.pdf, triest.zip. Due on 4/15.
  • Lecture of 3/20: Counting triangles on MapReduce Slides. Readings: MMDS 2.3.7, 2.5.3, 10.7.4.
    HW05 is out! Due 3/27 at 1.59pm.
  • Lecture of 3/18: Counting triangles on static graphs. Homework correction Slides, Homework Slides. Readings: MMDS 10.7.1, 10.7.2, 10.7.3.
  • Lecture of 3/6: Graphs and TRI\ÉST Slides.
  • Lectures of 2/27 and 3/4: Data Streams: DGIM algorithm Slides. Readings: MMDS 4.6.
    HW04 is out! Due 3/6 at 1.59pm.
  • Project 01 is out: proj01.pdf
  • Lecture of 2/25: Data Streams: Bloom filter, Flajolet-Martin approach Slides. Readings: MMDS 4.3, 4.4.
  • Lecture of 2/20: Data Streams: Intro, Reservoir sampling Slides. Readings: MMDS 4.1, 4.2.
  • Lecture of 2/18: Eclat algorithm (Slides), Compressing Patterns (Slides). Readings: N/A.
  • Due to the network outage, both HW02 and HW03 are due on Wed 2/20 at 2pm.
  • Lecture of 2/13: Association Rules, Apriori algorithm Slides. Readings: MMD 6.2.5, DMT 4.4.1, 4.4.2.
    HW03 is out! Due 2/20 at 1.59pm.
  • Lecture of 2/11: Intro to Association Rules Slides. Readings: MMD 6.1.3, DMT: 4.3.
  • Lecture of 2/6: Communication costs, Intro to Pattern Mining Slides. Readings: MMD 2.5, 6.1.1, 6.1.2, 6.2.1, 6.2.3, 6.2.4, DMT 4.1, 4.2.
    HW02 is out! Due 2/13 at 1.59pm.
  • Lecture of 2/4: Matrix-by-Vector Multiplication in Hadoop. Readings: MMD 2.3.
  • Lecture of 1/30: MapReduce & Hadoop Slides. Readings: MMD 2.1, 2.2.
    HW01 is out! Due 2/6 at 1.59pm.
  • Lecture of 1/28: What is Data Mining? Slides. Readings: MMD Ch.1, DMT Ch. 1.
    HW00 is out! Due 1/30 at 1.59pm.

Future classes

  • Week of 4/22: PageRank and Review
  • Week of 4/29: Review