|
About the Course
Much of the world’s recorded data is locked up in
structured sources such as databases, which are often the propriety
information of private corporations and government agencies. Searching and
exploring for information within databases is currently very cumbersome -
often the data explorer has to know comprehensive query languages (such as
SQL), as well as important information on how the data is structured into
different tables and columns (the database schema). In recent years,
researchers have pondered on the problems of improving the search and
exploration capabilities for relational databases. This includes adapting
probabilistic and approximate querying methods to improve the scalability
of query answering, as well as information retrieval techniques such as
relevance ranking and keyword search. This class will explore the recent
efforts by researchers in these extremely important and challenging
fields. We will read and discuss latest research literature gleaned from
premier conferences in databases and information retrieval. It is hoped
that this class will spur students to pursuing further research in these
areas.
The following is a tentative list of topics which we
will attempt to cover:
1. Probabilistic Methods in
Databases
Sampling Methods in Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain Data
2. Unstructured Search in
DatabasesKeyword Queries in Databases
Ranking of Database Query Results
3. DB and IR integration
Top-K algorithms
We will cover various topics in breadth, understand
the central contributions of these efforts and try and predict future
research directions.
Prerequisites
Advanced Algorithms and Database II are the
prerequisite courses. However, exceptions will be made on a case by case
basis, especially if the student has prior exposure or demonstrates
initiative to quickly learn these concepts on his/her own.
Presentations
The actual reading list, consisting of recent
research papers, will be selected and finalized by the first week of
classes. Each student will present one or more papers (depending on the
enrollment) during the semester. Students will participate in class
discussions during and after each presentation. Attendance is required.
Project
Additionally to reading papers, students will have
the option of attempting a programming project during the semester. The
projects will involve developing portions of information retrieval systems
for structured databases based on the techniques suggested in the papers.
The projects will also be tested out using real data that the students
should get access to. A long-term objective is that the more promising
projects will serve as infrastructure/test-beds for students to continue
with their research in these areas beyond the course.
Evaluation
The grade will be based on the paper presentations,
class attendance and participation, and performance in the projects. |