Paper #9: Randomized algorithms for optimizing large join queries


Y. Ioannidis and Y.C. Kang.
Randomized algorithms for optimizing large join queries.
In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, pages 312-321, May 1990.
Presented by: Deodhar


Query optimization for relational database management systems is a very important issue. It is a combinatorial optimization problem which makes the exhaustive search unacceptable as the query size grows. There are some randomized algorithms such as Simulated Annealing (SA) and Iterative Improvement (II) which are viable alternatives to exhaustive search. In randomized algorithms for query optimization, each solution to combinatorial optimization problem can be thought of as a state in the space, i.e. a node in the graph, that includes all such solutions. Each of these nodes has a associated cost and the randomized algorithm performs random walks in the state space to find a state with globally minimum cost. In this paper on "Randomized Algorithms For Optimizing Large Join Queries", the author has addressed all the issues associated with the above two randomized algorithms for optimizing Project- Select-Join queries. It is very important to note that the importance of these randomized algorithms become significant as the query size grows. He has presented the results after applying these algorithms on large queries of various types with different databases, concluding that in most cases Simulated Annealing identifies a lower cost access plan than Iterative Improvement. It has also been observed that II performs better than SA initially, but if enough time is given to SA, it outperforms II. In order to explain these results, he has presented the study of the shape of the cost function over the solution space associated with such queries and has conjectured that it resembles a 'cup' with relatively small variations at the bottom. This has inspired them to come up with a new algorithm called "Two Phase Optimization algorithm" which is a combination of Simulated Annealing and Iterative Improvement. The performance of all these three algorithms is compared with each other and it has been concluded that this new algorithm outperforms both SA and II in terms of output quality as well as running time.