Paper #9: Randomized algorithms for optimizing large join queries
Y. Ioannidis and Y.C. Kang.
Randomized algorithms for optimizing large join queries.
In Proceedings of the 1990 ACM SIGMOD International Conference
on Management of Data, Atlantic City, NJ, pages 312-321, May 1990.
Presented by: Deodhar
Query optimization for relational database management systems
is a very important issue. It is a combinatorial optimization
problem which makes the exhaustive search unacceptable as the
query size grows. There are some randomized algorithms such as
Simulated Annealing (SA) and Iterative Improvement (II) which are
viable alternatives to exhaustive search. In randomized algorithms
for query optimization, each solution to combinatorial optimization
problem can be thought of as a state in the space, i.e. a
node in the graph, that includes all such solutions. Each of
these nodes has a associated cost and the randomized
algorithm performs random walks in the state space to find
a state with globally minimum cost. In this paper on
"Randomized Algorithms For Optimizing Large Join Queries",
the author has addressed all the issues associated with the
above two randomized algorithms for optimizing Project-
Select-Join queries. It is very important to note that the
importance of these randomized algorithms become significant
as the query size grows. He has presented the results after
applying these algorithms on large queries of various types
with different databases, concluding that in most cases
Simulated Annealing identifies a lower cost access plan than
Iterative Improvement. It has also been observed that II
performs better than SA initially, but if enough time is
given to SA, it outperforms II. In order to explain these
results, he has presented the study of the shape of the cost
function over the solution space associated with such
queries and has conjectured that it resembles a 'cup' with
relatively small variations at the bottom. This has inspired
them to come up with a new algorithm called "Two Phase
Optimization algorithm" which is a combination of Simulated
Annealing and Iterative Improvement. The performance of all
these three algorithms is compared with each other and it
has been concluded that this new algorithm outperforms both
SA and II in terms of output quality as well as running
time.