Paper #25: Magic sets and other strange ways to implement logic programs

F. Bancilhon, D. Maier, Y. Sagiv, and J. Ullman.
Magic sets and other strange ways to implement logic programs.
In ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, Cambridge MA, March 1986.
Presented by: Wilson

Logic programming is a simple but powerful formalism for representing data and queries. The relational data model, for example, is easily represented, as are extensions to the model. Functions and programs can be defined along with data such that the distinction between data and programs is blurred. Logic programming is "very-high-level" in the sense that one specifies program functionality ("what it does") without giving an algorithm ("how it does it"). The downside of logic programming is its implementation inefficiency. To achieve efficient execution, we must rewrite the rules defining data, functions, and queries in a more efficient form. This optimization problem is orders of magnitude harder than the relational optimization problem.

This paper presents several methods to transform logic programs so that they run more efficiently. The authors deal with a specific class of programs written in a constrained form of logic programming called "Datalog". In Datalog the terms of predicates are allowed to be only constants, or variables that refer to constants, but not functions (of terms). Datalog expressions are like 1NF relation tuples and it is a useful subset of logic programming for database applications.

The example of the paper is to transform a program that computes "same-generation cousins". The database contains a child-parent relation represented by expressions of the form "par(c,p)", which states that the parent of c is p. The same-generation cousins relation can be defined by the following rules:

      sg(X,X).
      sg(X,Y) <- par(X,Xp), par(Y,Yp), sg(Xp,Yp).

The first rule states that everyone is his own cousin. The second says that X and Y are cousins if their parents are cousins. We wish to compute same-generation cousins for a particular individual "a" using the query sg(a,X). A straightforward "top-down" Prolog implementation can take exponential time using these rules. A "bottom-up" approach is faster (perhaps O(n^3)), but still wastes time computing sg tuples which are never used in the solution of the query.

To improve on this the authors define a "magic set" of individuals that limits the generation of sg tuples to just those needed to solve the query. A key observation is that each sg(b,c) "fact" that is used to generate sg(a,X) answers must have b as an ancestor to a. This leads to the following definition of the magic set and rewritten sg rules:

      magic(a).
      magic(U) <- magic(V), par(V,U).     -- defines ancestors of "a"

      sg(X,X) <- magic(X).
      sg(X,Y) <- magic(X), par(X,Xp), par(Y,Yp), sg(Xp,Yp).

These rules give a more efficient bottom-up evaluation of the query.

The authors present an algorithm which computes the magic set transformation for a class of Datalog programs involving "linear" recursion. Two other methods are presented, "counting" and "reverse counting", which are less generally applicable but possibly more efficient. These three methods, along with a fourth from another paper, are compared asymptotically for example relations with different patterns of data. It turns out that no one algorithm is always best -- one must know the relation contents to make good optimization decisions. The authors emphasize the difficulty of defining an "optimal" way to evaluate queries.