CSE 230 Notes, Sethi 11

CSE 230: Principles of Programming Languages

Notes on Chapter 11 of Sethi (logic programming and Prolog)

The path towards what eventually became known as Prolog starts with Alan Robinson, who around 1965 developed his resolution method (which includes unification) as a computational approach to theorem proving. The next step occurred in the early 1970s, when a group headed by Robert Kowalski in the AI Department of the University of Edinburgh was using interpreters for logic in experiments on natural language understanding. Although this work was very interesting, the programs were too inefficient to be practical. A little later, Alain Colmerauer in France built a much more efficient interpreter, and called the system Prolog (for "programming in logic"); he also introduced some features which departed significantly from pure logic, such as the "cut". In the 1980s, David Warren at SRI designed an efficient abstract machine which became the basis for all subsequent Prolog compilers. Logic programming was the focus of the Japanese Fifth Generation Project, which aimed to build highly efficient machines for logic programming languages, and use them for applications that required much greater intelligence than was then available. The fact that the Fifth Generation Project was a very expensive failure contributed to a decline in interest in logic programming for large applications. However, logic programming remains one of the best approaches for natural language processing, as well as a very interesting adventure in programming language design. Note that all these projects were AI oriented; as usual, this cultural context helps to understand the design decisions that went into Prolog, such as its reuse of the Lisp data structure for lists.

As a footnote, Alfred Horn, for whom Horn clauses are named, had nothing to do with logic programming; he was a professor of logic at UCLA who in 1951 wrote paper using the sentences that now bear his name for reasons having little to do with computer science. As a second footnote, it seems to me rather misleading to call Prolog a "logic programming" language, since it departs rather far from logic; I would rather have had it called a "relational programming" language, because it is the use and manipulation of relations that is most characteristic of its programming style.

BinProlog is the "official" version of Prolog for this class. You should read the short "getting started" Notes on BinProlog, where many important and useful facts are given. However, if you prefer to use a different version of Prolog for exercises, that is certainly permissible.

We may divide programming languages into procedural languages, and non-procedural languages, where the first includes imperative languages (where assignment is fundamental, as in C, Java, etc.) and functional languages (where functions are first class citizens, as in ML, LISP, etc.). Non-procedural languages are not just what's left over, but are supposed to have declarative flavor, i.e., to (more or less) allow programers to state a problem without stating how to solve it. Most non-procedural languages are logic languages, in which the syntax and semantics are supposed to be based on some form of logic. For example, Prolog is (partially) based on Horn clause logic, and OBJ is based on equational logic. However, SETL is based on set theory, which is not a form of logic, so it is a non-logical and non-procedural language.

The following may help you to convert from thinking in the imperative programming paradigm to that of the logic paradigm; this material was adapted by Fox Harrell from The Art of Prolog, by Leon Sterling and Ehud Shapiro.

This is a sample thought process for writing a recursive procedure in Prolog; in this view, the key to writing the Prolog program is being able to switch back and forth between procedural and declarative thought. The example involves writing a delete procedure which removes all occurrences of some element from a list.

Specify the meaning of the arguments of the relation. In this case there are three: the element X to be deleted, the element L1 which may contain occurrences of X, and the element L2 which does not contain any occurrences of X. The input will have the form
Think of an example query, such as ?- delete([a,b,c,b],b,X), and its answer, which in this case should be X = [a,c].
Begin by thinking procedurally about the first recursive element. The recursive form for a list is [X|Xs]. There are two possibilities.
1. X is the element to be deleted.
2. It is not.
In the first case we need to remove X from the head of the list and recursively delete X from Xs. The appropriate rule is delete([X|Xs],X,Ys) :- delete(Xs,X,Ys). Now we should take a step back and look at this declaratively. The declarative reading of this rule is "The deletion of X from [X|Xs] is Ys if the deletion of X from Xs is Ys."
In the second case since the head of the list is not equal to X we want to keep it around. In this case an appropriate rule is
delete([X|Xs],Z,[X|Ys]) :- X != Z, delete(Xs,Z,Ys). The declarative reading of this rule is: "The deletion of Z from [X|Xs] is [X|Ys] if Z is not equal to X and deleting Z from Xs is Ys."
So are we done? No! We need a base case to stop the recursion (after you get used to writing Prolog programs you may want to start coming up with a base case before the other rules). The base case is straightforward here. No elements can be removed from the empty list. The rule for this is Now we are done, and here is the final program:

For practice, you should try to define a program that removes only the first occurrence of an element from a list!

There is an interesting reason why it is difficult to explain cut well: whereas Prolog is supposed to be a non-procedural language, cut cannot be understood in non-procedural terms; i.e. cut is procedural, and thus is in conflict with the basic design philosophy of the language; i.e., cut is a kind of kludge, and kludges are usually difficult to explain. Also, one can do some rather surprising things with cut, which at first may seem counter-intuitive, and are easy to get wrong.

Cut is a nullary predicate, denoted !, which always succeeds when the search reaches it, but which also has a rather drastic side effect on the search. I like to think of a cut in a clause as a way for the programmer to tell Prolog something like the following:

Trust me - if you get this far in this clause, there's no need to backtrack and try another clause for proving its goal, or to try any other way of satisfying any of the subgoals that were already proved for this goal.

Cuts can be useful for reducing the search space of a Prolog program, which sometimes can be really huge. In fact, this was the original reason that Colmerauer introduced cut. Here is a simple example: Suppose we have two predicates, p1(X) and p2(X), which are mutually exclusive, i.e. exactly one of them is true for each value of X; an example is predicates for even and odd integers. Now suppose we have the following: q(X) :- even(X), a(X). q(X) :- odd(X), b(X). Then if X is even, we should check a(X), but there is no need to check the second q(X) clause, because we know it will fail. Therefore it is safe to write the following, which is also more efficient: q(X) :- even(X), !, a(X). q(X) :- odd(X), b(X). When X is even, the first literal of the first clause succeeds, so the cut prevents backtracking on q(X) or even(X), and the second clause will not be tried.

Here is another example, which is a little more complex, and which also illustrates the Prolog search strategy:

a(1) :- b. a(2) :- e. b :- !, c. b :- d. c :- fail. d. e. If we try the query a(X), the first clause will match, giving us a tentative value 1 for X, provided the body b of the clause succeeds; next, the third clause matches b, and raises the goal c, which fails; because of the cut, Prolog will not undo its commitment to the third clause for b, and so it will never find the fifth clause, which would have succeeded; therefore it backtracks all the way to the top, undoes the first clause, and tries the second, now with a tentative value 2 for X and a subgoal e, which then succeeds. So in this case, the cut causes Prolog to miss a solution. Of course, without the cut, Prolog will find both solutions. (Note that the fifth clause would not be needed for some other versions of Prolog, which automatically fail if there is no clause for some literal.) Here is a link to the BinProlog output for this example, which includes a trace.

Now let's look at a couple of stranger things that one can do with cut. The first is something that rather excited researchers when they first discovered it, a way of introducing exceptions into Horn clause rule sets. This was considered important because exceptions are common in knowledge representation, and knowledge representation was supposed to be one of the most important applications of Prolog. This example says that all birds fly, except for penguins, which are birds, but do not fly (the version of this example on page 201 of Stansifer is wrong):

bird(eagle). bird(sparrow). bird(penguin). fly(penguin) :- !, fail. fly(X) :- bird(X). For example, the query fly(sparrow) will succeed in the usual way, but fly(penguin) will fail due to the fourth clause, and the cut will prevent to fifth clause from being tried, so penguins will not be claimed to fly, even though they are birds. The query bird(X) will find all three birds, but the query fly(X) does not work correctly (at least, not in BinProlog); this is due to the conflict between the logic and procedural features.

Another odd looking use of cut is to implement so-called negation by failure, the code for which is as follows:

not(X) :- X, !, fail. not(X). This is a meta-logical predicate, which means that it takes predicates as its argument. For an example, in the query not(fly(penguin)). the argument to not is fly(penguin). The above query succeeds by showing that fly(penguin) fails. Here's how it works: the first clause for not will try to make fly(penguin) succeed; if it did succeed, then the subsequent cut and fail would make that clause fail without trying the next not clause. However, since fly(penguin) fails, the second not clause is tried; it matches, and causes not(fly(penguin)) to succeed. On the other hand, the query not(fly(X)). produces incorrect results (at least, in BinProlog).

We can look at unification as solving systems of equations, using a process rather like what is called Gaussian elimination in linear algebra. Let's do some simple examples to illustrate the idea. Suppose the two terms to be unified are

f(g(X), h(X,Y)). f(Y, h(Z,g(Z))). What we want is a substitution s such that s( f(g(X), h(X,Y)) ) = s( f(Y, h(Z,g(Z))) ). Or more simply, we could say we are looking for values of X, Y, Z such that f(g(X), h(X,Y)) = f(Y, h(Z,g(Z))) . To solve this equation, we first check that the "head functors" of the two terms are equal; both are f, so this is OK. Then in order for the two terms to be equal, the following equalities have to hold: g(X) = Y h(X,Y) = h(Z,g(Z)) The first equation actually gives a solution for Y in terms of X, while the second is solved in the same way that we started this process, checking that the head functors are equal, and then noting that for the two terms to be equal, we must have X = Z Y = g(Z) and since we already have Y = g(X), from the second of these equations we get g(X) = g(Z) which in the usual way gives X = Z, which we have already deduced. So putting everything together, we get Y = g(X) Z = X which cannot be simplified any further. This says that we can take any term at all for X, and then the values of Y, Z are uniquely determined from that. In fact, the above is a substitution, and moreover, it is the most general substitution which is a solution to the original unification problem, in the sense that any other solution is a substitution instance of this one. The existence of such solutions when there are any solutions at all, is the main result about unification, first proved by Alan Robinson.

The above procedure is essentially the unification algorithm devised by Montanari and Martelli; it is linear in the size of the problem. Notice that if the problem had been slightly different, there would have been no solution. For example, if the original terms had been

f(g(X), h(X,Y)). f(Y, h(Y,g(Z))). then unification would have failed because we would have the equations Y = g(X) Y = X which imply that X = g(X) which contains a circularity, and therefore in the jargon of unification theory, is said to "fail the occurs check". Note that whereas early implementations of Prolog tended to omit the occurs check, today it is usual to check for the occurence of a variable in the term that defines it, except in those implementations that allow circular terms as solutions, such as Colmerauer's Prolog III system, which was the first to take this approach.

Of course, unification can also fail in simpler ways, for example, if the heads of subterms fail to match, as in

f(g(X), f(X,Y)). f(Y, h(Y,g(Z))). where the subterms f(X, Y) and h(Y, g(Z)) cannot be unified because f is unequal to h.

The algorithm used for solving certain queries in BinProlog actually goes beyond unification, in that it introduces new variables, and searches for values for them; this is called narrowing. Examples of this can be given using the the code in pl/addn.html, such as

addn(X, X, s(s(0))). The details of this are beyond the scope of this course, although examples were discussed in class.

Other topics

Difference lists are an important idiom in Prolog programming; they are a tricky way of representing lists that allow for certain kinds of computation on lists to go much faster, including parsing. First, the intuition behind this idiom is to represent a list as the "difference" of two other lists. As an analogy, the number 6 can be represented as the difference 9 - 3, or 13 - 7, or more generally, (6 + Y) - Y for any number Y. Similarly, we can represent the list [1, 2, 3] as the difference between [1, 2, 3, 4, 5] and [4, 5], or between [1, 2, 3, a, b, c] and [a, b, c], or more generally between [1, 2, 3 | Y] and Y, for any list Y. Symbolically, we may write

(X + Y) - Y = X . It is conventional to represent difference lists using a functor, and dl is particularly common. Thus, the above representations of [1, 2, 3] would be written dl([1, 2, 3, 4, 5], [4, 5]) dl([1, 2, 3, a, b, c], [a, b, c]) dl([1, 2, 3 | Y], Y) respectively.

One very nice example of the use of this idiom is to get a very fast version of append for difference lists; the definition is also very simple.

appendl(dl(X,Y), dl(Y,Z), dl(X,Z)). Although its operational semantics is rather tricky, since unification may be involved, this has a very simple declarative reading: the append of the difference of X and Y with the difference of Y and Z is the difference of X and Z. In the analoguous arithmatic notation, we may write this as (X - Y) + (Y - Z) = (X - Z) . Here are links to a simple example and its output, including a trace.

It is common to use difference lists for parsing in Prolog, instead of using append, because difference lists can be made to do appends implicitly, which is much more efficient. This is illustrated in the linked Prolog code for parsing a simple English language grammar; and here is a link to its output. A good reference for difference lists and their use in parsing is The Art of Prolog, by Leon Sterling and Ehud Shapiro (MIT Press).

There is a small bug on page 458, on the 4th line from the bottom, "L = [a]" should be "X = [a]" (or else previous X's should be L's).

To CSE 230 notes page
To CSE 230 homepage