Since Datalog is a subset of Prolog, one could create and execute Datalog programs as soon as Prolog evaluators existed—around 1972. However, as a named concept and object of study, Datalog emerged in the mid-1980s, following work in deductive databases in the late 1970s. (See Section1.3for more on the naming of Datalog.) Why did Datalog emerge as a topic of interest at that point? It was because Datalog served as a “sweet spot”—or “middle ground”—for related research lines from Logic Programming, Database Systems, and Artificial Intelligence.
Why was Datalog interesting to the three communities? It was because pure Datalog was very simple and had a clean syntax and semantics. Yet it was expres- sive enough to serve as the basis for theoretical investigations and examination of
evaluation alternatives, as well as a foundation from which extensions could be ex- plored and a starting point for knowledge-representation systems. We summarize relevant trends in each of the three communities below, following each by more specific background.
Logic Programming
Logic programmers saw relational databases as an implementation of an important sublanguage, and worked to integrate them into Prolog systems. Since early Prolog implementations assumed all rules and facts were memory-resident, it was clear that for very large fact bases, something like relational database technology was necessary. Sometimes this enhancement took the form of a connection to a rela- tional database management system (RDBMS). Translators were written to convert a subset of Prolog programs into SQL to be passed to an RDBMS and evaluated there. Datalog was a more powerful, more Prolog-ish, data-oriented subset of Pro- log than SQL. Datalog was also more “declarative” than Prolog and many in the Logic Programming community liked that aspect. With Prolog, there was a serious gap between Logic Programming theory (programs understood as logical axioms and deduction) and Logic Programming practice (programs as evaluated by Prolog interpreters). For one thing, Prolog contained a number of non-logical primitives;
for another, many perfectly logical Prolog programs would not terminate due to the weaknesses of the evaluation strategy used by the Prolog interpreters. With Data- log, that gap almost disappeared—deduction largely coincided with evaluation in terms of the produced results. Also, Datalog provided a basis on which to work out solutions for recursion with negation, some of which could be applied to more general logic languages.
Background. Logic programming grew out of resolution theorem proving pro- posed byRobinson[1965] and, especially, out of a particularly simplified version of it, calledSLD resolution[Lloyd 1993,Kowalski 1974], which worked for special cases of logic, such as Horn clauses.5In the early 1970s, researchers began to re- alize that SLD resolution combined with backtracking provided a computational model. In particular, Colmerauer and collaborators developed Prolog [Colmerauer and Roussel 1996] and Kowalski made significant contributions to the theory of logic programming, as it came to be called [Kowalski 1988]. Prolog was the start- ing point for languages in the Japanese Fifth-Generation Computer Systems (FGCS) 5. AHorn clauseis a logical implication among positive literals with at most one literal in the consequent.
project in the early 1980s [Fuchi and Furukawa 1987,Moto-oka and Stone 1984].
The FGCS project sought advances in hardware, databases, parallel computing, de- duction and user interfaces to build high-performance knowledge-base systems. In the context of this project,Fuchi[1981] describes Prolog as a basis for bringing to- gether programming languages and database query languages. D. H. D. Warren [1982b] provides interesting insights into the focus on Prolog for the FGCS. The use of Prolog to express queries dates from around the same time. For example, the Chat-80 system [Warren and Pereira 1982] analyzed natural-language questions and turned them into Prolog clauses that could be evaluated against stored predicates.
(All examples there are actually in Datalog.)Warren[1981] showed that such queries were amenable to some database-style optimizations via Prolog rewriting and an- notation. Prolog itself had been proposed for database query. For example,Maier [1986b] considers Prolog as a database query language and notes its advantages—
avoiding the “impedance mismatch” between DBMS and programming language, expressive power, ease of transformation—but also points out its limits in terms of data definition, update, secondary storage, concurrency control and recovery.
Zaniolo[1986] also notes that Prolog can be used to write complete database ap- plications, avoiding the impedance mismatch. He further proposes extensions to Prolog for use with a data model supporting entity identity. For a recent survey of the history of logic programming, seeKowalski[2014].
Database Systems
As the relational model gained traction in the 1980s, limitations on expressiveness of the query languages became widely recognized by researchers and practition- ers. In particular, fairly common applications—such as the transitive closure of a graph and bill-of-materials roll ups (i.e., aggregation of costs, weights, etc. in a part- subpart hierarchy)—could not be expressed with a single query in most relational query languages. Various approaches for enhancing expressiveness to handle such cases were proposed, such as adding control structures or a fixpoint operator to relational algebra. Datalog was a simple alternative that was similar to domain re- lational calculus, with which the database theory community was familiar. Thus, it was readily understood, and provided a natural setting in which to study topics such as deductive databases, recursion, its interaction with negation, and evalua- tion techniques. Much of the early discussion and presentations on Datalog took place at the informal“XP” workshops6(particularly XP 4.5 in 1983 and XP 7.52 in 1986) and the early symposia on Principals of Database Systems (PODS), which were 6. A list of these workshops can be found inhttp://dblp.uni-trier.de/db/conf/xp/index.html
a follow-up to the XP workshops to some extent. Work on Datalog also highlighted the difference between query answering as evaluation in a model vs. deduction in a theory.
Background. It was recognized early on that there were recursive queries express- ible neither in relational algebra nor relational calculus. For example, Aho and Ullman[1979] prove the inexpressibility of transitive closure by finite relational ex- pressions, and consider various extensions to handle it, such as a least-fixed-point operator and embedding in a host programming language.Paredaens[1978] and Bancilhon[1978] also explore this issue. The PROBE system supported traversal recursionover directed graphs [Rosenthal et al. 1986].
The connection of logic and databases predates the relational model. For exam- ple,Green and Raphael[1968] uses theorem proving as the basis for the question- answering system QA1 that can “deduce facts that are not explicitly available in its data base.” The 1970s was an active time for investigating the connections between logic and databases, such as model-theoretic vs. proof-theoretic views of a database [Nicolas and Gallaire 1977] and “closed-world” vs. “open-world” assumptions about the information in a database [Reiter 1977b]. It was also a time when prototype deductive databasesbased on logic began appearing, such as MRRPS 3.0 [Minker 1977], DADM [Kellogg et al. 1977], and DEDUCE 2 [Chang 1977]. While many re- searchers of deductive databases focused on function-free logic,Reiter[1977a] was explicit in his opinion that function-free logic “approximates [his] own intuitive concept of what should be a database,” for otherwise “any first-order theory is a database,” such as point-set topology. For a history of deductive databases, see Minker et al.[2014].
Artificial Intelligence
The use of logic and deduction as a basis for question-answering and reasoning in expert systems dates to at least the late 1960s. Rule-based systems were also a common approach to AI problem solving. Logic languages such as Prolog were attractive to this community because they encompassed both basic information and rule-based “intelligence” to work with that knowledge in a uniform model, while providing a formal foundation for rule-based reasoning. There were other attractions, such as the natural use of meta-programming features to manipu- late programs and implement alternative evaluation systems. Prolog rules seemed an accessible means for domain experts (who were assumed not to be sophisti- cated programmers) to directly capture their reasoning strategies. Furthermore, the resolution-based deduction methods used with Prolog were at once a close analog
of human reasoning and an efficient evaluation mechanism. By the early 1980s, there was interest in working with large fact bases in expert systems, and imbu- ing database systems with intelligence, manifested in the closely related areas of Knowledge-Based Systems (KBS) and Expert Database Systems (EDS). Some saw the function-free subset of Prolog as a happy medium between databases and general rule-based reasoning. Datalog became a common basis for work in the KBS and especially the EDS communities.
Background. As mentioned, one of the earliest proposals to use theorem proving as a mechanism for query answering was that ofGreen[1969]. The 1970s saw a proliferation of expert systems that tried to replicate human expertise in computa- tional form [Puppe 1993]. These early systems tended to be ad hoc, with much of their knowledge encoded procedurally. Toward 1980, KBS emerged as an architec- tural approach to make expert systems (and other reasoning applications) easier to construct and maintain, especially when involving large collections of information [Davis 1986]. In a KBS, there is a separation of knowledge structures and the compu- tational mechanism to apply that knowledge. These two parts are often called the knowledge baseand theinference engine. (The inference engine did not necessarily use logical inference. It could, for example, make use of probabilistic reasoning.) The knowledge base contains both concrete information (facts) and more abstract forms of knowledge, such as rules, templates, or classification hierarchies. Logic was the representation for the knowledge base in some KBS, although there were competing approaches, such as frame-based systems [Minsky 1975] and semantic networks [Findler 1979]. (It is interesting to note that there was some debate at the time as to the suitability of logic for this role [Hayes 1977,Hayes 1980,Winograd 1975].) The logic-based KBS often structured the knowledge base in the form of facts and rules, similar to a logic program. For example, the DLOG system [Goebel 1985] was a KBS using logic, and its implemented subset consisted of facts and Horn clauses that were passed to a Prolog-based interpreter. Another example of a logic-based framework for KBS was the Syllog system [Fellenstein et al. 1985], which had a database and Prolog-style rules, but presented a structured natural-language interface to them. In a similar vein to KBS, Expert Database Systems (EDS) were an effort to imbue database systems with richer representation and reasoning capa- bilities. Logic (usually in the guise of logic programming) was often the choice for providing capabilities such as classification hierarchies [Dahl 1982] and incorpo- rating constraints into query answering [Dahl 1986,Kifer and Li 1988].Kerschberg [1990] provides an overview of EDS.
1.2.1
UptakeWhile Datalog had precursors from Logic Programming, Database Systems, and Artificial Intelligence, the bulk of the early work on it took place in the database theory and query-language communities. Deductive-database researchers showed some interest, but their focus was more on removing non-declarative features—
such as cut—from Prolog while retaining top-down, SLD-resolution approaches to evaluation [Minker et al. 2014]. On the database-query side, however, there was strong interest in adding rules and recursion to existing query frameworks, which mainly employed bottom-up techniques based on relational algebra for evaluation.
Since Datalog did not have function symbols, a safe program implied a finite number IDB facts from a finite EDB, meaning that bottom-up techniques would converge. Much early work on implementation techniques for Datalog and similar languages was based on bottom-up approaches, and researchers found ways to adapt these approaches to support additional features, such as complex objects, aggregation, and negation. These extensions motivated new semantic constructs, such as stratification and stable models for negation and other non-monotone extensions. Therefore, it should be no surprise that the majority of sources cited in this chapter appeared in database venues, through there is a significant body of references from logic programming and AI publications.