• Tidak ada hasil yang ditemukan

Differential Datalog

5.1 Differential-Dataflow Computation Model and Tool Suites

5.1.3 Differential Datalog

The main use case for Datalog is to take a database of facts and iteratively infer additional interesting facts via given rules from the current knowledge base. Datalog and related programming languages are commonly calledlogic programming. Differential-Datalog is a general-purpose logic programming language extending the traditional Datalog language and is built upon Differential-dataflow. The output of the DDLog compiler is a dataflow graph, which may contain cycles (introduced by recursion). The nodes of the graph represent relations; the relations are computed by dataflow relational operators. Edges connect each operator to its in- put and output relations. Differential-dataflow natively implements the following operators: map,filter, distinct,join,antijoin,groupby,union,aggregation, andflatmapwith a highly opti- mized implementation in Rust.

t y p e d e f NID = S t r I d {n i d : s t r i n g} | NumId {n n i d : u32} | C o n s t a n t 1 | C o n s t a n t 2 i n p u t r e l a t i o n Node ( i d : NID )

i n p u t r e l a t i o n Edge ( s r c : Node , d s t : Node ) o u t p u t r e l a t i o n P a t h ( s r c : Node , d s t : Node ) o u t p u t r e l a t i o n NoCycle ( n o d e : Node )

t y p e d e f N o d e L i s t N x t = Nxt {l i s t : Ref<N o d e L i s t>} | NULL o u t p u t r e l a t i o n N o d e L i s t ( i t e m : Node , n e x t : N o d e L i s t N x t )

N o d e L i s t ( node , n x t ) : − Node [ n o d e ] , v a r t a i l = N o d e L i s t{node , NULL}, v a r n x t = Nxt{r e f n e w ( t a i l )}.

P a t h ( a , c ) : − P a t h ( a , b ) , P a t h ( b , c ) .

H a s C y c l e ( H a s C y c l e C o n s t a n t ) : − P a t h ( u1 , u2 ) , u1 == u2 , v a r g = u1 . g r o u p b y ( ( ) ) , v a r c o u n t = g . g r o u p c o u n t ( ) , c o u n t == 0 .

O u t d e g r e e ( Node{s r c}, sum ) : − Edge ( Node{s r c}, Node{d s t}) , v a r sum = d s t . g r o u p b y ( s r c ) . g r o u p c o u n t ( ) .

Listing 5.2: Introduction to Differential-Datalog with Examples

In Listing 5.2 we give a short example to introduce the major language features of DDLog and the mean- ing of its syntax. The keywordtypedefdenotes the type definition of tagged union type that contains at least one constructor. For example, The typeNIDcan be either a string ID or a numeric ID.Contant1,

Constant2andNULLare implicit constructors that take zero argument. Differential-Datalog also supports advanced types such as generic typesList<T>,Set<T>andRef<T>in the same flavor as Rust language itself even though the Differential-Datalog has a different imperative programming language for writing ex- ternal function. For example,Nxt {list:Ref<NodeList>}is a constructor that takes the reference of NodeListas the only one argument without keeping another copy of the same data.

Relations denoted by keywordsinputandoutputis the entry and exit points in the dataflow gener- ated by the DDLog program thatoutput relation Path(src:Node, dst:Node)means a data container namedPaththat has a set of tuples. The input relations only receive changes from data input while the output relations incrementally reflect the changes propagating in the dataflow graph from the inputs all the way to the final outputs.

Figure 5.1: Differential-Datalog Internal Workflow

Figure 5.2: Generated Differential-Dataflow from a DDLog Rule

The dataflow graph and pipelines are described by the rules defined in DDLog program that each rule is translated into operators in Differential-Dataflow. For example, the pattern-matching predicates in the rule are mapped to the joinoperator that takes two incoming streams of data and incrementally updates the results. DDLog also has the specialgroup byoperator that groups the incoming data stream based on the key. For example, the last rule in Listing 5.2 groups all the edgesEdge(src, dst)in the graph by the key srcin order to compute the outdegree of each node. DDLog program may have recursion in its rules that repeatedly feed the updates from the output back to the input until the fixed point is reached that the dataflow sub-graph with self-cycle cannot derive new facts anymore.

In a nutshell, Differential-Datalog is a declarative language to define the versatile data structures to be passed around in the dataflow and describe how timestamped streams of data should be computed on a higher level abstraction. Differential-Datalog also has a compiler that compiles the description of dataflow into a real runtime implemented in Rust to do the incremental computation. The users either add changes to the input in the command line or call the runtime APIs in Rust.

Before we dive into the implementation of an incremental version of FORMULA language for metamod- eling, we also did some research on each tool and the potential of tool integration as summarized below

1. Introduce several frameworks based on differential computation models and analyze the relationship between Timely-dataflow, Differential-dataflow, and Differential-datalog.

2. We investigate the possibility of integrating the differential computation model into our integrated modeling framework mainly FORMULA to achieve better performance and incremental computation.

3. Dissect and compare the computation model of FORMULA and Differential-Dataflow to have a one-to- one mapping from FORMULA semantic to the dataflow operators. The data structures of FORMULA terms and data types in Differential-dataflow and DDLog are also compared for the implementation of a translator from concrete FORMULA terms to the tagged union in DDLog.

In this chapter, we describe in detail the novel idea of generating an Incremental version of logic style modeling language FORMULA by modeling the language domains, identifying the semantic mismatches, and doing a model transformation for code generation. We use a general logic programming language named Differential-Datalog to do the model transformation incrementally because DDLog is also an incremental version of Datalog with extensions and more features. The whole process of model transformation and extraction from models to generate executable code can be viewed as incrementally generating an equivalent DDLog program using the exact same incremental DDLog language by reasoning and rule execution.