4 A Matching-Based Edit Distance - Lecture Notes in Computer Science 5240

The algorithm to calculate weighted quantitative simulation can be used as a similarity measure for service automata or OGs, but has two drawbacks: Firstly, it is not an edit distance. It calculates a value that expresses the similarity between the service automata, but gives no information about the modiﬁcation actions needed to achieve simulation. Secondly, it does not take formulae of the OG into account. Therefore, a high similarity between a service automaton and an OG would not guarantee deadlock freedom as the example of Fig. 3 demonstrates: The service automaton of the customer is perfectly simulated by the OG but the overall choreography deadlocks.

4.1 Simulation-Based Edit Distance

Before we consider the OG’s formulae, we show how the similarity result of the algorithm of [18] can transformed into an edit distance. Given two statesq1and q2, Def. 1 determines the best simulation between the transitions ofq1andq2. In addition, one service automaton can stutter (i. e., remain in the same state). The weighted quantitative simulation function calculates the best label matching to maximize the similarity between the root nodes of the service automata. From the transition pairs belonging to the maximum, we can derive according edit actions (cf. Table 1).

Table 1.Deriving edit actions from transition pairs of Def. 1 transition ofS1 transition ofS2 resulting edit action similarity

a a keep transitiona L(a,a)

a b modify transitionatob L(a,b)

a ε(stutter) delete transitiona L(a, ε)

ε(stutter) a insert transitiona L(ε,a)

These edit actions deﬁne basic edit actions whose similarity is determined by the edge similarity functionL. To simplify the representation of a large number of edit actions, the basic edit actions may be grouped to macros to express more complex operations such as swapping or moving of edges and nodes, duplicating of subgraphs, or partial unfolding of loops.

The simulation-based edit distance does not respect the OG’s formulae. One possibility to achieve a matching would be to ﬁrst calculate the most similar simulating service using the edit distance for Def. 1 and then to simply add and remove all nodes and edges necessary in a second step. Using the weighted quantitative simulation function of Def. 1, the resulting edit actions (cf. Table 1) simply inserts or removes edges to present nodes rather than to new nodes. This approach does in general not work to achieve matching with an OG. See Fig. 6 for a counterexample. However, also the insertion of nodes would not determine the most similar partner service, because this may result in sub-optimal solutions as Fig. 7 illustrates.

4.2 Combining Formula-Checking and Graph Similarity

Due to the suboptimal results achieved by a-posteriori formula satisfaction by node insertion, we need to modify the algorithm of [18] not to statically take the outgoing transitions of an OG’s state into account, but also check any formula- fulﬁlling subset of outgoing transitions. Therefore, we need some additional def- initions to base formula satisfaction and to cover the dynamic presence of OG transitions.

Deﬁnition 2 (Satisfying label set, label permutation). Let S = [Q_S, δ_S, F_S, q₀_S, I] be a service automaton and O= [Q_O, δ_O, F_O, q₀_O, I] an OG, and let q1∈Q_S andq2∈Q_O.

– DeﬁneSat(ϕ(q2))⊆ P(I∩ {b| ∃q₂∈Q_O:q2

−b

→q₂})to be the set of all sets of labels of transitions leaving q₂ that satisfy formulaϕ of stateq₂.

– Forβ ∈Sat(ϕ(q2)), deﬁne perm(q1, q2, β)

(I∪ {ε})×(I∪ {ε}) to be a label permutation ofq₁,q₂ andβ such that:

(a) if q1

−→a q₁, then(a, c)∈perm(q1, q2, β)for a label c∈β∪ {ε},

(b) ifq₂−→^b q₂andb∈β, then(d, b)∈perm(q₁, q₂, β)for a labeld∈I∪ {ε}, (c) (ε, ε)∈/ perm(q1, q2, β), and

(d) if(a, b)∈perm(q1, q2, β), then(a, c),(d, b)∈/perm(q1, q2, β)for all labels c∈β∪ {ε}and all labelsd∈I∪ {ε}.

– DeﬁneP erms(q1, q2, β)to be the set of all label permutations ofq1,q2andβ.

(a)

?a ?b

true

(b)

(c)

(d)

Fig. 6.Matching cannot be achieved solely by transition insertion. The service automaton (a) does not match with the OG (b) because of a missing ?b-branch. In service automaton (c), a loop edge was inserted. However, the state reached by?bin the OG requires a?c-branch to be present. After inserting this edge (d), the resulting service automaton is not simulated by the OG (b).

(a)

a! !b

?c?d?e

true

?c true true

true

?d ?e

(b)

?c ?d ?e

(c)

(d)

Fig. 7.Adding states to a simulating service automaton may yield sub-optimal results.

The service automaton (a) does not match with the OG (b), because the formula (?c∧?d∧?e) is not satisﬁed. The OG, however, perfectly simulates the service automaton (a), and adding two edges achieves matching (c). However, changing the edge label of (a) from!ato!balso achieves matching, but only requires a single edit action (d).

The set Sat consists of all sets of labels that fulfill a state’s formula. For example, consider the OG in Fig. 3(b): For stateq2 of the OG Oagency⊕airline, we haveSat(ϕ(q2)) ={{?confirmation,?refusal}}. Likewise,Sat(ϕ(q3)) ={{?offer}, {!payment},{?offer,!payment}}.

The setP ermsconsists of all permutations of outgoing edges of two states.

In a permutation, each outgoing edge of a state of the service automaton has to be present as ﬁrst element of a pair (a), each outgoing edge of a state of the OG that is part of the label setβ has to be present as second element of a pair (b).

As the number of outgoing edges of both states may be diﬀerent,ε-labels can occur in the pairs, but no pair (ε, ε) is allowed (c). Finally, each edge is only allowed to occur once in a pair (d).

For β = {?conﬁrmation,?refusal} and state q1 of the service automaton S1

in Fig. 3(a), {(?confirmation,?confirmation),(ε,?refusal)} is one of the permutations in P erms(q1,q2, β). Another permutation is {(?confirmation,?refusal), (ε,?confirmation)}. The permutations can be interpreted like the label pairs of the simulation edit distance: (?confirmation,?confirmation) describes a keeping of ?confirmation, (?confirmation,?refusal) describes changing ?confirmation to ?refusal, and (ε,?refusal) the insertion of a ?refusal transition. The insertion and deletion has to be adapted to avoid incorrect or sub-optimal results (see Fig. 6–7).

Deﬁnition 3 (Subgraph insertion, subgraph deletion). Let S = [Q_S, δ_S, F_S, q0_S, I] be a service automaton andO= [Q_O, δ_O, F_O, q0_O, I]an OG. Deﬁne

ins(q2) =

⎧⎨

⎩

1, if q₂∈F_O,

(1−p) + max

β∈Sat(ϕ(q2))

|β| ·

b∈β

L(ε, b)·ins(δ_O(q₂, b)), otherwise,

del(q1) =

⎧⎪

⎨

⎪⎩

1, if q1∈F_S,

(1−p) +p n ·

q₁−→^a ^q1

L(a, ε)·del(q₁), otherwise,

wherenis the number of outgoing edges of q1.

Functionins(q2) calculates the insertion cost of the optimal subgraph of the OG O beginning at q2 which fulﬁlls the formulae. Likewise,del(q1) calculates the cost of deletion of the whole subgraph of the service automatonS from stateq1. Both functions only depend on one of the graphs; that is, ins and del can be calculated independently from the service automaton and the OG, respectively.

Deﬁnition 3 actually does not insert or delete nodes, but only calculates the similarity value of the resulting subgraphs. Only this similarity is needed to ﬁnd the most similar partner service and the actual edit actions can be easily derived from the state from which nodes are inserted or deleted (cf. Table 1).

With Def. 2 describing means to respect the OG’s formulae and Def. 3 cop- ing with insertion and deletion, we can ﬁnally deﬁne the weighted quantitative matching function:

Deﬁnition 4 (Weighted quantitative matching). Let S = [Q_S, δ_S, F_S, q0_S, I] be a service automaton andO= [Q_O, δ_O, F_O, q0_O, I] an OG. Aweighted quantitative matchingis a function M :Q_S ×Q_O →[0,1], such that:

M(q₁, q₂) =

1, if (q1∈F_S∧q2∈F_O), (1−p) +W1(q1, q2), otherwise,

W1(q1, q2) = max

β∈Sat(ϕ(q2)) max

P∈P erms(q1,q2,β)

|P|·

(a,b)∈P

W2(q1, q2, a, b),

W2(q1, q2, a, b) =

⎧⎪

⎨

⎪⎩

L(a, b)·M(δ_S(q1, a), δ_O(q2, b)), if(a=ε∧b=ε), L(ε, b)·ins(δ_O(q2, b)), ifa=ε,

L(a, ε)·del(δ_S(q₁, a)), otherwise.

The weighted quantitative matching function is similar to the weighted quantitative simulation function (Def. 1). It recursively compares the states of the service automaton and the OG, but instead of statically taking the OG’s edges into consideration, it uses the formulae and checks all satisfying subsets (W1).

Additionally,W2 organizes the successor states determined by the labelsaand b, or the insertion or deletion.

4.3 Matching-Based Edit Distance

Again, we can straight-forwardly extend the weighted quantitative matching function towards an edit distance, because the permutations give information how to modify the graph. Keeping and modiﬁcation of transitions is handled as in Table 1, whereas adding and deletion of nodes can be derived from Def. 3.

In fact, the weighted quantitative matching function is not a classical distance.

It expresses the similarity between a service automaton and an OG (i. e., a characterization of many service automata) and is hence not symmetric. We still use the term “edit distance” to express the concept of a similarity measure from which edit actions can be derived.

Consider the example from Fig. 3. During the calculation ofM(q₁,q₂), the permutation{(?confirmation,?confirmation),(ε,?refusal)}is considered. The first label pair denotes that the?confirmationtransition is kept unmodified. The second label pair denotes an insertion of a ?refusal transition. The value of this insertion is defined by

L(ε,?refusal)·ins(δ_Oagency⊕airline(q2,?refusal)) =L(ε,?refusal)·ins(q4)

=L(ε,?refusal) and only depends on the similarity functionL.

?offer

?confirmation

!booking

!payment

!rejection

?refusal

keep transition "?offer" to state q6

keep transition "!booking" to state q7 keep transition "!rejection" to state q8

keep transition "!payment" to state q1

keep transition "?confirmation" to state q8 insert transition "?refusal" to new state q9 q₅

q₆

q₇ q₁

q8 q

Fig. 8.Matching-based edit distance applied to the customer’s service

Figure 8 shows the result of the application of the matching-based edit distance to the service automaton of Fig 3(a). The states are annotated with edit actions. The service automaton was automatically generated from a BPEL pro- cess and the state in which a modiﬁcation has to be made can be mapped back to the original BPEL activity. In the example, a receive activity has to be replaced by apickactivity with an additionalonMessagebranch to receive the refusal message.

Dalam dokumen Lecture Notes in Computer Science 5240 (Halaman 150-154)