Learning and Deducing Temporal Orders

(1)

Learning and Deducing Temporal Orders

Item Type Article

Authors Fan, Wenfei;Tugay, Resul;Wang, Yaoshu;Xie, Min;Ali, Muhammad Asif

Citation Fan, W., Tugay, R., Wang, Y., Xie, M., & Ali, M. A. (2023).

Learning and Deducing Temporal Orders. Proceedings of the VLDB Endowment, 16(8), 1944–1957. https://

doi.org/10.14778/3594512.3594524 Eprint version Publisher's Version/PDF

DOI 10.14778/3594512.3594524

Journal PROCEEDINGS OF THE VLDB ENDOWMENT

Rights This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/

by-nc-nd/4.0/ to view a copy of this license Download date 2023-12-03 18:24:42

Item License https://creativecommons.org/licenses/by-nc-nd/4.0/

Link to Item http://hdl.handle.net/10754/692657

(2)

Learning and Deducing Temporal Orders

Wenfei Fan

Shenzhen Institute of Computing Sciences, China

University of Edinburgh United Kingdom Beihang University, China

[email protected]

Resul Tugay

University of Edinburgh United Kingdom [email protected]

Yaoshu Wang

^∗

Min Xie

Shenzhen Institute of Computing Sciences, China

[email protected] [email protected]

Muhammad Asif Ali

King Abdullah University of Science and Technology Kingdom of Saudi Arabia

muhammadasif.ali

@kaust.edu.sa

ABSTRACT

This paper studies how to determine temporal orders on attribute values in a set of tuples that pertain to the same entity, in the absence of complete timestamps. We propose a creator-critic framework to learn and deduce temporal orders by combining deep learning and rule-based deduction, referred to asGATE(Get the lATEst). The creator ofGATEtrains a ranking model via deep learning, to learn temporal orders and rank attribute values based on correlations among the attributes. The critic then validates the temporal orders learned and deduces more ranked pairs by chasing the data with currency constraints; it also provides augmented training data as feedback for the creator to improve the ranking in the next round.

The process proceeds until the temporal order obtained becomes stable. Using real-life and synthetic datasets, we show thatGATE is able to determine temporal orders with𝐹-measureabove 80%, improving deep learning by 7.8% and rule-based methods by 34.4%.

PVLDB Reference Format:

Wenfei Fan, Resul Tugay, Yaoshu Wang, Min Xie, and Muhammad Asif Ali.

Learning and Deducing Temporal Orders. PVLDB, 16(8): 1944 - 1957, 2023.

doi:10.14778/3594512.3594524

PVLDB Artifact Availability:

The source code, data, and/or other artifacts have been made available at https://github.com/resultugay/GATE.

1 INTRODUCTION

Real-life data keeps changing. As reported by Royal Mail, on average, “9590 households move, 1496 people marry, 810 people divorce, 2011 people retire and 1500 people die” each day in the UK [60]. It is estimated that inaccurate customer data costs organizations 6% of their annual revenues [60]. Outdated data incurs damage not only to Royal Mail. When the data at a search engine is out of date, a restaurant search may return a business that had closed three years ago. When the data about the condition of infrastructure assets is obsolete, it may delay the maintenance of equipment and cause outage. Moreover, data-driven decisions based on outdated data can be worse than making decisions with no data [49]. Indeed, “as a healthcare, retail, or financial services business you cannot afford to

∗Corresponding author

This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 16, No. 8 ISSN 2150-8097.

doi:10.14778/3594512.3594524

make decisions based on yesterday’s data” [25]. Unfortunately, “82%

of companies are making decisions based on stale information” [8].

These highlight the need for determining the currency of data, i.e.,how up-to-date the information is. This is, however, highly nontrivial. Consider a set of tuples that pertain to the same entity.

Their attribute values may become obsolete and inaccurate over the time. Worse yet, onlypartialreliable timestamps might be available, where a timestamp isreliableif it isprecise,correctand moreover, it indicates that at the time, the values arecorrectandup-to-date.

Apart from mechanical reasons (malicious attacks or hardware failures), the logical reasons below are the major temporal issues in data quality [5], and account for the absence of reliable timestamps.

(1) Missing timestamps.Timestamps may simply not be recorded, e.g.,in an e-health database [40], only 16 out of 26 relations are timestamped. Even when a relation has timestamps, it may not be complete,e.g.,25.36% missing in [53], up to 82.28% in [40].

(2) Imprecise timestamps.Timestamps may be too coarse, leading to unreliable ordering. An e-form may be submitted multiple times to an OA system by employees during a day. If the forms are recorded in datestamps, it is not clear which form (all on the same day) is the latest. Similar problems are often encountered in hospital data, where only dates are recorded [5]. Another example concerns in- consistent granularity of timestamps (e.g.,minutes vs. days [5, 40]).

If two values have timestamps “12-8-2021” and “12-8-2021 20:41”, it is not clear which one is more up-to-date. As reported in [53], 90.41% of appointment records have imprecise timestamps.

(3) Incorrect timestamps.Many factors can lead to incorrect timestamps. Taking medical data [5] as an example, an X-ray machine has many asynchronous modules, each of which has a local clock and a local buffer. There can be a discrepancy between when a value is actually updated and when it is recorded since the value is first queued in the buffer before it is recorded. Moreover, 36.96% of appointments have the overlapping issue (i.e.,the next appointment in a particular room appears to have started before the current one has ended) [53], indicating incorrect timestamps.

As remarked earlier by Royal Mail [60], customer data changes frequently. To prevent the data from being outdated, the sales de- partment may contact the customers and get regular updates on somecriticalinformation (e.g.,email and phone). Due to the im- patience of customers and high cost (e.g.,man power) of contact, the rest (e.g.,jobs, marital status) may not be frequently updated and may gradually become obsolete, no longer having a reliable timestamp. Add to this the complication that data values are often copied or imported from other sources [21, 22], and even in the

(3)

FN LN sex address status job kids SZ 𝑡₁, 𝑒₁: Mary Goldsmith F 19 xin st single n/a - 5 𝑡₂, 𝑒₁: Mary Taylor F 6 gold plaza married journalist 1 6 𝑡₃, 𝑒₁: Mary Taylor F 19 mall st espoused assoc editor 2 7 𝑡₄, 𝑒₁: Mary Taylor F 7 ave divorced chief editor 2 7 𝑡₅, 𝑒₁: Mary Taylor F 7 ave detached chief editor 2 7 𝑡₆, 𝑒₁: Mary Goldsmith F 6 const. ave wed producer 3 7

Figure 1: Customer records dataset

same database, values may come from multiple relations (e.g.,by join), where no uniform scheme of timestamps is granted,e.g.,some values are more frequently timestamped than the others [40]. This calls for an attribute-level time-stamping scheme (i.e.,each value has a timestamp). It subsumes the tuple-level scheme (i.e.,all values in the same tuple have the same timestamp) as a special case.

No matter how desirable, the percentage of reliable timestamps is far lower than expected. How can we determine the temporal orders on the attribute values,i.e.,for tuples𝑡₁and𝑡₂, whether the value in the𝐴-attribute of𝑡₁is more current than the value in the𝐴-attribute of𝑡₂, denoted by𝑡₂≺𝐴𝑡₁, in the absence of complete timestamps?

Example 1: Consider customer records𝑡₁-𝑡₆shown in Figure 1, which have been identified to refer to the same person Mary. Each tuple𝑡_𝑖has attributesFN(first name),LN(last name),sex,address, maritalstatus,job,kidsandSZ(shoe size). Some attribute values of these tuples have become stale since Mary’s data changes over the years,e.g.,her job, address and last name have changed 4 times, five times and twice, respectively. Only some attribute values might be associated with reliable timestamps,e.g.,the timestamp of𝑡₅[job]

and𝑡₆[job]are 2016 and 2019 (not shown), respectively, indicating that at that time, the values are up-to-date. In the absence of complete timestamps, it is hard to know whether𝑡₂≺_LN𝑡₆,i.e.,whether the value of𝑡₂[LN]is more up-to-date than𝑡₆[LN]? Moreover, what Mary’s current job title is,e.g.,whether𝑡_𝑖≺_job𝑡₆for𝑖∈ [1,4]? 2 One may want to approach this problem by training a ranking model that sorts objects according to their degrees of relevance or importance [51]. The state-of-the-art systems in this regard employ deep learning [7, 36, 57, 63] or reinforcement learning [37, 68], and have been used in search engine [11] and machine translation [24, 70]. By means of ranking models one can learn temporal orders and decide whether𝑡₁ ≺𝐴 𝑡₂foralltuples𝑡₁, 𝑡₂and attribute𝐴. However, it is hard to justify whether the ranking conforms to the temporal orders in the real world. For data-driven decision making, we need to ensure that the learned orders are reliable. Moreover, these approaches cannot explain the ranking of objects that follow a complex interlinked structure (e.g.,address in Figure 1).

Another approach is to employ logic rules,e.g.,currency constraints (CCs) [19, 23, 28, 30, 44, 48, 66]. These rules help us deduce temporal orders. Taking Example 1 as an example, the shoe sizes of the same person typically monotonically increases (before 20 years old), and the address of a person may be associated with the marital status. As will be seen in Section 2, usingCCsone can deduce that 𝑡₂ ≺_SZ 𝑡₃and𝑡₁ ≺_address 𝑡₂. However, it is hard to find enough rules to deduce relative orders oneach and everypair of values.

For instance, we cannot find rules to decide whether𝑡₂ ≺_LN 𝑡₆. When testing existing rule-based methods on a dataset with 5%

initial timestamps, even the best of them can only deduce 16.3%

temporal orders (Section 6), leaving the remaining 78.7% undecided.

Besides, it is hard to generalize rules to handle lexically different

but semantically similar values, since tuples might be extracted from different sources (e.g.,statusin Figure 1: married vs. wed).

Neither deep/ML learning nor logic rules work well when used separately. A natural question is whether it is possible to combine ML models and logic rules in a uniform framework, such that we can learn temporal orders, and use the rules to validate the ranking and improve the learning? How well can this framework improve the accuracy of the ML models and logic rules when being used alone?

Contributions & organization. This paper makes a first effort to answer these questions, all in the affirmative.

(1) Temporal orders(Section 2). We define the notion of temporal orders𝑡₁ ≺𝐴 𝑡₂ and𝑡₁ ⪯𝐴 𝑡₂ on attributes, and formulate the problem for determining temporal orders. We also review the currency constraints (CCs) of [30], for deducing temporal orders by our framework. We show thatCCscan specify interesting temporal properties,e.g.,the monotonicity, comonotonicity and transitivity.

(2)GATE(Section 3). We proposeGATE(Get the lATEst), a creator- critic framework to determine temporal orders by combining deep learning and logic deduction.GATEiteratively invokes a creator to rank the temporal orders on attribute values, followed by the critic that validates the ranking of the creator and deduces more ranked pairs via discoveredCCs. The critic also produces augmented training data for the creator to improve its ranking in the next round.

This process proceeds for the creator and critic to mutually enhance each other, until the temporal order cannot be further improved.

(3) Creator(Section 4). We propose a deep learning model under- lying the creator ofGATE, to learn temporal orders on attribute values. It departs from previous ranking models in that it learns the temporal orders based on contexts (i.e.,attribute correlations) and calculates the confidence of the ranking, by employing chronological embeddings and adaptive pairwise ranking strategies.

(4) Critic(Section 5). The critic complementsGATEwith discovered rules (CCs). We show how it justifies the ranked pairs learned by the creator, and (incrementally) deduces latent temporal orders with the chase [62] usingCCs. The chase has the Church-Rosser property (cf. [2]),i.e.,it guarantees to converge at the same result no matter in what orders theCCsare applied. Moreover, the critic augments the training data for the creator to improve its model in the next round.

(5) Experimental study (Section 6). Using real-life and synthetic data, we find that (a) the𝐹-measureofGATEon datasetCareer[41]

is 0.866, versus 0.41 and 0.39 by rule-basedUncertainRule[44] and Improve3C[18] (resp. 0.54 and 0.53 by ML-basedRANK_Bert[56]

andDitto_Rank[47]). (b) On average,GATEis 34.4% and 7.8% more accurate than the critic and the creator, respectively, verifying the effectiveness of combining deep learning and logic rules. (c) GATEis feasible in practice; it only takes 7 rounds to terminate on a real-life dataset of 1,983,698 tuples, with a single machine.

Related work.We categorize the related work as follows.

Rule-based methods. Currency constraints (CCs) were first studied in [29, 30] by employing partial currency orders and later extended in [27, 28] for conflict resolution, by considering bothCCsand con- ditional functional dependencies (CFDs) [26]. A class of currency repairing rules (CRRs) was proposed in [43], which combines logic rules and statistics. A two-step approach was developed in [17] to

(4)

determine the currency of dynamic data, by means of a topolog- ical graph. A class of uncertain currency rules was supported by UncertainRule[44] that considers both temporal orders and data certainty.Improve3C[18] andImp3C[19] are 4-step data cleaning frameworks, including data consistency, completeness and currency, which use a metric to repair noisy data usingCFDs[26] and currency orders. There has also been work [23] on parallel incremental updating algorithms by employing traditional currency rules.

This work differs from the prior work. To the best of our knowledge, we make the first effort to combine deep learning and rules for timeliness, for the two to enhance each other, while the prior work at most extends rules with statistic. We propose a model for inferring temporal orders and use logic deduction to derive more.

ML models. There have also been efforts on inferring temporal orders by ML models [6, 10, 35, 54, 55, 64]. A related topic is to in- corporate temporal information into knowledge graphs,e.g.,for temporal link predictions [20, 61] and reasoning [16, 65, 69]. These methods often embed temporal information in ML models, for infer- ence varying over time. Learning temporal orders has been modeled as a learning-to-rank problem (pointwise, pairwise and listwise), where global orders are learned for currency attributes. Temporal problems have also been studied in,e.g.,search and recommenda- tion [31, 67], IR [50] and NLP [47, 56]. In particular, ranking in NLP (e.g.,in question-answering tasks) can be handled by pre-trained language models with cross-entropy loss or by a ranking function with a binary classifier as the comparison operator; we adopted a baseline from each kind (i.e.,RANK_BertandDitto_Rank) in Section 6.

In contrast, we propose a creator-critic framework that not only learns temporal orders via deep learning, but also adopts rules to deduce new ones and help ML models learn better. The prior models can be embedded into our framework.

Entity resolution (ER). Temporal clustering methods were proposed in [46] based on a concept of time decay, to improve ER with timestamps. [12] dynamically assesses and adjusts similarities among records based on their attributes and temporal differences. [42] es- timates the probability that a value changes, and shows how to link tuples in a correct time period. To cope with incorrect timestamps, [66] adopts matching dependencies (MDs) and currency constraints to capture the attributes that were changed over time.

This work focuses on deducing temporal orders of each entity, while the prior work mainly adopts temporal information for ER.

2 DETERMINING TEMPORAL ORDERS

In this section we first formulate temporal orders and the problem for determining temporal orders (Section 2.1). We then review currency constraints (CCs) of [30] and show that such rules are able to express monotonicity, comonotonicity and transitivity (Section 2.2).

2.1 The Problem

Consider a relation schema𝑅 = (EID, 𝐴₁, . . . , 𝐴_𝑛), where𝐴_𝑖 is an attribute (𝑖 ∈ [1, 𝑛]), andEIDis an entity id as introduced by Codd [14]. For a tuple𝑡of schema𝑅, we denote by𝑡[𝐴]the value in the𝐴-attribute of𝑡, and by𝑡[EID]the entity represented by tuple 𝑡. We assume that each tuple is associated with a tuple id, denoted bytid. We refer to a relation𝐷of𝑅as anormal instance𝐷of𝑅.

Temporal orders and temporal instances. Anentity instanceis (𝐷_𝑒, 𝑇_𝑒), where (a)𝐷_𝑒is a normal instance of schema𝑅such that all tuples𝑡₁and𝑡₂in𝐷_𝑒refer to the same real-life entity𝑒and hence, 𝑡₁[EID] =𝑡₂[EID]; and (b)𝑇_𝑒is a partial function that associates a timestamp𝑇_𝑒(𝑡[𝐴])with the𝐴-attribute of a tuple𝑡in𝐷_𝑒. We refer to(𝐷_𝑒, 𝑇_𝑒)as anentity instance pertaining to𝑒.

Here the timestamp indicates that at the time𝑇_𝑒(𝑡[𝐴]), the𝐴- attribute value of tuple𝑡is correct and up-to-date; it does not nec- essarily refer to the time when𝑡[𝐴]was created or last updated. If 𝑇_𝑒(𝑡[𝐴])is undefined, a reliable timestamp is not available for𝑡[𝐴].

Intuitively, an entity instance extends a normal instance with available timestamps. Its tuples may be extracted from a variety of data sources, and are identified to refer to the same entity𝑒 via entity resolution. In the same tuple𝑡,𝑡[𝐴]and𝑡[𝐵]may bear different timestamps (or even no timestamp) for different𝐴and𝐵. Note that we do not assume a timestamp for the entire tuple, since we often find only parts of a tuple to be correct and up-to-date.

Temporal orders. Atemporal orderon attribute𝐴of𝐷_𝑒is a partial order⪯𝐴such that for all tuples𝑡₁and𝑡₂∈𝐷_𝑒,𝑡₂⪯𝐴𝑡₁if the value in𝑡₁[𝐴]is at least as current as𝑡₂[𝐴]. We also use a strict partial order𝑡₂≺𝐴𝑡₁if𝑡₁[𝐴]is more current than𝑡₂[𝐴]. To simplify the discussion we focus on⪯𝐴in the sequel;≺𝐴is handled analogously.

In particular, if𝑇_𝑒(𝑡₁[𝐴])and𝑇_𝑒(𝑡₂[𝐴])are both defined and if 𝑇_𝑒(𝑡₂[𝐴]) ≤𝑇_𝑒(𝑡₁[𝐴]),i.e.,when timestamp𝑇_𝑒(𝑡₁[𝐴])is no earlier than𝑇_𝑒(𝑡₂[𝐴]), then𝑡₂ ⪯𝐴 𝑡₁,i.e.,𝑡₁[𝐴] is confirmed at a later timestamp and is thus considered at least as current as𝑡₂[𝐴].

Temporal order⪯𝐴is represented as a set of tuple pairs such that(𝑡₂[tid], 𝑡₁[tid]) ∈ ⪯𝐴iff𝑡₂⪯𝐴𝑡₁. We write(𝑡₂[tid], 𝑡₁[tid])as (𝑡₂, 𝑡₁)if it is clear in the context. Note that the same value may bear different timeliness in different tuples,e.g.,Mary’s marital status changed from married (𝑡₂) to divorced (𝑡₄) to married (𝑡₇, not shown in Figure 1). While𝑡₂[status] = 𝑡₇[status],𝑡₂ ≺_status 𝑡₇. Here 𝑡₂≺_status𝑡₇ranks the timeliness of thestatus-attributes of tuples 𝑡₂and𝑡₇, not values (married vs. married) detached from the tuples.

We say that a temporal order⪯¹_𝐴extends⪯²_𝐴, written as⪯²_𝐴⊆⪯_𝐴¹, if for all tuples𝑡₁, 𝑡₂in𝐷_𝑒, if𝑡₂⪯²_𝐴𝑡₁is defined, then so is𝑡₂⪯_𝐴¹ 𝑡₁. That is,⪯_𝐴¹ includes all tuple pairs in⪯_𝐴² and possibly more.

Temporal instances. Atemporal instance𝐷_𝑡of𝑅is given as(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇), where each⪯𝐴_𝑖 is a temporal order on𝐴_𝑖(𝑖∈ [1, 𝑛]), 𝐷=Ð

𝑖∈ [𝑘]𝐷_𝑒

𝑖,𝑇 =Ð

𝑖∈ [𝑘]𝑇_𝑒

𝑖, and for all𝑖∈ [1, 𝑘],(𝐷_𝑒

𝑖, 𝑇_𝑒

𝑖)is an entity instance of𝑅. Here tuples𝑡₁and𝑡₂in𝐷arecompatible under⪯𝐴if they pertain to the same entity,i.e.,𝑡₁[EID]=𝑡₂[EID].

Intuitively,𝐷_𝑡is a collection of entity instances, such that each (𝐷_𝑒

𝑖, 𝑇_𝑒

𝑖)pertains to the same entity𝑒_𝑖. We do not rank the currency of tuples if they refer to different entities. A temporal instance𝐷_𝑡= (𝐷 ,⪯𝐴₁, . . . ,⪯𝐴𝑛, 𝑇) is said toextendanother temporal instance 𝐷^′

𝑡=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇)if for all𝑖∈ [1, 𝑛],⪯𝐴_𝑖extends⪯^′_𝐴

𝑖. Problem statement. We study the problem fordetermining the temporal ordersof a temporal instance, stated as follows.

◦ Input: A temporal instance𝐷_𝑡 =(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇).

◦ Output: An extended temporal instance𝐷^′

𝑡=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇)such that for all𝑖∈ [1, 𝑛], (a)⪯^′_𝐴

𝑖extends⪯𝐴_𝑖 and (b)⪯^′_𝐴

𝑖 is a total order on all compatible𝑡₁and𝑡₂with𝑡₁[EID]=𝑡₂[EID].

(5)

Intuitively, our goal is to extend⪯𝐴_𝑖such that for all tuples𝑡₁ and𝑡₂in𝐷, if𝑡₁[EID]=𝑡₂[EID], we can decide which of𝑡₁[𝐴_𝑖]and 𝑡₂[𝐴_𝑖]is more up-to-date. As a consequence, we can deduce the latest value for all attributes. Note that we define a total order oneach attribute, not a global order on tuples. The total orders on different attributes can be different. This said, we can use the temporal orders learned on one attribute to help the deduction on other attributes, via correlation expressed as currency constraints (see below).

2.2 Currency Constraints

We next review the class of currency constraints proposed in [30].

The predicates over a relation schema𝑅are defined as:

𝑝 ::= 𝑡[𝐴] ⊕𝑐 | 𝑡₁[𝐴] ⊕𝑡₂[𝐴] | 𝑡₁⪯𝐴𝑡₂,

where𝑡 , 𝑡₁, 𝑡₂are tuple variables denoting tuples of𝑅,𝐴is an attribute of𝑅,𝑐is a constant,⊕is an operator from{=,≠,>,≥,<,≤};

𝑡[𝐴] ⊕𝑐and𝑡₁[𝐴] ⊕𝑡₂[𝐴]are defined on attribute values, while 𝑡₁⪯𝐴𝑡₂compares the timeliness of𝑡₁[𝐴]and𝑡₂[𝐴]. In particular, 𝑡₁[EID]=𝑡₂[EID]says that𝑡₁and𝑡₂refer to the same entity.

Acurrency constraint(CC) over schema𝑅is defined as follows:

𝜑=𝑋 →𝑝₀,

where𝑋 is a conjunction of predicates over𝑅with tuple variables 𝑡₁, . . . , 𝑡_𝑚, and𝑝₀has the form𝑡_𝑢 ⪯𝐴_𝑖 𝑡_𝑣for𝑢, 𝑣∈ [1, 𝑚]. We refer to𝑋as thepreconditionof𝜑, and𝑝₀as theconsequenceof𝜑.

As defined in [30], aCCcan be equivalently expressed as a universal first-order logic sentence of the following form:

𝜑=∀𝑡₁, . . . , 𝑡_𝑚 Û

𝑗∈ [1,𝑚]

(𝑡₁[EID]=𝑡_𝑗[EID]) ∧𝑋 →𝑡_𝑢 ⪯𝐴𝑡_𝑣 .

Semantics. Currency constraints are defined over temporal in- stances𝐷_𝑡 = (𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇)of𝑅. Avaluationof tuple variables of aCC𝜑in𝐷_𝑡, or simplya valuation of𝜑, is a mappingℎ that instantiates variables𝑡₁, . . . , 𝑡_𝑚with tuples in𝐷that refer to the same real-world entity, as required by𝑡₁[EID]=𝑡_𝑗[EID].

A valuationℎsatisfiesa predicate𝑝over𝑅, written asℎ |=𝑝, if the following is satisfied: (1) if𝑝is𝑡[𝐴] ⊕𝑐 or𝑡₁[𝐴] ⊕𝑡₂[𝐴], then it is interpreted as in tuple relational calculus following the standard semantics of first-order logic [2]; and (2) if𝑝is𝑡₁⪯𝐴𝑡₂, then𝑡₂[𝐴]is at least as current as𝑡₁[𝐴]i.e.,(𝑡₁, 𝑡₂)is in⪯𝐴.

For a conjunction𝑋of predicates, we writeℎ|=𝑋ifℎ|=𝑝forall 𝑝in𝑋. A temporal instance𝐷_𝑡satisfiesCC𝜑, denoted by𝐷_𝑡 |=𝜑, if forallvaluationsℎof𝑋→𝑝₀in𝐷_𝑡, ifℎ|=𝑋 thenℎ|=𝑝₀. Properties. Currency constraints are able to specify interesting temporal properties. Below we exemplify some properties.

Monotonicity. A temporal order⪯𝐴over relation schema𝑅ismono- tonicif for any tuples𝑡₁and𝑡₂of𝑅that refer to the same entity, if 𝑡₁[𝐴] ≤𝑡₂[𝐴]then𝑡₁ ⪯𝐴𝑡₂. For instance, consider theSZ(shoe size) attribute of the customer relation of Example 1. Then⪯_SZis monotonic. It can be expressed as the followingCC:

𝜑₁=𝑡₁[SZ] ≤𝑡₂[SZ] →𝑡₁⪯_SZ𝑡₂.

As another example, marital status only changes from single to married, not the other way around [9]. This is expressed asCC:

𝜑₂=𝑡₁[status] =“single”∧𝑡₂[status]=“married”→𝑡₁⪯_status𝑡₂. With slight abuse of terminologies by considering timestamps as an “attribute” associated with attribute𝐴, we have:

𝜑₃=𝑇_𝑒(𝑡₁[𝐴]) ≤𝑇_𝑒(𝑡₂[𝐴]) →𝑡₁⪯𝐴𝑡₂,

since𝑡₂[𝐴]is confirmed at a later timestamp.

Comonotonicity. For attributes𝐴and𝐵of𝑅, we say⪯𝐴and⪯𝐵are comonotonicin a temporal instance𝐷_𝑡of𝑅if for all tuples𝑡₁and𝑡₂ in𝐷_𝑡that refer to the same entity,𝑡₁⪯𝐴𝑡₂if and only if𝑡₁⪯𝐵𝑡₂. Intuitively,⪯𝐴and⪯𝐵are comonotonic if the two are correlated such that when one is updated, the other will also change. For instance,⪯_status and⪯_address are often comonotonic: when the marital status of a person changes from single to married, this person may move to a larger house. This can be expressed as aCC:

𝜑₄=𝑡₁⪯_status𝑡₂→𝑡₁⪯_address𝑡₂.

Transitivity.Transitivity can also be expressed for any attribute𝐴: 𝜑₅=𝑡₁⪯𝐴𝑡₂∧𝑡₂⪯𝐴𝑡₃→𝑡₁⪯𝐴𝑡₃.

Correlating different attributes.One can correlate multiple different attributes to capture implicit temporal ordering. For example, marital status may change from “married" to “divorced" and further from “divorced" to “married" again. To deduce an order on𝑡[status], we can use the number of kids as an additional condition:

𝜑₆=𝑡₁[status]=“married”∧𝑡₂[status]=“divorced”∧ 𝑡₁[kids] <𝑡₂[kids] →𝑡₁⪯_status𝑡₂.

Deduction. Making use ofCCs, we can deduce certain temporal orders,e.g.,from𝜑₁-𝜑₆and Figure 1, we can deduce the following:

◦ 𝑡₁⪯_status𝑡₂by𝜑₂,𝑡₂⪯_status𝑡₄by𝜑₆and𝑡₁⪯_status𝑡₄by𝜑₅;

◦ 𝑡₁⪯_address𝑡₂by𝜑₄; hence𝑡₂[address]is more current for Mary;

◦ 𝑡₁⪯_SZ𝑡₂⪯_SZ𝑡₃by𝜑₁; hence Mary’s current shoe size is 7.

However, we cannot determine whether𝑡₂⪯_LN𝑡₆or𝑡₆⪯_LN𝑡₂by deduction with the currency constraints𝜑₁-𝜑₆.

Discovery ofCCs. As noted in [30],CCsare a special case of denial constraints (DCs) [3] extended with temporal orders⪯𝐴and≺𝐴. Algorithms are in place for discoveringDCs,e.g.,[4, 13, 52, 58]. We can readily extend these algorithms for discoveringCCs(see [1]).

3 GATE: A CREATOR-CRITIC SYSTEM

In this section, we propose a creator-critic framework for determining temporal orders, and develop systemGATEto implement it. A unique feature ofGATEis its combined use of deep learning and logic deduction. Below we start with the architecture ofGATE, and then present its overall workflow, with termination guarantee.

Architecture. The ultimate goal ofGATEis to obtain a total order

⪯𝐴for each attribute𝐴. As shown in Figure 2,GATEfirst discovers a setΣofCCson𝐷_𝑡offlinefor performing logic deduction. Then it takes a temporal instance𝐷_𝑡=(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴𝑛, 𝑇)as input, and learns and deduces more temporally ranked pairs for𝐷_𝑡 online.

For simplicity, we assumew.l.o.g.that𝐷consists of a single entity instance𝐷_𝑒,i.e.,all tuples in𝐷pertain to the same entity𝑒and thus their attributes can be pairwisely compared; the methods of this paper can be readily extended to𝐷_𝑡with multiple entity instances.

More specifically, the learning and deducing process inGATEit- erativelyexecutes two phases, namely, creator and critic, as follows.

(1) Creator (Sections 4). In this phase,GATE(incrementally) trains a ranking modelM_rankvia deep learning. By taking𝐷_𝑡 and the augmented training data𝐷_aug(see its definition shortly) from the critic as input, it predicts new orders for extending each⪯𝐴of𝐷_𝑡.

(6)

Temporal Instance

Ranking Model Predicted Temporal Orders

Creator Critic

Conflict Resolution Augmented Training Data Chasing with

CCs Training

Data

incremental training

offline

Currency Constraints

Figure 2: Architecture of GATE

Our model has three features: (a) For each tuple𝑡in𝐷, the embedding for𝑡[𝐴]is created using both𝑡[𝐴]and other correlated values in𝑡, so that𝑡[𝐴]can be ranked comprehensively. (b) The attribute embeddings [15] are arranged to preserve chronological orders. (c) We adopt anattribute-centric adaptive pairwise rankingstrategy in M_rank, so that the ranking result can be justified semantically. More- over, the temporal instance𝐷_𝑡will also be extended based on𝐷_aug. Given an attribute𝐴, we associate each(𝑡₁, 𝑡₂) ∈⪯𝐴with aconfidence, denoted byconf(𝑡₁⪯𝐴𝑡₂), indicating how likely𝑡₁ ⪯𝐴𝑡₂ holds. If𝑡₂[𝐴]has a later timestamp than𝑡₁[𝐴], thenconf(𝑡₁⪯𝐴𝑡₂) is 1. If𝑡₁⪯𝐴𝑡₂is predicted byM_rank, its confidence is from 0 to 1.

We only consider(𝑡₁, 𝑡₂)predicted byM_rankwithconf(𝑡₁⪯𝐴𝑡₂) ≥ 𝛿, where𝛿is a predefined threshold, ascandidate pairsto be extended to⪯𝐴. We denote the set of all candidate pairs of⪯𝐴by⪯^M

𝐴 . The input and output of the creator are as follows:

◦ Input: A temporal instance𝐷_𝑡 = (𝐷 ,⪯𝐴₁, . . . ,⪯𝐴𝑛, 𝑇) and the augmented training data𝐷_augfrom the critic.

◦ Output: An extended temporal instance𝐷^′

𝑡=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯^′_𝐴

𝑛

, 𝑇)based on𝐷_augand the predicted orders(⪯_𝐴^M

1

, . . . ,⪯_𝐴^M

𝑛).

(2) Critic (Sections 5). In this phase, the critic ofGATEjustifies and deduces more temporally ranked pairs, by applyingCCsinΣvia the chase[62]. Denote the result of chasing by⪯_𝐴^Σ. Depending on the validity of⪯^Σ

𝐴, we construct an augmented training data, denoted by𝐷_aug, containing the ranked pairs justified (resp. conflicts caught byCCs);𝐷_augwill be fed back to the creator, for the next round of model learning, so that the creator can learn from more unseen data and get higher accuracy iteratively. Specifically,𝐷_augis a temporal instance extended with a validity flagf_valid,i.e.,𝐷_aug= (𝐷 ,⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇 ,f_valid): (a) iff_validistrue,⪯_𝐴^Σis valid and all temporal order deduced by the chase will be added to𝐷_aug; and (b) otherwise, the chasing is invalid,i.e.,both(𝑡₁, 𝑡₂) ∈⪯𝐴and(𝑡₂, 𝑡₁) ∈⪯𝐴are deduced, and either𝑡₁≺𝐴𝑡₂and𝑡₂≺𝐴𝑡₁. In this case, these two conflictingorders will be added to𝐷_aug, and the creator will be asked to resolve this conflict, by revising its model accordingly.

Formally, the input and output of the critic are as follows:

◦ Input: A temporal instance 𝐷_𝑡 = (𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇), the predicted temporal orders(⪯_𝐴^M

1

, . . . ,⪯_𝐴^M

𝑛

)and a setΣofCCs.

◦ Output: Augmented data𝐷_aug=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇 ,f_valid).

The novelty of the critic consists of (a) the deduction using the chase, and (b) an efficient algorithm implementing the chase.

Workflow.As shown in Figure 3,GATEtakes a temporal instance 𝐷_𝑡=(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇)and a setΣofCCsdiscovered offline as input, and it outputs an extended temporal instance𝐷^′

𝑡=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇)with a total order⪯_𝐴^′ defined for each attribute𝐴. GATEfirst initializes the augmented training data𝐷_aug= (𝐷 ,

⪯_𝐴^′

1

, . . . ,⪯_𝐴^′

𝑛

, 𝑇 ,f_valid=true)(Line 1, details omitted) for the first

Input:A temporal instance𝐷_𝑡=(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇), and a setΣofCCs.

Output:An extended temporal instance𝐷^′ 𝑡=(𝐷 ,⪯^′

𝐴₁, . . . ,⪯^′

𝐴_𝑛, 𝑇)such that for all𝑖∈ [1, 𝑛], (a)⪯^′

𝐴_𝑖extends⪯𝐴

𝑖and (b)⪯𝐴

𝑖is a total order.

1. 𝐷_aug:=Initialize(𝐷_𝑡,f_valid=true,Σ); Initialize the ranking modelM_rank; 2. whiletruedo

3. /* The Creator ofGATE*/

4. 𝐷_𝑡:=Extend(𝐷_𝑡, 𝐷_aug);

5. TrainM_rankincrementally based on𝐷_𝑡; 6. (⪯_𝐴^M

1, . . . ,⪯^M

𝐴_𝑛):=the predicted temporal orders byM_rank; 7. /* The Critic ofGATE*/

8. ( ⪯^Σ 𝐴₁

, . . . ,⪯^Σ

𝐴_𝑛):=Chase(𝐷_𝑡,Σ);

9. 𝐷_aug:=ConstructAugmented(𝐷 ,⪯_𝐴^M

1

, . . . ,⪯_𝐴^M

𝑛

,⪯_𝐴^Σ

1, . . . ,⪯_𝐴^Σ

𝑛

, 𝑇);

10. if𝐷_𝑡no longer changesthen

11. break;

12. 𝐷_𝑡=Extend(𝐷_𝑡,M_rank);

13. return𝐷_𝑡;

Figure 3: Workflow of GATE

round ofGATE, by deducing its initial temporal orders⪯_𝐴^′ viaCCs inΣwhose preconditions do not involve timeliness comparison,e.g., we can create temporal orders for those tuples with available timestamps using𝜑₃. The initialization can be done efficiently by [33].

ThenGATEiteratively executes the creator and the critic in rounds. In the𝑖-th round, (a) the creator extends the temporal in- stance𝐷_𝑡(Line 4, see Section 4) based on the augmented training data𝐷_augreturned by the critic in the(𝑖−1)-th round, by possibly revising its model to resolve the conflicts. Then the creator incrementallytrainsM_rank(Line 5, see Section 4) and predicts new temporal orders(⪯_𝐴^M

1

, . . . ,⪯_𝐴^M

𝑛

)(Line 6), which are candidate pairs (𝑡₁, 𝑡₂)with confidences at least𝛿; and (b) the critic deduces more ranked pairs(⪯_𝐴^Σ

1

, . . . ,⪯^Σ_𝐴

𝑛

) by chasing withCCsinΣ(Line 8).

Based on the result of chasing, the critic constructs augmented training data𝐷_augvia procedureConstructAugmented(Line 9, see Section 5); this𝐷_augis fed back to the creator for the next round.

Finally, when𝐷_𝑡no longer changes (Line 10-11), the iteration ends. If𝐷_𝑡is still not a temporal instance with total orders defined on all attributes, we extend it using procedureExtend(Line 12), such that for each pair(𝑡₁, 𝑡₂), one of(𝑡₁, 𝑡₂)and(𝑡₂, 𝑡₁)learned with a higher confidence is in⪯𝐴, until each⪯𝐴becomes a total order.

Termination.We prove in [1] that eventually,GATEwill terminate (Line 10),i.e.,with more iterations,𝐷_𝑡is gradually extended with more orders and finally, becomes stable and does not change.

Theorem 1:GATEis guaranteed to terminate. 2 Example 2: Continuing with Example 1, assume thatΣ={𝜑₁-𝜑₆} and𝐷_𝑡has empty temporal orders. We first initialize the training data𝐷_augby applyingCCsinΣthat do not compare timeliness in their preconditions,e.g.,we can deduce𝑡₅⪯_job𝑡₆by𝜑₃. Suppose that after trainingM_rankbased on the initial𝐷_aug, no tuple pair predicted byM_rankis at least𝛿-confident in the first round. The creator outputs empty⪯_𝐴^Mfor each𝐴due to the lack of information.

Then by applying theCCsinΣ, the critic can deduce new temporal orders⪯_𝐴^Σ,e.g.,𝑡₁ ⪯_address𝑡₂by𝜑₄and𝑡₁ ⪯_status𝑡₄by𝜑₅. Both

⪯_𝐴^Mand⪯_𝐴^Σwill be used to construct the augmented training data 𝐷_aug, based on which the creator extends𝐷_𝑡and incrementally trainsM_rankin the second round. This time the creator might be able to predict confident temporal orders,e.g.,𝑡₁⪯_LN𝑡₂, based on the augmented training data𝐷_aug. The iteration continues until

(7)

𝐷_𝑡does not change anymore. Each temporal order⪯𝐴in𝐷_𝑡will be extended to a total order byM_rankif it is still not total. 2 Remark.We adopt a confidence threshold𝛿to ensure the reliability of ML predictions, so that only reliable orders are considered. When confident orders cannot be decided for the lack of initial information, we may opt to invite user inspection to ensure the correctness of a few initial ranked pairs, from which more reliable orders can be iteratively deduced/learned. The parameter𝛿plays an important role in modelM_rank,e.g.,when testingM_rankon procedural billing data (with 82.28%realmissing timestamps [40]), we find that 22.6% pairs are identified as confident when𝛿=0.55. The percentage decreases to 16.7% when𝛿=0.6. We will test the impact of𝛿in Section 6.

4 CREATOR

In this section, we develop the creator ofGATE. Given a temporal in- stance𝐷_𝑡=(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇)and the augmented training data 𝐷_aug(from the critic), the creator (a) extends each⪯𝐴of𝐷_𝑡based on 𝐷_augand (b) trains a ranking modelM_rankto predict new orders.

Review on ranking models.Learning to Rank[50] aims to learn a ranking model so that objects can be ranked based on their degrees of relevance, preference, or importance (in our setting, timeliness).

We adopt pairwise ranking since (a) its semantics is consistent with temporal orders, which is a set oftuple pairson which partial orders are defined (Section 2); and (b) by transitivity of temporal orders, pairwise ranking helps us get a total order for all attributes.

Challenges.One naturally wants to adopt an existing model. How- ever, it is hard to directly apply one, since the unique features of temporal orders are not well considered, resulting in poor performance.

(1) Attribute correlation.Due to the correlated nature of temporal order, we often need to reference other attributes to determine the orders of a given attribute. Moreover, since a value may change back and forth (e.g.,the marital status changes from “married" to

“divorced", and back from “divorced" to “married"), it is hard to determine the up-to-date value based on a single attribute only.

(2) Limitation of embedding models.To determine the timeliness, care should be taken for lexically different but semantically similar values (e.g.,“baby” vs. “birth” and “dead” vs. “expired” for attribute status). Although existing embedding models (e.g.,ELMo[59] or Bert[15]) are widely adopted, they cannot be directly used here since they are not trained to organize data chronologically.

(3) Adaptive margin.Existing ranking strategies do not consider real-life characteristics of timeliness,e.g.,the timespan for a person’s status to move from “birth” to “engaged” is typically longer than from “engaged” to “married” (Figure 4). Instead of ranking status with afixedmargin as most existing strategies did, we need a new methodology to embed values usingadaptivemargins, to con- form to their real-life behaviors and justify the semantic of ranking.

Model overview. We propose a ranking model to tackle the above challenges, whose novelty includes (a) a context-aware scheme that embeds each value along with other correlated values, (b) an encoding mechanism to re-organize the embeddings in a chronological manner, and (c) an attribute-centric adaptive ranking strategy.

As shown in Figure 4, our ranking modelM_ranktakes the cur- rent𝐷_𝑡as input, and outputs new ranked pairs, where each(𝑡₁, 𝑡₂)

is associated with aconfidence, indicating how likely𝑡₁⪯𝐴𝑡₂holds.

Our ranking model consists of three stages as follows: context- aware embedding, chronological encoding and order prediction.

(1)M_rankfirst builds a context-aware embedding for each attribute value using pre-trained language models (ELMo[59] orBert[15]), in a way that information of correlated values is also embedded.

(2) Based on the embeddings,M_rankencodes a target vector𝜙_𝐴for attribute𝐴, vianon-linear transformation(see below). Similarly, a value vector𝜙_𝑡_[𝐴]is encoded for each𝐴-attribute value of tuple𝑡, so that (a) if𝑡[𝐴]is more current,𝜙_𝑡[𝐴]is closer to𝜙_𝐴and (b) the gap between𝜙

𝑡[𝐴]and𝜙_𝐴is trained adaptively, to reflect real semantics.

(3) Finally, given𝜙_𝑡

1[𝐴]and𝜙_𝑡

2[𝐴]of𝑡₁and𝑡₂,M_rankpredicts the order,i.e.,whether𝑡₁⪯𝐴𝑡₂holds with high enough confidence.

We next briefly elaborate the context-aware embedding and the chronological encoding scheme with the adaptive margin.

Context-aware embedding.To reference correlated values, we treat tuples as sequences and adopt the idea of serialization [47] (so that tuples can be meaningfully ingested by models) to embed values.

Following [47], given a tuple𝑡in𝐷_𝑡, we serialize its values:

serialize(𝑡)=⟨COL⟩𝐴₁⟨VAL⟩𝑡[𝐴₁]. . .⟨COL⟩𝐴_𝑛⟨VAL⟩𝑡[𝐴_𝑛], where⟨COL⟩and⟨VAL⟩are special tokens, denoting the start of attribute and value, respectively (see Figure 4). This serialization is fed as input to a pre-trained language modelemb(·)to compute a𝑑-dimensional embedding for each𝐴-attribute value, denoted by emb(𝑡[𝐴]) ∈R^𝑑. Besides, we average out the embedding vectors for all𝑡[𝐴]to get a context representation of tuple𝑡,i.e.,emb(𝑡)=

1 𝑛

Í𝑛

𝑖=1emb(𝑡[𝐴_𝑖]). Finally for each𝐴-attribute value of𝑡, we get the context-aware embedding of𝑡[𝐴], denoted by𝐸_𝑡[𝐴]∈R²^𝑑:

𝐸_𝑡_[𝐴]=[emb(𝑡[𝐴]);emb(𝑡)],

where[;]denotes vector concatenation. In this way,𝐸_𝑡_[_𝐴_]embeds not only the𝐴-attribute value, but also the contextual information from other attribute values, to allow comprehensive ranking.

Similarly, a schema embedding for each attribute𝐴is computed:𝐸_𝐴

=emb(𝐴), by feeding the attribute name,e.g.,status, to the model.

Chronological encoding with adaptive margins. While pre-trained embeddings are widely adopted to capture semantics, they are not trained to organize temporal orders. Thus we propose chronological encoding to re-organize the embeddings to preserve timeliness. The idea is to use schema embedding as the target and make the embedding of a more current value closer to the target; moreover, instead of ranking in fixed margins, embeddings are ordered adaptively.

Specifically, given the embedding of the𝐴-attribute of𝑡,i.e., 𝐸_𝑡_[𝐴], we encode it using a context encoderENC_cxtx(·)as follows:

𝜙_𝑡[𝐴]=ENC_cxtx(𝐸_𝑡_[𝐴])=𝜎(W₂∗𝜎(W₁∗𝐸_𝑡_[𝐴])), whereW1andW2are learnable parameters of the encoder, and𝜎is the non-linear sigmoid activation function given by𝜎(𝑥)=_1+𝑒¹−𝑥. Similarly, the target vector for attribute 𝐴 is encoded as 𝜙_𝐴=ENC_attr(𝐸_𝐴), whereENC_attr(·)denotes the schema encoder.

To train the encoders with ordered embeddings and adaptive margins, we adopt an attribute-centric adaptive margin-based triplet loss. Given⪯𝐴in𝐷_𝑡, the loss on𝐴is formulated as follows:

loss(𝐴)=Í

(𝑡₁,𝑡₂) ∈ ⪯𝐴

max{−tanh(⟨𝜙_𝑡

2[𝐴], 𝜙_𝐴⟩)

(8)

Figure 4: The Network Architecture of Creator +𝛾_𝑡

1,𝑡₂+tanh(⟨𝜙_𝑡

1[𝐴], 𝜙_𝐴⟩),0}

, where⟨·,·⟩is the inner product and𝛾_𝑡

1,𝑡₂is the adaptive margin between the two tuples; we set𝛾_𝑡

1,𝑡₂to be 1+cos(𝜙_𝑡

1[𝐴], 𝜙_𝑡

2[𝐴])in practice, which characterize the co-occurrence of𝑡₁[𝐴]and𝑡₂[𝐴].

Intuitively, by minimizing the loss, for each training instance 𝑡₁ ⪯𝐴 𝑡₂, (a) we make the encoded vector of the more current value𝑡₂[𝐴]closer to target𝜙_𝐴(the first term) and (b) we ensure an adaptive margin𝛾_𝑡

1,𝑡₂between the two tuples (the second term).

In other words, the attribute values in the encoded space are not only arranged chronologically by their “distances” to𝜙_𝐴, from which temporal orders can be easily derived, their margins are also adaptively determined, to reflect the semantic of timeliness ranking.

Model training and instance extension.In each round, the creator receives augmented training data𝐷_aug=(𝐷 ,⪯_𝐴^′

1

, . . . ,⪯^′_𝐴

𝑛

, 𝑇 ,f_valid), based on which it (a) extends the temporal instance𝐷_𝑡 with𝐷_augand (b) incrementally trainsM_rankvia back-propagation.

In the first round,𝐷_augis initialized to be the temporal orders constructed by applying thoseCCsinΣwithout timeliness comparison in their preconditions [33]. In the following rounds,𝐷_augis constructed by the critic, based on the result of chasing withCCs.

Specifically, depending on flagf_validin𝐷_aug, we have two cases:

(1) Iff_validistrue, the result of chasing is valid andM_rankis incrementally trained based on the orders in𝐷_aug(see below). Moreover, the temporal instance𝐷_𝑡 =(𝐷 ,⪯𝐴₁, . . . ,⪯𝐴_𝑛, 𝑇)is extended with the ranked pairs in𝐷_aug,i.e.,for each(𝑡₁, 𝑡₂)in⪯_𝐴^′ of𝐷_aug,(𝑡₁, 𝑡₂) is added to⪯𝐴. In this case, we say that(𝑡₁, 𝑡₂)becomesstable. Once a temporal order becomes stable, it will not be removed from𝐷_𝑡. (2) Iff_validisfalse, the result of chasing is invalid,i.e.,there is an attribute𝐴such that conflicting orders(𝑡₁, 𝑡₂)and(𝑡₂, 𝑡₁)are both in⪯_𝐴^′ of𝐷_augi.e.,{(𝑡₁, 𝑡₂),(𝑡₂, 𝑡₁)} ⊆⪯_𝐴^′, and either𝑡₁ ≺𝐴 𝑡₂or 𝑡₂ ≺𝐴 𝑡₁. In this case, we decide that either (𝑡₁, 𝑡₂)or (𝑡₂, 𝑡₁) is added to⪯𝐴usingM_rank, with a higher confidence (and possibly user inspection). Assumew.l.o.g.that(𝑡₁, 𝑡₂)is added to⪯𝐴(i.e.,it becomes stable). Then, the creator fine-tunesM_rankso that𝑡₁and 𝑡₂are better separated in the encoded space (see [1] for details).

Incremental training.Incremental training ofM_rankmight lead to the catastrophic forgetting issue [39],i.e.,the model might forget some temporal orders learned in prior rounds. To overcome this, we adopt a simple strategy to retain prior knowledge: In each round, 𝐷_𝑡is augmented with𝐷_augandM_rankis incrementally trained on

𝐷_𝑡from the last checkpoint. In this way, the model is able to learn from previous training instances and avoid catastrophic forgetting.

Monotonicity.One can verify that the number of stable temporal orders in𝐷_𝑡is (strictly) monotonically increasing when more roundsGATEare performed (see a detailed lemma in [1]). The termination ofGATE(Theorem 1) partly depends on this monotonicity.

Confidence. Given⪯𝐴, a tuple pair(𝑡₁, 𝑡₂)and a closeness function 𝑓_𝐿,M_rankpredicts(𝑡₁, 𝑡₂) ∈⪯𝐴if𝑓_𝐿(𝜙_𝑡

2[𝐴], 𝜙_𝐴) >𝑓_𝐿(𝜙_𝑡

1[𝐴], 𝜙_𝐴);

its confidence indicates how likely𝑡₁⪯𝐴𝑡₂holds,i.e.,we have conf(𝑡₁⪯𝐴𝑡₂)=𝜎(𝑓_𝐿(𝜙_𝑡

2[𝐴], 𝜙_𝐴) −𝑓_𝐿(𝜙_𝑡

1[𝐴], 𝜙_𝐴)) and setconf(𝑡₂⪯𝐴𝑡₁)to 0. Thus, given a confidence threshold𝛿>

0, we will not predict both𝑡₁⪯𝐴𝑡₂and𝑡₂⪯𝐴𝑡₁as confident orders.

We use the negative Euclidean distance−𝐿₂(𝜙_𝑡[𝐴], 𝜙_𝐴)in practice to measure the “closesness” between𝜙_𝑡_[𝐴]and𝜙_𝐴, where the one with a larger closeness value is more current. The distance gap between𝜙_𝑡

1[𝐴]and𝜙_𝑡

2[𝐴]quantifies the confidence: the larger the gap, the larger the confidence, which ranges from 0 to 1.

Example 3: Consider the example in Figure 4, where we focus on attributestatus. After creating the context-aware embedding based on a pre-trained model, it chronologically encodes a target vector 𝜙_statusand value vectors for all values, so that they are arranged by their distances to𝜙_status. To illustrate, we also label theunknown timestamp of each value vector in the figure (e.g.,𝑇_𝑒(Married)= 2018). Since𝜙_Marriedis the closest to𝜙_status, it is predicted to be the lateststatusvalue and new ranked pairs are constructed accordingly for⪯_status, as augmented training data in the next round. 2 Remark.Our creator learns temporal orders by utilizing context- aware embedding, chronological encoding and attribute-centric adaptive ranking. However, it does not explicitly take into account of some temporal properties, such as the transitivity. This motivates us to use critic to deduce and justify the orders in the next section.

5 CRITIC

In this section, we develop the critic underGATEfor justifying and deducing temporal orders. Taking a temporal instance𝐷_𝑡, the temporal orders(⪯_𝐴^M

1

, . . . ,⪯_𝐴^M

𝑛

)predicted by the creator and a setΣof minedCCsas input, the critic (a) deduces more ranked pairs(⪯_𝐴^Σ

1

, . . . ,⪯_𝐴^Σ

𝑛

) by applyingCCsvia thechase, and (b) constructs the augmented training data𝐷_augand feeds𝐷_augback to the creator.