Learning and Deducing Temporal Orders
Item Type Article
Authors Fan, Wenfei;Tugay, Resul;Wang, Yaoshu;Xie, Min;Ali, Muhammad Asif
Citation Fan, W., Tugay, R., Wang, Y., Xie, M., & Ali, M. A. (2023).
Learning and Deducing Temporal Orders. Proceedings of the VLDB Endowment, 16(8), 1944–1957. https://
doi.org/10.14778/3594512.3594524 Eprint version Publisher's Version/PDF
DOI 10.14778/3594512.3594524
Journal PROCEEDINGS OF THE VLDB ENDOWMENT
Rights This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/
by-nc-nd/4.0/ to view a copy of this license Download date 2023-12-03 18:24:42
Item License https://creativecommons.org/licenses/by-nc-nd/4.0/
Link to Item http://hdl.handle.net/10754/692657
Learning and Deducing Temporal Orders
Wenfei Fan
Shenzhen Institute of Computing Sciences, China
University of Edinburgh United Kingdom Beihang University, China
Resul Tugay
University of Edinburgh United Kingdom [email protected]
Yaoshu Wang
∗Min Xie
Shenzhen Institute of Computing Sciences, China
[email protected] [email protected]
Muhammad Asif Ali
King Abdullah University of Science and Technology Kingdom of Saudi Arabia
muhammadasif.ali
@kaust.edu.sa
ABSTRACT
This paper studies how to determine temporal orders on attribute values in a set of tuples that pertain to the same entity, in the absence of complete timestamps. We propose a creator-critic framework to learn and deduce temporal orders by combining deep learning and rule-based deduction, referred to asGATE(Get the lATEst). The creator ofGATEtrains a ranking model via deep learning, to learn temporal orders and rank attribute values based on correlations among the attributes. The critic then validates the temporal orders learned and deduces more ranked pairs by chasing the data with currency constraints; it also provides augmented training data as feedback for the creator to improve the ranking in the next round.
The process proceeds until the temporal order obtained becomes stable. Using real-life and synthetic datasets, we show thatGATE is able to determine temporal orders with𝐹-measureabove 80%, improving deep learning by 7.8% and rule-based methods by 34.4%.
PVLDB Reference Format:
Wenfei Fan, Resul Tugay, Yaoshu Wang, Min Xie, and Muhammad Asif Ali.
Learning and Deducing Temporal Orders. PVLDB, 16(8): 1944 - 1957, 2023.
doi:10.14778/3594512.3594524
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at https://github.com/resultugay/GATE.
1 INTRODUCTION
Real-life data keeps changing. As reported by Royal Mail, on aver- age, “9590 households move, 1496 people marry, 810 people divorce, 2011 people retire and 1500 people die” each day in the UK [60]. It is estimated that inaccurate customer data costs organizations 6% of their annual revenues [60]. Outdated data incurs damage not only to Royal Mail. When the data at a search engine is out of date, a restaurant search may return a business that had closed three years ago. When the data about the condition of infrastructure assets is obsolete, it may delay the maintenance of equipment and cause outage. Moreover, data-driven decisions based on outdated data can be worse than making decisions with no data [49]. Indeed, “as a healthcare, retail, or financial services business you cannot afford to
∗Corresponding author
This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 16, No. 8 ISSN 2150-8097.
doi:10.14778/3594512.3594524
make decisions based on yesterday’s data” [25]. Unfortunately, “82%
of companies are making decisions based on stale information” [8].
These highlight the need for determining the currency of data, i.e.,how up-to-date the information is. This is, however, highly nontrivial. Consider a set of tuples that pertain to the same entity.
Their attribute values may become obsolete and inaccurate over the time. Worse yet, onlypartialreliable timestamps might be available, where a timestamp isreliableif it isprecise,correctand moreover, it indicates that at the time, the values arecorrectandup-to-date.
Apart from mechanical reasons (malicious attacks or hardware failures), the logical reasons below are the major temporal issues in data quality [5], and account for the absence of reliable timestamps.
(1) Missing timestamps.Timestamps may simply not be recorded, e.g.,in an e-health database [40], only 16 out of 26 relations are timestamped. Even when a relation has timestamps, it may not be complete,e.g.,25.36% missing in [53], up to 82.28% in [40].
(2) Imprecise timestamps.Timestamps may be too coarse, leading to unreliable ordering. An e-form may be submitted multiple times to an OA system by employees during a day. If the forms are recorded in datestamps, it is not clear which form (all on the same day) is the latest. Similar problems are often encountered in hospital data, where only dates are recorded [5]. Another example concerns in- consistent granularity of timestamps (e.g.,minutes vs. days [5, 40]).
If two values have timestamps “12-8-2021” and “12-8-2021 20:41”, it is not clear which one is more up-to-date. As reported in [53], 90.41% of appointment records have imprecise timestamps.
(3) Incorrect timestamps.Many factors can lead to incorrect times- tamps. Taking medical data [5] as an example, an X-ray machine has many asynchronous modules, each of which has a local clock and a local buffer. There can be a discrepancy between when a value is actually updated and when it is recorded since the value is first queued in the buffer before it is recorded. Moreover, 36.96% of appointments have the overlapping issue (i.e.,the next appointment in a particular room appears to have started before the current one has ended) [53], indicating incorrect timestamps.
As remarked earlier by Royal Mail [60], customer data changes frequently. To prevent the data from being outdated, the sales de- partment may contact the customers and get regular updates on somecriticalinformation (e.g.,email and phone). Due to the im- patience of customers and high cost (e.g.,man power) of contact, the rest (e.g.,jobs, marital status) may not be frequently updated and may gradually become obsolete, no longer having a reliable timestamp. Add to this the complication that data values are often copied or imported from other sources [21, 22], and even in the
FN LN sex address status job kids SZ 𝑡1, 𝑒1: Mary Goldsmith F 19 xin st single n/a - 5 𝑡2, 𝑒1: Mary Taylor F 6 gold plaza married journalist 1 6 𝑡3, 𝑒1: Mary Taylor F 19 mall st espoused assoc editor 2 7 𝑡4, 𝑒1: Mary Taylor F 7 ave divorced chief editor 2 7 𝑡5, 𝑒1: Mary Taylor F 7 ave detached chief editor 2 7 𝑡6, 𝑒1: Mary Goldsmith F 6 const. ave wed producer 3 7
Figure 1: Customer records dataset
same database, values may come from multiple relations (e.g.,by join), where no uniform scheme of timestamps is granted,e.g.,some values are more frequently timestamped than the others [40]. This calls for an attribute-level time-stamping scheme (i.e.,each value has a timestamp). It subsumes the tuple-level scheme (i.e.,all values in the same tuple have the same timestamp) as a special case.
No matter how desirable, the percentage of reliable timestamps is far lower than expected. How can we determine the temporal orders on the attribute values,i.e.,for tuples𝑡1and𝑡2, whether the value in the𝐴-attribute of𝑡1is more current than the value in the𝐴-attribute of𝑡2, denoted by𝑡2≺𝐴𝑡1, in the absence of complete timestamps?
Example 1: Consider customer records𝑡1-𝑡6shown in Figure 1, which have been identified to refer to the same person Mary. Each tuple𝑡𝑖has attributesFN(first name),LN(last name),sex,address, maritalstatus,job,kidsandSZ(shoe size). Some attribute values of these tuples have become stale since Mary’s data changes over the years,e.g.,her job, address and last name have changed 4 times, five times and twice, respectively. Only some attribute values might be associated with reliable timestamps,e.g.,the timestamp of𝑡5[job]
and𝑡6[job]are 2016 and 2019 (not shown), respectively, indicating that at that time, the values are up-to-date. In the absence of com- plete timestamps, it is hard to know whether𝑡2≺LN𝑡6,i.e.,whether the value of𝑡2[LN]is more up-to-date than𝑡6[LN]? Moreover, what Mary’s current job title is,e.g.,whether𝑡𝑖≺job𝑡6for𝑖∈ [1,4]? 2 One may want to approach this problem by training a ranking model that sorts objects according to their degrees of relevance or importance [51]. The state-of-the-art systems in this regard employ deep learning [7, 36, 57, 63] or reinforcement learning [37, 68], and have been used in search engine [11] and machine translation [24, 70]. By means of ranking models one can learn temporal orders and decide whether𝑡1 ≺𝐴 𝑡2foralltuples𝑡1, 𝑡2and attribute𝐴. However, it is hard to justify whether the ranking conforms to the temporal orders in the real world. For data-driven decision making, we need to ensure that the learned orders are reliable. Moreover, these approaches cannot explain the ranking of objects that follow a complex interlinked structure (e.g.,address in Figure 1).
Another approach is to employ logic rules,e.g.,currency con- straints (CCs) [19, 23, 28, 30, 44, 48, 66]. These rules help us deduce temporal orders. Taking Example 1 as an example, the shoe sizes of the same person typically monotonically increases (before 20 years old), and the address of a person may be associated with the marital status. As will be seen in Section 2, usingCCsone can deduce that 𝑡2 ≺SZ 𝑡3and𝑡1 ≺address 𝑡2. However, it is hard to find enough rules to deduce relative orders oneach and everypair of values.
For instance, we cannot find rules to decide whether𝑡2 ≺LN 𝑡6. When testing existing rule-based methods on a dataset with 5%
initial timestamps, even the best of them can only deduce 16.3%
temporal orders (Section 6), leaving the remaining 78.7% undecided.
Besides, it is hard to generalize rules to handle lexically different
but semantically similar values, since tuples might be extracted from different sources (e.g.,statusin Figure 1: married vs. wed).
Neither deep/ML learning nor logic rules work well when used separately. A natural question is whether it is possible to combine ML models and logic rules in a uniform framework, such that we can learn temporal orders, and use the rules to validate the ranking and improve the learning? How well can this framework improve the accuracy of the ML models and logic rules when being used alone?
Contributions & organization. This paper makes a first effort to answer these questions, all in the affirmative.
(1) Temporal orders(Section 2). We define the notion of temporal orders𝑡1 ≺𝐴 𝑡2 and𝑡1 ⪯𝐴 𝑡2 on attributes, and formulate the problem for determining temporal orders. We also review the cur- rency constraints (CCs) of [30], for deducing temporal orders by our framework. We show thatCCscan specify interesting temporal properties,e.g.,the monotonicity, comonotonicity and transitivity.
(2)GATE(Section 3). We proposeGATE(Get the lATEst), a creator- critic framework to determine temporal orders by combining deep learning and logic deduction.GATEiteratively invokes a creator to rank the temporal orders on attribute values, followed by the critic that validates the ranking of the creator and deduces more ranked pairs via discoveredCCs. The critic also produces augmented train- ing data for the creator to improve its ranking in the next round.
This process proceeds for the creator and critic to mutually enhance each other, until the temporal order cannot be further improved.
(3) Creator(Section 4). We propose a deep learning model under- lying the creator ofGATE, to learn temporal orders on attribute values. It departs from previous ranking models in that it learns the temporal orders based on contexts (i.e.,attribute correlations) and calculates the confidence of the ranking, by employing chronologi- cal embeddings and adaptive pairwise ranking strategies.
(4) Critic(Section 5). The critic complementsGATEwith discovered rules (CCs). We show how it justifies the ranked pairs learned by the creator, and (incrementally) deduces latent temporal orders with the chase [62] usingCCs. The chase has the Church-Rosser property (cf. [2]),i.e.,it guarantees to converge at the same result no matter in what orders theCCsare applied. Moreover, the critic augments the training data for the creator to improve its model in the next round.
(5) Experimental study (Section 6). Using real-life and synthetic data, we find that (a) the𝐹-measureofGATEon datasetCareer[41]
is 0.866, versus 0.41 and 0.39 by rule-basedUncertainRule[44] and Improve3C[18] (resp. 0.54 and 0.53 by ML-basedRANKBert[56]
andDittoRank[47]). (b) On average,GATEis 34.4% and 7.8% more accurate than the critic and the creator, respectively, verifying the effectiveness of combining deep learning and logic rules. (c) GATEis feasible in practice; it only takes 7 rounds to terminate on a real-life dataset of 1,983,698 tuples, with a single machine.
Related work.We categorize the related work as follows.
Rule-based methods. Currency constraints (CCs) were first studied in [29, 30] by employing partial currency orders and later extended in [27, 28] for conflict resolution, by considering bothCCsand con- ditional functional dependencies (CFDs) [26]. A class of currency repairing rules (CRRs) was proposed in [43], which combines logic rules and statistics. A two-step approach was developed in [17] to
determine the currency of dynamic data, by means of a topolog- ical graph. A class of uncertain currency rules was supported by UncertainRule[44] that considers both temporal orders and data certainty.Improve3C[18] andImp3C[19] are 4-step data cleaning frameworks, including data consistency, completeness and currency, which use a metric to repair noisy data usingCFDs[26] and cur- rency orders. There has also been work [23] on parallel incremental updating algorithms by employing traditional currency rules.
This work differs from the prior work. To the best of our knowl- edge, we make the first effort to combine deep learning and rules for timeliness, for the two to enhance each other, while the prior work at most extends rules with statistic. We propose a model for inferring temporal orders and use logic deduction to derive more.
ML models. There have also been efforts on inferring temporal or- ders by ML models [6, 10, 35, 54, 55, 64]. A related topic is to in- corporate temporal information into knowledge graphs,e.g.,for temporal link predictions [20, 61] and reasoning [16, 65, 69]. These methods often embed temporal information in ML models, for infer- ence varying over time. Learning temporal orders has been modeled as a learning-to-rank problem (pointwise, pairwise and listwise), where global orders are learned for currency attributes. Temporal problems have also been studied in,e.g.,search and recommenda- tion [31, 67], IR [50] and NLP [47, 56]. In particular, ranking in NLP (e.g.,in question-answering tasks) can be handled by pre-trained language models with cross-entropy loss or by a ranking function with a binary classifier as the comparison operator; we adopted a baseline from each kind (i.e.,RANKBertandDittoRank) in Section 6.
In contrast, we propose a creator-critic framework that not only learns temporal orders via deep learning, but also adopts rules to deduce new ones and help ML models learn better. The prior models can be embedded into our framework.
Entity resolution (ER). Temporal clustering methods were proposed in [46] based on a concept of time decay, to improve ER with times- tamps. [12] dynamically assesses and adjusts similarities among records based on their attributes and temporal differences. [42] es- timates the probability that a value changes, and shows how to link tuples in a correct time period. To cope with incorrect timestamps, [66] adopts matching dependencies (MDs) and currency constraints to capture the attributes that were changed over time.
This work focuses on deducing temporal orders of each entity, while the prior work mainly adopts temporal information for ER.
2 DETERMINING TEMPORAL ORDERS
In this section we first formulate temporal orders and the problem for determining temporal orders (Section 2.1). We then review cur- rency constraints (CCs) of [30] and show that such rules are able to express monotonicity, comonotonicity and transitivity (Section 2.2).
2.1 The Problem
Consider a relation schema𝑅 = (EID, 𝐴1, . . . , 𝐴𝑛), where𝐴𝑖 is an attribute (𝑖 ∈ [1, 𝑛]), andEIDis an entity id as introduced by Codd [14]. For a tuple𝑡of schema𝑅, we denote by𝑡[𝐴]the value in the𝐴-attribute of𝑡, and by𝑡[EID]the entity represented by tuple 𝑡. We assume that each tuple is associated with a tuple id, denoted bytid. We refer to a relation𝐷of𝑅as anormal instance𝐷of𝑅.
Temporal orders and temporal instances. Anentity instanceis (𝐷𝑒, 𝑇𝑒), where (a)𝐷𝑒is a normal instance of schema𝑅such that all tuples𝑡1and𝑡2in𝐷𝑒refer to the same real-life entity𝑒and hence, 𝑡1[EID] =𝑡2[EID]; and (b)𝑇𝑒is a partial function that associates a timestamp𝑇𝑒(𝑡[𝐴])with the𝐴-attribute of a tuple𝑡in𝐷𝑒. We refer to(𝐷𝑒, 𝑇𝑒)as anentity instance pertaining to𝑒.
Here the timestamp indicates that at the time𝑇𝑒(𝑡[𝐴]), the𝐴- attribute value of tuple𝑡is correct and up-to-date; it does not nec- essarily refer to the time when𝑡[𝐴]was created or last updated. If 𝑇𝑒(𝑡[𝐴])is undefined, a reliable timestamp is not available for𝑡[𝐴].
Intuitively, an entity instance extends a normal instance with available timestamps. Its tuples may be extracted from a variety of data sources, and are identified to refer to the same entity𝑒 via entity resolution. In the same tuple𝑡,𝑡[𝐴]and𝑡[𝐵]may bear different timestamps (or even no timestamp) for different𝐴and𝐵. Note that we do not assume a timestamp for the entire tuple, since we often find only parts of a tuple to be correct and up-to-date.
Temporal orders. Atemporal orderon attribute𝐴of𝐷𝑒is a partial order⪯𝐴such that for all tuples𝑡1and𝑡2∈𝐷𝑒,𝑡2⪯𝐴𝑡1if the value in𝑡1[𝐴]is at least as current as𝑡2[𝐴]. We also use a strict partial order𝑡2≺𝐴𝑡1if𝑡1[𝐴]is more current than𝑡2[𝐴]. To simplify the discussion we focus on⪯𝐴in the sequel;≺𝐴is handled analogously.
In particular, if𝑇𝑒(𝑡1[𝐴])and𝑇𝑒(𝑡2[𝐴])are both defined and if 𝑇𝑒(𝑡2[𝐴]) ≤𝑇𝑒(𝑡1[𝐴]),i.e.,when timestamp𝑇𝑒(𝑡1[𝐴])is no earlier than𝑇𝑒(𝑡2[𝐴]), then𝑡2 ⪯𝐴 𝑡1,i.e.,𝑡1[𝐴] is confirmed at a later timestamp and is thus considered at least as current as𝑡2[𝐴].
Temporal order⪯𝐴is represented as a set of tuple pairs such that(𝑡2[tid], 𝑡1[tid]) ∈ ⪯𝐴iff𝑡2⪯𝐴𝑡1. We write(𝑡2[tid], 𝑡1[tid])as (𝑡2, 𝑡1)if it is clear in the context. Note that the same value may bear different timeliness in different tuples,e.g.,Mary’s marital status changed from married (𝑡2) to divorced (𝑡4) to married (𝑡7, not shown in Figure 1). While𝑡2[status] = 𝑡7[status],𝑡2 ≺status 𝑡7. Here 𝑡2≺status𝑡7ranks the timeliness of thestatus-attributes of tuples 𝑡2and𝑡7, not values (married vs. married) detached from the tuples.
We say that a temporal order⪯1𝐴extends⪯2𝐴, written as⪯2𝐴⊆⪯𝐴1, if for all tuples𝑡1, 𝑡2in𝐷𝑒, if𝑡2⪯2𝐴𝑡1is defined, then so is𝑡2⪯𝐴1 𝑡1. That is,⪯𝐴1 includes all tuple pairs in⪯𝐴2 and possibly more.
Temporal instances. Atemporal instance𝐷𝑡of𝑅is given as(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇), where each⪯𝐴𝑖 is a temporal order on𝐴𝑖(𝑖∈ [1, 𝑛]), 𝐷=Ð
𝑖∈ [𝑘]𝐷𝑒
𝑖,𝑇 =Ð
𝑖∈ [𝑘]𝑇𝑒
𝑖, and for all𝑖∈ [1, 𝑘],(𝐷𝑒
𝑖, 𝑇𝑒
𝑖)is an entity instance of𝑅. Here tuples𝑡1and𝑡2in𝐷arecompatible under⪯𝐴if they pertain to the same entity,i.e.,𝑡1[EID]=𝑡2[EID].
Intuitively,𝐷𝑡is a collection of entity instances, such that each (𝐷𝑒
𝑖, 𝑇𝑒
𝑖)pertains to the same entity𝑒𝑖. We do not rank the currency of tuples if they refer to different entities. A temporal instance𝐷𝑡= (𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇) is said toextendanother temporal instance 𝐷′
𝑡=(𝐷 ,⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇)if for all𝑖∈ [1, 𝑛],⪯𝐴𝑖extends⪯′𝐴
𝑖. Problem statement. We study the problem fordetermining the temporal ordersof a temporal instance, stated as follows.
◦ Input: A temporal instance𝐷𝑡 =(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇).
◦ Output: An extended temporal instance𝐷′
𝑡=(𝐷 ,⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇)such that for all𝑖∈ [1, 𝑛], (a)⪯′𝐴
𝑖extends⪯𝐴𝑖 and (b)⪯′𝐴
𝑖 is a total order on all compatible𝑡1and𝑡2with𝑡1[EID]=𝑡2[EID].
Intuitively, our goal is to extend⪯𝐴𝑖such that for all tuples𝑡1 and𝑡2in𝐷, if𝑡1[EID]=𝑡2[EID], we can decide which of𝑡1[𝐴𝑖]and 𝑡2[𝐴𝑖]is more up-to-date. As a consequence, we can deduce the lat- est value for all attributes. Note that we define a total order oneach attribute, not a global order on tuples. The total orders on different attributes can be different. This said, we can use the temporal orders learned on one attribute to help the deduction on other attributes, via correlation expressed as currency constraints (see below).
2.2 Currency Constraints
We next review the class of currency constraints proposed in [30].
The predicates over a relation schema𝑅are defined as:
𝑝 ::= 𝑡[𝐴] ⊕𝑐 | 𝑡1[𝐴] ⊕𝑡2[𝐴] | 𝑡1⪯𝐴𝑡2,
where𝑡 , 𝑡1, 𝑡2are tuple variables denoting tuples of𝑅,𝐴is an at- tribute of𝑅,𝑐is a constant,⊕is an operator from{=,≠,>,≥,<,≤};
𝑡[𝐴] ⊕𝑐and𝑡1[𝐴] ⊕𝑡2[𝐴]are defined on attribute values, while 𝑡1⪯𝐴𝑡2compares the timeliness of𝑡1[𝐴]and𝑡2[𝐴]. In particular, 𝑡1[EID]=𝑡2[EID]says that𝑡1and𝑡2refer to the same entity.
Acurrency constraint(CC) over schema𝑅is defined as follows:
𝜑=𝑋 →𝑝0,
where𝑋 is a conjunction of predicates over𝑅with tuple variables 𝑡1, . . . , 𝑡𝑚, and𝑝0has the form𝑡𝑢 ⪯𝐴𝑖 𝑡𝑣for𝑢, 𝑣∈ [1, 𝑚]. We refer to𝑋as thepreconditionof𝜑, and𝑝0as theconsequenceof𝜑.
As defined in [30], aCCcan be equivalently expressed as a universal first-order logic sentence of the following form:
𝜑=∀𝑡1, . . . , 𝑡𝑚 Û
𝑗∈ [1,𝑚]
(𝑡1[EID]=𝑡𝑗[EID]) ∧𝑋 →𝑡𝑢 ⪯𝐴𝑡𝑣 .
Semantics. Currency constraints are defined over temporal in- stances𝐷𝑡 = (𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇)of𝑅. Avaluationof tuple vari- ables of aCC𝜑in𝐷𝑡, or simplya valuation of𝜑, is a mappingℎ that instantiates variables𝑡1, . . . , 𝑡𝑚with tuples in𝐷that refer to the same real-world entity, as required by𝑡1[EID]=𝑡𝑗[EID].
A valuationℎsatisfiesa predicate𝑝over𝑅, written asℎ |=𝑝, if the following is satisfied: (1) if𝑝is𝑡[𝐴] ⊕𝑐 or𝑡1[𝐴] ⊕𝑡2[𝐴], then it is interpreted as in tuple relational calculus following the standard semantics of first-order logic [2]; and (2) if𝑝is𝑡1⪯𝐴𝑡2, then𝑡2[𝐴]is at least as current as𝑡1[𝐴]i.e.,(𝑡1, 𝑡2)is in⪯𝐴.
For a conjunction𝑋of predicates, we writeℎ|=𝑋ifℎ|=𝑝forall 𝑝in𝑋. A temporal instance𝐷𝑡satisfiesCC𝜑, denoted by𝐷𝑡 |=𝜑, if forallvaluationsℎof𝑋→𝑝0in𝐷𝑡, ifℎ|=𝑋 thenℎ|=𝑝0. Properties. Currency constraints are able to specify interesting temporal properties. Below we exemplify some properties.
Monotonicity. A temporal order⪯𝐴over relation schema𝑅ismono- tonicif for any tuples𝑡1and𝑡2of𝑅that refer to the same entity, if 𝑡1[𝐴] ≤𝑡2[𝐴]then𝑡1 ⪯𝐴𝑡2. For instance, consider theSZ(shoe size) attribute of the customer relation of Example 1. Then⪯SZis monotonic. It can be expressed as the followingCC:
𝜑1=𝑡1[SZ] ≤𝑡2[SZ] →𝑡1⪯SZ𝑡2.
As another example, marital status only changes from single to married, not the other way around [9]. This is expressed asCC:
𝜑2=𝑡1[status] =“single”∧𝑡2[status]=“married”→𝑡1⪯status𝑡2. With slight abuse of terminologies by considering timestamps as an “attribute” associated with attribute𝐴, we have:
𝜑3=𝑇𝑒(𝑡1[𝐴]) ≤𝑇𝑒(𝑡2[𝐴]) →𝑡1⪯𝐴𝑡2,
since𝑡2[𝐴]is confirmed at a later timestamp.
Comonotonicity. For attributes𝐴and𝐵of𝑅, we say⪯𝐴and⪯𝐵are comonotonicin a temporal instance𝐷𝑡of𝑅if for all tuples𝑡1and𝑡2 in𝐷𝑡that refer to the same entity,𝑡1⪯𝐴𝑡2if and only if𝑡1⪯𝐵𝑡2. Intuitively,⪯𝐴and⪯𝐵are comonotonic if the two are correlated such that when one is updated, the other will also change. For instance,⪯status and⪯address are often comonotonic: when the marital status of a person changes from single to married, this person may move to a larger house. This can be expressed as aCC:
𝜑4=𝑡1⪯status𝑡2→𝑡1⪯address𝑡2.
Transitivity.Transitivity can also be expressed for any attribute𝐴: 𝜑5=𝑡1⪯𝐴𝑡2∧𝑡2⪯𝐴𝑡3→𝑡1⪯𝐴𝑡3.
Correlating different attributes.One can correlate multiple different attributes to capture implicit temporal ordering. For example, mar- ital status may change from “married" to “divorced" and further from “divorced" to “married" again. To deduce an order on𝑡[status], we can use the number of kids as an additional condition:
𝜑6=𝑡1[status]=“married”∧𝑡2[status]=“divorced”∧ 𝑡1[kids] <𝑡2[kids] →𝑡1⪯status𝑡2.
Deduction. Making use ofCCs, we can deduce certain temporal orders,e.g.,from𝜑1-𝜑6and Figure 1, we can deduce the following:
◦ 𝑡1⪯status𝑡2by𝜑2,𝑡2⪯status𝑡4by𝜑6and𝑡1⪯status𝑡4by𝜑5;
◦ 𝑡1⪯address𝑡2by𝜑4; hence𝑡2[address]is more current for Mary;
◦ 𝑡1⪯SZ𝑡2⪯SZ𝑡3by𝜑1; hence Mary’s current shoe size is 7.
However, we cannot determine whether𝑡2⪯LN𝑡6or𝑡6⪯LN𝑡2by deduction with the currency constraints𝜑1-𝜑6.
Discovery ofCCs. As noted in [30],CCsare a special case of denial constraints (DCs) [3] extended with temporal orders⪯𝐴and≺𝐴. Algorithms are in place for discoveringDCs,e.g.,[4, 13, 52, 58]. We can readily extend these algorithms for discoveringCCs(see [1]).
3 GATE: A CREATOR-CRITIC SYSTEM
In this section, we propose a creator-critic framework for determin- ing temporal orders, and develop systemGATEto implement it. A unique feature ofGATEis its combined use of deep learning and logic deduction. Below we start with the architecture ofGATE, and then present its overall workflow, with termination guarantee.
Architecture. The ultimate goal ofGATEis to obtain a total order
⪯𝐴for each attribute𝐴. As shown in Figure 2,GATEfirst discovers a setΣofCCson𝐷𝑡offlinefor performing logic deduction. Then it takes a temporal instance𝐷𝑡=(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇)as input, and learns and deduces more temporally ranked pairs for𝐷𝑡 online.
For simplicity, we assumew.l.o.g.that𝐷consists of a single entity instance𝐷𝑒,i.e.,all tuples in𝐷pertain to the same entity𝑒and thus their attributes can be pairwisely compared; the methods of this paper can be readily extended to𝐷𝑡with multiple entity instances.
More specifically, the learning and deducing process inGATEit- erativelyexecutes two phases, namely, creator and critic, as follows.
(1) Creator (Sections 4). In this phase,GATE(incrementally) trains a ranking modelMrankvia deep learning. By taking𝐷𝑡 and the augmented training data𝐷aug(see its definition shortly) from the critic as input, it predicts new orders for extending each⪯𝐴of𝐷𝑡.
Temporal Instance
Ranking Model Predicted Temporal Orders
Creator Critic
Conflict Resolution Augmented Training Data Chasing with
CCs Training
Data
incremental training
offline
Currency Constraints
Figure 2: Architecture of GATE
Our model has three features: (a) For each tuple𝑡in𝐷, the embed- ding for𝑡[𝐴]is created using both𝑡[𝐴]and other correlated values in𝑡, so that𝑡[𝐴]can be ranked comprehensively. (b) The attribute embeddings [15] are arranged to preserve chronological orders. (c) We adopt anattribute-centric adaptive pairwise rankingstrategy in Mrank, so that the ranking result can be justified semantically. More- over, the temporal instance𝐷𝑡will also be extended based on𝐷aug. Given an attribute𝐴, we associate each(𝑡1, 𝑡2) ∈⪯𝐴with acon- fidence, denoted byconf(𝑡1⪯𝐴𝑡2), indicating how likely𝑡1 ⪯𝐴𝑡2 holds. If𝑡2[𝐴]has a later timestamp than𝑡1[𝐴], thenconf(𝑡1⪯𝐴𝑡2) is 1. If𝑡1⪯𝐴𝑡2is predicted byMrank, its confidence is from 0 to 1.
We only consider(𝑡1, 𝑡2)predicted byMrankwithconf(𝑡1⪯𝐴𝑡2) ≥ 𝛿, where𝛿is a predefined threshold, ascandidate pairsto be ex- tended to⪯𝐴. We denote the set of all candidate pairs of⪯𝐴by⪯M
𝐴 . The input and output of the creator are as follows:
◦ Input: A temporal instance𝐷𝑡 = (𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇) and the augmented training data𝐷augfrom the critic.
◦ Output: An extended temporal instance𝐷′
𝑡=(𝐷 ,⪯𝐴′
1
, . . . ,⪯′𝐴
𝑛
, 𝑇)based on𝐷augand the predicted orders(⪯𝐴M
1
, . . . ,⪯𝐴M
𝑛).
(2) Critic (Sections 5). In this phase, the critic ofGATEjustifies and deduces more temporally ranked pairs, by applyingCCsinΣvia the chase[62]. Denote the result of chasing by⪯𝐴Σ. Depending on the validity of⪯Σ
𝐴, we construct an augmented training data, denoted by𝐷aug, containing the ranked pairs justified (resp. conflicts caught byCCs);𝐷augwill be fed back to the creator, for the next round of model learning, so that the creator can learn from more unseen data and get higher accuracy iteratively. Specifically,𝐷augis a temporal instance extended with a validity flagfvalid,i.e.,𝐷aug= (𝐷 ,⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇 ,fvalid): (a) iffvalidistrue,⪯𝐴Σis valid and all temporal order deduced by the chase will be added to𝐷aug; and (b) otherwise, the chasing is invalid,i.e.,both(𝑡1, 𝑡2) ∈⪯𝐴and(𝑡2, 𝑡1) ∈⪯𝐴are deduced, and either𝑡1≺𝐴𝑡2and𝑡2≺𝐴𝑡1. In this case, these two conflictingorders will be added to𝐷aug, and the creator will be asked to resolve this conflict, by revising its model accordingly.
Formally, the input and output of the critic are as follows:
◦ Input: A temporal instance 𝐷𝑡 = (𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇), the predicted temporal orders(⪯𝐴M
1
, . . . ,⪯𝐴M
𝑛
)and a setΣofCCs.
◦ Output: Augmented data𝐷aug=(𝐷 ,⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇 ,fvalid).
The novelty of the critic consists of (a) the deduction using the chase, and (b) an efficient algorithm implementing the chase.
Workflow.As shown in Figure 3,GATEtakes a temporal instance 𝐷𝑡=(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇)and a setΣofCCsdiscovered offline as input, and it outputs an extended temporal instance𝐷′
𝑡=(𝐷 ,⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇)with a total order⪯𝐴′ defined for each attribute𝐴. GATEfirst initializes the augmented training data𝐷aug= (𝐷 ,
⪯𝐴′
1
, . . . ,⪯𝐴′
𝑛
, 𝑇 ,fvalid=true)(Line 1, details omitted) for the first
Input:A temporal instance𝐷𝑡=(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇), and a setΣofCCs.
Output:An extended temporal instance𝐷′ 𝑡=(𝐷 ,⪯′
𝐴1, . . . ,⪯′
𝐴𝑛, 𝑇)such that for all𝑖∈ [1, 𝑛], (a)⪯′
𝐴𝑖extends⪯𝐴
𝑖and (b)⪯𝐴
𝑖is a total order.
1. 𝐷aug:=Initialize(𝐷𝑡,fvalid=true,Σ); Initialize the ranking modelMrank; 2. whiletruedo
3. /* The Creator ofGATE*/
4. 𝐷𝑡:=Extend(𝐷𝑡, 𝐷aug);
5. TrainMrankincrementally based on𝐷𝑡; 6. (⪯𝐴M
1, . . . ,⪯M
𝐴𝑛):=the predicted temporal orders byMrank; 7. /* The Critic ofGATE*/
8. ( ⪯Σ 𝐴1
, . . . ,⪯Σ
𝐴𝑛):=Chase(𝐷𝑡,Σ);
9. 𝐷aug:=ConstructAugmented(𝐷 ,⪯𝐴M
1
, . . . ,⪯𝐴M
𝑛
,⪯𝐴Σ
1, . . . ,⪯𝐴Σ
𝑛
, 𝑇);
10. if𝐷𝑡no longer changesthen
11. break;
12. 𝐷𝑡=Extend(𝐷𝑡,Mrank);
13. return𝐷𝑡;
Figure 3: Workflow of GATE
round ofGATE, by deducing its initial temporal orders⪯𝐴′ viaCCs inΣwhose preconditions do not involve timeliness comparison,e.g., we can create temporal orders for those tuples with available times- tamps using𝜑3. The initialization can be done efficiently by [33].
ThenGATEiteratively executes the creator and the critic in rounds. In the𝑖-th round, (a) the creator extends the temporal in- stance𝐷𝑡(Line 4, see Section 4) based on the augmented training data𝐷augreturned by the critic in the(𝑖−1)-th round, by pos- sibly revising its model to resolve the conflicts. Then the creator incrementallytrainsMrank(Line 5, see Section 4) and predicts new temporal orders(⪯𝐴M
1
, . . . ,⪯𝐴M
𝑛
)(Line 6), which are candidate pairs (𝑡1, 𝑡2)with confidences at least𝛿; and (b) the critic deduces more ranked pairs(⪯𝐴Σ
1
, . . . ,⪯Σ𝐴
𝑛
) by chasing withCCsinΣ(Line 8).
Based on the result of chasing, the critic constructs augmented training data𝐷augvia procedureConstructAugmented(Line 9, see Section 5); this𝐷augis fed back to the creator for the next round.
Finally, when𝐷𝑡no longer changes (Line 10-11), the iteration ends. If𝐷𝑡is still not a temporal instance with total orders defined on all attributes, we extend it using procedureExtend(Line 12), such that for each pair(𝑡1, 𝑡2), one of(𝑡1, 𝑡2)and(𝑡2, 𝑡1)learned with a higher confidence is in⪯𝐴, until each⪯𝐴becomes a total order.
Termination.We prove in [1] that eventually,GATEwill terminate (Line 10),i.e.,with more iterations,𝐷𝑡is gradually extended with more orders and finally, becomes stable and does not change.
Theorem 1:GATEis guaranteed to terminate. 2 Example 2: Continuing with Example 1, assume thatΣ={𝜑1-𝜑6} and𝐷𝑡has empty temporal orders. We first initialize the training data𝐷augby applyingCCsinΣthat do not compare timeliness in their preconditions,e.g.,we can deduce𝑡5⪯job𝑡6by𝜑3. Suppose that after trainingMrankbased on the initial𝐷aug, no tuple pair predicted byMrankis at least𝛿-confident in the first round. The creator outputs empty⪯𝐴Mfor each𝐴due to the lack of information.
Then by applying theCCsinΣ, the critic can deduce new temporal orders⪯𝐴Σ,e.g.,𝑡1 ⪯address𝑡2by𝜑4and𝑡1 ⪯status𝑡4by𝜑5. Both
⪯𝐴Mand⪯𝐴Σwill be used to construct the augmented training data 𝐷aug, based on which the creator extends𝐷𝑡and incrementally trainsMrankin the second round. This time the creator might be able to predict confident temporal orders,e.g.,𝑡1⪯LN𝑡2, based on the augmented training data𝐷aug. The iteration continues until
𝐷𝑡does not change anymore. Each temporal order⪯𝐴in𝐷𝑡will be extended to a total order byMrankif it is still not total. 2 Remark.We adopt a confidence threshold𝛿to ensure the reliability of ML predictions, so that only reliable orders are considered. When confident orders cannot be decided for the lack of initial information, we may opt to invite user inspection to ensure the correctness of a few initial ranked pairs, from which more reliable orders can be it- eratively deduced/learned. The parameter𝛿plays an important role in modelMrank,e.g.,when testingMrankon procedural billing data (with 82.28%realmissing timestamps [40]), we find that 22.6% pairs are identified as confident when𝛿=0.55. The percentage decreases to 16.7% when𝛿=0.6. We will test the impact of𝛿in Section 6.
4 CREATOR
In this section, we develop the creator ofGATE. Given a temporal in- stance𝐷𝑡=(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇)and the augmented training data 𝐷aug(from the critic), the creator (a) extends each⪯𝐴of𝐷𝑡based on 𝐷augand (b) trains a ranking modelMrankto predict new orders.
Review on ranking models.Learning to Rank[50] aims to learn a ranking model so that objects can be ranked based on their degrees of relevance, preference, or importance (in our setting, timeliness).
We adopt pairwise ranking since (a) its semantics is consistent with temporal orders, which is a set oftuple pairson which partial orders are defined (Section 2); and (b) by transitivity of temporal orders, pairwise ranking helps us get a total order for all attributes.
Challenges.One naturally wants to adopt an existing model. How- ever, it is hard to directly apply one, since the unique features of tem- poral orders are not well considered, resulting in poor performance.
(1) Attribute correlation.Due to the correlated nature of temporal order, we often need to reference other attributes to determine the orders of a given attribute. Moreover, since a value may change back and forth (e.g.,the marital status changes from “married" to
“divorced", and back from “divorced" to “married"), it is hard to determine the up-to-date value based on a single attribute only.
(2) Limitation of embedding models.To determine the timeliness, care should be taken for lexically different but semantically similar values (e.g.,“baby” vs. “birth” and “dead” vs. “expired” for attribute status). Although existing embedding models (e.g.,ELMo[59] or Bert[15]) are widely adopted, they cannot be directly used here since they are not trained to organize data chronologically.
(3) Adaptive margin.Existing ranking strategies do not consider real-life characteristics of timeliness,e.g.,the timespan for a per- son’s status to move from “birth” to “engaged” is typically longer than from “engaged” to “married” (Figure 4). Instead of ranking status with afixedmargin as most existing strategies did, we need a new methodology to embed values usingadaptivemargins, to con- form to their real-life behaviors and justify the semantic of ranking.
Model overview. We propose a ranking model to tackle the above challenges, whose novelty includes (a) a context-aware scheme that embeds each value along with other correlated values, (b) an encod- ing mechanism to re-organize the embeddings in a chronological manner, and (c) an attribute-centric adaptive ranking strategy.
As shown in Figure 4, our ranking modelMranktakes the cur- rent𝐷𝑡as input, and outputs new ranked pairs, where each(𝑡1, 𝑡2)
is associated with aconfidence, indicating how likely𝑡1⪯𝐴𝑡2holds.
Our ranking model consists of three stages as follows: context- aware embedding, chronological encoding and order prediction.
(1)Mrankfirst builds a context-aware embedding for each attribute value using pre-trained language models (ELMo[59] orBert[15]), in a way that information of correlated values is also embedded.
(2) Based on the embeddings,Mrankencodes a target vector𝜙𝐴for attribute𝐴, vianon-linear transformation(see below). Similarly, a value vector𝜙𝑡[𝐴]is encoded for each𝐴-attribute value of tuple𝑡, so that (a) if𝑡[𝐴]is more current,𝜙𝑡[𝐴]is closer to𝜙𝐴and (b) the gap between𝜙
𝑡[𝐴]and𝜙𝐴is trained adaptively, to reflect real semantics.
(3) Finally, given𝜙𝑡
1[𝐴]and𝜙𝑡
2[𝐴]of𝑡1and𝑡2,Mrankpredicts the order,i.e.,whether𝑡1⪯𝐴𝑡2holds with high enough confidence.
We next briefly elaborate the context-aware embedding and the chronological encoding scheme with the adaptive margin.
Context-aware embedding.To reference correlated values, we treat tuples as sequences and adopt the idea of serialization [47] (so that tuples can be meaningfully ingested by models) to embed values.
Following [47], given a tuple𝑡in𝐷𝑡, we serialize its values:
serialize(𝑡)=⟨COL⟩𝐴1⟨VAL⟩𝑡[𝐴1]. . .⟨COL⟩𝐴𝑛⟨VAL⟩𝑡[𝐴𝑛], where⟨COL⟩and⟨VAL⟩are special tokens, denoting the start of attribute and value, respectively (see Figure 4). This serialization is fed as input to a pre-trained language modelemb(·)to compute a𝑑-dimensional embedding for each𝐴-attribute value, denoted by emb(𝑡[𝐴]) ∈R𝑑. Besides, we average out the embedding vectors for all𝑡[𝐴]to get a context representation of tuple𝑡,i.e.,emb(𝑡)=
1 𝑛
Í𝑛
𝑖=1emb(𝑡[𝐴𝑖]). Finally for each𝐴-attribute value of𝑡, we get the context-aware embedding of𝑡[𝐴], denoted by𝐸𝑡[𝐴]∈R2𝑑:
𝐸𝑡[𝐴]=[emb(𝑡[𝐴]);emb(𝑡)],
where[;]denotes vector concatenation. In this way,𝐸𝑡[𝐴]embeds not only the𝐴-attribute value, but also the contextual information from other attribute values, to allow comprehensive ranking.
Similarly, a schema embedding for each attribute𝐴is computed:𝐸𝐴
=emb(𝐴), by feeding the attribute name,e.g.,status, to the model.
Chronological encoding with adaptive margins. While pre-trained embeddings are widely adopted to capture semantics, they are not trained to organize temporal orders. Thus we propose chronological encoding to re-organize the embeddings to preserve timeliness. The idea is to use schema embedding as the target and make the embed- ding of a more current value closer to the target; moreover, instead of ranking in fixed margins, embeddings are ordered adaptively.
Specifically, given the embedding of the𝐴-attribute of𝑡,i.e., 𝐸𝑡[𝐴], we encode it using a context encoderENCcxtx(·)as follows:
𝜙𝑡[𝐴]=ENCcxtx(𝐸𝑡[𝐴])=𝜎(W2∗𝜎(W1∗𝐸𝑡[𝐴])), whereW1andW2are learnable parameters of the encoder, and𝜎is the non-linear sigmoid activation function given by𝜎(𝑥)=1+𝑒1−𝑥. Similarly, the target vector for attribute 𝐴 is encoded as 𝜙𝐴=ENCattr(𝐸𝐴), whereENCattr(·)denotes the schema encoder.
To train the encoders with ordered embeddings and adaptive mar- gins, we adopt an attribute-centric adaptive margin-based triplet loss. Given⪯𝐴in𝐷𝑡, the loss on𝐴is formulated as follows:
loss(𝐴)=Í
(𝑡1,𝑡2) ∈ ⪯𝐴
max{−tanh(⟨𝜙𝑡
2[𝐴], 𝜙𝐴⟩)
Figure 4: The Network Architecture of Creator +𝛾𝑡
1,𝑡2+tanh(⟨𝜙𝑡
1[𝐴], 𝜙𝐴⟩),0}
, where⟨·,·⟩is the inner product and𝛾𝑡
1,𝑡2is the adaptive margin between the two tuples; we set𝛾𝑡
1,𝑡2to be 1+cos(𝜙𝑡
1[𝐴], 𝜙𝑡
2[𝐴])in practice, which characterize the co-occurrence of𝑡1[𝐴]and𝑡2[𝐴].
Intuitively, by minimizing the loss, for each training instance 𝑡1 ⪯𝐴 𝑡2, (a) we make the encoded vector of the more current value𝑡2[𝐴]closer to target𝜙𝐴(the first term) and (b) we ensure an adaptive margin𝛾𝑡
1,𝑡2between the two tuples (the second term).
In other words, the attribute values in the encoded space are not only arranged chronologically by their “distances” to𝜙𝐴, from which temporal orders can be easily derived, their margins are also adaptively determined, to reflect the semantic of timeliness ranking.
Model training and instance extension.In each round, the creator receives augmented training data𝐷aug=(𝐷 ,⪯𝐴′
1
, . . . ,⪯′𝐴
𝑛
, 𝑇 ,fvalid), based on which it (a) extends the temporal instance𝐷𝑡 with𝐷augand (b) incrementally trainsMrankvia back-propagation.
In the first round,𝐷augis initialized to be the temporal orders constructed by applying thoseCCsinΣwithout timeliness com- parison in their preconditions [33]. In the following rounds,𝐷augis constructed by the critic, based on the result of chasing withCCs.
Specifically, depending on flagfvalidin𝐷aug, we have two cases:
(1) Iffvalidistrue, the result of chasing is valid andMrankis incre- mentally trained based on the orders in𝐷aug(see below). Moreover, the temporal instance𝐷𝑡 =(𝐷 ,⪯𝐴1, . . . ,⪯𝐴𝑛, 𝑇)is extended with the ranked pairs in𝐷aug,i.e.,for each(𝑡1, 𝑡2)in⪯𝐴′ of𝐷aug,(𝑡1, 𝑡2) is added to⪯𝐴. In this case, we say that(𝑡1, 𝑡2)becomesstable. Once a temporal order becomes stable, it will not be removed from𝐷𝑡. (2) Iffvalidisfalse, the result of chasing is invalid,i.e.,there is an attribute𝐴such that conflicting orders(𝑡1, 𝑡2)and(𝑡2, 𝑡1)are both in⪯𝐴′ of𝐷augi.e.,{(𝑡1, 𝑡2),(𝑡2, 𝑡1)} ⊆⪯𝐴′, and either𝑡1 ≺𝐴 𝑡2or 𝑡2 ≺𝐴 𝑡1. In this case, we decide that either (𝑡1, 𝑡2)or (𝑡2, 𝑡1) is added to⪯𝐴usingMrank, with a higher confidence (and possibly user inspection). Assumew.l.o.g.that(𝑡1, 𝑡2)is added to⪯𝐴(i.e.,it becomes stable). Then, the creator fine-tunesMrankso that𝑡1and 𝑡2are better separated in the encoded space (see [1] for details).
Incremental training.Incremental training ofMrankmight lead to the catastrophic forgetting issue [39],i.e.,the model might forget some temporal orders learned in prior rounds. To overcome this, we adopt a simple strategy to retain prior knowledge: In each round, 𝐷𝑡is augmented with𝐷augandMrankis incrementally trained on
𝐷𝑡from the last checkpoint. In this way, the model is able to learn from previous training instances and avoid catastrophic forgetting.
Monotonicity.One can verify that the number of stable tempo- ral orders in𝐷𝑡is (strictly) monotonically increasing when more roundsGATEare performed (see a detailed lemma in [1]). The termi- nation ofGATE(Theorem 1) partly depends on this monotonicity.
Confidence. Given⪯𝐴, a tuple pair(𝑡1, 𝑡2)and a closeness function 𝑓𝐿,Mrankpredicts(𝑡1, 𝑡2) ∈⪯𝐴if𝑓𝐿(𝜙𝑡
2[𝐴], 𝜙𝐴) >𝑓𝐿(𝜙𝑡
1[𝐴], 𝜙𝐴);
its confidence indicates how likely𝑡1⪯𝐴𝑡2holds,i.e.,we have conf(𝑡1⪯𝐴𝑡2)=𝜎(𝑓𝐿(𝜙𝑡
2[𝐴], 𝜙𝐴) −𝑓𝐿(𝜙𝑡
1[𝐴], 𝜙𝐴)) and setconf(𝑡2⪯𝐴𝑡1)to 0. Thus, given a confidence threshold𝛿>
0, we will not predict both𝑡1⪯𝐴𝑡2and𝑡2⪯𝐴𝑡1as confident orders.
We use the negative Euclidean distance−𝐿2(𝜙𝑡[𝐴], 𝜙𝐴)in prac- tice to measure the “closesness” between𝜙𝑡[𝐴]and𝜙𝐴, where the one with a larger closeness value is more current. The distance gap between𝜙𝑡
1[𝐴]and𝜙𝑡
2[𝐴]quantifies the confidence: the larger the gap, the larger the confidence, which ranges from 0 to 1.
Example 3: Consider the example in Figure 4, where we focus on attributestatus. After creating the context-aware embedding based on a pre-trained model, it chronologically encodes a target vector 𝜙statusand value vectors for all values, so that they are arranged by their distances to𝜙status. To illustrate, we also label theunknown timestamp of each value vector in the figure (e.g.,𝑇𝑒(Married)= 2018). Since𝜙Marriedis the closest to𝜙status, it is predicted to be the lateststatusvalue and new ranked pairs are constructed accord- ingly for⪯status, as augmented training data in the next round. 2 Remark.Our creator learns temporal orders by utilizing context- aware embedding, chronological encoding and attribute-centric adaptive ranking. However, it does not explicitly take into account of some temporal properties, such as the transitivity. This motivates us to use critic to deduce and justify the orders in the next section.
5 CRITIC
In this section, we develop the critic underGATEfor justifying and deducing temporal orders. Taking a temporal instance𝐷𝑡, the tem- poral orders(⪯𝐴M
1
, . . . ,⪯𝐴M
𝑛
)predicted by the creator and a setΣof minedCCsas input, the critic (a) deduces more ranked pairs(⪯𝐴Σ
1
, . . . ,⪯𝐴Σ
𝑛
) by applyingCCsvia thechase, and (b) constructs the augmented training data𝐷augand feeds𝐷augback to the creator.