SPL Profile
5. Difference Analysis
5.3. Difference Algorithm
5.3.1. Matching
5.3.1.1. Traversing
As shown in Algorithm 1, the matching takes the copies’ SoftwareModels as input and returns a set of the root match elements to be stored in the DifferenceModel. Matches are structured in a hierarchy according to the hierarchies of the input models. Each match can reference matching elements of the two input models (i.e., a Regular Match), or only one if no match exists (i.e., a Single Side Match).
Filter At the beginning of the algorithm, ScopeFilter and ElementTypeFil-ter are applied on the SoftwareModels to filElementTypeFil-ter elements that can be ignored by the rest of the difference analysis.
The ScopeFilter scans the model trees for Resources and SoftwareElements in scopes (e.g., namespaces) explicitly excluded in the process configuration.
What a scope is, depends on the technology a software model relates to.
For example, in Java, a scope can be defined by packages. In PHP or C++, namespaces can define a scope. According to the technology-specific nature of a scope, a ScopeFilter is technology-specific as well, and thus the specification of scopes to include as part of the process configuration also depends on the ScopeFilter used.
The ElementTypeFilter scans the model trees for Resources and Software-Elements of types not relevant for the copies’ behavior. For example, com-ments or layout information might be modified but not relevant for the copies behavior. The concrete set of element types that can be ignored is technology-specific, as, for example, layout information can be relevant in some languages such as PHP. Thus, the ElementTypeFilter provides another point of technology-specific adaptation.
Resource matching When the filtering is done, the Resources of the resulting SoftwareModels are matched with each other. First, the resources of the Integration Copy’s SoftwareModel are stored as a list of matching candidates.
Now, for each resource of the Leading Copy, the best matching candidate is identified by the BestMatchResource algorithm component. This identifies the best matching resource for the leading resource to match. A resource is identified by its Uniform Resource Identifier (URI), which is an identifier string consisting of several segments (i.e., strings between “/” characters).
For example, a URI identifying a file of a Leading Copy in the file sys-tem might look likefile:/C:/project1/example/File.xyz. In contrast, files of an Integration Copy might be placed in different folders or file sys-tems. Thus, the URIs of a copy’s files might start with a string such as
file:/C:/project-copy/... or evenfile:/D:/copy/... To identify the best matching resource, it is not possible to simply match the full URI, as the algorithm does not know the base path of the resources. Furthermore, a copy can consist of several projects. To cope with this, segments of the URIs of the resources are compared. This is done from back to forth, as the front segments differ anyway because of different source directories used.
5.3. Difference Algorithm
Now, the BestMatchResource algorithm identifies the resource of the match-ing candidates with the highest number of matchmatch-ing segments at the end of their URIs. As a cross check, the identified best matching resource from the Integration Copy is compared with all other not yet matched Leading Copy resources. Again, this is done by comparing their URI segments from back to front. If there is a pair of leading and integration resources with a higher number of similar segments at the end of their URIs, this is an even better match than the one identified before. Thus, the integration resource with the next lower number of matched segments will be used and the cross check will be done again. The BestMatchResource algorithm considers renaming patterns specified in the process configuration. If a renaming pattern influ-ences the resources’ URIs (e.g., renamed Java packages reflected in classes’
file system paths), it will be used here to normalize the URIs before matching their segments.
If a best matching resource was detected, a new Match element is created for their root elements (i.e.,Match(sel ,sei). Next, those root elements are used to recursively detect their matching child elements (i.e., by calling
SubMatchTraversing recursively) and the resulting set of submatches is assigned to the newly created match element (i.e., Match: m). Afterwards, m is added to the list of detected rootMatches and the matched integration resource is removed from the candidates list. If no matching resource was found for a leading resource, a Single Side Match is created that references the leading resource’s root SoftwareElement only. Finally, if there are resources remaining in the list of matchingCandidates, further Single Side Match elements are created and stored in the rootMatches set.
Software element matching The Sub Match Traversing (Algorithm 2) specifies how submatches are recursively detected for a pair of already matched software elements. A depth-first traversing according to the Leading Copy’s software model containment hierarchy is performed.
During each recursion, first, the child elements of the Integration Copy’s software element are used as matching candidates. Next, each child node of the Leading Copy’s software element is checked for similarity with the matching candidates. If a candidate is identified to be similar, a Regular Match element is created and stored in the result list of detected submatches.
The matched candidate is removed from the list of candidates, and the next
child element of the Leading Copy is processed. If no similar candidate could be found for a Leading Copy’s child element, a Single Side Match is created referencing this element only. When all child elements of the Leading Copy are checked, Single Side Match elements are created for each of the remaining Integration Copy’s child elements. The child elements are checked in the order they are stored in the software model, as it represents their occurrence in the real implementation.
To decide about elements’ similarity, an algorithm component named
Sim-ilarityCheckdecides if two elements represent the same software element, as further described in Section 5.3.1.2. The strictly hierarchical traversing allows assuming matching locations for elements passed to the
Similari-tyCheck. Here, location relates to the elements’ containing parent software elements.
The described algorithm for hierarchical match traversing is generic and can be applied to all software models with unique containment relation-ships. As mentioned above, the ScopeFilter and ElementTypeFilter provide adaptation points for technology-specific behavior. Furthermore, the Best-MatchResource detection provides an adaptation point to consider renaming practices. Those three are straightforward checks of namespaces (e.g., pack-ages) and element types or name mappings. In contrast, theSimilarityCheck
used during the recursive traversing is more complex, depending on the type of software model under study and therefore further explained below.