We first present the underlying framework connecting generalization of voting fusion approaches and MS-STAPLE. Consider a target image with π voxels and intensity values π° β βπx1 with a set of π registered atlases, associated intensities π¨ β βπxπ , and registered labels π« β ππxπ . Let π·ππ be the decision of atlas π at voxel π. In the case of multiple sets of labeling protocols, β corresponds to an arbitrary set of labels, that is to say that the numerical label denotations between the atlases do not necessarily correspond to the same anatomical structure. Lastly, let π³ = {πΏ1, πΏ2, β¦ , πΏΟ}, where πΏπ corresponds to the number of labels in atlas class π and π corresponds to the total number of different labeling protocols used with a vector π β π°π x1 mapping each rater to its atlas class. The objective of label fusion is to estimate πΜ, a voxel discrete mapping of the target image. Figure 1 illustrates the rater models from the following sections.
2.1. Generalized Majority, Locally Weighted, and Non-Locally Weighted Vote
In standard majority vote (MV), the probability of label π at voxel π, πππ,π(π |π·), is empirically determined as the fraction of atlases that observe π at voxel π. We may generalize MV to include atlases (raters) of different protocols by:
ππΊππ,π(π |π·) =1
π β ππ(π |π·ππ, ππ)
π
(1)
26
where ππ(π |π·ππ, ππ) is discrete probability of the co-occurrence across labeling protocols of registered atlases (which can be empirically estimated as seen in Β§2.4). Henceforth, for convenience, we simplify the co-occurrence probability to be spatially invariant and drop the subscript. Generalized majority vote (GMV) is thus πΜ = argmaxπΊππ,π
π ππΊππ,π(π |π·). Similarly, locally weighted vote [84] can be framed around the co-occurrence probability as:
ππΏππ,π(π |π·, π, π΄, πΌ) =1
πβ ππ(π |π·ππ, ππ, π΄ππ, πΌπ)
π
=1
πβ π(π΄ππ|πΌπ)π(π |π·ππ, ππ)
π
(2)
assuming marginal independence of image intensity values and observed labels, where π is a partition function normalizing distribution to a valid probability distribution and π(π΄ππ|πΌπ) is the likelihood of observing the intensity value of atlas π at voxel π. Hence, πΜ =πΏππ,π argmax
π ππΏππ,π(π |π·π, π, π΄, πΌ).
Moreover, non-locally weighted voting (NLWV) [96, 97] can be reframed with πππΏππ,π(π |π·, π, π΄, πΌ) = 1
πβ βπ β²βπΏ ππ,π(π β²|π·, π, π΄, πΌ)π(π |π β², ππ)
π ππ with π β² as the latent
correspondence and ππ,π(π β²|π·, π, π΄, πΌ) as the likelihood of atlas π observing π β² at π. Note that the latent corresponding label, π β², occupies the role of the atlas labels in the NLWV model, and πππΏππ,πΜ = argmax
π πππΏππ,π(π |π·π, π, π΄, πΌ).
Note that these formulations reduce to their classical definitions if all of the atlases are of the target class (i.e., co-occurrence matrix are 1:1) or the co-occurrence probabilities of non-target classes are uniform (i.e., other label protocols are uninformative and, hence, ignored).
27 2.2. Multi-Set STAPLE (MS-STAPLE)
STAPLE label fusion maintains a confusion matrix π for each atlas [91]. Each confusion matrix entry πππ β²π β‘ π(π·ππ = π β²|ππ = π ) presents a discrete probability distribution where π β²
corresponds to the label observed and π corresponds to a latent true segmentation. Consider segmentation with π labeling protocols, each with its own confusion matrix such that the likelihood of observing a label is π(π·ππ = π β²|ππ = π, {ππ}) where π β² is the observed later by rater π at voxel π, π is a set of true labels of size 1xπ observed at π, and {ππ} is a set of π confusion matrices for atlas π where ππππ β²π corresponds to the likelihood that label π β² observed by rater π given that label π of set π is the true label. Note that πππ is a possibly non-square matrix where πππ is of size πΏππxπΏπ. This is the core of MS-STAPLE.
To estimate the data likelihood, we follow [93] to capture dependence between protocols through a geometric mean:
π(π·ππ = π β²|ππ = π, {ππ}) = (β π(π·ππ = π β²|πππ = ππ, πππ)
πππ
)
πΌππ
= (β ππππ β²ππ
πππ
)
πΌππ
(3)
where πππ maintains βπ β²βπΏ (βπβπππππ β²ππ)πππ = 1
ππ , thus maintaining a valid probability distribution. Note that [93] captured joint information across specific hierarchical protocols, while here we are using it to normalize across relationships that must be estimated assuming conditional independence between sets of labels.
28
Given this model, we can apply expectation maximization (EM). Briefly, let π β ββπβππΏπxπ where πππ(π)β‘ π(ππ = π|π·, {π}(π)) is the probability that the true label set observed at voxel π during iteration π is π. Using a Bayesian expansion and the assumed conditional independence between atlases,
πππ(π) = π(ππ = π) βπππ π (π·ππ = π β²|ππ = π, {ππ(π)})
β π(ππβ² π = πβ²) βπππ π (π·ππ = π β²|ππ = πβ², {ππ(π)}) (4) where π(ππ = π) is a voxelwise a priori distribution of the underlying segmentation. The denominator corresponds to a partition function normalizing πΎ to a valid probability distribution.
Substituting (3) in (4) yields the MS-STAPLE E-Step,
πππ(π) =
π(ππ = π) β (β ππππ β²ππ (π) πππ )πππ
πππ
β π(ππ = πβ²) β (β ππππ β²π
πβ² (π) πππ )πππβ²
πππ
πβ² (5)
To estimate the performance parameters, maximize the expected value of the conditional log-likelihood function. We follow the traditional M-Step expansion:
πππ(π+1) = argmax
πππ
β πΈ[ln(π(π·ππ|ππ, {ππ})| π·, {ππ}]
π
= argmax
πππ
β β β πππ(π)ln (β ππππ β²ππ (π)
πππ
)
πΌππ(π)
π π:π·ππ=π β² π β²
= argmax
πππ
β β β πππ(π)Ξ±jl(k)β ln (ππππ β²ππ (π) )
πππ π
π:π·ππ=π β² π β²
(6)
29
where π: π·ππ = π β² corresponds to the voxels where π·ππ = π β². To constrain each row of the confusion matrix to be a valid probability distribution (β ππ β² πππ β²π = 1), we differentiate by each element and use a Lagrange Multiplier (π):
0 = π
πππππ β²π [β β β πππ(π)πΌππ(π)β ln (ππππ β²ππ (π) )
πππ π
π:π·ππ=π β² π β²
+ π β ππππ β²π (π) π β²
]
0 = β β (πππ(π)πππ(π) ππππ β²π )
π:ππ=π π:π·ππ=π β²
+ π
βπππππ β²π (π+1)
= ( β β πππ(π)πππ(π)
π:ππ=π π:π·ππ=π β²
)
π(π+1)πππ β²π = (βπ:π·ππ=π β²βπ:ππ=π πππ(π)πππ(π)) (β βπ π:ππ=π πππ(π)πππ(π))
(7)
where π: ππ = π corresponds to label sets where the voxel of atlas set π is π .
2.3. Simplified MS-STAPLE (SMS-STAPLE) and Non-Local-SMS-STAPLE
Empirically, we have found the fully parameterized MS-STAPLE model (Β§2.2) less numerically stable than one would desire. Here, we present a simplified model to improve stability with limited data. In place of {ππ} (with βππΏπππ±ππΏπ degrees of freedom), consider πΜπ, a πΏππΓ πΏπ‘ confusion matrix with π‘ to be the index of the target label. Each element of πΜππ β²π corresponds to the probability rater π observes label π β² β πΏππ given that the true label is π β πΏπ‘. Note that each atlas has confusion matrix with the number of rows dependent on the labeling protocol, but the
30
number columns matching the target protocol. The degree of freedom of πΜ is βππΏπππΏπ‘, a possibly dramatic reduction from the Β§2.2 model by assuming conditional independence.
With Simplified Multi-Set STAPLE model (i.e., SMS-STAPLE), the EM update equations are found to be:
ππ π(π)= π(ππ = π ) β π πΜππ β²π (π)
β π(ππ = π β²β²) β πΜππ β²π β²β²
π (π)
π β²β² (8)
for the E-Step, and
πΜππ β²π (π+1)
=(βπ:π·ππ=π β²ππ π(π))
(β ππ π π(π)) (9)
for the M-Step.
Following [97], we derive the EM update equations to incorporate non-local correspondence in the SMS-STAPLE model (herein SMS-Non-Local STAPLE) as:
ππ π(π)= π(ππ = π ) β β πΜππ β²π (π)πππβ²π
πβ²ββ΅π(π) π
β π(ππ = π β²β²) β β πΜππ β²π β²β²
(π) πππβ²π
πβ²ββ΅π(π) π
π β²β² (10)
for the E-Step where β΅π(π) is the spatial neighborhood around voxel π and πππβ²π is the likelihood of correspondence between atlas π at voxel πβ² and the target at voxel π. The M-Step follows as:
πΜππ β²π (π+1)
=
β (βπβ²ββ΅π(π):π· πππβ²π
πβ²π=π β² ) ππ π(π)
π
β ππ π π(π) (11)
31
2.4. Modeling Co-Occurrence Probability and Initialization
For the voting approaches (Β§2.1), the co-occurrence probability definitions were calculated empirically as
π(π |π β², π) =βπ:ππ=πββ:πβ=π‘βπ:π·ππ=π β²πΏ(π·πβ, π )
βπ:ππ=πβ πΏ(π·π ππ, π β²)
(12) where π is the latent label, π β² is the observed label, and π is the index of the label set from which π β² is drawn. Note that this model is conditionally independent of the true label. For the MS- STAPLE approaches (Β§2.2) were initialized by:
π(0)πππ β²π = ππ
πππ β²π
(0) =βπ:ππ=πββ:πβ=π‘βπ:π·ππ=π β²πΏ(π·πβ, π )
βπ:ππ=π‘β πΏ(π·π ππ, π )
(13)
For the SMS-STAPLE approaches (Β§2.3), initial confusion matrixes were:
πππ β²π (0) = ππ
ππ β²π
(0) =βπ:ππ=πββ:πβ=π‘βπ:π·ππ=π β²πΏ(π·πβ, π )
βπ:ππ=π‘β πΏ(π·π ππ, π )
(14) Convergence of EM was detected when the average change in πΜ from π to π + 1 was less than π = 10β6, specifically:
1
πβ β β πππ (πΜππ β²π (π+1)
β πΜππ β²π (π))
π βπΏπ‘ π β²βπΏππ
π
(155)
where π is the total number of elements in all of the confusion matrices (i.e. β πΏπ ππxπΏπ‘).