Proposed Framework: Unified Semi-Supervised Non-Negative Matrix Fac-

(USS-NMF) 57

• Random-walks, GNNs, auto-encoders and proximity-based methods are limited to only capturing k-hop local contexts for nodes.

• None of the SSL methods incorporate all important priors from the classic SSL As- sumptions, especially the cluster assumption.

• Structure-Aware models are all unsupervised in nature – limited by link density and ground-truth network connectivity patterns.

3.7 Proposed Framework: Unified Semi-Supervised Non- Negative Matrix Factorization (USS-NMF)

Proximity Matrix S

Label Matrix

Cluster membership

Context Embedding

Label Emb

Cluster Embedding

Node Embedding

Factorize N

q k

m m

Fig. 3.5 Unified Semi-Supervised Non-Negative Matrix Factorization (USS-NMF) N: number of nodes in graph|V|, m: size of low-dimension embedding space, q:

number of classes, k:number of clusters. Please refer to Table2.1for more details.

In our first contribution, we formulate the clusterability assumption of SSL as a cluster invariance property. We propose a semi-supervised node embedding framework, USS-NMF, that provides a means to enforce this property to learn high-quality clusterable representations of nodes. The proposed framework also incorporates other essential learning principles of SSL. Figure3.5illustrates the USS-NMF framework.

3.7.1 Unified Semi-Supervised NMF (USS-NMF)

We incrementally build the proposed framework here.

3.7.1.1 Encoding local invariance aka network structure

This is a fundamental component of any network embedding model. It allows for learning locally invariant node representations, i.e., connected nodes will have similar representations.

We obtain locally invariant representations by factorizing a proximity matrix that encodes the similarity between the nodes. Herein, we consider the Pointwise Mutual Information matrix, S, defined as the average transition probability in a window size of(t), in an identical setup [58]. We considert=2, i.e., taking only the first order and second order average transition probability between a pair of nodes into account [Refer to Section2.1.3for details]. SinceS is positive semi-definite as the connection weights are non-negative, unlike the formulation in [58,28], we factorize the proximity matrixS[Refer to Section2.1.2for details on NMF Algorithms] into two non-negative basis matrices of dimensionm— the node representation matrixU ∈R^m×N and the context representation matrixM∈R^m×N as with the objective (loss) function below,

Onetwork=min

M,U∥S−U^TM∥²:M≥0,U ≥0 (3.1) 3.7.1.2 Encoding supervision knowledge

In order to learn semi-supervised representations, we jointly factorize the label matrix,Y, along withS. We define the label matrix factorization term in Eqn3.2. WhereW ∈R^q×N is a weight penalty matrix that zeroes out all the label information of test instances. Specifically, W_i^T is equal to⃗0_q vector if the correspondingY_i^T is unknown and⃗1_q vector otherwise.

⊙is Hadamard or element-wise multiplication. Thus, element-wise multiplication of the participating matrices in the original label matrix factorization objective(Y−QU) with the penalty-matrixW will result in zero entries in all columns pertaining to the test node instances. This way, we only incorporate the train-label matrix factorization into our objective function. Q∈R^q×mis the label basis matrix.U ∈R^m×N is the common node embedding matrix. The supervision component is defined as follows,

Olabel=min

Q,U∥W⊙(Y−QU)∥²:Q≥0,U≥0 (3.2)

3.7 Proposed Framework: Unified Semi-Supervised Non-Negative Matrix Factorization

(USS-NMF) 59

3.7.1.3 Encoding local neighborhood invariance in label space

This component is the classical Laplacian regularizer for label smoothing,LS(S,QU)[Refer to Section2.1.5]. Label smoothing regularizer constrains the connected nodes to have similar labels, whereQU is the predicted/ reconstructed labels as defined in Eqn3.3. The learning of reconstructed labelsQU is supervised by the network proximity matrixSas below.

OLS=LS(S,Yˆ) =LS(S,QU) =Tr{(QU)∆(S)(QU)^T} (3.3)

3.7.1.4 Encoding semi-supervised cluster invariance

Here, we define the key component that allows for learning cluster invariant node representations. Unlike models that enforce Laplacian regularization on the embedding space or label space, we enforce constraints on the abstract space of clusters. Note that the clusters are not provided apriori. The proposed model is made to learn clusters such that nodes with the same/ similar labels are invariant to cluster assignments. This allows the model to learn representations with high label smoothness within the groups. This component primarily comprises two sub-components:

• Learning cluster assignment via orthogonality constraint: Let,H∈R^k×N repre- sents the cluster membership indicator matrix defined forknumber of clusters. We obtain H by projecting node embeddings,U on cluster basis C, asH =CU. We can restrict the clusters to have different node assignments and have (more or) less overlap with each other via block diagonal constraints. Here, we resort to blocks of size one by enforcing the orthogonal constraints, which encourages every cluster to be different from each other, i.e., we enforce HH^T =I_k similar to [29]. We can obtain cluster assignments as in Eqn3.4whereβ andζ are Lagrange multipliers.

O1= min

H,C,U≥0β∥H−CU∥²+ζ∥HH^T−I∥² (3.4)

• Encoding global context with cluster invariance: We enforce this constraint by applying a Laplacian regularization onH with label similarity-based proximity matrix E. We define the label similarity network over training data ofGasE= (W⊙Y)^T(W⊙ Y)∈R^N×N, where∆(E) =D(E)−E is the unnormalized Laplacian operator onE. With ∆(E), we define the cluster smoothing Laplacian regularizer, LS(E,H). The label-based similarity matrix introduces new edges between nodes of similar labels, which may be far away or not even connected in the original network,S. This allows the clusters to enforce a global context.

O =LS(E,H) =Tr((H)∆(E)(H)^T) (3.5)

Putting it altogether: Ogroup=O1+φO2, where φ is a multiplier. Cluster invariant representations that enforce similar cluster assignments to nodes with the same labels are enabled with this constraint. There is circular enforcement betweenHandU, i.e.,U learns fromH,H implicitly learns fromU. H explicitly learns from the cluster regularization term and non-overlapping/ orthogonality constraint. Therefore,H pushes two nodes with the same labels and similar neighborhood structures together into the same cluster, thereby leading towards having similar embeddings.

3.7.1.5 USS-NMF Model

In USS-NMF, the node representations,U are learned by jointly factorizing the local neighborhood proximity matrixS, label matrixY, inferred cluster assignment matrixH and are indirectly influenced by smoothing on the label and cluster space. The joint objective is given below,

O=αOnetwork+θOlabel+δOLS+Ogroup+λL2reg (3.6) whereα,β,θ,ζ,φ,δ,λ are hyper-parameters controlling the importance of respective terms in the Equation3.7–3.11. Since the joint non-negative constrained objective is not convex, we can iteratively solve the convex sub-problem for each of the factors M,U,C,Q, &H individually. Next, we derive multiplicative update rules for these factors following the methodologies described in [53].

3.7.1.6 Derivation of multiplicative update rules

Letψ1,ψ2,ψ3,ψ4,ψ₅be the Lagrange multipliers for the non-negative constraints on factor matricesM,U,C,Q,H respectively. USS-NMF’s loss function in Eqn3.6can be expanded and rewritten with Lagrange multipliers as follows:

L=αTr[SS^T−2SM^TU+U^TMM^TU] +βTr[HH^T −2HU^TC^T+CUU^TC^T] +θTr[W⊙ {YY^T −2YU^TQ^T+QUU^TQ^T}]

+ζTr[HH^THH^T−2HH^T+I] +φTr{H∆(E)H^T} +δTr{(QU)∆(S)(QU)^T}

+λTr(MM^T +QQ^T+CC^T+UU^T +HH^T) +Tr[ψ₁M^T +ψ₂U^T+ψ₃C^T+ψ₄Q^T +ψ₅H^T]

3.8 Evaluation Methodology 61

Dalam dokumen Structure-Aware Network Representation Learning on Graphs (Halaman 83-87)