ConceptNet - A Programming Language for Parallel Graph Algorithms

from its input edges, and addsdD(v)eto its output edges.

Section 4.5 compares performance between GRAPAL and a sequential implementation with Figure 4.3. For both the sequential and the GRAPAL implementation most work is in edge relaxation, with one edge relaxation per active edge per iteration. Compared to the sequential implementation the GRAPAL implementation has extra overhead for node iteration and barrier synchronization. For node iteration, the GRAPAL implementation iterates over each node per iteration while the sequential implementation iterates over only active nodes that were updated on the last iteration. The GRAPAL implementation must perform a barrier synchronization on each iteration, or graph-step. Figure 4.3 sorts graphs from smallest to largest and shows that the largest graphs get greater speedups due to a lower relative overhead.

activation. Thenextactfunction is used to combine activations by the binary reduce and by the activation state update.

All activations, weights and discounts are in the range[0,1]. In order to use FPGA logic efficiently we use fixed-point arithmetic by representing numbers with1integer digit and 8fractional digits. GRAPAL currently does not have a fixed-point type, so we use the type unsigned<9>and definemult fixed pointfor multiplication.

The sequential controller iterates spreading activation by first broadcasting to nodes with bcast start spreading activation() then issuing step() commands.

Before iterating, the query procedure (spreading activation) must first set the source nodes and set the edge weights. Source nodes are set to have an initial activation of 1with set sourceand other nodes are set to 0with clear sources. Edges are weighted by their relation types by set edge weights, which is first broadcast to nodes then forwarded to successor edges. After each initializing broadcast command, a step()command is issued to perform the graph-step that makes the changes to node and edge state.

To test ConceptNet, we used a small version of the ConceptNet semantic network (cnet small), which has15,000nodes and 27,000edges. Our tests run spreading activation for 8 iterations.

Figure 4.3 shows that the speedup of the GRAPAL implementation of ConceptNet over the sequential implementation is 7 times per chip. The relation between the GRAPAL implementation and the sequential implementation of ConceptNet is analogous to the two implementations of Bellman-Ford: For both implementations, most work is for operations on active edges. The sequential implementation keeps a FIFO of active nodes so edges are sparsely active and work is performed for only the active edges. There is overhead for the GRAPAL implementation due to iterating over all nodes, rather than just active nodes, and due to the cost of the barrier synchronization. Larger graphs generally have less relative overhead due to barrier synchronization. The ConceptNet graph is larger than the Bellman-Ford graphs so the ConceptNet speedup is better than the Bellman-Ford speedup.

global Glob { out Node nodes;

// query initialization

bcast clear_sources() nodes.clear_sources;

bcast set_source(int<20>) nodes.set_source;

bcast set_edge_weights(boolean, int<6>, unsigned) nodes.set_edge_weights;

// query computation

bcast start_spreading_activation() nodes.start_spreading_activation;

}

node Node { out Edge edges;

boolean is_source;

int<20> idx;

unsigned<9> discount, act;

// query initialization send clear_sources() {

is_source = false;

act = 0;

}

send set_source(int<20> set_idx) { if (idx == set_idx) is_source = true;

}

send set_edge_weights(boolean all_types, int<6> type, unsigned<9> weight) { edges.set_weight(all_types, type, weight);

}

// query computation

send start_spreading_activation() { if (is_source) {

edges.prop(one_fixed_point());

act = one_fixed_point();

} }

reduce tree update(unsigned<9> act1, unsigned<9> act2) { return nextact(act1, act2);

}

send update(unsigned<9> more_act) {

edges.prop(mult_fixed_point(more_act, discount));

act = nextact(act, more_act);

} }

edge Edge { out Node to;

int<6> type;

unsigned<9> weight;

// query initialization

fwd set_weight(boolean all_types, int<6> set_type, unsigned<9> new_weight) { if (all_types || type == set_type) weight = new_weight;

}

// query computation fwd prop(unsigned<9> act) {

to.update(mult_fixed_point(act, weight));

} }

unsigned<9> nextact(unsigned<9> act1, unsigned<9> act2) { return act1 + act2 - mult_fixed_point(act1, act2);

}

unsigned<9> one_fixed_point() { return ((unsigned<9>) 1) << 8;

}

unsigned<9> mult_fixed_point(unsigned<9> x, unsigned<9> y) { return (((unsigned<17>) x) * y) >> 8;

}

Figure 4.1: ConceptNet’s spreading activation in GRAPAL

void spreading_activation(int n_source_nodes, int* source_node_idxs,

float* default_weight, int n_rel_weights, int* rel_weight_types, float* rel_weights, int n_iterations) {

int i;

// initialize source nodes bcast_clear_sources();

step();

for (i = 0; i < n_source_nodes; i++) { bcast_set_source(source_node_idxs[i]);

step();

}

// initialize relation weights

bcast_set_edge_weights(true, -1, to_fixed_point(default_weight));

step();

for (i = 0; i < n_rel_weights; i++) {

bcast_set_edge_weights(false, rel_weight_types[i], to_fixed_point(rel_weights[i]));

step();

}

// run spreading activation iterations bcast_start_spreading_activation();

for (i = 0; i < n_iterations; i++) { step();

} }

Figure 4.2: Sequential controller for ConceptNet’s spreading activation in C

Dalam dokumen A Programming Language for Parallel Graph Algorithms (Halaman 57-60)