Homogeneous, Concurrent Architecture

Introduction

Homogeneous Machines

It is clear that the communication capability of the network must be high to support the execution of interesting p:wblems. The definition of a homogeneous machine makes the accessibility of the state of the machine a function of distance.

Related Efforts

Sequential languages, by definition, are unable to take advantage of the concurrency available in a homogeneous machine. The goals of the Apiary machine and the actor programming model are aimed at a general class of artificial intelligence problems.

Scope and Outline

Computations occur in the leaves of the hierarchy, and the concurrent components of the computation are created and destroyed at the leaves, e:x-pa.Dder and shrink the logic graph of the computation. The actor model of programming has many similarities with object-oriented programming, and Bigården from Hev.itt [HevvittBO] is centered around these concepts.

Conclusion

Examples of the SELECT statement can be seen in the Gaussian elimination example in a subsequent section. The size of the crossbar switch is 1+log2N by 1+log2N, where N is the number of processing nodes in the system.

Concurrent, Object-Oriented Programming

Introduction

Many of the concepts of object-oriented programming have appeared in other languages, such as the Actor System [ClingerBl]. A twist is applied to the torus in each dimension to increase the duty cycle of individual processors.

Overview of Simula

Class attributes are called using the name of a reference variable followed by the attribute name separated by a dot. Code scope in a subclass includes the corresponding block level in the superclass.

When a virtual attribute of an object is invoked, the attribute definition of the lowest subclass to which the object belongs is used. The FIFOs or queues attribute ensures mutual exclusion between the various attributes of the object.

Concurrent Programming Examples

The volume, index content, locality, and rate of message traffic on the system all affect the number of tbs. The memories of processing nodes should not directly limit the number of objects that can exist in the system at the same time.

Comparison vvith CSP

B Conclusions

The probability of deadlock in other networks is a function of the size, queue, topology, and message traffic in the system. That is, the number of connections to be made to each node in the network is a function of the number of nodes.

Garbage Collection

Introduction

The problem of garbage collection has been an interesting problem for many years among the implementers of various languages such as Algol 68 [Van Wijngaarden69], Simula 67 [Birtwhlstle73] and LISP [McCarthy60] that allow for dynamic allocation of data structures. Garbage collection has been a minor part of operating systems for some time, but gained new importance with the implementation of Hydra on C.mmp [Wulf72,80]. Moreover, the problem of garbage collection in pure LlSP is somewhat more limited than the more general case of Simula [Arnborg72], insofar as LlSP objects are of fixed size and LJSP data structures can be of a limited topology.

However, the interactions between the system's processors introduce new problems, such as their synchronization when they need to perform tasks such as garbage collection.

The Object-Oriented Environment

A Description of the Algorithm

All the garbage collection tasks are told to uncheck their objects and reference variables. Some of the message sequences in the above loop can be concatenated into single messages, but are separated for clarity. One important part of the algorithm cannot be represented as part of a message sending routine.

Thus, only the sense of the tag bits needs to be changed for the system to be considered cleared.

Proof

Objects marked and subsequently trashed will not be collected in the next collection phase. Since messages sent in a processor that is not in the mark phase may contain unmarked pointers. In the process of marking objects, "ReceivedMark" messages are sent to objects referenced in other processors.

To track completion, the "StartInterval" and "EndInterval" messages are used to delineate a time period shared by all the processors in the system.

Simulation Results

After the initial set of objects is created in the simulated system, a random set of 40% of the total set of objects is set to be "executable". The objects on the list are left in place in the system and are simply noted for subsequent reference. After the collection pass, the system is stopped again and the objects collected by the algorithm are compared to those noted in the list.

The "notch" in the lifetime histogram at 2 seconds is due to the cycle time of the collection process.

Performance Analysis

If N is defined as the total number of objects, then O(N) is an upper bound for the complexity of the second pass. Assuming the worst case where neither object is garbage, the total complexity for both passes is 0(2N +Npt)' Further, if the average number of pointers in an object is M, then we have O(N(2 +M)). For now, it is possible to specify a number that is the average number of such scans required for each complete cycle of the garbage collection task.

Again, the time and bandwidth used to send such messages will have a detrimental effect on system performance.

Scaling of the Algorithm ........................................................ _ 89

Each link can then be associated with that part of the address that differs by the nodes it connects. With traffic 3, most messages can be routed to two links in a TREE regardless of its size. Again, the Boolean N-cube variants show increasing performance as the system gets larger.

In a virtual object environment, the size of available mass storage devices will determine the maximum number of objects that can be supported on the system.

Interconnection Issues

Introduction

The instruction set and internal architecture of the processor/memory node are of a high enough level to allow compilers to be written ·with some ease. That is, increasing the size of the system should not slow down the message traffic between neighboring nodes. Nodes would thus have a fixed set of three links, however, expanding the system requires: inserting nodes into all existing rings, as well as adding vertices.

This massive system reconnection makes the cube connection cycle undesirable from the start.

Interconnection Topologies and Queuing Models

The tree structure used here places all the processor1g nodes at the leaves of the tree. The branching ratio of the tree has a great effect on the communication properties of the tree. This distance is the same as the longest path found in a Boolean N-cube of the same size.

One additional channel must be provided in each dimension and between each row and column of nodes to accommodate the envelope connections.

Deadlock

The situation is defined in such a way that packets in the queues are never destined for the ne:xi. node and thus cannot be consumed. When the last link in a path is reserved, all links in the path are COMPLETED. Because none of the data in the message is stored on the network, breaking paths does not result in data loss.

The link data rates and message traffic used are the same as those used in the queued networks.

A Distance-Independent Measure of Locality

If the neighborhood size were the same as the number of nodes in the system, traffic in the system could be said to be uniform. The neighborhood is ideally the first a-nodes in the ordered list, where a must be a measure of location. The inverse of the geometric distribution is used to find which position in the list of nodes is represented by this probability:

For each of the topologies in question, the following relations hold, describing how many nodes R can be accessed by traveling exactly 1 communication links, where N is the total number of nodes in the network:

Simulation Results

The statistical measurements and histograms are typical of the data produced by the simulation programs. One of the most important characteristics of interconnection strategies is their performance as the network grows. The ARRA.Y and RING connections steadily lose performance as the system grows, making them unusable.

The ECUBE version of the Boolean N cube has been observed to be only slightly less powerful than NCUBE.

Wire ability of the Boolean N-cube

Each object can have Vvi associated with it a quantity for each port of the processing node. The object can then be moved to the neighboring node in that direction, placing it closer to the source of the message traffic. The difference in the volume of the two streams will be the garbage objects left in the node and never referenced.

To operate the disk drive, a specialized processor can form part of the node that communicates with the object memory. A full-scale implementation of the system presented in this thesis will undoubtedly raise new questions. A cross compiler must be developed to compile concurrent Simula as presented in Chapter 2 for execution in the processors of the test vehicle.

A Localized, Virtual Object Environment

Introduction

Maintaining Locality Among Object References

In the object-oriented environment, the communication topology changes when new objects are created, as old objects are removed and pointers are exchanged between objects. If the quantity is continuously updated to reflect the amount of message traffic to the object in each direction, it can be determined whether the object should be moved. This task can be performed by the communication processor or by special purpose hardware to avoid unnecessary load on the object processor.

So if x0 exceeds both the threshold and is the largest xi, it indicates that the object should be moved out of the processor it is currently in and moved to a nearby processor to take advantage of concurrency.

Providing a Virtual Object Space

Changes in control constants and process details allow it to adapt to a variety of facility environments. Transferring objects in this way will cause them to be considered resident on the disk node. Disk scanning in the implementation of the garbage collection algorithm can be partially transferred to this processor.

To maintain a level of performance consistent with other nodes in the system, nodes with disks will require space for a larger object table to accommodate those objects on disk.

Locating Objects in the Network

Berkling, "Reduction Languages for Reduction Machines", Proceedings of the 2nd Symposium on Com.puter Architecture, pp. Davis, "The Arcbltecture and System Method of DDMl: A Recursively Structured Data Driven Machine", Proceedings of the 5th Symposium on Computer Architecture , pp.210-215, Prill 1978. Patterson, "X-Tree: A Tree Structured Multiprocessor Computer _,\rchitecture", Proceedings of the 5th Symposium on Computer Architecture, pp.

Bashkow, "A large-scale homogeneous, fully distributed parallel machine I", Proceedings of the 4th Sympasium on Computer Architecture, pp.

Conclusions a.D.d Surnmary