VLSI Concurrent Computation for Music Synthesis

Carver was the origin, either directly or indirectly, of most of the ideas in this thesis. Ron Ayres wrote the ICL language that we used to compile the nMOS version of the chip. The approach allows the generation of rich musical sounds using models that are easy to control and have parameters that match many of the physical properties of musical instruments.

The generality of the approach to music synthesis is demonstrated by presenting some primitive mechanisms of sound generation. Instrumentation refinements are easily achieved by adjusting or rearranging various functional components. An even bigger problem with the shortcut methods of the past is that they have produced models that require updates to internal parameters at a rate that is many times what happens in real musical instruments.

In my thesis, I present a solution to the problem of generating realistic musical sounds. Details of the architecture implementation are presented in Chapter 4, VLSI Implementation, along with the introduction of new CMOS circuit techniques.

Modeling Musical Instruments

In the flute, the player's mouth and lips play the role of the exhaust slit. The clarinet player applies a constant source of pressure to the outside of the mouthpiece. There is a pressure wave inside the clarinet body with a wavelength proportional to the length of the body.

It is interesting to look in detail at the jet flow in and out of the pipe. At the jet end of. the pipe, Q+ is reflected as Q- with some change due to the presence of the beam;. We observe that (1) the amount of noise injected into the pipe is proportional to the flow in the pipe, and (2) the amplitude of the signal generally grows in proportion to the amount of flow in the pipe.

The nozzle composite function, F(q), is found and replaces the nozzle calculation in Figure 1.5. Also important to the production of a musical tone is the placement of the stroke and the period of impact.

Figure 1.1 shows taxonomy of modern orchestral instruments, it is not com- com-plete and for clarity does not necessarily use the proper technical names for each category

Computing Sound

Each UPE provides a sample of delay so that the signal at the output of the input processing section is. In the final form of the difference equation, the input variable x appears as Xn-2 instead of Xn, as in equation 5. At each step in the calculation, the product B X M is summed with the result of the previous step.

The linear interpolation function of the UP Es can be used for mixing signals. The coefficients of each resonator are chosen to model a particular mode of the musical instrument. For sufficiently large values of R (low damping), the zeros have little effect on the system response.

In this figure, the output of the attack resonator is fed to the input of the noise modulation section. The output of the noise modulation section is used to drive a parallel connection of second-order sections used as resonators. The input to or output of each resonator must be adjusted to compensate for the implicit gain of the resonator.

Considering only the first nozzle function, the system is a pipe open at both ends. Noise is added to the signal in an amount proportional to the amplitude of the signal. Typically, the loop gain is controlled by the gain of the nonlinear element G.

If G is just large enough, the system oscillates with a pure tone as it operates in the nearly linear region of the nonlinear element. Intuitively, we might think that the system should oscillate at the center frequency of the resonator. A detailed view of the composite model is shown in the form of a computational graph for our computing machine in Figure 2.23.

Figure 2.1 Sound-synthesis architecture. Processing elements are con- con-nected to each other and to the outside world through a reconfigurable in-terconnection matrix

A VLSI Architecture

The output noise level was calculated based on the number of bits required to represent such a level. The amplitude of the limit cycle in a second-order system is only a function of the damping coefficient R. The order of the nodes in the graph has been perturbed for a better fit.

The structure of the update buffer comprises two RAM structures superimposed on each other (Figure 3.9). Under control of the host computer, a transfer signal copies the contents of the first RAM to the second. The structure and function of IPE is evident from studies of longhand multiplication.

A sequence of four zeros is output and the high four bits of the result are generated. The input need not be 0, in which case input A is added to the higher N bits of the result Y. During the first half of the word cycle, the higher order output bits from the previous operation leave the array, B enters the array and is transferred to the storage register, 0 is is input to input A, and M is sign-extended for the previous operation.

During the second half of the word cycle, B is ignored, M is broadcast to the array, the bits. The delay from the time the input enters the IPE until the high 32 bits of the result exits the IPE is 64 bit periods-one-word-time delay. We enforce this equality by forcing any bits to the left of the binary point to zero.

As the number of instruments and the number of chips grow, the ratio of the number of coefficients provided by the host to the cycle time of the host memory system increases to a point where the host cannot keep up in real time. In our simple system configuration, latency is the more demanding of the two requirements and is determined by the time it takes the host to generate coefficients and transfer them to the processor chips. In the case of the piano, most coefficients remain constant and do not need to be rewritten to.

Figure 3 .1 shows a typical system configuration (many others are possible)

VLSI Implementation

The state of the flip-flop is reversed by connecting the top rail to ground via S F1 and pulsing cf>. The time during which the circuit must dynamically store its previous state on the capacitances of the internal flip-flop nodes is only the highest clock time. Reliable operation of the general, or set-reset, form of the flip-flop depends on the clock gradually descending.

This effect is clearly visible in Figure 4.5 as a downward bump in the track, representing the bottom rail during the falling edge of the clock. Two p-channel loaders are added, each connected from one input node of the flip-flop with Vdd>, as shown in Figure 4.6. In an alternative modification, the internal nodes of the flip-flop are connected to Vdd via two p-channel transistors.

The gates of the two new transistors are connected to a voltage (Vb) that places the transistors near the threshold; transistors supply a small amount of current (~ 10-G A) to the input nodes due to subthreshold conduction. In addition to eliminating the circuit's dependence on a falling clock edge, the two load devices aid in the static operation of the flip-flop. Previously, the circuit dynamically stored state on the internal node capacitance of the flip-flop when the clock was high and no SF was turned on.

Thus, there are output transitions on both the rising and falling edges of the clock signal. The bottom edge of the connection matrix shown in Figure 4.12 connects to the inputs and outputs of each IPE and to the outputs of the update buffer. The physical layout of the pass-through gate at each intersection in the connectivity matrix can have dramatic performance consequences.

The load line takes the place of the clock input to the first FF of each stage. Most of the surface is occupied by the processing elements and the connection matrix. The CMOS design for the IPE was a simplification and refinement of the nMOS UPE design.

Other Architectures

A shift register can be simulated using a RAM by sequentially reading and writing bits of memory. In this section, I discuss issues involved in the VLSI implementation of the series-series architecture. In this study, as in the implementation of the series-parallel architecture, I chose standard CMOS technology and static circuits.

The machining node layout area can be used to plan the layout of the entire chip. The width of the wafer is dominated by the width of the cross cells for the switch. The address of the operands and the results for each cycle are stored in another memory, the control memory.

The partial products of the results are iteratively summed over space; therefore, the delay through the structure increases linearly with the number of bits involved. The basic operation of the system is the same as for the simple system shown in Figure 5.5, with some important improvements. Each line in the graph illustrates the activity of one of the memory banks or arithmetic unit.

In the second half of the cycle, the arithmetic operations are completed while the second argument is fetched. We make the same assumptions about technology as for the serial-serial architecture implementation in Section A Serial-Serial Architecture. Therefore, a quarter of the cycles are 1.5 T seconds long, and the rest are T seconds long.

One column of PE wafers fills the chip surface in the vertical dimension and about half of the horizontal dimension. At a full speed clock of 54.4 MHz, the total power consumption of the chip is The results of the assessments from the previous three sections are shown in Tables 5.1 and 5.2.

Let Q be the number of cycles until the impulse response of the system reaches 1/e of its initial value. As with the gain, the phase at resonance in equation 1 is not constant.