Nature of Data Processing - Data Processing Procedures

3. Data Processing Procedures

3.1. Nature of Data Processing

Many scientific advances are based on accurate and long-term observations. For example, planetary motion was not understood until the seven-

3.1. Nature of Data Processirig 47 teenth century despite the fact that observations were made for thousands of years. We owe this understanding to several pioneers. In particular, Tycho Brahe recognized the need for and made accurate, continuous observations for 21 years; and Johannes Kepler spent 25 years deriving the three laws of planetary motion from Brahe’s data (Berry, 1898). The period of sidereal revolution for the terrestrial planets is of the order of ¹ year, but observational data an order of magnitude greater in duration were required to derive the laws of planetary motion. Because disastrous earthquakes in a given region recur in tens or hundreds of years, it could be argued by analogy that many years of accurate observations might be needed before reliable earthquake predictions could be made.

A present strategy in earthquake prediction is to carry out intensive monitoring of seismically active areas with microearthquake networks and other geophysical instruments. By studying microearthquakes, which occur more frequently than the larger events, we hope to understand the earthquake generating process in a shorter time. Intensive monitoring produces vast amounts of data which make processing tedious and dif- ficult to keep up with. Therefore, we must consider the nature of these data in order to devise an effective data processing scheme.

In a manner similar to that of Anonymous (1979), data associated with microearthquake networks may be classified as follows:

( 1) Level 0: Instrument location, characteristics, and operational details.

( 2 ) Level I : Raw observational data, i.e., continuous signals from seismometers.

(3) Level 2: Earthquake waveform data, i.e., observational data containing seismic events.

(4) Level 3: Earthquake phase data, such as P- and S-arrival times, maximum amplitude and period, first motion, signal duration, etc.

( 5 ) Level ^4: Event lists containing origin time, epicenter coordi- nates, focal depth, magnitude, etc.

(6) Level 5 : Scientific reports describing seismicity, focal mecha- nism, etc.

In order to discuss the amounts of data in levels CL4, we use the USGS Central California Microearthquake Network as an example. Level 0 data for this network consists of the locations, characteristics, and operational details for about 250 stations; this information is about 2 ^x

lo6

bitdyear.

Level 1 data consists of about bits/year of continuous signals from the seismometers at the stations. These data are recorded on analog magnetic tapes ( 4 tapesiday), and most of them are also recorded on 16-mm mi-

48 3 . Data Processing Procedures

crofilms (I3 rolls/day). The analog magnetic tapes are used because each can record roughly 300 times more data than a standard 800 BPI digital magnetic tape. But even four analog magnetic tapes per day are too many to keep on a permanent basis, so we save only the earthquake waveform data by dubbing to another analog tape. The dubbed waveform data for this network (level 2) consists of approximately 5 x l(r' bitdyear in analog form (about 75 analog tapes per year). These analog waveform data can be further condensed if we digitize only the relevant portions of the analog waveforms from the stations that recorded the earthquakes adequately.

That amount of digital data is about 5 x 10" bits/year and requires some 500 reels of 800 BPI tapes. Earthquake phase data (level 3) are prepared from the dubbed waveform data andor from the 16-mm microfilms, and consist of about 10" bitdyear. Finally at level 4, the event lists consist of about 5 x lo6 bitdyear and are derived from the phase data using an earthquake location program.

Since data processing is not an end in itself, any effective procedures must be considered with respect to the users. Because of the volume of microearthquake data, a considerable amount of processing and interpre- tation are required before the data become useful for research. In the following discussion, we consider the human and technological factors that limit our ability to process and comprehend large amounts of data.

Although some readers may not agree with our order-of-magnitude ap- proach, we believe that practical limits do exist and that in some instances we are close to reaching or exceeding them.

The basic unit of information is 1 bit (either 0 or ^{I ) .} A letter in the English alphabet is commonly represented by 8 bits or 1 byte. A number usually takes from 10 to 60 bits depending on the precision required. ^A page of a scientific paper contains about 2 ^x lo4 bits of information, whereas a color picture can require up to lo8 bitdframe to display. Thus a picture is generally much higher in information density than a page of words or numbers. For a human being or a data processing device, the limiting factors are data capacity, data rate or execution time, and access delay time. Although human beings may have a large data capacity and quick recall time, they are limited to a reading speed of less than 1000 words/min or lo3 bitdsec, and a computing speed of less than one arithme- tic operation per second. On the other hand, a large computer may have a data capacity of I@* bits, a data rate of lo7 bitdsec, and an execution speed of lo7 instructions per second.

Data processing requires at least several computer instructions for each data sample, and there are about 3 x lo7 seconds in a year. Hence, it is clear that a large computer cannot process more than ICY3 bits/year of data at present, nor can a human being read more than 10"' bits/year of infor-

Dalam dokumen Principles and Applications of Microearthquake Networks (Halaman 52-55)