Objective Performance Testing and Quality Assurance of Medical Ultrasound Equipment
Johan M. Thijssen, Gert Weijers, Chris L. de Korte.
Clinical Physics Laboratory, Children's Hospital Radboud University Nijmegen Medical Center
Nijmegen, The Netherlands [email protected]
Abstract—The goal of this study was to develop a test protocol that contains the minimum set of performance measurements for predicting the clinical performance of ultrasound equipment and that is based on objective assessments by computerized image analysis. The post-processing look-up-table (LUT) is measured and linearized. The elevational focus (slice thickness) of the transducer is estimated and the in plane transmit focus is positioned at the same depth. The developed tests are: echo level dynamic range (dB), contrast resolution (i.e., “gamma” of display, #gray levels/dB) and –sensitivity, overall system sensitivity, lateral sensitivity profile, dead zone, spatial resolution, and geometric conformity of display. The concept of a computational observer is used to define the lesion signal-to-noise ratio, SNRL (or Mahalanobis distance), as a measure for contrast sensitivity. The whole performance measurement protocol has been implemented in software. Reports are generated that contain all the information about the measurements and results, such as graphs, images and numbers. The software package may be viewed and a run-time version downloaded at the website:
http://www.qa4us.eu.
Keywords—computational observer, contrast resolution, contrast sensitivity, image quality, objective assessment, performance testing, quality assurance, spatial resolution, QA4US
I. INTRODUCTION
The first efforts to develop performance assessment methods for medical ultrasound equipment started already more than 30 years ago [1]. Progress has been slow for various reasons: evolution of equipment features and performance is still very fast and some ambiguity seems to be present within committees in defining a clear end point: should it be a technical standard, or a quality assessment from the user's point of view. Another limitation of the methods involved in quality standards so far has been the involvement of subjective assessments in case of image quality. This limitation also holds for a recent AAPM report [2], although it has evidently been based on the end point of quality of use, i.e. imaging performance, and for the paper by Dudley [3]; see also [4].
Although, a recent paper [5] is based on objective assessment, a serious limitation is the use of gray level rather than relative echo level (in dB), as was proposed by the authors [6], [7] and that is incorporated also in new IEC standards being developed.
Therefore, the dB scale made available in this paper for the estimation of most of the quality measures: overall dynamic
range, contrast resolution, contrast sensitivity, spatial resolution and overall system sensitivity.
In this paper, the approach is followed that the performance measurements are carried out on echographic images which are digitally acquired in a computer and recalibrated by using the postprocessing look-up-table (LUT) and by measuring the gray level of scattering objects with a known contrast in a tissue mimicking phantom. This approach allows for a transfer of gray levels to relative echo levels (dB) and for an objective assessment of image quality by using the concept of a
“computational observer” [8]. Image quality is assessed by measuring the lesion-signal-to-noise ratio (SNRL), see [9], which is to be considered as a “contrast sensitivity” measure.
Other measures related to contrast are the overall “echo level dynamic range” and the “contrast resolution” and the spatial resolution. The depth of the azimuth focus was chosen to coincide with the elevation focus because it can be expected that with this setting the performance of a transducer is optimal over the depth of focus range. Furthermore, for a minimum set of performance measures this single focus is considered to be sufficient.
The validity of the approach of using recalibrated echo levels (dB) was investigated by using simultaneously made radio frequency signal acquisitions as a reference. The methods are further illustrated by experiments regarding an assessment of image quality in relation to the choice of ultrasound frequency/bandwidth of the system.
II. TEST OBJECTS
The performance measurements were carried out by using
“tissue mimicking” phantoms made of urethane rubber base material (ATS Labs Inc., Bridgeport CT 06608, U.S .A.). The imaging performance testing was carried out by using the ATS model 539 phantom for low frequency transducers, and ATS model 550 phantom for high frequency transducers. The main objects within the phantom to be used in the assessment scheme are thin wires in a cross-like pattern and cylindrical objects of known (“nominal”) scattering contrast as compared to the surrounding material (Fig. 1). The nominal contrast values of these latter, “solid lesions” with respect to the background gray level are specified by the manufacturer in decibels (dB), for the frequency range 2.25-7.5 MHz (ATS model 539) and ≥7.5 MHz (ATS model 550).
Figure 1. Scheme of tissue mimicking test phantom containing various objects, base material urethane rubber (Model 539; ATS Inc., Bridgeport CT 06608, USA.).
The measurement of the position and width of the
“elevation” focus was made with the “slice thickness” phantom (ATS model 538N).
III. EQUIPMENT AND EXPERIMENTS A. Equipment
The experiments were performed with a SONOS 7500 (Philips Medical Systems, Andover, USA). A linear array transducer (3-11 MHz) was used for most of the measurements for illustrating this paper.
This system is provided with a custom designed raw data output of scan lines in radio frequency format (16 bits rf data). Further outputs available are a standard analog video (PAL) format and a digital (DICOM) format output. The measurements taken from the analog video output were acquired by using a frame grabber (USB VideoBus II, Belkin Inc., Compton, CA 90220, U.S.A.). In that case the image resolution was (352x288) pixels, and the gray level range (0-255). When the frame grabber was used during acquisition of test images, the user had the opportunity to adjust the equipment settings in case of image gray level under- or overflow was observed.
The compression was set at 60 dB and the gain was adjusted such that the imaged objects of the used phantom did not reach maximum gray levels. The transmit focus was placed at the depth of the elevation focus (see further on for definition).
The TGC was adjusted to obtain a practically constant gray level over depth in a homogeneous part of the phantom. The phantom used in the reported experiments was a model 550 (ATS Labs Inc.) small parts/breast phantom.
B. Software
The acquisition and analysis software package
“QA4US®” was written in MATLAB (release 14; The Mathworks Inc., Natick MA, USA) and a runtime version is available at the website of the Clinical Physics Laboratory (http://www.umcn.nl/clinicalphysicslab; http://www.qa4us.eu).
C. Experiments
The first experiment was concerned with the comparison of contrast data acquired with the rf-interface vs.
those digitized from the video output. The rf data were processed by the log(Hilbert transform), i.e., complex demodulation subroutine (Matlab) followed by taking a logarithm. The compression factor to be applied to the processed rf-data was established from a comparison between the obtained straight-line fit of dB echo level vs. nominal contrast and the same plot obtained from the DICOM video data. The same procedure was followed for the estimation of the gain factor.
A second experiment was performed to illustrate the importance of the contrast resolution (i.e., the number of gray levels per dB echo level, in the paper called the “gamma”), the contrast sensitivity measure (i.e., lesion-signal-to-noise ratio, SNRL), and the spatial resolution measurements for the assessment of imaging quality of medical ultrasound equipment.
The measurements were performed with three different transducers: a linear array transducer (L11-3; center frequency 7 MHz) and two phased array transducers (S8 and S12; center frequency 5.5 and 8 MHz, respectively). The images were optimized for each transducer separately (TGC, gain). The preprocessing, i.e., gray level dynamic range, was kept at 60 dB.
IV. ASSESSMENT OF IMAGING QUALITY
A. Standard equipment settings
1) Monitor: Although not part of the present paper, the TV monitor of the equipment should be adjusted until the gray wedge in the image is optimally displayed by using the
"intensity" and "contrast" controls.
2) Transmission output level: The transmitted output level should be noted for future reference by using either the displayed "thermal index"(TI), or "mechanical index" (MI) value (AIUM/NEMA, 1992) as the output magnitude indicator.
3) Preprocessing controls: The dynamic range, i.e., the compression of gray levels by the logarithmic amplifier of the equipment, should be fixed and typed-in by the user in the software protocol.
4) Time-gain-compensation: The time-gain- compensation (TGC) in most equipment is preset by the manufacturer such that it is anticipating tissue-like. If no preset is incorporated, the user has to adjust the TGC in a later phase of the protocol by changing the TGC slide potentiometers until the image obtained from an object-less zone in the phantom is seen as a homogeneous average gray level over a maximum attainable
Figure 2. Echographic image obtained from scattering
"lesions" and measurement discs inserted in them.
depth. The overall gain is displayed on the screen of most medical equipment, either as a number (e.g., 0-100, or in decibel (dB).
5) Transducer frequency: Another part of the preprocessing is concerned with the transducer: focal depth (to be set further on in the protocol) and frequency characteristics.
So a choice has to be made about the part of the total available bandwidth that is used in the chosen clinical application.
6) Postprocessing (LUT: The postprocessing comprises the "look-up table" (LUT) used to select the coding of echo levels (i.e., after log compression and digitization to 8 bits) that are stored in the scan converter before displaying at the screen. For performance testing, a linear LUT has to be chosen because only then a purely logarithmic relation exists between the echo level and the display voltage, and, therefore, also the image gray level. This relation is needed for some of the performance measurements. The LUT that corresponds to the “gray map”, or “mode”, selected by the user, or being part of the chosen clinical application, is estimated from the displayed gray level wedge. In the developed software, the estimation of the LUT is performed by interactively inserting a rectangular Region-of-Interest (ROI) box in the image. Any deviation of the gray levels vs. position relation from a linear curve can visually be noted and the eventually present linear LUT can be selected. If not present, the software calculates a third order regression line and generates a correction table to recalibrate the gray levels to a linear LUT.
B. Performance measurements
1) Slice thickness/elevation focus: In general, array transducers have a fixed focus cylindrical lens at their surface to narrow the transmitted sound field. This focusing action is in the direction perpendicular to the scanning plane, hence the term
“elevation focusing”, or “slice thickness reduction”. A special phantom is used to measure the depth where the elevation focus is located (ATS, model 538N). A manual shifting of the transducer over the phantom surface yields a shifting, horizontal bar image with varying width being displayed that can be used to obtain the slice thickness vs. depth. An image has to be stored at five, or more, positions of the transducer. Each image is analyzed after interactive placement of a rectangular box over the bar. The gray levels within the box are laterally averaged and then plotted vs. depth. By using a gamma of 4 (it is assumed here that the image contains 4 gray levels per dB echo level), the gray level vs. depth plot is transformed into a dB vs. depth plot and for each stored image the -6 dB width is estimated by the software (taking into account also the 45o angle). Finally, by using the data of all the stored images, a plot of the -6 dB bar width vs. depth is produced and a parabola is fitted to the data points. The coordinates at the minimum of the parabola indicate the depth position of the elevation focus and the -6 dB width at the focus.
2) Contrast Resolution and echo level Dynamic Range:
The contrast resolution is represented by the “gamma” of the system. When estimating the total span of decibels over the full
range of gray levels (i.e., from zero to 255, corresponding to the 8 bits of the scan converter), the displayed echo level “dynamic range” is obtained. This range comprises both the dynamic range of the preprocessing (prior to the AD-converter of the scan converter) and of the (linear) post processing LUT.
The measurements proceed then with scanning of the contrast objects (cylinders) of the phantom (Fig. 2). Each object has to be scanned five times at slightly different positions of the transducer, and also a homogeneous region at the same depth should be scanned and processed five times. A circle is interactively fitted into the displayed discs and the mean of the gray levels within the top semi-circle is estimated for each object. This size of the circle is maintained during the interactive analysis procedure implemented in the software. The ensemble mean and standard deviation of the four most consistent measurements are plotted for each disc against the nominal contrast value in dB, as shown in Fig. 3. The estimated contrast resolution (i.e., gamma) and the overall dynamic range are then estimated. The gamma is used to estimate dB echo levels from measured gray levels in several image quality characteristics, which are discussed in the following.
3) Image contrast sensitivity: The contrast sensitivity is defined as: the smallest echo level contrast, with respect to the surrounding scattering medium, of a lesion of a certain size that can be detected with a certain probability (e.g., 75%). The purpose of the measurements in this paper is to characterize just the imaging system. In practice it is useful to assess the
"lesion signal-to-noise ratio", SNRL [9],[10],[11],[12], or Mahalanobis Distance (MD), as was reported before [6].
This SNRL can be considered as the performance measure of a "computational" observer [8] carrying out a detection experiment under "SKE" conditions (i.e., "signal- known-exactly" [13]).
The SNRL data vs. absolute values of nominal contrasts are fitted to a single straight line (shown vs. real dBs in Fig. 4) and the SNRL value at the graph corresponding with +/- 3dB echo contrast is obtained and used as the “contrast
Figure 3. Plot of mean gray level of lesions (average of five measurements and standard deviation; left abscissa) vs. nominal contrast values of scattering lesions. Linear regression yields contrast resolution ("gamma" = # gray levels/dB) and echo level “dynamic range” (# dB for full range (0-255)).
Figure 4. Plot of “lesion signal-to-noise ratio” (SNRL, or Mahalanobis Distance, MD) vs. nominal contrast values of scattering lesions. Value at 3 dB is taken as the measure of “contrast sensitivity”
sensitivity” characteristic, for the chosen application of the equipment tested.
4) System sensitivity/Penetration depth: The overall imaging sensitivity of the transducer and equipment combination can be assessed by estimating the maximum depth at which echoes from a scattering medium can be imaged. The transmission pulse (output) level and the overall gain do not influence the outcome of this method if saturation of the gray values is avoided at one extreme, or if not too low gray levels are chosen at the other. Because of the linear LUT (i.e., log compression preserved), with these precautions the sensitivity curve will be vertically shifted, i.e., remain undisturbed, when transmit level or overall gain would be changed.
Given an optimized TGC setting, the software estimates the mean echo level over a user defined depth range.
For this purpose a rectangular ROI is overlaid on the central region of the image, spanning the depth range in which
"speckle" resulting from the scattering medium can be seen. The laterally averaged gray level profile vs. depth is then calculated.
The mean echo level in the range where these echoes are still approximately constant (i.e. from a few cm below the transducer till a few cm beyond the focus) is estimated. The depth at which this mean level has dropped by 6 dB is then estimated.
5) Lateral sensitivity profile (linear array): The linear array transducer integrity can be tested by analyzing the gray level averaged over depth vs. the lateral position in the image from a homogeneous part of the phantom. The image acquired for the system sensitivity measurement is optimally suited for this purpose. The software takes the moving average of the depth averaged echo level obtained for two adjacent image lines. A graph is generated showing this mean echo level vs.
lateral position.
6) Spatial resolution (2D in-plane point-spread- function, IP-PSF): The spatial resolution is defined as the axial, or lateral, full-width-at-half-maximum (FWHM), i.e., -6dB width, of the image of a small object at the installed, optimal, focus (i.e., the coinciding elevation and azimuth foci). For this purpose, the vertical column of wires is scanned and the image of the wire nearest to the azimuth focus is interactively selected by placing a measurement box around it and then analyzed by the software. The acquisition of the wires is repeated an additional four times in order to be able to estimate the precision of the PSF measurement. In fact, the scanning of wires in the phantom results in the assessment of the “line- spread-function” at the chosen depth. The developed software estimates the position of the maximum gray level present in the measurement ROI. Then, by using the gamma (see section:
Contrast Resolution), the gray levels are recalculated to dB scale [6]. For the axial resolution measurement, a parabola is fitted to the three highest gray levels. The reason for taking only these three levels is the observation that often the slope of the trailing part of the curve is low or disturbed by a secondary peak, so, a trade-off had to be made between precision and accuracy. In the lateral direction, a parabola is fitted to the gray level points above the -20 dB level with respect to the 0 dB level maximum (Fig. 5) The software then uses the parabolas to obtain the FWHM (i.e., the -6dB width) and the FWTM (i.e., the “full-width–at-tenth-maximum”, or -20 dB width) in axial and lateral directions.
Figure 5. Spatial resolution curves with parabolic fit
Figure 6. Comparison of gray level encoding and contrast resolution for video image data vs. rf data. Ensemble mean gray level of contrast discs vs. nominal echo levels (dB). Lower data and linear regression line: log(AM demodulated rf) data; lower-but-one data and regression: data from lower curve after compression by a factor of 2; upper-but one data and regression: data from lower-but one curve after additional gain factor of 12 dB; upper curve: data and regression from simultaneously measured DICOM data.
V. EXPERIMENTALRESULTS
The results of the contrast measurements made with simultaneous video- and rf-acquisitions are discussed with Fig.
6. In this figure, the lowest data points and fitted straight line are the results from the rf data after the log(Hilbert) processing. It can be concluded that the straight line fits the data points closely.
The slope of the line corresponds to a dynamic range of 117 dB.
The compression chosen for the video processing within the equipment was 60 dB, so a compression factor of 2 was applied and the lower-but-one data and line fit was obtained. It may be noted that the slope of the new line fit is practically identical to the one obtained for the DICOM video data obtained after linearization of the postprocessing LUT (upper points and line fit). After an additional processing of the rf data with a gain factor of 4 (i.e., upward shift of the number of gray levels of data points equivalent to 12 dB), the upper-but-one data points and line fit were obtained. A close correspondence between the video and rf derived data points is present.
The effect of the choice of a transducer on the quantitative set of performance parameters is illustrated by Table 1. It may be noted that changing the center frequency should not result in an increasing contrast resolution, i.e., the gamma should remain constant. This was not the case for one transducer. The effect of frequency on the contrast sensitivity is complex because both the axial and lateral resolution have an impact.
Also the contrast and the pixel SNR are incorporated in the SNRL [8], but these latter factors should remain constant at different frequencies if an adequate phantom is used. It should be remarked, however, that Browne [14] observed a non-linear with frequency attenuation coefficient in urethane rubber.
The linear array transducer (L11-3) yielded a superior contrast sensitivity, as well as axial resolution as compared to the other transducers (Table 1). Therefore, the system performance in detecting lesions in the phantom is considerably better using this transducer.
TABLE I. QUALITY ASSESSMENT OF THREE DIFFERENT TRANSDUCERS.
FHWM [mm]
Transducer
type Central frequency
[MHz]
SNRL
At 3 dB Axial Lateral S8 5.5 2.1 0.34±0.01 0.79±0.06 S12 8.5 2.9 0.30±0.04 0.64±0.07
L11-3 7 6.0 0.17±0.01 0.72±0.04
SNRL : lesion signal-to-noise ratio; FWHM: full-width-at-half-maximum, i.e., the in-plane spatial resolution
VI. CONCLUSIONS
The performance testing as described in this paper can be carried out with a minimum number of test objects. The image measurements, the software-estimated quantitative data and the test report are directly shown on the PC monitor. The data are compiled in a complete test report and all this information is stored in an easily retrievable file. The authors for these reasons expect wide acceptance of the developed methods, at least in the larger hospitals. The Matlab software written by the authors is well documented, and an extensive User’s Guide (Handbook) is made available. An executable version of the original Matlab Source code is available and it can be downloaded at the website of the authors. A full paper on this topic by the authors is in press [15]
The authors anticipate that the standardization activities of national and international organizations will continue. However, as long as no a priori distinction is made between technical (performance) testing, e.g. of bandwidth, pulse shape, beam profile, noise figure, etc., and performance of use (image quality, Doppler velocity) it may take much time to produce useful standards. In the meantime, new equipment
developments are still becoming available faster than standards.
This paper will hopefully contribute to a wider introduction of performance testing of medical ultrasound equipment in hospitals, which, at least in the Netherlands, is becoming part of the equipment quality assurance protocol for the institutional certification.
REFERENCES
[1] American Institute of Ultrasound in Medicine, Standard for Real-Time Display of Thermal and Mechanical Acoustic output Indices on Diagnostic Ultrasound Equipment, AIUM/NEMA, Rockville, 1992.
[2] Goodsitt MM, Carson PL, Witt S, Hykes DL, Kofler JM. Real time B- mode ultrasound quality control est procedures. Med Phys 1998;25:1385-406.
[3] Dudley NJ, Griffith K, Houldsworth G, Holloway M, Dunn MA. A review of two alternative ultrasound quality assurance programs. Eur J Ultrasound 2001;12:233-45.
[4] Gibson NM, Dudley NJ, Griffith K. A computerised quality control testing system for B-mode ultrasound. Ultrasound Med Biol 2001;27:1697-711.
[5] Browne JE, Watson AJ, Gibson NM, Dudley NJ, Elliott AT. Objective measurements of image quality. Ultrasound Med Biol. 2004;30:229-37.
[6] Thijssen JM, Wijk MC van, Cuypers MHM. Performance testing of medical ultrasound equipment. Eur J Ultrasound. 2002; 15: 151-64.
[7] Wijk MC van, Thijssen JM. Performance testing of medical ultrasound equipment: fundamental and harmonic modes. Ultrasonics, 2002; 40:
585-91.
[8] Lopez H, Loew MH, Goodenough DJ. Objective Analysis of Ultrasound Images by Use of a Computational Observer. IEEE Trans Med Imag 1992;11:488-95.
[9] Smith SW, Wagner RF, Sandrik JM, Lopez H. Low contrast detectability and contrast/detail analysis in medical ultrasound. IEEE Trans Sonics Ultrasonics 1983;30:164-73.
[10] Wagner RF, Insana MF, Brown DG. Unified approach to the detection and classification of speckle texture in diagnostic ultrasound, Opt. Eng.
1986;25:738-42.
[11] Thijssen JM, Oosterveld BJ. Performance of echographic equipment and potentials for tissue characterization. In: Proceedings NATO-ASI Mathematics and Computer Science in Medical Imaging. Viergever MA and Todd-Prokopek A, eds. Springer, Berlin, 1988. Pp. 455-68.
[12] Valckx FMJ, Thijssen JM, van Geemen AJ, Rotteveel JJ, Mullaart R.
Calibrated Parametric Medical Ultrasound Imaging. Ultrasonic Imag 2000; 22: 57-72.
[13] Swets JA, Pickett RM. Evaluation of Diagnostic Systems, Academic Press, New York, 1982.
[14] Browne JE, Ramnarine KV, Watson AJ, Hoskins PR. Assessment of the acoustic properties of common tissue-mimicking test phantoms.
Ultrasound Med Biol 2003;29:1053-60.
[15] Thijssen JM, Weijers G, de Korte CL: Objective Performance Testing and Quality Assurance of Medical Ultrasound Equipment. Ultrasound Med Biol 2006; in press.