FRAMEWORK

(1)

47 Chapter 3

FRAMEWORK

3.1. Theoretical Considerations 3.1.1. Underwater Image Formation

Underwater image capture is much affected by light propagation in water medium. Underwater light physics reduces expectations of quality of capture and processing results in an underwater visual sensor system.

Figure 3.1. Visualization of Underwater Light Propagation (Leonard, Arnold-Bos, & Alfalou, 2010)

Complex factors from the water medium and the visual sensor itself attribute to the degraded quality of an underwater image. However, in light physics perspective, the Jaffe–McGlamery underwater imaging model (Jaffe, 1990; McGlamery, 1980), visualized at the figure above, accounts the direct component, Ed, the attenuated light directly reflected from the subject, backscatter, Eb, the indirect attenuated light reflected from non-proximal turbid water particles, and forward scatter, Ef, the indirect attenuated light reflected from turbid water particles across the visual sensor. According to the model, the total irradiance, Et, or the total light received by the camera is expressed as:

(2)

48

𝐸 = 𝐸 + 𝐸 + 𝐸 3.1

Studies offer different approaches toward the estimation of direct component and backward scatter. Forward scatter is assumed to have negligible effects when the scene or subject is near the visual sensor. The state-of-the-art, which is the basis for most underwater image quality improvement, especially restoration, is the atmospheric scattering model:

𝐼(𝑥) = 𝐽(𝑥)𝑡(𝑥) + 𝐵(1 − 𝑡(𝑥)) 3.2 where I(x) is the degraded image, J(x) is the scene radiance or the undegraded image, t(x) is the transmission or the light unaffected by the turbid water particles, and B is the background or ambient light. Restoration creates deconvolutions from more physical information about light propagation in different water media to estimate the model parameters, recovering the scene radiance.

3.1.2. Motion JPEG

The encoding scheme of captured underwater visual data into an accessible, standardized format is one of the factors of the Raspberry Pi-based system that limits the rapid capture of high-resolution visual data. The Raspberry Pi does not have adequate video compression capability to encode .h264 videos with resolutions greater than 2 MP (1080p). For further information, H.264 or Advanced Video Coding (AVC), with .h264 file extension, is a video compression standard that achieves double the coding efficiency or half of the bit rates in comparison to earlier (MPEG) standards to confront the demands of higher compression rates for various network-based, real-time application. The lower bit rates denote lighter computation times during recording and compression. However, to account for this, the computing resources of the Raspberry Pi-based system enable the streaming of 8 MP video at 10 frames per second (fps) autonomously in MJPEG format.

The visual data is translated into .mjpeg format. Motion JPEG (M-JPEG) is a video

(3)

49

encoding format that compresses its frames into .jpeg, a popular lossy compression technique for data in digital photography.

3.1.3. Hypertext Transfer Protocol (HTTP)

The usage of HTTP as a video streaming link for Raspberry Pi-based camera allows efficient provision of captured underwater data towards different processes (scheduled capturing and real-time streaming) in the processor. HTTP is a component of the Internet Protocol suite that facilitates data transfer over a network utilizing a server- client model. In the local setup, a series of .mjpeg frames are provided by the Raspberry Pi-camera (the HTTP server) as long as continuous requests of the processor (the HTTP client) are initiated. The HTTP server encapsulates captured visual data (from the Raspberry Pi-based camera) into a deliverable format, and the HTTP client assembles the transferred visual data into a seamless video format. Proxies exist between the HTTP client and HTTP server to locally process external requests (from each other).

3.1.4. Power over Ethernet

The distance between the hermetically sealed Raspberry Pi-based visual sensor and the server computer with corresponding network devices housed in a neighboring data center encompasses several meters. An interface that provides medium-distance data connection and power over these devices should be vital for the feasibility of this system.

The underwater visual sensor is compliant with Power over Ethernet or PoE (IEEE 802.3af), a standard that incorporates a DC power line with the twisted pair Ethernet physical link. The visual sensor utilizes a PoE splitter configuring 44-56 V 300 mA from a PoE-compliant switch into a 5V, 2.4 A (maximum) power source for Raspberry Pi and its modules. Also, PoE provides a data link of (up to) 1000 Mbps between network devices for bidirectional data transfer.

3.1.5. SSH Tunneling

SSH Tunneling is a method of transporting data from one device to another through an encrypted SSH tunnel. Using an SSH tunnel is a secure connection between

(4)

50

two geographically distant devices without letting the other device know the network details of the local device to be accessed. These details, when publicly known, may compromise the security of the local network.

The appropriate mode of SSH Tunneling in the proposed system is remote port forwarding, wherein a remote device tries to access local resources. Setting up SSH tunneling requires opening the port on an SSH server that is seen publicly, and the establishment of an SSH tunnel that translates the connection request from the remote device to the local device in the network through an encrypted tunnel can bypass firewalls. SSH can be used if other protocols cannot cater to remote connections. The SSH Server may assume a domain name, instead of the public IP address of the server, for identification of the said local network on the internet. The establishment of an SSH Tunneling through remote port forwarding is visualized in the figure below.

Figure 3.2. Visualization of Remote Port Forwarding Establishment Processes

3.1.6. Image Statistics

An underwater image's global and local statistics provide information, especially about the degradations that hinder the visual quality. Notably, the interpretation of these statistics informs about the characteristics and quantity of degradations within an

(5)

51

underwater image. The statistics that are meaningful in the characterization of the local data are as follows:

A color histogram is a 2-dimensional statistic that presents the number of occurrences for each intensity level per color channel; the domain is all possible intensity levels, and the range is the number of pixels within an image. The histogram function H is expressed as:

𝐻 _{∈ , ,} (𝑘) = 𝐽 3.3

where 𝐽 is the number of occurrences per intensity level 𝑘, 𝑘 = 0, 1, . . . 𝐾 – 1, K is the number of intensity levels of a digital image (K = 256). A histogram is a reduced representation of an image where some information, especially the interaction of pixels intensities, is lost. However, this statistic visualizes the distribution of intensities, e.g., whether an underwater image has intensities concentrated on darker intensity levels or has intensities concentrated on a small range of intensity levels. This visualization aids in developing of novel methods or applications of states-of-the-art to redistribute the pixel intensities into different intensity levels, as an image of desirable quality should have high contrast, manifested by increased visibility on most intensity levels (A. C. Bovik, 2009).

One of the singular values that represents all pixel intensities within a color channel is the mean intensity (of a color channel), calculated as:

𝜇 _{, ,} = 1

𝑀 × 𝑁 𝐼 (𝑥, 𝑦)

3.4

As the mean includes all pixel intensities in its calculation, this measure can be affected by greater occurrences at certain intensity levels. Such statistical information supports the

(6)

52

observations from a color histogram, e.g., a histogram with higher occurrence of dark intensities has lower mean intensity.

3.1.7. Assumptions

The utilization of statistics provides unbiased and objective observations from underwater images. In contrast, reliance on subjective human visual perception for generating observations about an underwater image can be unreliable due to the ability of the human visual system to adjust to changes in brightness and color (A. C. Bovik, 2009). Nevertheless, the observations from the statistics of an underwater image allow the presentation of sound principles or assumptions for developing and applying specific underwater image enhancement methods toward locally acquired underwater image.

Current hybrid enhancement methods (Ancuti et al., 2018; Azmi et al., 2019; Lee et al., 2020; Mohd Azmi et al., 2019) provide the following principles based on their similar observation on extensive public datasets, usually from marine and coastal setups:

 Prominence of green intensities: The greenish visual appearance of an underwater image can be resolved as the more remarkable preservation of green intensities than red and blue intensities. Light propagation in water medium exhibits selective attenuation: shorter red wavelengths are absorbed faster than green or blue wavelengths (D. Huang, Wang, Song, Sequeira, &

Mavromatis, 2018), thus, significant attenuation to red intensities. However, the degradation of the red channel is not always observed in underwater images. The blue channel can also be significantly attenuated because the absorption of blue wavelengths by some organic matter in the water is greater than the absorption of the red channel by all water particles and organic matter (H. Lu et al., 2016). This assumption expects that histogram of color channels reveals a greater inclination of green intensities towards higher intensity levels and the relatively higher mean intensity of green (in comparison to red and blue).

(7)

53

 Gray World Assumption (GWA): This assumption of color constancy by Buchsbaum (1980) simplifies that the means of all color channels are similar prior to attenuation. In this regard, several Gray World Assumption-based color balancing techniques account for the assignment of a green channel as a reference channel and the development of color channel compensations from the disparity of mean green intensity to different mean intensities. As an illustration of referencing the green channel, balancing the red-green opponent color pairs (of color opponent theory) entails appending values proportional with green intensities to red intensities. Similarly, following trichromacy (representing color in RGB space), blue intensities should also be applied with values proportional to green intensities. Furthermore, these compensations may be related to von Kries hypothesis-based methods (Abdul Ghani, Aris, &

Muhd Zain, 2016; Ghani & Isa, 2014), wherein some numerical factors are applied throughout all color intensities to achieve color constancy (Ramanath

& Drew, 2014). This assumption expects that significant differences between mean intensities of color channels should exist.

 Reduced Contrast: Underwater images have low contrast: the difference in intensities in all color channels is reduced. Apparently, the reduced contrast of an underwater image is also attributed to the presence of particles and organic matter in turbid water (H. Lu et al., 2016). This assumption expects that histogram of color channels reveals concentrated occurrences of intensities at a limited range. In contrast, an image with good contrast maximizes all intensity levels to provide better details.

3.1.8. Underwater Image Enhancement

Enhancement is independent of the manipulations of the presented underwater image formation model. Instead, it depends on the visual appeal or the usefulness to succeeding analyses (Z.-M. Lu & Guo, 2017); enhancement methods improve the visual quality and/or the information content of an image (Haldar, 2018). These enhancement

(8)

54

processes can be further classified into its nature of the operation: (1) point operation or direct non-linear pixel intensities modifications that are independent of relationship towards neighboring pixels, (2) spatial operation or direct (mostly linear) pixel intensities modifications that account relationships towards neighboring pixel intensities and (3) frequency operation or modifications to the image in the frequency domain (e.g., modifications in Fourier transformed 2-dimensional image)..

Usually, these operations on underwater images are motivated by the observed degradations in color and contrast. Enhancement performs color correction since underwater images contain a greenish color cast due to the light wavelength attenuation (combined effects of absorption and forward scatter and backscattering) in a water medium. Also, enhancement improves contrast since objects of interest (in this case, fishes) are much similar in color to the background, and both objects and background are poorly illuminated. The color corrections and contrast improvements of underwater image enhancement are differentiated based on the processing scale: globalized, i.e., accounting for all pixel intensities, or localized, i.e., sampling a small yet informative portion of pixel intensities. Several enhancement methods unify traditional and/or recent methods into a singular hybrid framework, as underwater images always have combinations of degradations.

3.1.9. Color Balance and Fusion

Current underwater image enhancement frameworks are hybrid color correction and contrast enhancement of different operations. Color balance and fusion is one of the current state-of-the-art hybrid methods that address greenish color cast and dehazing.

(9)

55

Figure 3.3. Overview of Color Balance and Fusion Method

Color balancing corrects the color representations of an underwater image due to the light absorption and scattering properties of particles in the water medium. This method accounts for both the degradations to red wavelengths at several depths and/or the degradations to blue wavelengths by organic matter in turbid water through the gain factors derived from the green intensities; the G channel is the reference channel. The compensation of other color channels through a reference channel is inspired by von Kries hypothesis, a chromatic adaptation method that emphasizes the impositions on the captured information in the visual sensor (Chong, Zickler, & Gortler, 2007). A viable interpretation for this hypothesis is that some gain can be applied to each channel of a color image to preserve a reference white point. The compensations are expressed as:

𝐼 , ∈ , , = 𝐼 + 𝛼 (𝜇 − 𝜇 )(1 − 𝐼 )𝐼 3.5

where 𝐼 _,refers to the results of compensation at a color channel 𝑐 (R, G, B) of dimensions (𝑥, 𝑦) pixels, 𝐼 and 𝐼 refers to the color channel to be compensated and the green color channel, respectively, 𝜇 and 𝜇 refers to mean intensities of green channel and the other color channel, respectively and 𝛼 (𝛼 , 𝛼 ) is the user-defined parameter that controls the strength of compensation (usually set to 1). The result of underwater white balance is further corrected with Gray World algorithm that removes the illuminant color cast to achieve similar means for all color channels (Buchsbaum, 1980).

Generally, brighter intensities are enhanced well by any underwater white balance method. So, fusion-based contrast correction is applied to the results of color balance also to enhance the darker intensities and the details for all intensities. This method superimposes the weighted effects of Gamma Correction and Edge Sharpening from the white balanced image. A fusion input is the result of gamma correction that achieves a larger difference between the darker and brighter intensities through the application of the power-law equation:

(10)

56

𝐼 = 255 × 3.6

The nonlinear shifting by gamma correction is controlled by the parameter 𝛾.

Another fusion input is the edge details enhancement through modified unsharp masking, as this fusion input complements the decrease of details brought by global contrast enhancement of gamma correction. This method subtracts the estimated blur from an input underwater image,

I = I − β(1 − G ∗ I )) 3.7

with the Gaussian filter-based blur is estimated from the input image as:

G = × e . 3.8

The second term denotes the convolution of Gaussian blur and the input image. The filtering capacity of regulated by a parameter 𝛽. This method increases the details and the artifacts or unnecessary enhanced details of an underwater image.

The desirable effects of each fusion input, i.e., the apparent difference between darker and brighter intensities and the improvement of edge details, are combined through a fusion methodology that provides importance to higher image local quality and higher object saliency on different scales. This importance on each fusion input is represented as a weight map; an image detail with a higher weight has higher importance. A weight map is convoluted with a fusion input, and the results of the convolution to both fusion inputs are superimposed as the final version of the enhanced image. The weight map applied to both fusion inputs is derived from a normalized version of the Laplacian contrast weight (𝑊 ), saliency weight (𝑊), and saturation weight (𝑊 ).

(11)

57

Laplacian contrast weight is the result of applying the absolute value of a Laplacian filter to an image. A Laplacian filter determines the second derivative or the rate of the change in intensities on the luminance channel. Significant changes in the intensities determine the presence of an edge. This weight map provides relevance to edge and texture details.

To counteract unnecessary edge and texture details that are shown as artifacts.

Saliency weight quantifies the perception to an object relative to the background. The method of Achanta, Hemami, Estrada, & Susstrunk (2009) is utilized to generate the saliency weight, which is derived from the Euclidian distance between the mean intensities of an image the Gaussian-filtered blur at Luminance channel. The said method efficiently extracts full resolution saliency maps that emphasize salient regions while successfully disregarding artifacts. This weight map provides relevance to higher luminance areas.

𝑊 = 𝜇 − (𝐺 ∗ 𝐼) , 𝑐 𝜖 𝐿, 𝑎, 𝑏 3.9 The saturation weight map prioritizes high saturation regions by accounting for the differences between the luminance channel and the R, G, B color channels, as expressed below:

𝑊 = 1

3[(𝑅 − 𝐿) + (𝐺 − 𝐿) + (𝐵 − 𝐿) ] 3.10 Three sets of weights derived from both fusion inputs are consolidated to form a normalized weight.

W = W + W + W 3.11

(12)

58

W = W + δ/ W + (K × δ)

3.12 K is the number of fusion inputs, in this case, 2. The δ is a regularization term that accounts all inputs and is set to be 0.1.

Multiscale fusion at different scales or levels is performed to incorporate the weights to each fusion input. This fusion method minimizes the abrupt transitions in image intensities introduced by the weight maps through a series of Gaussian-based filtering of the fusion inputs and its corresponding weight maps.

Each fusion input is decomposed into a Gaussian pyramid, where pyramid levels or scales result from successive bottom-up applications of a (Gaussian filter-based) low- pass filter and reductions into a factor of 2 to perform multiscale fusion at Laplacian pyramid representation or decomposition (Burt & Adelson, 1983). The Laplacian pyramid is derived from the successive top-down difference between Gaussian pyramid layers and increases the same factor as the reductions. This process is visualized in a figure below.

Figure 3.4. Multiscale decomposition based on Laplacian pyramid. Level 0 in Gaussian pyramid denotes the original image.

Convolutions between the Laplacian pyramid decomposition of fusion inputs and its corresponding normalized weights in Gaussian pyramid representations are

(13)

59

performed. Then, the convolutions of the fusion inputs are consolidated in each level to form the final output of color balance and fusion method at the bottom level of a Gaussian pyramid representation; such process is the reconstruction of an image. The default number of levels is set to 10, an optimal value that compromises between resulting visual quality and processing times.

3.1.10. Underwater Image Evaluation

Assessment of visual quality and information content of an underwater image is usually performed by utilizing established subjective and/or objective evaluations.

Subjective evaluations involve the naïve or expert human participants rating images based on their perception of visual quality or information content. The assigned ratings could be numerical (e.g., a score from 1 to 5, a higher score means higher quality) or categorical (e.g., descriptive words: Bad, Average, Good). A subjective evaluation may utilize both reference and test images (double-stimulus) or only test images (single- stimulus) during the rating assignment. This evaluation methodology is the most reliable, provided that the standards which define explicitly the testing venue, display materials, rating systems, and visual acuity of the human participants, e.g., ITU-R BT.500 standard, are complied. Producing outstanding unbiased results from subjective evaluations requires substantial time, effort, and finances (P. Guo, He, Liu, Zeng, & Liu, 2021; Wu, Yuan, & Cheng, 2020).

On the other hand, an objective evaluation does not require the same amount of resources consumed by subjective evaluations. This evaluation strategy emulates human visual perception (the basis of subjective evaluation) through quantitative metrics developed from image statistics. These metrics provide an abstraction from the accounted image statistics to describe an aspect of an image's visual quality and/or information content. For instance, Peak Signal-to-Noise Ratio (PSNR), a metric based on the difference between the intensities of reference and test image, is interpreted as the number of errors accumulated by the test image during the enhancement process. Such metrics

(14)

60

must be verified with sound observations from image statistics and validated with subjective evaluations during its development and implementation to ensure reliability in objective evaluation methodologies. As these metrics generalize human visual perception, some degraded scenes may not be accounted for and may cause inconsistencies in quantification. Sometimes, objective evaluation is further augmented with subjective evaluation to strengthen the observations to improve images.

3.1.11. Quality Assessment Metrics

Objective evaluations use either one or both types of metrics: full-reference metrics and no-reference metrics. Full-reference image quality assessment metrics require both the reference (noise-free) and the improved image (by an enhancement method) in its calculation. Meanwhile, No-reference/Blind image quality assessment only requires improved images. Towards the objective evaluation of underwater visual data, full-reference metrics are discouraged because of the shortage of pristine, noise-free underwater images. Conversely, no-reference metrics are recommended (M. Yang, Du, et al., 2019).

The presented no-reference quantitative metrics quantify underwater images' information content and edge details. Information entropy (or Shannon entropy) is the measurement of the uncertainty of an image. This metric intended for a 2-dimensional intensity image is extended to a color image is expressed as:

H _{∈ , ,}= − ∑ p(i) log p(i) . 3.13

where H is the information entropy of an image per color channel c (R,G,B), p(i) is the probability distribution of an image at intensity level, i, and L is the total number of intensity levels (for a digital image that is represented as an 8-bit integer, L = 2⁸ = 256.

The H of a color image is the root-mean-square of H for all color channels:

(15)

61

H = 3.14

Information entropy quantifies the amount of information in an image. Accordingly, the more information an image presents, the higher the quality of an underwater image.

Notably, Mangeruga, Bruno, et al. (2018) report that information entropy is the quantitative metric related to subjective evaluations of an expert panel..

On the other hand, average gradient is an elaborate measure of contrast and edge details.

𝐴𝐺 = ₍ ₎₍ ₎∑ ∑ ^{( , )} ⁽ ^{, )} ^{( , )} ^{( ,} ⁾ 3.15 where G is the average gradient per color channel, c, and L and R are the dimensions of the color channel. The summation part of the equation represents the magnitude of gradients derived from all points on the color channel of an underwater image, and the summation is divided further by the total number of pixels in a color channel, thus, average. G is the average of G for all color channels.

𝐴𝐺 = 𝐴𝐺 + 𝐴𝐺 + 𝐴𝐺 3

3.16 Gradient provides the points in a field that increases rapidly. Rapid changes between adjacent pixel intensities signify the presence of edge details or discontinuities within an image. Deriving the mean gradients signifies the amount of contrast or differences. A higher average gradient indicates clearer and sharper image details (Dong, Yang, Wu, Xiao, & Xu, 2015).

3.2. Conceptual Framework

The presented theories and elements are incorporated in the underwater camera system framework as shown below.

(16)

62

Figure 3.5. Underwater Visual Sensor System Block Diagram

The study presents a natural light-assisted underwater camera system that has two modes of operation for local and remote access of underwater visual data: (1) the scheduled capture which automatically captures and manages underwater video frames (as image files) from underwater scenes of inland freshwater aquaculture, and (2) the real- time streaming which steadily provides a real-time streaming link. These modes of operation should be automated and continuous.

Once the data has been accessed, further analysis and processing are performed on the gathered data. The total statistics (histograms, mean intensities) are generated from the local underwater image datasets. The interpretations of these statistics should determine if the gathered underwater images are appropriate for applying color balance and fusion, a state-of-the-art underwater image enhancement framework hybrid. The motivation for applying the said framework to an underwater image is formed by a series of assumptions to the statistics of the concerned underwater images.