Exploring accurate invariants on polar harmonic Fourier moments in polar coordinates for robust image watermarking

(1)

Exploring accurate invariants on polar harmonic Fourier moments in polar coordinates

for robust image watermarking

Mingze He, Hongxia Wang, Fei Zhang, and Yuyuan Xiang

Abstract—In moment-based watermarking schemes, the accuracy of the moments is crucial for constructing robust watermarking schemes. The robustness of the watermarking scheme relies heavily on the proper representation of the moments.

Despite the importance, current theoretical research on accuracy is very limited in watermarking techniques. To this end, we propose a novel robust image watermarking scheme based on accurate polar harmonic Fourier moments (PHFMs). Specifically, the accurate PHFMs computation based on polar pixel tiling with nearest neighbor interpolation (PPTN) is designed. This computation is general and used for embedder and extractor. This ingenious design eliminates geometric and numerical integration errors and also avoids the distortion interaction caused by watermarks. Also, an improved quantization strategy is applied to the embedding process, and satisfactory imperceptibility is obtained. The watermark is extracted without the host image.

The experimental results show the excellent robustness of the proposed watermarking scheme to common image processing attacks, geometric attacks, and some kinds of compound attacks.

The proposed scheme is superior to the state-of-the-art image watermarking schemes.

Index Terms—Image watermarking, Polar harmonic Fourier moments, Robustness, Polar pixel tiling, Geometric attacks.

I. INTRODUCTION

D

IGITAL image watermarking is an important task in the field of information hiding, which is considered an at- tractive strategy for tracking origins and protecting copyrights and can limit digital image piracy to a certain extent [1], [2].

Robust watermarking schemes [3] embed invisible watermark signals (copyright information, authentication information, de- vice codes, etc.) into the digital host image. Unlike zero watermarking schemes [4], robust watermarking schemes do not require a Trusted third-party certification Authority (TA) and do not necessitate a large amount of storage space, making them one of the most widely used image watermarking schemes. It is not surprising, as operational costs are essential to business decisions. Therefore, a common practice in digital watermarking technology is to directly embed the watermarks in the host image without auxiliary information, also called blind watermarking.

This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 62272331 and 61972269 and Sichuan Science and Technology Program under Grant 2022YFG0320.

M. He, H. Wang, F. Zhang, Y. Xiang are with the School of Cyber Sci- ence and Engineering, Sichuan University, Chengdu 610065, China (e-mail:

[email protected], [email protected], [email protected], xi- [email protected]).

Corresponding author: Hongxia Wang.

The proposed watermarking scheme based on polar harmonic Fourier moments is a blind watermarking algorithm, which can be used for active forensics to combat piracy in social networks and protect creators’ legitimate rights and interests. We discuss the excellent performance of the proposed watermarking scheme in three aspects: Accuracy - the image descriptors used in the proposed scheme represent the image content with a satisfactory accuracy; Imperceptibility - the watermarked image generated by the proposed scheme cannot be captured by the human visual system;Robustness- the watermarked signal embedded by the proposed scheme is unaffected by geometric or signal perturbations that may be introduced by the adversary.

Digital image-based creations and APPs continue to emerge, providing us with ever-increasing ease. Along with this, illegal acts such as piracy and infringement are becoming increas- ingly prevalent, damaging the intellectual property rights and legitimate income of digital image owners and affecting the creative enthusiasm of digital image creators. For this reason, adequate copyright protection of digital images is a significant research issue.

Watermarking schemes based ondeep learning (e.g., Con- volutional Neural Networks (CNN)) have shown impressive performance. However, two inherent properties leave them a long way to go for active forensics (e.g., digital watermarking) [5]. The first inherent property is that the robustness of these schemes is unsatisfactory [6], and the robustness decreases significantly after being subjected to geometric attacks and adversarial examples [7]. Another property is that the decision- making mechanism is challenging to understand [8], and this black-box nature reduces the credibility of the extracted watermarks.

Watermarking schemes based on hand-crafted feature are a hot research topic due to their strong robustness and inter- pretable theoretical analysis. Designing a robust watermarking scheme resistant to geometric attacks has been challenging.

Currently, a usual way is to embed watermark information into geometric invariants (e.g., Fourier–Mellin transform [9], histogram shapes [10], orthogonal moments [11], etc.) to obtain geometric invariance. Among them, orthogonal moments perform better in overcoming the semantic gap due to the independence of their basis functions and the geometric invariance. The current common practice to improve the robustness of orthogonal moment-based watermarking algorithms is to replace the better orthogonal moments, which is effective but not ingenious. This practice does not consider the effect of

(2)

the accuracy of the orthogonal moments on the watermarking scheme and ignores the correlation between accuracy and watermarking robustness. Almost all existing watermarking schemes based on orthogonal moments have the problem of low accuracy (i.e., high errors), especially geometric and numerical integration errors (see III-C for a description).

Motivated by the above facts, a robust image watermarking scheme based on accurate polar harmonic Fourier moments (PHFMs) is designed to track origins and protect copyright.

In this paper, the accurate computation of Polar Pixel Tiling with Nearest neighbor interpolation(PPTN) is used to replace the most commonly orthogonal moment computation method (Zero-Order Approximation(ZOA)) to obtain high accuracy.

To the best of our knowledge, this is a very early work to eliminate numerical integration and geometric errors in orthogonal moment-based robust watermarking schemes. The effect of moment accuracy on robustness has rarely been explored clearly before. The key contributions of our work are the following three-fold:

• We first derive an accurate numerical computation of PHFMs by using a polar pixel tiling (PPT) technique, which can effectively eliminate integration and geometric errors to obtain highly accurate PHFMs. The ablation experiments demonstrate that the robustness of the watermarking scheme is improved by exploring the exact invariants. In addition, this accurate computation is generic for most orthogonal moments.

• We propose the PPTN computation method, which is exquisitely designed for watermarking schemes. The popular high-precision interpolation spreads the distortion caused by the embedded watermark. The nearest neighbor interpolation is introduced to reduce the effect of watermark distortion on the moment accuracy and to ensure the moment accuracy in the extractor. The proposed PPTN method provides the guideline for the design of orthogonal moment-based watermarking.

• We give an improved quantization strategy to compensate for the omission of the quantization strategy in [12].

The resulting imperceptibility of watermarked images is superior to the previous quantization method with the same embedding strength and embedding capacity.

We validate the robustness of the proposed scheme using various image attacks, including common image processing attacks, geometric attacks, and compound attacks. Experiments against these attacks are necessary and comprehensive. The experimental results show that the proposed scheme outperforms other state-of-the-art watermarking schemes in terms of robustness and imperceptibility.

II. RELATEDWORKS

Digital watermarking consists of at least two players: the embedder and the extractor. The watermarked image generated by the embedder is usually distorted during transmission to the extractor, where more distortions imply a more substantial de- gree of damage to the watermark. In particular, the geometric distortions destroy the synchronization between the embedder and extractor, significantly decreasing the robustness. Many

researchers have proposed to resist geometric distortions by exploiting the geometric invariants to address this issue.

Orthogonal moment-based watermarking. The performance of the watermarking schemes are determined by the orthogonal moments. There are three main types of orthogonal moments, including Jacobi polynomials type (e.g., Zernike moments (ZMs) [13]), eigenfunctions type (e.g., Bessel- Fourier moments (BFMs) [14]), and harmonic functions type (e.g., radial harmonic Fourier moments (RHFMs) [15]), polar harmonic Fourier moments (PHFMs) [16]).

Alghoniemy et al. [17] first proposed a robust image watermarking scheme based on image moments and verified the feasibility of the image moments in the watermarking algorithm. Based on the inspiration of this strategy, ZMs-based robust watermarking schemes [18]–[22] have been proposed one after another. In [23], Xin et al. proposed an accurate computation method for ZMs based on polar pixel tiling. Yang et al. [24] proposed a novel scheme using RHFMs. Based on RHFMs, Wang et al. [25] introduced a fast computation method using FFT and designed a robust watermarking scheme resistant to complex desynchronization attacks. Maet al.[12]

made an effort to the accurate computation of orthogonal moments and proposed an image watermarking algorithm based on Gaussian numerical integration (GNI). Unfortunately, GNI- based computational methods can only mitigate geometric and numerical integration errors and have extremely high computational complexity. Hu and Xiang [26] presented a robust reversible watermarking (RRW) strategy based on low- order ZMs. Tang et al. [27] proposed a new RRW scheme by improving [26]. The essence of RRW is to trade imperceptibility for reversible property, and the robustness is not improved. Recent works introduce trinion/quaternion/octonion orthogonal moments [28]–[30] and fractional-order orthogonal moments [31]–[33]. However, these algorithms are limited in the accuracy of orthogonal moments.

Non-orthogonal moment-based watermarking.There are two types of watermarking schemes: spatial domain watermarking and transform domain watermarking. To resist geometric distortions, watermarking schemes based on transform domain (e.g., Fourier–Mellin transform [34], Discrete Wavelet Transform (DWT) [35], Discrete Cosine Transform (DCT)) [36], Dual Tree-Complex Wavelet Transform (DT CWT) [37]

are gradually becoming the most popular ones. Kang et al.

[34] proposed a near uniform log-polar mapping (ULPM) and embedded watermark in discrete Fourier transform coefficient to resist rotation and scaling attacks. Huang et al. [36] gave a spread spectrum scheme with adaptive embedding strength (SSAES) to guarantee imperceptibility. Huan et al. [37] designed watermarking scheme using DT CWT to resist rotation attacks not exceeding 20 degrees. The latest watermarking scheme [38] based on autoconvolution function obtains more comprehensive robustness in the spatial domain, but these mid- and high-frequency watermarks are not robust to blurring attacks and Gaussian noise.

Deep learning-based watermarking.Inspired by the CNN, deep neural networks have been introduced for image watermarking [39]–[43], termed as deep image watermarking.

HiDDeN [39] is a classical end-to-end robust watermarking

(3)

network that introduces the noise layer to simulate the image distortion in the communication channel. ReDMark was proposed in [40] for watermarking. Specifically, a deep diffusion watermarking network is trained end-to-end to conduct a blind robust watermarking. In MBRS [41], Jia et al.used SE Block [44] to learn better features in the embedding and extraction stages, and the proposed scheme has good performance in terms of JPEG robustness and image quality. Although some similar neural network-based watermarking schemes [42], [43]

have been proposed subsequently, as mentioned in Section I:

its inherent two properties dictate that such schemes have a long way to go in the practical application.

III. POLAR HARMONICFOURIER MOMENTS(PHFMS) For completeness, we briefly review the foundations of PHFMs (see [16] for a description). PHFMs are a set of continuous orthogonal moments, which are also an improvement of RHFMs.

A. Definition of PHFMs

Assume thatf(r, θ)is the central region of the host image, a PHFM with an order of n(|n| ≥ 0) and a repetition of m(|m| ≥0)as follows:

Pnm= 2 π

Z 2π 0

Z 1 0

f(r, θ)Hnm(r, θ)rdrdθ, (1) where(·)represents the conjugate of complex number and the basis function H_nm(r, θ) consists of T_n(r) and exp(jmθ) is defined by the following equation:

Hnm(r, θ) =Tn(r)exp(jmθ), (2) whereexp(jmθ)is the Fourier factor in the angular direction and the radial basis function Tn(r)is given by Eq. (3).

Tn(r) =





 1/√

2, n= 0

sin[(n+ 1)πr²], whilenis odd cos(nπr²), whilenis even.

(3) The orthogonal properties ofexp(jmθ)andHnm(r, θ)over the unit circle are as follows:

Z 2π 0

exp(jmθ)exp(jm^′θ) = 2πδmm^′, (4) Z 2π

0

Z 1 0

Hnm(r, θ)Hkl(r, θ)rdrdθ= 2

πδnkδml, (5) whereδ is the Kronecker delta.

Based on the theory of orthogonal complete functions, the host image f(r, θ) can be approximately reconstructed with finite sequence of PHFMs. For known PHFMs with maximum ordernmax and maximum repetitionmmax, the reconstructed imagefˆ(r, θ)is given below.

fˆ(r, θ) =Re(Pnm, Hnm(r, θ))

=

nmax

X

n=0 mmax

X

m=−mmax

P_nmH_nm(r, θ). (6)

Fig. 1. Computation of orthogonal moments in image inscribed circle.

B. The ZOA-based PHFMs Computation Method

The most commonly used approximate computation for PHFMs isZero-Order Approximation(ZOA) method [16]. For an N ×N image f(x, y), we map the inscribed circle of the digital image onto a unit disk according to the following formula

xq =2q−N−1

N , yp =2p−N−1

N ,(p, q= 1,2, ..., N), (7) Since PHFMs are computed by a continuous function in Eq. (1), its computational formula cannot be directly adapted to discrete image functions f(x, y). The ZOA-based PHFMs computation method converts continuous integrals to discrete summation forms.

P_nm≈P_nm^′ = 2 π

N

X

p=1 N

X

q=1

H_nm(r_p,q, θ_p,q)f(r_p,q, θ_p,q)∆x∆y, (8) where r_p,q = q

x²_q+y_p², and θ_p,q = arctan(y_p/x_q), ∆x =

∆y= _N².

The ZOA method described above suffers from geometric and numerical integration errors. Most existing moment-based watermarking schemes use this ZOA computation method, which compromises the excellent invariance of orthogonal moments, resulting in reduced robustness of the watermarking scheme.

C. Error analysis of PHFMs

The ZOA computation method is generally subject to geometric and numerical integration errors, as shown in Fig. 1.

The orange circle represents the computational area defined by the orthogonal moments, and the white square grid is the actual computational area. These computational errors can even dominate watermarking scheme performance [5].

We aim to meet the computational accuracy requirements in watermarking tasks by reducing these errors. Next, we define these two types of errors.

Let D = {(r, θ) : r ∈ [0,1], θ ∈ [0,2π)} be the unit circle and ρp,q be the (p, q) pixel at (rp,q, θp,q). Let Θ be the intersection region of D with the union of pixels whose centers and corners lie outside the circle. Going back to Eq.

(1),P_nm can be decomposed as follows

(4)

Pnm=2 π

X X

(r_p,q,θ_p,q)∈D

Z Z

ρ_p,q

f(r, θ)Hnm(r, θ)rdrdθ

+ Z Z

Θ

f(r, θ)H_nm(r, θ)rdrdθ.

(9)

In the ZOA method, a pixel is included in the computation of the PHFMs if its center falls within the boundary of the unit disk D. otherwise, the pixel is discarded. Thus there is the errors E_t betweenP_nm^′ andP_nm.

E_t=P_nm^′ −P_nm

= 2 π

N

X

p=1 N

X

q=1

H_nm(r_p,q, θ_p,q)f(r_p,q, θ_p,q)∆x∆y

− 2 π

X X

(rp,q,θp,q)∈D

Z Z

ρp,q

f(r, θ)H_nm(r, θ)rdrdθ

− Z Z

Θ

f(r, θ)Hnm(r, θ)rdrdθ.

(10)

We define the sum of the first and second terms as the Numerical Integration Error En , and the third term as the Geometric Error Eg.

Eg=− Z Z

Θ

f(r, θ)Hnm(r, θ)rdrdθ. (11)

E_n= 2 π

N

X

p=1 N

X

q=1

H_nm(r_p,q, θ_p,q)f(r_p,q, θ_p,q)∆x∆y

−2 π

X X

(rp,q,θp,q)∈D

Z Z

ρp,q

f(r, θ)Hnm(r, θ)rdrdθ, (12)

The numerical integration error En is caused by the numerical solution of double integration obtained using the approximation algorithm. The computational area defined in the orthogonal moments is not equal to the actual computational area, which leads to geometric error E_g. The existing watermarking algorithms based on orthogonal moments are approximated using the (ZOA) algorithm to find the numerical solution, so errors Et=Eg+En arise from this.

IV. PROPOSEDPPTN-BASEDPHFMS COMPUTATION METHOD

To eliminate losses caused by geometric and numerical integration errors, a novel PHFMs computation approach based on polar pixel tiling with nearest neighbor interpolation (PPTN) is proposed in this section.

A. The PPTN-based PHFMs Computation Method

The above ZOA-based computation method described in Section III-B derives PHFMs on the square region, which inevitably suffers from geometric error, as shown in Fig. 1.

Therefore, we use sector pixels instead of square pixels to avoid geometric error, as shown in Fig. 2. To derive PHFMs

Fig. 2. Illustration of proposed PPTN-based PHFMs computation method.

on the sector region, we rewrite the definition of PHFMs in Eq. (1) as the following

Pˆ_nm= 2 π

U

X

u=1 (2u−1)V

X

v=1

f˜(r_uv, θ_uv)h_nm(r_uv, θ_uv) (13) where U denotes the number of ring-shape section divided uniformly along the radial direction from the unit disk, and V is the number of sectors contained in the innermost ring- shape section. f(r˜ uv, θuv) denote the pixel values of non- square pixels for (ruv, θuv) ∈Ωuv, as shown in Fig. 3. The non-square pixelsΩuv are non-overlapping and are subject to the following constraints.

[

(u,v)

Ω_uv=D for

D={(ruv, θ_uv) : ((r_uvsinθ_uv)²+ ((r_uvcosθ_uv)²≤1}, (14) and

Ω_uv∩Ω_u^′_v^′ =∅ for∀(u, v)̸= (u^′, v^′). (15) According to the summation formula of arithmetic progres- sion, it is easy to know that the unit disk is divided intoV U² parts, as shown in Fig. 2. At this stage, the geometric error has been eliminated. Now we move on to the numerical integration error.

The numerical integration error is due to using an approximate algorithm to obtain the numerical solution, which is described in Section III-C. To this end, we eliminate the numerical integration error by deriving the analytical solution of h_nm(r_uv, θ_uv), i.e., by using the Newton-Leibniz formula directly.

To facilitate the subsequent discussion, we expand Eq. (13) into the following form.

Pˆnm=2 π

U

X

u=1 (2u−1)V

X

v=1

f˜(ruv, θuv)

× Z Z

Ω_uv

T_n(r)exp(−jmθ)rdrdθ,

(16)

The integral over the sectors Ωuv in Eq. (16) can be expressed as Eq. (17).

(5)

uv

( )s

ruv r_uv^{( )}^e

( )s

uv ( )e

uv

( ,r_uv_uv)

0 1

1

Fig. 3. Illustration of the polar pixelΩuv at(ruv, θuv).

Z Z

Ω_uv

T_n(r)exp(−jmθ)rdrdθ= Z r_uv^(e)

r^(s)_uv

Tn(r)rdr Z θ^(e)_uv

θ^(s)_uv

exp(−jmθ)dθ,

(17)

wherer^(s)uv andr^(e)uv are the starting and ending distance of Ω_uv, respectively. As shown in Fig. 3,θuv^(s)andθ^(e)uv mean the starting and ending angles ofΩ_uv, respectively.

To avoid numerical integration error, we first retrieve the analytical solution of the PHFMs basis function by using the Newton-Leibnizformula. The analytical solutions of the radial basis function and the angle factor are given by Eqs. (18) and (19), respectively.

The above theoretical derivation avoids these two types of errors but leads to interpolation errorEi.

Ei= ˆPnm−Pnm

=2 π

U

X

u=1 (2u−1)V

X

v=1

f˜(ruv, θuv)

× Z Z

Ω_uv

Tn(r)exp(−jmθ)rdrdθ

−2 π

Z Z

D

f(r, θ)Hnm(r, θ)rdrdθ,

(20)

The reason is that the two-dimensional digital image usually consists of a set of square pixels in the computer storage structure. By comparing Fig. 1 and Fig. 2, we can confirm that the location of the sector pixels on the polar coordinate plane does not match the location of these square pixels. For this reason, we use interpolation algorithms to obtain pixel values in the polar coordinate plane. We adopt the nearest neighbor interpolation method instead of the common bicubic interpolation method in our watermarking scheme. The reason for this is that the watermark embedder modifies the host image, and the bicubic interpolation predicts the polar pixel in the4×4neighborhood pixel region, resulting in distortion interactions caused by the watermark in the watermark extractor. Therefore, the nearest neighbor interpolation is more suitable for the proposed watermarking scheme (or rather,

all orthogonal moments-based watermarking schemes). We validate our opinion by ablation experiments in Section VI-E.

The proposed PPTN method in this section is generic and applicable to other continuous orthogonal moments. The key to applying PPTN lies in the derivation of analytical solutions, as demonstrated by Eqs. (18) and (19). Based on the mathematical theory of Antiderivative, every continuous function has a primary function. Therefore, PPTN is applicable to all continuous orthogonal moments, including the three types mentioned in Section II: Jacobi polynomial-based, eigenfunctions-based, and harmonic function-based moments.

For the purpose of sustainability study, we attempt to derive analytical solutions for each type individually. In practice, we find that analytical solutions for orthogonal moments based on Jacobi polynomials and harmonic functions are easily derivable. However, deriving the analytic solution of the eigenfunctions-based moments (BFMs) is challenging due to the extraordinarily complex construction of its basis function.

Consequently, PPTN is applicable to all continuous orthogonal moments except BFMs.

B. Stability of PHFMs

In this subsection, we discuss the geometric invariance of PHFMs obtained after PPTN computation.

1) Rotation invariance: Suppose f^′(r, θ) = f(r, θ−α) is the image that rotates the f(r, θ) by α degrees. Pˆnm(f^′) and Pˆnm(f) represent the PHFMs of f^′(r, θ) and f(r, θ) calculated by Eq. (13), respectively, and satisfy the equation:

Pˆ_nm(f^′) = ˆP_nm(f)exp(−jmα). Then we take the absolute value of the above equation:|Pˆ_nm(f^′)|=|Pˆ_nm(f)|. It can be found that the magnitudes of PHFMs are rotation invariant.

2) Scale invariance: To calculate the PHFMs, the image f(r, θ)is mapped into the unit circle:(rsinθ)²+(rcosθ)²≤1.

The same image function is obtained for the scaled images after normalization, so the magnitudes of PHFMs are scale invariant.

3) Flipping invariance: Suppose f^hf(r, θ) and f^vf(r, θ) represent f(r, θ) being flipped horizontally and vertically, respectively. The relationship between magnitudes of flipped images can be established from the definition of PHFMs:

|Pˆ_nm(f)|. Therefore the above relationship can be written as

4) Invariance to length-width ratio changing: The invariance of PHFMs to length-width ratio changing is heavily dependent on scale invariance. The M ×N size image is resized to(M+N)/2×(M+N)/2size, and the interpolation algorithm can fix the image distortion. The scaling invariance of PHFMs combined with the interpolation algorithm can effectively resist the attack of length-width ratio changing.

V. PROPOSED WATERMARKING SCHEME

In this section, we propose a novel robust image watermark scheme based on accurate PHFMs to resist common image processing operations and geometric attacks. The proposed

(6)

Z r^(e)_uv r_uv^(s)

Tn(r)rdr=









 1 2√

2(r_uv^(e)²−r_uv^(s)²), n= 0 cos[π(n+ 1)r^(s)uv

2

]−cos[π(n+ 1)ruv^(e) 2

]

2π(n+ 1) , while nis odd sin(πnr^(e)uv

2

)−sin(πnruv^(s) 2

)

2π(n+ 1) , while nis even.

(18)

Z θ^(e)_uv θ^(s)_uv

exp(−jmθ)dθ=





 j

m[e^−jmθ^uv^(e)−e^−jmθ^(s)^uv], m̸= 0 θuv^(e)−θ^(s)uv, m= 0.

(19)

watermarking scheme consists of two main phases: watermark embedding and watermark extraction. The first phase generates the watermarked image using the PPTN-based PHFMs and improved quantization strategy. The second stage extracts the watermark information of the target image and verifies its copyright legitimacy.

A. Watermark Embedding

The basic rule of watermark embedding: firstly, the host image is mapped to the polar coordinate plane, then accurate invariants on PHFMs are obtained from the host image using the proposed PPTN computation method; Next, the magnitudes of the invariants is randomly selected for embedding using the improved quantization strategy and the private key; the watermarked image is constructed using the quantized PHFMs invariants. Assuming F = {f(x, y),0 ≤ x < Nl,0 ≤ y < Nw,} is a grayscale host image, with N = Min(Nl, Nw)representing the length of its shorter side.

Let I={f(x, y),0≤x, y < N} be the central region of the host image andW ={w(i),1≤i≤Lw}be a binary random watermark information. If the color image is used as input, we convert it to YCbCr space and take the luminance Y as the host image I. The procedure of watermark embedding is described in detail as follows.

Step 1. Calculating accurate PHFMs of host image using the PPTN method. Before calculating the PHFMs, the host image I is first mapped into the polar coordinate plane. The PHFMs matrix Pˆ_nm of the polar image is obtained by the PPTN method in Section IV. Since the(n_max+1)(2m_max+1) sizePˆ_nmis a complex matrix, we take the|Pˆ_nm|as the candi- date value to ensure that the binary watermark information can be embedded. From the definition of PHFMs, we know that the Pˆnm is symmetric with respect to mmax= 0. Hence, we only choose Pˆnm withmmax>0 as the setSfor watermark embedding.

Step 2. Encrypting watermark information. To improve the security of the watermark, we encrypt the watermark information using the tent chaotic mapping. First, a one- dimensional chaotic sequence T = {t(i),1 ≤ i ≤ Lw} is generated by the given initial seedK1, as described in Eq. (21).

Next, chaotic and watermark sequence is XORed to obtain the

encrypted watermark informationW˜ ={w(i),˜ 1≤i≤L_w}, e.g., Eq. (22):

xl+1=T(xl) =

xl/α, xl∈[0, α) (1−xl)/(1−α), xl∈[α,1], (21) whereα∈(0,1), and theK1 must be different fromα.

W˜ =XOR(W, T). (22) Step 3. Embedding watermark sequence.Z={z(i),1≤ i ≤ Lw} are chosen at random from set S using key K2

to improve the security of watermarking scheme. Then, we quantize the encrypted watermarkW˜ bits intoZ. To the best of our knowledge, the most advanced watermarking scheme [12] based on PHFMs embeds the watermark as follows

z^w(i) =

(q(i) + 3/2)∆, mod(q(i) + ˜w(i),2) = 1 (q(i) + 1/2)∆, mod(q(i) + ˜w(i),2) = 0,

(23) where q(i) = ⌊z(i)/∆⌋, ⌊·⌋ is the floor function, and ∆ denotes the quantization step aimed to establish a balance between robustness and imperceptibility of watermark. z^w(i) is the watermarked version ofz(i).

Since the magnitude is non-negative, the quantized magnitude is always greater than the original magnitude in Eq.

(23). To distinguish between the two cases (w(i) = 1˜ and

˜

w(i) = 0), a large modification distance is used when mod(q(i) + ˜w(i),2) = 1, which is in need of improvement.

In other words, the quantized strategy in Eq. (23) does not minimize the modification distance.

To solve the above problem, we give an improved quantization strategy to reduce the modification distance, as shown in Eq. (24). A simple example is shown in Fig. 4. In Eq. (23), a watermark bit w(i) is embedding by quantizing the z(i) and changing 3∆/2−γ distance for satisfying the condition:

z(i)≥∆/2 and mod(q(i) + ˜w(i),2) = 1. In our Eq. (24), a watermark bit w(i) is embedding by quantizing the z(i) and only changing∆/2 +γdistance for same condition. The following relation is easily obtained:

∆/2 +γ <∆<3∆/2−γ. (25) As a result, the total change is smaller (i.e., better imperceptibility) by our quantization strategy at the same embedding strength. We also verified this by ablation experiments

(7)

z^w(i) =







(q(i) + 3/2)∆, z(i)<∆/2 and mod(q(i) + ˜w(i),2) = 1 (q(i)−1/2)∆, z(i)≥∆/2 and mod(q(i) + ˜w(i),2) = 1, (q(i) + 1/2)∆, mod(q(i) + ˜w(i),2) = 0,

(24)

Δ

0 2Δ 3Δ 4Δ

0 zone

1 zone 1 zone

0 zone

( )

z i z i^w( )

γ

(a)

Δ

0 2Δ 3Δ 4Δ

0 zone

1 zone 1 zone

0 zone

( ) z i

w( ) z i

γ

(b)

Fig. 4. The difference between the quantization method [12] and our quantization method formod(q(i) + ˜w(i),2) = 1: (a) the quantization process of PHFMs magnitude by quantization method [12]. (b) the quantization process of PHFMs magnitude by improved quantization method.

of Section VI-E. To ensure that the watermarked PHFMs are conjugate, we modify the magnitudes (m < 0) of the corresponding position to obtain the watermarked magnitudes

|Pˆ_nm^w |. The final watermarked PHFMs can be obtained by the following relation.

Pˆ_nm^w =|Pˆ_nm^w |

|Pˆ_nm|×Pˆnm (26) Step 4. Obtaining the watermarked image.Eq. (6) states that the image can be constructed by a finite number of PHFMs. Similarly, we define the reconstructed image as Iˆ and the reconstructed image with the watermark as Iˆ^w. are expressed as follows, respectively,

Iˆ=

n_max

X

n=0 m_max

X

m=−mmax

PˆnmHnm(r, θ), (27) and

Iˆ^w=

nmax

X

n=0 mmax

X

m=−mmax

Pˆ_nm^w H_nm(r, θ). (28)

The residual between IˆandIˆ^w, representing the watermark, is added to the host imageI in the spatial domain to generate watermarked image I^w by

I^w=I+ ˆI^w−I.ˆ (29) To correct extraction of the watermark, we suggest setting the quantization step ∆ within the range of [0.07, 1.1]. This range is based on the common storage of image pixels using 8 bits.∆<0.07mostly results in fractional watermark residual (Iˆ^w−I), which can be easily discarded.ˆ ∆ > 1.1 indicates excessive embedding strength, leading to pixel overflow.

B. Watermark Extraction

The proposed watermarking scheme is a blind watermarking scheme, which means that the watermark information is extracted without the host image. Given a central region of watermarked image I^′ ={f^′(x, y),0≤x < M_l,0≤y < M_w}, regardless of whether this image is attacked or not, we can directly extract the watermark information from the I^′. The detailed process of watermark extraction is described below.

Step 1. Calculating accurate PHFMs of the watermarked image using the PPTN method. To resist the length-width ratio change attack, we resize the given watermarked imageI^′ into a square imageI^∗of size(M_l+M_w)/2×(M_l+M_w)/2.

We call the public package to implement image resizing, usually bicubic interpolation as the resizing parameter. Then, the image I^∗ is mapped into the polar coordinate plan, and (nmax+1)(2mmax+1)sizePˆ_nm^∗ of the polar image is obtained by the PPTN method. Similar to the embedding process, we only choose|Pˆ_nm^∗ |withmmax>0as the setS^∗for watermark extraction.

Step 2. Extracting watermark sequence. Z^∗ = {z^∗(i),1≤i≤L_w}are chosen at random from set S^∗ using key K2. Using the same quantization step as the embedder, the watermark bits can be extracted by

˜ w^∗(i) =

1, mod(q^∗(i),2) = 1

0, mod(q^∗(i),2) = 0, (30) and

q^∗(i) =⌊z^∗(i)/∆⌋. (31) where W˜^∗ ={w˜^∗(i),1 ≤ i ≤L_w} represents the extracted watermark sequence.

Step 3. Recovering watermark. Perform XOR operation on chaotic sequence T andW˜^∗ to generate one-dimensional watermark informationW^∗={w^∗(i),1≤i≤L_w}.

VI. EXPERIMENT AND ANALYSIS

In this section, we evaluate the performance of the proposed PPTN-based PHFMs computation method and watermarking scheme, covering experiments on accuracy, imperceptibility, androbustness. we describe the experimental details in Section VI-A. In Section VI-B, we discuss theaccuracyimprovement achieved by PPTN method. Then, we compare the imper- ceptibilityandrobustnessperformance of the proposed image watermarking scheme with other state-of-the-art (SOTA) watermarking schemes, including watermarking methods based on orthogonal moments, non-orthogonal moments, and deep learning in Section VI-C, VI-D, respectively. Following that, we investigate the impact of the PPTN on robustness and the impact of quantization strategies on imperceptibility in ablation studies VI-E. Finally, we discuss the computational cost.

(8)

(a)

(b)

Fig. 5. Comparison of host images and watermarked images: (a) The host images. (b) The watermarked images generated by the proposed scheme.

A. Implementation Details

• Dataset: The 108 test host images consisted of 8 color images selected from the USC-SIPI image database [45]

and the 100 grayscale images selected from the BOSSBase database [46]. Fig. 5 (a) presents part of the host images with a resolution of 256×256.

• Metrics: We use three widely used metrics (i.e., Mean Square Reconstruction Errors (MSRE) in Eq. (32), Peak Signal-to-Noise Ratio (PSNR), Structural SIMilarity (SSIM) [47], and Bit Error Rate (BER)) to evaluate performance.

The MSRE is the most commonly used measure of image reconstruction accuracy, and a smaller MSRE indicates better reconstruction performance. The BER is defined as the ratio between the number of incorrectly extracted bits and the watermark length. The higher the PSNR or SSIM value, the better theimperceptibility, while the lower the BER, the better the robustness.

MSRE(f,fˆ) = PN

x=1

PN

y=1|f(x, y)−fˆ(x, y)|² PN

x=1

PN

y=1f²(x, y) , (32) where f(x, y) represents the host image with a N ×N resolution,fˆ(x, y)represents the reconstructed image.

• Comparative Schemes: The proposed scheme is compared with seven SOTA watermarking schemes, including five schemes based on hand-crafted feature (i.e., three orthogonal moments-based schemes [12], [21], [25] and two non-orthogonal moments-based schemes [36], [38]), and two schemes based ondeep learning[40], [41]. These watermarking schemes are advanced and robust.

• Parameters: The important parameters of the proposed scheme are n_max = m_max = 25, ∆ = 0.3, V = 4, and U =N/2. Forhand-crafted feature-based schemes [12], [25], the parameters nmax, mmax and ∆ are consistent with the parameters of the proposed scheme. Similarly, the parameters nmax=mmax=25 in [21] are the same as in our scheme. For [36] and [38], the key parameters PSNR threshold and size of the spread-spectrum matrix are set to 43.95 and 4, respectively.

For the deep learning-based schemes [40], [41], their open source codes is used. For the ReDMark method [40], we test on the Multi-Attack-Trained-Network’s model weights provided by the authors. The MBRS model [41] is trained on 10,000 images and evaluated on a 5000 images test set.

These images are from COCO dataset [48]. The batch size and epoch are 4 and 100, respectively. For fair evaluation, the binary random watermark is the same for all schemes, and the lengthLwis set to 64 bits. To evaluate the robustness between the proposed scheme and the other schemes fairly, we conduct comparative experiments at a fixed embedding capacity (64 bits) and PNSR, as mentioned in Table I.

• Platforms: All experiments were performed on a PC with 3.8 GHz Intel Core i7 CPU and NVIDIA RTX 2080Ti and 32GB RAM. Stirmark 4.0 Benchmark [49], [50] is utilized to simulate JPEG compression, filtering, rotation attacks, affine transform, etc.

B. Discussion on Accuracy

In orthogonal moments, the computational errors can even dominate watermarking scheme performance [5]. Satisfactory accuracy (i.e., low errors) is the cornerstone of building a watermarking scheme, but accuracy has been neglected in the past time. The image reconstruction performance of orthogonal moments is the best evidence of its accuracy [51]. We believe that better reconstruction performance means stronger robustness in watermarking schemes. In subsequent experiments, we will again emphasize the effect of accuracy on the robustness and embedding capacity of the watermarking scheme.

In this subsection, the reconstructed images and errors are used to evaluate the accuracy. The 108 test images of size 64 × 64 from Database are used to verify the superiority of the PPTN-based PHFMs computation method. For these experiments, our maximum order is set tonmax= 1,2, ...,50, and the maximum repetition is set to mmax = nmax. Fig. 6 reveals that the reconstructed Lena using the ZOA method continues to deteriorate in the edge region as the nmax

increases. In contrast, the proposed PPTN method consistently improves the reconstruction quality of the image.

We use MSRE to quantitatively evaluate the errors. By sub- stituting Eqs. (10) and (20) into Eq. (6), the errors generated by the proposed PPTN and ZOA methods [16] are defined as Errort and Errori, respectively. We define dif f as the error function to compare the negative effects of geometric and numerical integration errors Error_t with interpolation error Error_i. If dif f > 0, Error_t has a greater negative impact thanError_i.

dif f =Error_t−Error_i (33)

(9)

TABLE I

COMPARISONS OF IMPERCEPTIBILITY AMONG DIFFERENT WATERMARKING SCHEMES. THE BEST AND SECOND BEST RESULTS ARE HIGHLIGHTED IN BOLD AND UNDERLINED,RESPECTIVELY

Moments Non-moments Deep-learning

Proposed

[21] [25] [12] [36] [38] [40] [41]

PSNR (dB) 43.44 38.00 44.03 43.96 38.78 41.41 39.75 44.20

SSIM 0.9959 0.9891 0.9918 0.9824 0.9700 0.9814 0.9841 0.9921

(a)

(b)

Fig. 6. Comparison of reconstructed images of Lena (nmax = 5,10,20,30,40,50): (a) The ZOA-based PHFMs method, (b) The Proposed PPTN-based PHFMs method.

0 5 10 15 20 25 30 35 40 45 50

Max moment order

10^-2 10⁰

MSRE

Fig. 7. Comparison of average MSRE between ZOA and PPTN computation method.

and

Errort= MSRE(f, Re((Pnm+Eg+En), Hnm(r, θ))), Errori= MSRE(f, Re((Pnm+Ei), Hnm(r, θ))).

(34) Fig. 7 shows the average errors obtained for the 108 sets of experiments, where the red line indicates the error function dif f. When nmax ≤ 12, the dif f ≈ 0, indicating that the performance of the Errort and Errori is almost the same. For 12 < nmax ≤ 29, the dif f starts to increase slightly. When n_max > 19, the dif f starts to increase dramatically, which indicates that the negative impact from the difference error is significantly smaller than that from the geometric and numerical integration errors. Furthermore, The Error_igenerated by using the PPTN computation method are continuously decreasing. In other words, the proposed PPTN method outperforms the most popular ZOA method.

In the ZOA-based PHFMs computation method, the MSRE value starts to increase after nmax > 19. The reason for this is that the geometric and numerical integration errors increase with nmax, and at some critical value (e.g., 19), the total reconstruction error caused by these two types of errors exceeds the gain in reconstruction quality caused by the increase of n_max [52].

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Embedding rate 38

40 42 44 46 48 50 52 54

PNSR

[21]

[25]

[12]

[36]

[38]

[40]

[41]

Proposed

Fig. 8. Comparison of rate-distortion curve for different watermarking schemes.

C. Discussion on Imperceptibility

We use 108 test images to evaluate the average imperceptibility of the proposed scheme. For the sake of effective comparison and convincing experiment results, we give the rate-distortion curves for different watermarking schemes in Fig. 8. The horizontal coordinate indicates the watermark embedding rate (i.e., L_w/64), and the vertical coordinate indicates the distortion (i.e., PSNR). The PNSR of these schemes [12], [21], [25] decreases with increasing embedding rate. In [36], the PSNR is constrained by a PSNR threshold. In [38], the PSNR is constant due to the following two conditions:

√Lw ∈ N and the spreading watermark is processed in the spatial domain. The embedding rate is fixed for the trained models [40], [41]. Overall, the performance of the proposed scheme is similar to [12], and the PSNR is consistently the highest among all schemes at different embedding rates. It should be pointed out that the proposed scheme has significant pros over [12] in terms of computational efficiency, storage and robustness. From computational efficiency, the GNI process used in [12] incurs substantial time 3480.9751(s). From storage, our scheme does not store additional data, and data 848(MB) from GNI needs to be saved when images are batch- embedded. More specifically, our scheme takes 1.4426(s) to embed each image. The scheme [12] takes (3480.9751(s) + 1.8669(s)) to embed one image. Assuming the data of GNI is pre-saved and only loaded when used, the embedding time for one image is (7.3837(s) + 1.8669(s)). From robustness, our scheme demonstrates significant superiority over [12], as shown in Tables II-IV.

Fig. 5 (b) presents part of the watermarked images with L_w= 64. The average PSNR and SSIM of 108 watermarked

(10)

TABLE II

AVERAGEBERS(%)UNDER COMMON IMAGE PROCESSING ATTACKS. THE BEST AND SECOND BEST RESULTS ARE HIGHLIGHTED IN BOLD AND UNDERLINED,RESPECTIVELY

Attacks Parameters Moments Non-moments Deep-learning

Proposed

[21] [25] [12] [36] [38] [40] [41]

JPEG

90 0.71 9.14 0.06 0 3.07 5.43 0 0

70 0.72 9.14 0.13 24.35 8.70 13.35 0 0.06

50 0.75 9.14 0.19 49.65 28.15 23.39 0.07 0.06

40 0.78 9.14 0.13 50.56 40.84 28.33 0.38 0.07

30 0.98 9.14 0.45 50.20 48.68 33.17 3.59 0.36

20 2.13 9.19 2.91 48.93 50.00 40.65 21.70 2.55

Noise

Gauss 0.0001 0.71 9.16 0.14 0 2.29 3.30 0 0.09

Gauss 0.001 1.35 9.51 0.62 0 48.50 5.24 4.93 0.39

Salt & Pepper 0.001 0.78 9.14 0.09 0 1.13 2.58 0 0.01

Salt & Pepper 0.005 2.53 9.27 0.94 0 1.71 2.45 0 0.87

Blurring

Gaussian filtering 3 × 3 0.71 9.14 0.03 0 1.68 7.96 0 0

Gaussian filtering 5 × 5 0.71 9.14 0.03 0 1.69 7.96 0 0

Median filtering 3 × 3 4.76 10.59 2.91 13.87 7.36 30.44 9.77 2.26 Median filtering 5 × 5 14.32 15.44 24.06 43.53 47.66 55.95 38.80 21.90

images are 44.2033 dB and 0.9921, respectively. Table I shows the average imperceptibility of the watermarked images generated by the seven different watermarking schemes. It can be clearly shown that our scheme obtained the highest PSNR and the second-highest SSIM for the same experimental setup and process. Empirically, PSNR > 40 dB indicates good perceptual quality [36], which also proves the good imperceptibility of our scheme.

For subsequent fair comparisons, we discuss robustness under the premise of imperceptibility given in Table I.

D. Discussion on Robustness

In this section, we evaluate the robustness of the proposed scheme using different attacks, including common image processing attacks, geometric attacks, and compound attacks.

1) Common Image Processing Attacks:

• JPEG compression: The watermarked images are com- pressed with JPEG compression with quality factors (QF) from 20 to 90.

• Noise attacks: The watermarked images are corrupted by Gauss noise with variance from 0.0001 to 0.01 and Salt &

Pepper noise with density from 0.05 to 0.01, respectively.

• Blurring attacks: Gaussian/median filters with 3×3 and 5

×5 masks are applied to the watermarked image, respectively.

The average BERs for all above comparison methods are given in Table II. As can be seen here, it is challenging to maintain the robustness of the watermarking scheme under common image attacks.

Fig. 9 illustrates part of attacked images. The scheme [36] exhibits additive noise invariance compared to other methods. However, this scheme cannot resist high-intensity JPEG compression, and the robustness is relatively unstable under the operation of median filtering. As for the scheme [41], its model allows low-intensity JPEG compression and additive-noise operations. Similar to [36], this deep learning- based scheme is not resistant to severe JPEG compression

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 9. Illustration of single image attacks: (a) Median filtering 5×5, (b) Rotation 90° (c) Scaling 0.5, (d) LWR 0.75×1.25, (e) Vertical flipping, (f) Affine transform (X-shearing), (g) Corner Cropping 1/8, (h) Patch Cropping

P1 .

and median filtering attacks. Moreover, watermarking schemes [12], [21], [25], [38], [40] exhibit moderate robustness against common image processing attacks. In contrast, the proposed scheme exhibits stronger robustness for JPEG compression and median filtering attacks than other methods. For the noise attacks, our scheme achieves the second best results, and this error is acceptable. Overall, The proposed scheme performs well under common image processing attacks. Specifically, the performance of the proposed scheme is similar to the scheme [41] and better than the rest of the schemes under common image processing attacks.

2) Geometric Attacks:

• Rotation attacks: The watermarked images are rotated by some specific angles, includingθ= 5°, 15°, 30°, 45°, 60°, 90°.

• Scaling attacks: The width and height of watermarked images are scaled proportionally with scaling factorsλ= 0.5, 0.75, 1.25, 2.

• Length-width ratio (LWR) changing attacks: The LWR attack is scaled by two different scaling factors, including 1.0 × 0.75, 1.25 × 1.0, and 0.75 × 1.25. The height of the

(11)

TABLE III

AVERAGEBERS(%)UNDER GEOMETRIC ATTACKS. THE BEST AND SECOND BEST RESULTS ARE HIGHLIGHTED IN BOLD AND UNDERLINED, RESPECTIVELY

Attacks Parameters Moments Non-moments Deep-learning

Proposed

[21] [25] [12] [36] [38] [40] [41]

Rotation

5 0.74 9.79 0.17 48.87 4.93 51.52 47.16 0.09

15 0.78 10.19 0.04 46.27 3.73 49.93 46.64 0.03

30 0.90 9.38 0.04 49.38 9.17 49.42 45.63 0

45 1.55 10.40 0.26 49.97 50.26 47.25 47.25 0.29

60 1.03 9.36 0.06 51.01 9.74 45.69 47.67 0.03

90 0.71 9.14 0.03 48.06 0.61 53.82 45.33 0

Scaling

0.5 4.73 11.63 3.57 N/A 8.55 N/A N/A 1.50

0.75 0.88 9.26 1.30 N/A 2.91 N/A N/A 0.01

1.5 0.71 9.22 1.24 0 1.23 N/A N/A 0.36

2 0.71 9.32 1.26 0 1.45 N/A N/A 0.71

LWR

1.0×0.75 0.95 9.14 1.78 37.47 2.26 N/A N/A 0.04

1.25×1.0 0.74 9.14 1.10 0 0.80 N/A N/A 0.09

0.75×1.25 0.71 9.14 1.29 28.75 1.17 N/A N/A 0

Flipping Horizontal 0.71 9.14 0.03 0 1.11 53.70 46.56 0

Vertical 0.71 9.14 0.03 0 0.56 53.10 48.76 0

Affine X-Shearing 46.31 12.50 7.02 2.05 10.76 N/A N/A 4.82

Y-Shearing 47.35 12.51 6.74 4.92 7.80 N/A N/A 4.79

Cropping

Corner 1/16 0.71 9.14 0.03 0 1.55 2.98 0 0

Corner 1/8 0.71 9.14 0.03 0 1.09 3.08 0 0

Patch P1 11.13 16.58 6.22 0 1.14 2.95 0 6.15

Patch P2 0.71 9.14 0.03 0 1.24 2.95 0 0

Center 1/16 1.72 12.73 1.39 0 1.36 2.89 0 1.30

Center 1/8 8.75 18.98 6.96 0 1.42 2.94 0 6.69

Center 1/4 28.81 31.83 21.86 0 1.94 4.18 0 21.35

watermarked image is scaled by the first scaling factor, and the width is scaled by the second scaling factor.

• Flipping attacks: The principle of flipping attacks is to swap the positions of the top and bottom rows and the left and right columns, which changes the value of the watermarked image position.

• Affine transform: The 2D affine transformation is imple- mented by Strimark. The transform matrices of X-shearing and Y-shearing are [1 0 0; 0.05 1 0], [1 0.05 0; 0 1 0], respectively.

For this attack, the BER can be minimized by perspective correction [2] before watermark extraction.

• Cropping attacks: The cropping attacks usually remove part of the image content but retain the main semantic information. Two different cropping attacks are used. First, the different strengths (1/16, 1/8) of the top-left cropping attack are used to measure robustness. Then, two random positions (P1=(120,170), P2=(200,236)) are patched by cropping a16×16white image. Finally, we use three strengths (1/16, 1/8,1/4) to crop the watermarked images in the center area of 16×16, 32×32, and 64×64 sizes.

The average BERs of the above compared methods under geometric attacks are given in Table III. Thanks to the orthogonality and completeness of the basis functions, moment- based comparison schemes [12], [21], [25] and the proposed scheme are able to be robust to rotation attacks. In [38], the

watermark extraction consists of watermark synchronization and estimation, and thus is also robust to rotation attacks.

Scaling and LWR attacks cause the size of the watermarked image to change. Due to the limitation of blind watermarking, the extractor extracts the watermark without the original image size, so some schemes are not applicable. For these schemes, we set their average BER as N/A (i.e., not applicable). Regard- ing flipping attacks, watermarking schemes based on deep- learning are difficult to maintain robustness. It can be seen that the average BER is affected by the cropping location. For patch cropping, the maximum BER of our scheme is∼6.5% at any position. For center cropping 1/16, our proposed scheme demonstrates stronger robustness, surpassed only by the works of [36] and [41]. For high-intensity center cropping (1/8, 1/4), our scheme exhibits stronger robustness compared to moments-based approaches, but falls short when compared to deep learning-based approaches and non-moments approaches.

The reason is that the PHFMs used in our scheme are global descriptor. In conclusion, the proposed scheme is robust to geometric attacks and somewhat robust to high-intensity center cropping.

3) Compound Attacks:

We select representative single attacks from Tables II and III, which are JPEG compression with QF 40, Gaussian noise with the variance of 0.001, median filtering with 3×3 mask,

(12)

TABLE IV

AVERAGEBERS(%)UNDER COMPOUND ATTACKS. THE BEST AND SECOND BEST RESULTS ARE HIGHLIGHTED IN BOLD AND UNDERLINED, RESPECTIVELY

Attacks Moments Non-moments Deep-learning

Proposed

[21] [25] [12] [36] [38] [40] [41]

JPEG + Noise 2.04 9.58 2.36 50.67 50.00 25.56 16.55 0.90

JPEG+ Blurring 3.17 9.69 3.91 51.10 46.57 40.68 18.52 2.68

Noise + Blurring 4.70 10.59 3.20 14.12 7.80 30.41 9.78 2.27

Rotation + Scaling 1.59 9.58 1.69 N/A 20.18 N/A N/A 0.14

Rotation + LWR 1.17 9.39 1.63 51.10 13.17 N/A N/A 0.06

Rotation + Cropping 0.90 9.38 1.43 49.38 9.64 52.46 45.49 0.00

Scaling + LWR 0.87 9.26 1.30 N/A 6.60 N/A N/A 0.01

Scaling + Blurring 0.88 9.26 1.30 N/A 2.17 N/A N/A 0.01

Scaling + Cropping 0.88 9.26 1.30 N/A 3.98 N/A N/A 0.01

LWR + Blurring 0.71 9.14 1.29 28.75 1.66 N/A N/A 0

LWR + Cropping 0.71 9.14 1.29 28.65 1.69 N/A N/A 0

Blurring + Rotation 0.90 9.38 1.43 49.38 6.99 43.14 46.73 0

Blurring + Cropping 0.71 9.14 1.30 0 1.61 N/A N/A 0

JPEG + Rotation 1.33 9.43 1.66 51.22 49.39 49.83 46.21 0.16

JPEG + Scaling 1.16 9.30 1.49 N/A 41.93 N/A N/A 0.16

JPEG + LWR 0.80 9.14 1.52 51.00 40.19 N/A N/A 0.07

JPEG + Blurring 0.78 9.14 1.49 50.56 41.78 54.62 47.06 0.07

JPEG + Cropping 0.78 9.14 1.49 50.75 40.97 29.21 0.42 0.07

Noise + Rotation 2.34 9.75 2.39 48.78 50.00 46.70 46.95 0.80

Noise + Scaling 1.81 9.64 2.11 N/A 38.08 N/A N/A 0.65

Noise + LWR 1.42 9.53 1.98 35.24 36.05 N/A N/A 0.43

Noise + Blurring 1.35 9.51 1.98 0 48.28 49.64 46.85 0.39

Noise + Cropping 1.35 9.51 1.98 0 49.44 5.64 5.43 0.39

Blurring + Rotation 5.47 10.92 4.11 52.03 25.49 50.81 45.36 2.99

Blurring + Scaling 5.63 10.95 3.67 N/A 9.33 N/A N/A 2.95

Blurring + LWR 4.70 10.55 3.40 51.56 6.67 N/A N/A 2.43

Blurring + Flipping 4.76 10.59 3.24 13.87 7.62 55.50 44.91 2.26 Blurring + Cropping 4.76 10.59 3.24 13.79 7.90 31.84 10.01 2.26

rotation 30°, scaling 0.75, LWR with a scaling factor of 0.75

× 1.25, horizontal flipping, and corner cropping 1/8. We use a combination of two single attacks to generate a compound attack, and the experimental results are presented in Table IV.

For brevity, we omit the attack parameters in Table IV. It is clear that the average BERs of the proposed scheme are very low, which is stronger than the other schemes in terms of robustness against compound attacks. This is not surprising because the proposed scheme has more comprehensive robustness than other schemes under single attacks (common image processing attacks and geometric attacks).

E. Ablation Studies

In this section, we present the ablation experiments on Dataset to explore the influence of different components in our proposed watermarking method. As shown in Table V, by combining the two proposed modules, our watermarking method obtains the best robustness. Specifically, the improved quantization strategy (Eq. (24)) effectively enhances the imperceptibility of watermarked images, and the module PPTN significantly improves the robustness. It is worth noting that

the interpolation algorithm in the modular PPTN causes a slight decrease in PSNR. However, the quantization strategy provides a more significant gain in PSNR to compensate for this slight loss.

In Section IV-A, we mentioned that using bicubic interpolation will result in watermark distortion interactions. To verify this point, three interpolation algorithms are used, as shown in Table VI. For the bicubic and bilinear interpolation algorithms, pixels in a polar coordinate plane are predicted in the4×4 and2×2neighborhood regions, respectively. From Table VI, it can be seen that the larger the prediction region, the lower the robustness, and the nearest neighbor interpolation is the best choice to preserve the orthogonality of PHFMs and thus maintain robustness. Such experimental evidence supports our theoretical expectation of robustness, verifying the usefulness of the nearest neighbor interpolation for the proposed PPTN- based PHFMs method.

F. Computational Cost

This subsection compares the computational cost before and after using the PPTN method. As shown in Table VII,