Semantic Offloading for DNN Inference - Enabling Deep Neural Network Inferences on Resource-con

In this section, we first envision a representative architecture for a semantic offloading system for efficient DNN inference involving the two endpoints. We then explain how this architecture could support single and multiple DNN inferences.

Key Requirements

We conceptualize what needs to be done for an ideal semantic offloading system for efficient DNN inference between the two endpoints in Figure 23. The essential requirement for such a system is that the DNN inference should satisfy the target performance of an application, such as DNN inference accuracy and latency, as much as possible, even in a dynamically changing network environment. This is why the essential information extractor at the client should control the volume of transmission adaptively to guarantee timely delivery of DNN input by exploiting the trade-off between compression performance and DNN performance satisfying an application’s requirements. Depending on the required DNN inference accuracy and the given bandwidth (e.g., up-to-date bandwidth or estimated bandwidth by [75, 76]), the extractor would know in advance if this extraction would be sufficient in making the offloaded inference work or not. If the requirement is infeasible to achieve, the extractor may drop the request as in admission control. Once extraction is done and delivered to the server, the information decoder of the application transforms the received information into the data format, which the DNN understands and makes inferences. Finally, the inference result is sent back to the client.

The key to making timely DNN inferences is to design an extractor that makes the extracted information size as small as possible given the accuracy level to meet the latency requirement. The smaller the extracted information is, the higher the chances to meet the DNN accuracy and latency requirements.

For example, offloaded AR services with edge cloud or anomaly detection in underdeveloped regions experience difficulties from being restricted by relatively lower network bandwidth. Such problems, however, would be solved by the existence of a powerful information extractor, which should be able to extract essential information only required for DNN inference. GRACE [1] considers essential information from a color perspective and proposes to discard color information of the image since DNNs are

Figure 24: Overview of our DNN-centric joint training method minimizing the DNN inference loss represented byDL(·). D_w_D andA_w_A denote the DNN for inference and the autoencoder. w_Dandw_A are their trainable parameters, andyis truth.

indeed less sensitive to colors. However, there could be more to discard than just the colors for some DNNs. For example, depth estimation needs color information to perform well (§3.7). Thus, to realize the ideal semantic offloading for DNN inference, we pay attention to designing the information extractor to extract only relevant information for a single or multiple DNNs in an application.

DNN-centric Joint Training

For the information extractor to maximally discard unnecessary information from input data, it needs to know the features a DNN or a set of DNNs would need for inference. However, handcrafting the information extractor for each DNN is not feasible since what to keep as essential information entirely depends on which DNN (e.g., classification, semantic segmentation, object detection) to run. Therefore, we train the information extractor in the form of an autoencoder towards the direction that eliminates unnecessary information for DNN inference as much as possible. In particular, we train our autoencoder jointly with the DNNs to minimize the loss function of the DNNs, which is different from using an autoencoder for conventional image reconstruction. For joint training, we connect DNNs with an autoencoder such that the loss from DNNs could flow backward to the autoencoder (Figure 24). Since merely connecting an independently trained autoencoder to a DNN would not perform well even with the well-known refinement process done with few samples, we propose a DNN-centric joint training method. Our joint training method optimizes the neural network parameters from the autoencoder to the DNN instead of optimizing them separately and sequentially, as shown in the following equation:

w^∗_D=argmin_w_DDL(y,DwD(x)), w^∗_A=argmin_w_ADL(y,Dw^∗_D(AwA(x))) +α·H(z),

(4) wherew_D andw_A denote set of trainable parameters of DNND(·) and autoencoderA(·)respectively, DL(·)denotes a loss function of DNN,xis input, andyis truth.

To overcome sub-optimality from separate and sequential training [77], our DNN-centric joint training trains the autoencoder and DNN application jointly with the DNN loss function at a given fixed latent

Autoencoder Input

Output

minimize

minimize Set of multiple DNNs

minimize

DNN₁

DNN_n

• • •

Figure 25: Overview of DNN-centric joint training method extended for multiple DNNs. ML means a loss function for multi-task learning.

representation size,c, as in Figure 24, which can be expressed by the following equation:

w^∗_A,w^∗_D=argmin_w_A_,w_DDL(y,Dw_D(Aw_A(x))), such that S(z) =c,

(5) whereD_w_D(Aw_A(x))is a predicted result ˆy,S(z)denotes the size ofz, andcis the target compressed size.

Our joint training method optimizes the autoencoder and DNN simultaneously to maximize DNN performance given the target compressed size. Furthermore, this joint training has another benefit compared to the conventional autoencoder training. Instead of minimizing the weighted sum of the DNN loss function and the entropy function ofz, it explicitly sets the compressed size of extracted information,S(z)as the constraint and maximizes the usefulness of extracted information (i.e., minimizes the DNN loss). This means that our extractor can run to meet the required transmission size by choosing the appropriate extractor rather than just extracting information in a best-effort manner. Realizing this bandwidth-adaptability for neural network-based encoding in the offloaded DNN inference has not been studied earlier. We believe this adaptability is a vital feature for implementing a practical semantic com- munication system that can guarantee latency in its network transmission. The concept of exploiting the target compressed size is depicted in Figure 24.

Extension for Multiple DNNs

Our DNN-centric joint training explained so far can also be applied to multiple DNNs, as shown in Figure 25. Similar to multi-task learning consisting of shared and task-specific layers, the DNNs are connected to the autoencoder in parallel. This way, we train them all together with the loss function in multi-task learning [78, 79] as the following equation:

ML(Y,Yˆ) =

∑

i=1

βi·DLi(yi,yˆi), (6) whereDL_iis the loss function fori-th DNN,βiis a weight of a loss function,Y is the set of ground truth

Client Inference server

Internal application storage Target app

Application𝒋

Output

: Initialization phase : Runtime phase : Offline phase

Device

Inference Channel info

Extractor

Offline model trainer

Decoder DNN Joint training 𝐷𝑁𝑁1

Inference 𝐷𝑁𝑁2

Application𝒊

Trained model Application 1

Internal application storage Trained model

Target app

Dedicated-Extractor

Training server

Trained model Input

3Essential info Inference results 4

Extractor Decoder

Wireless channel

Figure 26: System architecture of N-epitomizer with three phases: Offline, Initialization, and Runtime phase. Four steps in §3.6 are denoted by numbers.

As depicted in Figure 25, the information extractor in the form of the shared autoencoder is trained with the multi-task loss ML, and each DNN is trained with its DNN loss function. For instance, suppose two DNNs for segmentation and depth estimation trained by the categorical cross-entropy loss function (LossCE) and the mean relative absolute error loss function (LossRel), respectively, expressed by the following equation:

LossCE(y,p) =−

∑

c=1

yclogpc, Loss_Rel(y,y) =ˆ 1

∑

j=1

|y_j−yˆj| y_j ,

(7)

where M is the number of classes,y_cis a binary indicator if classcis the correct classification, p_c is a predicted probability of classc, andLis the number of pixels of an input image.

For training the essential information extractor for two DNNs, the DNN for segmentation is trained to predict the class with higher confidence throughLoss_CE. Likewise, the DNN for depth estimation is trained so that the predicted depth is closer to the true depth byLossRel. At the same time, the extractor is trained with theML, that isβCE·Loss_CE+βRel·Loss_Rel, to maintain the performance of both DNNs at a given target compressed size, resulting in extracting the union of essential information of DNNs for segmentation and depth estimation.

Therefore, the essential information extractor trained by this procedure for multiple DNNs would reduce the transmission volume by sending duplicate essential information only once compared to re- dundantly sending each essential information. If properly trained, our information extractor would send the union of essential information for DNNs, which would be close to the sum of individual essential information for disjoint DNNs, and would be close to the maximum of all those essential information for DNNs performing highly similar operations. We confirm this insight experimentally in §3.7.

Dalam dokumen Enabling Deep Neural Network Inferences on Resource-constraint Devices (Halaman 56-60)