• Tidak ada hasil yang ditemukan

PDF Codes for Modern Applications

N/A
N/A
Protected

Academic year: 2025

Membagikan "PDF Codes for Modern Applications"

Copied!
28
0
0

Teks penuh

(1)

Codes for Modern Applications

Current Research Topics in the Code and Signal Design Group

P. Vijay Kumar

Indian Institute of Science, Bangalore, India

Nov. 16, 2018

1/28

(2)

Codes for Modern Applications: Examples

I

Codes for Streaming Data

I

Codes for the Distributed Storage of ‘Big-Data’

I

Coding for Coded Computation

I

Codes for Private Information Retrieval

I

Polar Codes

I

Codes for Block Chains

(3)

Codes for Streaming

Next-Gen Vehicles Telemedicine

Live Streaming Big Data Messaging Systems

3/28

(4)

Problem Statement

I

A continuous stream of message packets s[0], s[1], . . . to be encoded and sent over an erasure channel

1

.

I

Each coded packet to be decoded with a delay of at most T .

5 1 2 3 4 6 7 8 9….

5 1 2 34 67 8 9

5 12 3 4 6789

5 1 23 4 6 7 89 Network

….

….

….

Streaming Server

1

blue rectangles indicate erased packets

(5)

Sliding-Window Channel Model

Within a sliding window of width W , will either have (a) ≤ A random packet erasures, or else

(b) a packet erasure-burst of length ≤ B .

I

Example with A = 2, B = 3, W = 4:

𝒙[𝑡] 𝒙[𝑡 + 1] 𝒙[𝑡 + 5] 𝒙[𝑡 + 6] 𝒙[𝑡 + 8]

…. ….

𝑊 = 4, 𝐵 = 3

𝑊 = 4, 𝑁 = 2

[1] A. Badr, P. Patil, A. Khisti, W. Tan, and J. G. Apostolopoulos, “Layered Constructions for Low-Delay Streaming Codes,”IEEE Trans. Inf. Theory, 2017.

5/28

(6)

Coding Approach: Redundancy within a Packet and Diagonal Embedding

𝒔[0], 𝒔[1], … . . , 𝒔[𝑖] Encoder 𝒙[𝑖]

𝑘

(𝑛 − 𝑘) 𝑛 time 𝑖

message packets coded packet

𝑘 𝒔[𝑖]

𝒑[𝑖]

E E E

. . . . . .

[2] E. Martinian and C. W. Sundberg, “Burst erasure correction codes with low decoding delay,”

IEEE Trans. Inf. Theory, 2004.

(7)

Rate Bound [1] and an Optimal Code [3]

RateR≤ (W−A) (W+ (B−A)). I A= 2,B= 4,T= 10,W= 11,R= 9/13.

I (c0,c1, . . . ,c9,c12) is an [11,9] MDS code.

Message packets

Parity packets Time →

[3] M. Nikhil Krishnan and P. Vijay Kumar, “Rate-Optimal Streaming Codes for Channels with Burst and Isolated Erasures,”IEEE ISIT, 2018.

7/28

(8)

Distributed Storage Setting

(9)

Distributed Storage Setting

I

data pertaining to a single file is distributed across storage nodes

I

nodes are inexpensive storage devices

(a) prone to failure, (b) down for maintenance, (c) unavailable, busy serving

other demands..

9/28

(10)

Distributed Storage Setting

I

Need for efficient repair of a failed node arises

I

Focus on

(a) repair bandwidth - amount of data download

(b) repair degree - number of helper nodes contacted

(the amount of data stored can be very very large ⇒ “Big Data”)

(11)

Just How Big is Big Data ?

I

Pictures from two different Data Centers..

11/28

(12)

A Recently Completed Large Data Center

The NSA Data Center in Utah.

I

Estimated to store several between 3 − 12 Exabytes!

GigaByte → TeraByte → PentaByte → ExaByte = One Billion GB!

(13)

I

Completed at an estimated cost of $1.5 billion..

I

Another $2 billion for hardware, software, and maintenance

I

65 MW of power, costing about $40 million per year

I

use 1.7 million gallons of water per day

13/28

(14)

Facebook’s Code: Not Efficient at Handling Node Repair

1

3 2

4

9 8 7 6 5

10

1

3 2

4

9 8 7 6 5

10

P1

P3 P2

P4 Node 1

Node 3 Node 2

Node 5 Node 4

Node 7 Node 6

Node 9 Node 8

Node 12 Node 10

Node 11

Node 14 Node 13

I

[14, 10] MDS code

I

Has the “any 10 out of 14” property

I

Used in Facebook data centers

D. Borthakur, R. Schmit, R. Vadali, S.

Chen, and P. Kling. ”HDFS RAID.” Tech talk. Yahoo Developer Network, Nov.

2010

(15)

Two New Branches of Coding Theory

Regenera'ng(Codes(

Codes(with((Locality(

•  Regenera'ng(codes(reduce(repair(bandwidth(

•  Codes(with(locality(reduce(repair(degree(

I A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network Coding for Distributed Storage Systems,”IEEE Trans. Inform. Th., Sep. 2010.

I P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the Locality of Codeword Symbols,”IEEE Trans. Inf. Theory, Nov. 2012.

Image:http://www.colorluna.com

15/28

(16)

Regenerating Codes - Formal Definition

Parameters: ( (n, k , d ), (α, β), B , F

q

)

1 2

k

n

Data Collector

capacity nodes k+1

1 2

n 3

1’

capacity nodes d+1

I

Data to be recovered by connecting to any k of n nodes

I

Nodes to be repaired by connecting to any d nodes, downloading β

symbols from each node; (d β << file size B )

(17)

An Example Construction of a Regenerating Code:

The Clay Code

1. M. Vajha, V. Ramkumar, B. Puranik, G. Kini, E. Lobo, B. Sasidharan, P. V. Kumar, A. Barg, M. Ye, S. Narayanamurthy, S. Hussain, and Siddhartha Nandi, “Clay Codes: Moulding MDS Codes to Yield an MSR Code,” presented at the 16th USENIX Conference on File and Storage Technologies (FAST), Feb. 12-15, 2018, Oakland, CA.

17/28

(18)

Transforming an MDS Code to Yield a (4, 2) Clay Code

(0,0) (0,1)

(1,0) (1,1)

Parity Data

Start with a single-layer (4,2) MDS code.

z=0

x y

z=1 z=2 z=3

Layer four such units.

z= (0,0)

z= (1,1) z= (1,0) z= (0,1)

Index each layerzusing two bits.

U*

U

Identify paired sub-chunks.

Pairwise Forward Transform (PFT)

C C*

U U*

=

A

A simple, (2x2) linear, trasnformation.

C C*

Pairwise transformation of symbols yields the Clay code.

(19)

Coded Computation

19/28

(20)

Coded MapReduce with r = 2

Q

= 3 functions to be computed using

K

= 3 servers,

N

= 6 file fragments.

I

Map function is run on each fragment by two servers.

I

Only 3 units of intermediate computations need to be sent.

(21)

PIR Codes

21/28

(22)

PIR: Replicated Servers

I

An example PIR protocol with 2 replicated servers:

x x

q1=u q2=u+ei

a1 = uTx a2= (u+ei)Tx

Server 1 Server 2

Alice

I a1

+

a2

=

eiTx

=

xi

I

For replicated servers, storage overhead =

τ ≥

2

[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, Private information retrieval, Journal of the ACM, 45, 1998

(23)

PIR Code

I

The basic idea here is to reduce storage overhead by avoiding replication.

I

The code that is used is one that can simulate the presence of

τ

replicated servers.

I

It does this by ensuring that each piece of data can be recreated in

τ

different ways by connecting to

τ

disjoint sets of severs.

I

This calls for a new class of code, termed by Fazeli et al as a ‘PIR Code’ .

[1] A. Fazeli, A. Vardy, and E. Yaakobi, “Codes for distributed PIR with low storage overhead”, ISIT 2015.

23/28

(24)

PIR Code

x1 x2

u u+e

i

uTx1 (u+ei)Tx2

Server 1 Server 2

Alice

x3 x4

u+ei

(u+ei)Tx4

Server 3 Server 4

x5

u+ei

(u+ei)Tx5 Server 5

I want x1i

(u+ei)Tx2+(u+ei)Tx3 +(u+ei)Tx4+(u+ei)Tx5 = (u+ei)Tx1

x5=x1 +x2 +x3 +x4

(u+ei)Tx2 u+ei

Storage Overhead = 1.25

x = [ x1 x2 x3 x4 ]T

I

We now have

uTx1

and (u +

ei

)

Tx1

I

Open problems

I

Lower bound on

n

given k and

τ ≥

5.

I

Optimal PIR code constructions for

τ≥

5.

[1] A. Fazeli, A. Vardy, and E. Yaakobi, “Codes for distributed PIR with low storage overhead”, ISIT 2015.

[2] M. Vajha, V. Ramkumar and P. V. Kumar “ Binary, Shortened Projective Reed Muller Codes for Coded PIR”, ISIT 2017.

(25)

Other Topics

1. Polar codes

2. Codes for improved operation of block chains

25/28

(26)

Recent Publications

1. Balaji S B, M. Nikhil Krishnan, Myna Vajha, Vinayak Ramkumar, Birenjith Sasidharan, P.

Vijay Kumar, “Erasure coding for distributed storage: an overview,”Sci China Inf Sci, vol.

61, October 2018, pp. 1-45.

2. Guang Gong, Tor Helleseth, and P. Vijay Kumar. “Solomon W. Golomb - Mathematician, Engineer, and Pioneer, ”IEEE Transactions on Information Theory, vol(64), no.4, pp.

2844-2857, 2018.

3. M. N. Krishnan, B. Puranik, P. V. Kumar, I. Tamo and A. Barg, ”Exploiting Locality for Improved Decoding of Binary Cyclic Codes,” in IEEE Transactions on Communications, vol.

66, no. 6, pp. 2346-2358, June 2018.

4. K. V. Rashmi, N. B. Shah, K. Ramchandran and P. V. Kumar, “Information-Theoretically Secure Erasure Codes for Distributed Storage,” inIEEE Trans. Inform. Th., vol. 64, no. 3, pp. 1621-1646, March 2018.

5. M. Nikhil Krishnan, Anantha Narayanan R., and P. Vijay Kumar, “Codes with Combined Locality and Regeneration Having Optimal Rate,dminand Linear Field Size,” accepted for presentation at theIEEE International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.

6. S. B. Balaji and P. Vijay Kumar, “A Tight Lower Bound on the Sub-Packetization Level of Optimal-Access MSR and MDS Codes, ” accepted for presentation at theIEEE

International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.

(27)

Recent Publications

7. M. Vajha, S. B. Balaji, PV Kumar, “Explicit MSR Codes with Optimal Access, Optimal Sub-Packetization and Small Field Size ford=k+ 1,k+ 2,k+ 3,” accepted for presentation at theIEEE International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.

8. M. Nikhil Krishnan and P. Vijay Kumar, “Rate-Optimal Streaming Codes for Channels with Burst and Isolated Erasures,” accepted for presentation at theIEEE International Symp.

Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.

9. S. B. Balaji, Ganesh Kini and P Vijay Kumar, “A Rate-Optimal Construction of Codes with Sequential Recovery with Low Block Length,” presented at the24th National Conference on Communications (NCC 2018), IIT Hyderabad, February 25-28, 2018.

10. Vinayak Ramkumar, Myna Vajha and P Vijay Kumar, “Determining the Generalized Hamming Weight Hierarchy of the Binary Projective Reed-Muller Code,” presented at the 24th National Conference on Communications (NCC 2018), IIT Hyderabad, February 25-28, 2018.

11. M. Vajha, V. Ramkumar, B. Puranik, G. Kini, E. Lobo, B. Sasidharan, P. V. Kumar, A.

Barg, M. Ye, S. Narayanamurthy, S. Hussain, and Siddhartha Nandi, “Clay Codes:

Moulding MDS Codes to Yield an MSR Code,” presented at the16th USENIX Conference on File and Storage Technologies (FAST), Feb. 12-15, 2018, Oakland, CA.

27/28

(28)

Thanks!

Referensi

Dokumen terkait

468 Big Data Characteristics: 3V + V Mapping for Education ...472 SMAC: Technology Strategies in Big Data Management ...473 Incorporating Social Media in the DoE ...473

Hadoop is an open source software stack that runs on a cluster of machines. Hadoop provides distributed storage and distributed processing for very large data sets... Is Hadoop a fad

International Journal of Technology http://ijtech.eng.ui.ac.id Maximal Minimum Hamming Distance Codes for Embedding SI in a Data-based BSLM Scheme for PAPR Reduction in OFDM Adnan

SAS codes for ARIMA, ARIMAX, ARCH and GARCH 2 Code for ARIMAX proc arima data=GROWTH_FORECAST; identify var= neavg_tem ; estimate q=1; run; identify var =YLD_Grth crosscorr=

In this paper, a performance analysis of Fountain codes for erasure channels, which use the Robust Soliton distribution for encoding the data packets and belief propagation to decode

I will propose two guidelines to be included in any future ethical codes for environmental robots: ● Fragility: limit the capabilities of robots to prevent them from having a high

Distributed Data Storage In Chapter 4, we formalize the problem of coding with encoding constraints, where a given bipartite graph see Figure 1.4 describes the feasible relations