Codes for Modern Applications
Current Research Topics in the Code and Signal Design Group
P. Vijay Kumar
Indian Institute of Science, Bangalore, India
Nov. 16, 2018
1/28
Codes for Modern Applications: Examples
I
Codes for Streaming Data
I
Codes for the Distributed Storage of ‘Big-Data’
I
Coding for Coded Computation
I
Codes for Private Information Retrieval
I
Polar Codes
I
Codes for Block Chains
Codes for Streaming
Next-Gen Vehicles Telemedicine
Live Streaming Big Data Messaging Systems
3/28
Problem Statement
I
A continuous stream of message packets s[0], s[1], . . . to be encoded and sent over an erasure channel
1.
I
Each coded packet to be decoded with a delay of at most T .
5 1 2 3 4 6 7 8 9….
5 1 2 34 67 8 9
5 12 3 4 6789
5 1 23 4 6 7 89 Network
….
….
….
Streaming Server
1
blue rectangles indicate erased packets
Sliding-Window Channel Model
Within a sliding window of width W , will either have (a) ≤ A random packet erasures, or else
(b) a packet erasure-burst of length ≤ B .
I
Example with A = 2, B = 3, W = 4:
𝒙[𝑡] 𝒙[𝑡 + 1] 𝒙[𝑡 + 5] 𝒙[𝑡 + 6] 𝒙[𝑡 + 8]
…. ….
𝑊 = 4, 𝐵 = 3
𝑊 = 4, 𝑁 = 2
[1] A. Badr, P. Patil, A. Khisti, W. Tan, and J. G. Apostolopoulos, “Layered Constructions for Low-Delay Streaming Codes,”IEEE Trans. Inf. Theory, 2017.
5/28
Coding Approach: Redundancy within a Packet and Diagonal Embedding
𝒔[0], 𝒔[1], … . . , 𝒔[𝑖] Encoder 𝒙[𝑖]
𝑘
(𝑛 − 𝑘) 𝑛 time 𝑖
message packets coded packet
𝑘 𝒔[𝑖]
𝒑[𝑖]
E E E
. . . . . .
[2] E. Martinian and C. W. Sundberg, “Burst erasure correction codes with low decoding delay,”
IEEE Trans. Inf. Theory, 2004.
Rate Bound [1] and an Optimal Code [3]
RateR≤ (W−A) (W+ (B−A)). I A= 2,B= 4,T= 10,W= 11,R= 9/13.
I (c0,c1, . . . ,c9,c12) is an [11,9] MDS code.
Message packets
Parity packets Time →
[3] M. Nikhil Krishnan and P. Vijay Kumar, “Rate-Optimal Streaming Codes for Channels with Burst and Isolated Erasures,”IEEE ISIT, 2018.
7/28
Distributed Storage Setting
Distributed Storage Setting
I
data pertaining to a single file is distributed across storage nodes
I
nodes are inexpensive storage devices
(a) prone to failure, (b) down for maintenance, (c) unavailable, busy serving
other demands..
9/28
Distributed Storage Setting
I
Need for efficient repair of a failed node arises
I
Focus on
(a) repair bandwidth - amount of data download
(b) repair degree - number of helper nodes contacted
(the amount of data stored can be very very large ⇒ “Big Data”)
Just How Big is Big Data ?
I
Pictures from two different Data Centers..
11/28
A Recently Completed Large Data Center
The NSA Data Center in Utah.
I
Estimated to store several between 3 − 12 Exabytes!
GigaByte → TeraByte → PentaByte → ExaByte = One Billion GB!
I
Completed at an estimated cost of $1.5 billion..
I
Another $2 billion for hardware, software, and maintenance
I
65 MW of power, costing about $40 million per year
I
use 1.7 million gallons of water per day
13/28
Facebook’s Code: Not Efficient at Handling Node Repair
1
3 2
4
9 8 7 6 5
10
1
3 2
4
9 8 7 6 5
10
P1
P3 P2
P4 Node 1
Node 3 Node 2
Node 5 Node 4
Node 7 Node 6
Node 9 Node 8
Node 12 Node 10
Node 11
Node 14 Node 13
I
[14, 10] MDS code
I
Has the “any 10 out of 14” property
I
Used in Facebook data centers
D. Borthakur, R. Schmit, R. Vadali, S.
Chen, and P. Kling. ”HDFS RAID.” Tech talk. Yahoo Developer Network, Nov.
2010
Two New Branches of Coding Theory
Regenera'ng(Codes(
Codes(with((Locality(
• Regenera'ng(codes(reduce(repair(bandwidth(
• Codes(with(locality(reduce(repair(degree(
I A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network Coding for Distributed Storage Systems,”IEEE Trans. Inform. Th., Sep. 2010.
I P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the Locality of Codeword Symbols,”IEEE Trans. Inf. Theory, Nov. 2012.
Image:http://www.colorluna.com
15/28
Regenerating Codes - Formal Definition
Parameters: ( (n, k , d ), (α, β), B , F
q)
1 2
k
n
Data Collector
capacity nodes k+1
1 2
n 3
1’
capacity nodes d+1
I
Data to be recovered by connecting to any k of n nodes
I
Nodes to be repaired by connecting to any d nodes, downloading β
symbols from each node; (d β << file size B )
An Example Construction of a Regenerating Code:
The Clay Code
1. M. Vajha, V. Ramkumar, B. Puranik, G. Kini, E. Lobo, B. Sasidharan, P. V. Kumar, A. Barg, M. Ye, S. Narayanamurthy, S. Hussain, and Siddhartha Nandi, “Clay Codes: Moulding MDS Codes to Yield an MSR Code,” presented at the 16th USENIX Conference on File and Storage Technologies (FAST), Feb. 12-15, 2018, Oakland, CA.
17/28
Transforming an MDS Code to Yield a (4, 2) Clay Code
(0,0) (0,1)
(1,0) (1,1)
Parity Data
Start with a single-layer (4,2) MDS code.
→
z=0
x y
z=1 z=2 z=3
Layer four such units.
→
z= (0,0)
z= (1,1) z= (1,0) z= (0,1)
Index each layerzusing two bits.
U*
U
Identify paired sub-chunks.
→
Pairwise Forward Transform (PFT)
C C*
U U*
=
A
A simple, (2x2) linear, trasnformation.
→
C C*
Pairwise transformation of symbols yields the Clay code.
Coded Computation
19/28
Coded MapReduce with r = 2
Q
= 3 functions to be computed using
K= 3 servers,
N= 6 file fragments.
I
Map function is run on each fragment by two servers.
I
Only 3 units of intermediate computations need to be sent.
PIR Codes
21/28
PIR: Replicated Servers
I
An example PIR protocol with 2 replicated servers:
x x
q1=u q2=u+ei
a1 = uTx a2= (u+ei)Tx
Server 1 Server 2
Alice
I a1
+
a2=
eiTx=
xiI
For replicated servers, storage overhead =
τ ≥2
[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, Private information retrieval, Journal of the ACM, 45, 1998
PIR Code
I
The basic idea here is to reduce storage overhead by avoiding replication.
I
The code that is used is one that can simulate the presence of
τreplicated servers.
I
It does this by ensuring that each piece of data can be recreated in
τdifferent ways by connecting to
τdisjoint sets of severs.
I
This calls for a new class of code, termed by Fazeli et al as a ‘PIR Code’ .
[1] A. Fazeli, A. Vardy, and E. Yaakobi, “Codes for distributed PIR with low storage overhead”, ISIT 2015.
23/28
PIR Code
x1 x2
u u+e
i
uTx1 (u+ei)Tx2
Server 1 Server 2
Alice
x3 x4
u+ei
(u+ei)Tx4
Server 3 Server 4
x5
u+ei
(u+ei)Tx5 Server 5
I want x1i
(u+ei)Tx2+(u+ei)Tx3 +(u+ei)Tx4+(u+ei)Tx5 = (u+ei)Tx1
x5=x1 +x2 +x3 +x4
(u+ei)Tx2 u+ei
Storage Overhead = 1.25
x = [ x1 x2 x3 x4 ]T
I
We now have
uTx1and (u +
ei)
Tx1I
Open problems
I
Lower bound on
ngiven k and
τ ≥5.
I
Optimal PIR code constructions for
τ≥5.
[1] A. Fazeli, A. Vardy, and E. Yaakobi, “Codes for distributed PIR with low storage overhead”, ISIT 2015.
[2] M. Vajha, V. Ramkumar and P. V. Kumar “ Binary, Shortened Projective Reed Muller Codes for Coded PIR”, ISIT 2017.
Other Topics
1. Polar codes
2. Codes for improved operation of block chains
25/28
Recent Publications
1. Balaji S B, M. Nikhil Krishnan, Myna Vajha, Vinayak Ramkumar, Birenjith Sasidharan, P.
Vijay Kumar, “Erasure coding for distributed storage: an overview,”Sci China Inf Sci, vol.
61, October 2018, pp. 1-45.
2. Guang Gong, Tor Helleseth, and P. Vijay Kumar. “Solomon W. Golomb - Mathematician, Engineer, and Pioneer, ”IEEE Transactions on Information Theory, vol(64), no.4, pp.
2844-2857, 2018.
3. M. N. Krishnan, B. Puranik, P. V. Kumar, I. Tamo and A. Barg, ”Exploiting Locality for Improved Decoding of Binary Cyclic Codes,” in IEEE Transactions on Communications, vol.
66, no. 6, pp. 2346-2358, June 2018.
4. K. V. Rashmi, N. B. Shah, K. Ramchandran and P. V. Kumar, “Information-Theoretically Secure Erasure Codes for Distributed Storage,” inIEEE Trans. Inform. Th., vol. 64, no. 3, pp. 1621-1646, March 2018.
5. M. Nikhil Krishnan, Anantha Narayanan R., and P. Vijay Kumar, “Codes with Combined Locality and Regeneration Having Optimal Rate,dminand Linear Field Size,” accepted for presentation at theIEEE International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.
6. S. B. Balaji and P. Vijay Kumar, “A Tight Lower Bound on the Sub-Packetization Level of Optimal-Access MSR and MDS Codes, ” accepted for presentation at theIEEE
International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.
Recent Publications
7. M. Vajha, S. B. Balaji, PV Kumar, “Explicit MSR Codes with Optimal Access, Optimal Sub-Packetization and Small Field Size ford=k+ 1,k+ 2,k+ 3,” accepted for presentation at theIEEE International Symp. Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.
8. M. Nikhil Krishnan and P. Vijay Kumar, “Rate-Optimal Streaming Codes for Channels with Burst and Isolated Erasures,” accepted for presentation at theIEEE International Symp.
Inform. Theory (ISIT), June 18-22, Vail, Colorado, 2018.
9. S. B. Balaji, Ganesh Kini and P Vijay Kumar, “A Rate-Optimal Construction of Codes with Sequential Recovery with Low Block Length,” presented at the24th National Conference on Communications (NCC 2018), IIT Hyderabad, February 25-28, 2018.
10. Vinayak Ramkumar, Myna Vajha and P Vijay Kumar, “Determining the Generalized Hamming Weight Hierarchy of the Binary Projective Reed-Muller Code,” presented at the 24th National Conference on Communications (NCC 2018), IIT Hyderabad, February 25-28, 2018.
11. M. Vajha, V. Ramkumar, B. Puranik, G. Kini, E. Lobo, B. Sasidharan, P. V. Kumar, A.
Barg, M. Ye, S. Narayanamurthy, S. Hussain, and Siddhartha Nandi, “Clay Codes:
Moulding MDS Codes to Yield an MSR Code,” presented at the16th USENIX Conference on File and Storage Technologies (FAST), Feb. 12-15, 2018, Oakland, CA.
27/28