Alvaro Cassinelli*, Makoto Naruse*
,** and Masatoshi Ishikawa*
Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST**
Quad-tree image compression using
reconfigurable free-space optical interconnections
and pipelined parallel processors
LCD/SLM LCD/SLM LCD/SLM LCD/SLM
…
A C :PRESTO = Precursory Research for Embryonic Science and Technology JST= Japan Science and Technology
A C :
III. Conclusion and further work
Plan of the presentation
I. OCULAR architectures for computing
- Reconfigurable Single Stage (OCULAR-I)
- Reconfigurable Multi-stage (OCULAR-II)
II. OCULAR-II demonstration: Quad-tree compression.
- Quad-tree compression algorithm
- Set-up and Demonstration
I. OCULAR architectures for computing
I.1 Reconfigurable Single Stage (OCULAR-I)
2D array of data
Photo Detector Array
Processing
Element Array VCSEL array
Optical
Interconnections
Optical feed-back
I.2 Reconfigurable Multi-stage (OCULAR-II)
O
ptoelectronic
C
omputer
U
sing
L
aser
A
rrays with
R
econfiguration
2D array
of data Output
Photo DetectorProcessing Element Array VCSEL
Optical Interconnections
network-based parallel computers
Optical technology offers enhanced parallel communication primitives
Static
Dynamic
Reconfigurableinterconnection
(X, Y or Z).
…switches inside
processors (local control)
…switches outside processors (local or global/external control possible)
I.1 Single-stage paradigm for parallel computing
P1 P2 Pn Y Z X Fixed interconnection
(X, Y, and Z)
m u x ULA Mem control P1 P2 Pn
…
… … ……
X
Y
Z
…
… … … controller ……of great benefit for
= distributed memory…anyway, static networks can be redesigned as
single-stage dynamic networks…
I.1 Dynamic architecture vs. static
In an n-degree static topology, each processor has n distinct
optoelectronic I/O ports…
Technologically challenging Non reusable architecture Bad scalability
P1
P2
Pn
…processors, switches and interconnections located in
distinct modules
Optimal use of electronic, optoelectronic and optics
Scalability, hardware reusability in other topologies possible introduction of multiple stages…
switches interconnections processors P1 P2 Pn … … … … … … … … … Feed-back loop …
I.1 OCULAR-I system architecture
Switches and
interconnections :
reconfigurable diffractive optics module
dynamic single stage…
Elementary Processor Array VCSEL array Photo-detector array Optical interconnecti on module Optical feed-back P1 P2 Pn
…
… … ……
X
Y
Z
…
… … … ……optical architecture
2D optoelectronic
processing layer
(PD-PE-VCSEL)
+
[ SIMD Processor array ]
Processing Module
Electronic mesh for rapid short range communication between PEs.
Si photo-detectors with
Integrated amplifier / threshold
8x8 PEs (on FPGA)
A B
4-neighbors VCSEL PD
ALU
mapped I/O
local memory (24 bits)
registers
PE
[ Photo-detector array ]
[VCSEL array ]
850 nm VCSELs
Modulation > 1 GHz (possible 10-50 GHz)
Each array attached to a PCB
Folded 4-f system
14 x 25 x 6.2 cm
La se r d io d e FT le n s
Reconfigurable interconnection module
CGH is generated by an
optically addressable SLM, using a laser diode and a
liquid crystal display coupled trough a fiber optical plate.
Space-invariant interconnections – good/bad? Free-space – alignment issues?
Multi-level CGH – good diffraction efficiency
Reconfiguration (“switch”) freq. – 100 Hz…
The module
generates
the
interconnection pattern…
…it is therefore responsible for
interconnection
and switching
X
Y
Z
=
alvaro:
In these optical interconnection module, we require adjustable components to adopt the diffraction position on LD and PD.
We have designed zooming Fourier transform lens as the adjustable component.
The focal length is adjustable from 360mm to 440mm by moving one of lenses as illustrated in the figure. This function is important for matching interconnection parameters such as the pixel pitches of the VCSEL-array, the PD-VCSEL-array, the CGH, and for compensating for wavelength variation of the VCSEL array.
alvaro:
In these optical interconnection module, we require adjustable components to adopt the diffraction position on LD and PD.
We have designed zooming Fourier transform lens as the adjustable component.
Multi-Stages
Single-Stage
S
&
I
m …S
&
I
2 …S
&
I
1 P1 P2 Pn … …I.2 Multi-stage paradigm for parallel computing
architecture can be “spanned” into
The cost of
multiplying the processors is paid back as…
Simplicity & Speed – S & I does not need to be complex (shuffle-exchange networks).
Scalability / Reconfigurability – for different topologies.
Pipelining – possible.
Theoretical background – Multi-stage architectures have been studied for decades in networking applications…
Hypercube Mesh Cube Cycle Shuffle/exchange Delta Benes De Bruijn
[computing]
Tree[computing & networking]
Optical interconne
ction module
…
Optical interconne
ction module
Optical interconne
ction module Elementary Processor Array
VCSEL array Photo-detector array
Two layer module Optoelectronic processing module
II. Quad-tree compression on OCULAR-II
II.3 Discussion
II.1 Quad-tree compression algorithm
II.2 Set-up and Demonstration
Interconnection module (SLM)
VCSELs
Photo Detectors
PE array
PE array Receiver
array
Sender array
Electrical feed-back trough host
II.1 Principle of the quad-tree compression algorithm
This group of pixels is a level 2 leaf of address B
A
B
D
C
…this pixel is a level 0 leaf of address CDA
level 1 leaf of address DB
…this pixel is NOT a leaf
…corresponding tree
B
DB
CDA B
A
C D
level 2
level 1
level 3
level 0 D
A
B
Image…
Image as a tree
= (
2
, B ) + (
1
, DB ) + (
0
, CDA )
II.1 Quad-tree compression on OCULAR-II architecture
- compare on receiver side
- update leaf levels of upper-level PE, if corners
resulted to be lower “false” leafs.
- sequentially broadcast leaf’s values to corresponding upper PE.
•
initialization
array n array n+1 1 3 4 2detect upper leaves
Load 2Nx2N image. ON pixels are
set as lowest level leafs on local PE memories.
•
from stage to stage
•
detect upper leaves
array
n+1
array
n+2
cutting branches
- parallel broadcast signal for resetting false low-level leaves.
- Download data from last array.
- Save data (level, address) from PEs which are still leaves.
•
cutting branches
•
End on last stage:
A C :
Rem
: data from the receiver side to
the sender side is electronically
feed-back trough the host computer…
A C :
Example : interconnection for processing of
level 1
1) Detecting leaves
2) Conditional broadcast
A B
C D
= computing PE on array n+1 = broadcasting PE on array n
A B
C D
…Is A a
level
one
leaf?
A
(zero order)
D (first order)
…If so, A must
update its leaf
level and cut
lower branches.
CCD image of PD plane
II.2 OCULAR-II demonstrator setup
• demonstration is carried out on a
two layer
OCULAR II prototype
Multiple layer processing
is simulated thanks to
electronic feed-back
between first and second
processor arrays.
• Interconnection for each level are time multiplexed on the SLM module.
Level 0
cghLevel 1
Level 2
diffraction pattern
Optical interconnection
module
PE array 2 PE array 1 VCSEL array PD array
…quad-tree algorithm and hypercube network
Image 2
n/2x 2
n/2pixel large
X
Y
W
Z
Quad-tree on OCULAR-II:
pairs
of (
6
-dimensional) hypercube links are
generated
and multiplexed in time
thanks to the SLM-based interconnection module…
…on level 1: X, Z …on level 2: Y, W
2
nelementary processors arranged in
a n-dimensional hypercube topology
Interconnection
module
“sender” array
(SIMD + VCELS)
“receiver”
array
(SIMD + PD)
Monitor
CCD
CGH
monitor
Control and results on
host computer …
Example : holograms required during
level 1
processing.
1) Broadcast hologram (
quadrant comparison
)
2) Re-Broadcast hologram (
cutting branches
)
A B
C D
= computing
PE
= broadcasting PE
A B
C D
Potential leaf on
level one
(zero order)
D
A (first order)
Level 0
. Detecting upper leaves.
D C A B
D C
A B
…symbolic representation of the initial tree, containing 28
level 0
(most of them false) leaves
Level 0
quadrants
level 0
leaves
true
Detail of level 0 broadcasting
= “D” corners with leaf bit ON
= “D” corners with leaf bit OFF.
photo-detector chip surface as seen through the alignment CCD camera
receiver array
sender array
[slide not shown in main presentation]
In this demonstration we used two-level phase CGHs computed by SA.
Only the 1storder of diffraction is
Level 0.
Cutting branches.
D C
A B
D C B A
D C
A B
Level 1
. Detecting upper leaves.
Level 1
Level 1
. Cutting branches.
D C
A B
Level 2.
Detecting leaves and cutting branches.
D C
B
A
D C
A B
…symbolic representation of the encoded image as a
minimal tree with seven leaves.
Level 2
Also, one have to remember than our chips are only 8x8 pixel large.
However, SLM reconfiguration limits operation at maximum hundred hertz....
II.3 Discussion
28 pixels ON = 28 initial leaves.
…only
seven final leaves
Compression of a 2
Nx2
Npixel large image takes
O
(5.N) clock cycles...
SIMD array, VCSEL and photo-detectors can run at more than 100MHz…
two million 1024x1024 images compressed per second
!
8x8 image (N=3)
III. Conclusion and further work
II.1 Summary
Alignment is not difficult, but may become a critical issue in “true” multistage architectures...
I.1 Summary
Electronic feed-back trough host computer generates parasitic signals, and synchronization problems!
We have successfully tested OCULAR-II multistage architecture with
reconfigurable optical interconnections by implementing quad-tree
compression on binary images (=example of embedded hypercube)
Optically addressed SLM-based interconnection module accounts for the strongest bandwidth limitation (hundred hertz)
III.2 Further work: OCULAR-III
Alignment issues (between 2D arrays)
[ Research underway ]
- dynamic alignment using actuators and control theory. - pre-aligned connectors using fiber-bundles.
Design of an integrated (VLSI) optoelectronic layer (with switching…)
Fiber bundle
[ Future research directions ]
- Test of these “modular” architectures for building computing and networking MINs.
- Design of all-optical networks using the above paradigm.
network
interconnection modules
Processor arrays
http://www.k2.t.u-tokyo.ac.jp/index-e.html
Concurrent multistage paradigm using