Updating Computer
Updating Computer
Science Education
Science Education
Jacques Cohen
Jacques Cohen
Brandeis University
Brandeis University
Waltham, MA
Waltham, MA
USA
USA
Topics
Topics
Preliminary remarks
Preliminary remarks
Present state of affairs and
Present state of affairs and
concerns
concerns
Objectives of this talk
Objectives of this talk
Trends (
Trends (
hardware, software, networks,
hardware, software, networks,
others)
others)
Illustrative examples
Illustrative examples
Present state of affairs and
Present state of affairs and
concerns
concerns
Huge increase in PC and internet usage.
Huge increase in PC and internet usage.
Decreasing enrollment.
Decreasing enrollment.
Possible Reasons
Possible Reasons
Previous high school preparation
Previous high school preparation
Bubble burst (
Bubble burst (
2000) + outsourcing
2000) + outsourcing
Widespread usage of computers
Widespread usage of computers
by lay persons
by lay persons
Interest in interdisciplinary
Interest in interdisciplinary
topics (e.g., biology, business,
topics (e.g., biology, business,
economics)
economics)
Public perception about:
Public perception about:
The Nature of Computer
The Nature of Computer
Science
Science
Two main components:
Two main components:
Theoretical
Theoretical
and
and
Experimental
Experimental
Mathematics
Mathematics
and
and
Engineering
Engineering
What characterizes CS is the notion of
What characterizes CS is the notion of
Algorithms
Algorithms
Emphasis on the
Emphasis on the
discrete
discrete
and
and
logic
logic
An interdisciplinary approach with other
An interdisciplinary approach with other
sciences may well revive the interest on
sciences may well revive the interest on
the continuous (or use of qualitative
the continuous (or use of qualitative
reasoning)
Related fields
Related fields
Sciences in general
Sciences in general
(scientific
(scientific
computing),
computing),
Management,
Management,
Psychology
Psychology
(human interaction),
(human interaction),
Business,
Business,
Communications,
Communications,
Journalism,
Journalism,
The role of Computer
The role of Computer
Science among other
Science among other
sciences
sciences
(
(
How we are perceived by the other sciences
How we are perceived by the other sciences
)
)
In physics, chemistry, biology,
In physics, chemistry, biology,
nature
nature
is the ultimate umpire.
is the ultimate umpire.
Discovery
Discovery
is paramount
is paramount
In math and engineering:
In math and engineering:
aesthetics
aesthetics
,
,
ease of use, acceptance,
ease of use, acceptance,
permanence,
Uneasy dialogue with
Uneasy dialogue with
biologists
biologists
It is not unusual to hear from a
It is not unusual to hear from a
physicist, chemist or biologist:
physicist, chemist or biologist:
“
“
If computer scientists do not get
If computer scientists do not get
involved in our field, we will do it
involved in our field, we will do it
ourselves!!”
ourselves!!”
It looks very likely that the
It looks very likely that the
biological sciences (including, of
biological sciences (including, of
course, neuroscience) will
course, neuroscience) will
dominate the 21st century
Differences in approaches
Differences in approaches
Most scientific and creative discoveries
Most scientific and creative discoveries
proceed in a
proceed in a
bottom-up
bottom-up
manner
manner
Computer scientists are taught to
Computer scientists are taught to
emphasize
emphasize
top-down
top-down
approaches
approaches
Polya’s
Polya’s
“
“
How to solve it”
How to solve it”
often mentions
often mentions
First specialize then generalize
First specialize then generalize
.
.
Objectives
Objectives
Provide a bird’s eye view of what is
Provide a bird’s eye view of what is
happening in CS education
happening in CS education
(USA) and
(USA) and
attempt to make recommendations
attempt to make recommendations
about possible directions. Hopefully,
about possible directions. Hopefully,
some of it would be applicable to
some of it would be applicable to
European universities.
European universities.
Premise
Premise
Changes ought to be gradual and
Changes ought to be gradual and
depend on resources and time
depend on resources and time
First we have to observe current
First we have to observe current
trends
trends
G
G
enerality, Storage, Speed, Networks,
enerality, Storage, Speed, Networks,
others.
o
thers.
Trying to make sense of present
Trying to make sense of present
directions.
directions.
Difficult and risky to foresee future,
Difficult and risky to foresee future,
e.g., PC (windows, mouse), internet,
e.g., PC (windows, mouse), internet,
parallelism
parallelism
Topics influencing computer
Topics influencing computer
science education.
science education.
Trends in hardware, software,
Trends in hardware, software,
networks.
Huge volume of data
Huge volume of data
(terabytes and petabytes
(terabytes and petabytes
)
)
Statistical nature of data
Statistical nature of data
Clustering, classification
Clustering, classification
Probability and Statistics
Probability and Statistics
become increasingly
become increasingly
important
Trend towards generality
Trend towards generality
Need to know more about what is
Need to know more about what is
going on in related topics
going on in related topics
A few examples:
A few examples:
Robotics and mechanical engineering
Robotics and mechanical engineering
Hardware, electrical engineering,
Hardware, electrical engineering,
material science, nanotechnology
material science, nanotechnology
Multi-field visualization
Multi-field visualization
(e.g., medicine)
(e.g., medicine)
Nature of data structures
Nature of data structures
Sequences (strings), streams
Sequences (strings), streams
Trees, DAGs, and Graphs
Trees, DAGs, and Graphs
3D structures
3D structures
Emphasis in discrete structures
Emphasis in discrete structures
Neglect of the continuous
Neglect of the continuous
should be corrected (
should be corrected (
e.g., use of
e.g., use of
MatLab
Trends on data growth
Trends on data growth
How Much Information Is There In the
How Much Information Is There In the
World?
World?
The
The
20-terabyte size
20-terabyte size
of the Library
of the Library
of Congress derived by assuming
of Congress derived by assuming
that LC has 20 million books and
that LC has 20 million books and
each requires 1 MB. Of course, LC
each requires 1 MB. Of course, LC
has much other stuff besides printed
has much other stuff besides printed
text, and this other stuff would take
text, and this other stuff would take
much more space.
much more space.
From Lesk
From Lesk
http://www.lesk.com/mlesk/ksg97/ksg.html
Library of Congress data
Library of Congress data
(cont)
(cont)
1.
1.
Thirteen million photographs
Thirteen million photographs
, even if
, even if
compressed to a 1 MB JPG each, would be
compressed to a 1 MB JPG each, would be
13
13
terabytes.
terabytes.
2. The
2. The
4 million maps
4 million maps
in the Geography Division
in the Geography Division
might scan to
might scan to
200 TB
200 TB
.
.
3. LC has over
3. LC has over
five hundred thousand movies;
five hundred thousand movies;
at
at
1 GB each they would be
1 GB each they would be
500 terabytes
500 terabytes
(most
(most
are not full-length color features).
are not full-length color features).
4. Bulkiest might be the
4. Bulkiest might be the
3.5 million sound
3.5 million sound
recordings
recordings
, which at one audio CD each, would
, which at one audio CD each, would
be almost
be almost
2,000 TB
2,000 TB
.
.
This makes the total size of the Library perhaps
This makes the total size of the Library perhaps
about
How Much Information Is There In the
How Much Information Is There In the
World?
Lesk’s Conclusions
Lesk’s Conclusions
There will be enough disk space
There will be enough disk space
and tape storage in the world to
and tape storage in the world to
store everything people
store everything people
write,
write,
say, perform
say, perform
or
or
photograph
photograph
.
.
For
For
writing
writing
this is true already; for the
this is true already; for the
others it is only a year or two
others it is only a year or two
away.
Lesk’s Conclusions (cont)
Lesk’s Conclusions (cont)
The challenge for librarians and
The challenge for librarians and
computer scientists is to let us
computer scientists is to let us
find
find
the information
the information
we want in other
we want in other
people's work; and the challenge for
people's work; and the challenge for
the lawyers and economists is
the lawyers and economists is
to
to
arrange the payment structures
arrange the payment structures
so
so
that we are encouraged to use the
that we are encouraged to use the
The huge volume of data
The huge volume of data
implies
implies
:
:
Linearity
Linearity
of algorithms is a
of algorithms is a
must
must
Emphasis in
Emphasis in
pattern matching
pattern matching
Increased
Increased
preprocessing
preprocessing
Different levels of memory transfer
Different levels of memory transfer
rates
rates
Algorithmic
Algorithmic
incrementality
incrementality
(avoid redoing (avoid redoingtasks)
tasks)
Need of
Need of
approximate
approximate
algorithms
algorithms
(
(
optimization
optimization
)
)
Distributed computing
Distributed computing
The importance of pattern
The importance of pattern
matching (searches) in large
matching (searches) in large
number of items
number of items
Pattern matching has to be “tolerant” (approximate)
Pattern matching has to be “tolerant” (approximate)
Find closest matches (dynamic programming,
Find closest matches (dynamic programming,
optimization)
optimization)
Sequences
Sequences
Pictures
Pictures
3D structures (e.g. proteins)
3D structures (e.g. proteins)
Sound
Sound
Photos
Photos
Trends in computer cycles
Trends in computer cycles
(speed)
(speed)
Moore’s law appears to be applicable until at
Moore’s law appears to be applicable until at
Use of supercomputers
Use of supercomputers
(2006)
(2006)
Researchers at Los Alamos National
Researchers at Los Alamos National
Laboratory have set a new world's record
Laboratory have set a new world's record
by performing the
by performing the
first million-atom
first million-atom
computer simulation in biology
computer simulation in biology
. Using the
. Using the
"Q Machine" supercomputer, Los Alamos
"Q Machine" supercomputer, Los Alamos
computer scientists have created a
computer scientists have created a
molecular simulation of the cell's
molecular simulation of the cell's
protein-making structure, the
making structure, the
ribosome
ribosome
. The
. The
project, simulating
project, simulating
2.64 million atoms
2.64 million atoms
in motion
in motion
, is more than six times larger
, is more than six times larger
than any biological simulations performed
than any biological simulations performed
to date.
Graphical visualization of the
Graphical visualization of the
simulation of a Ribosome at
simulation of a Ribosome at
work
Network transmission
Network transmission
speed (Lambda Rail Net)
speed (Lambda Rail Net)
Trends in Transmission Speed
Trends in Transmission Speed
The High Energy Physics
The High Energy Physics
team's demonstration
team's demonstration
achieved a peak throughput of
achieved a peak throughput of
151
151
Gbps
Gbps
and an official mark
and an official mark
of
of
131.6
131.6
Gbps
Gbps
beating their
beating their
previous mark for peak
previous mark for peak
throughput of
throughput of
101
101
Gbps
Gbps
by 50
by 50
percent.
Trends in Transmission
Trends in Transmission
Speed II
Speed II
The new record data transfer
The new record data transfer
speed is also equivalent to
speed is also equivalent to
serving 10,000 MPEG2 HDTV
serving 10,000 MPEG2 HDTV
movies simultaneously in real
movies simultaneously in real
time, or
time, or
transmitting all of
transmitting all of
the printed content of the
the printed content of the
Library of Congress in 10
Library of Congress in 10
Trend in Languages
Trend in Languages
Importance of scripting and
Importance of scripting and
string processing
string processing
XML, Java C++, Trend towards
XML, Java C++, Trend towards
Python, Matlab, Mathematica
Python, Matlab, Mathematica
No ideal languages
No ideal languages
No agreement of what the first
No agreement of what the first
A recently proposed
A recently proposed
language (
language (
Fortress 2006
Fortress 2006
)
)
Micro-Fortress Language
Fortress Language
(Sun, Guy Steele)
Meta-level approach to
Meta-level approach to
teaching
teaching
Learn 2 or 3 languages and assume that
Learn 2 or 3 languages and assume that
expertise in other languages can be
expertise in other languages can be
acquired on the fly.
acquired on the fly.
Hopefully, the same will occur in learning a
Hopefully, the same will occur in learning a
topic in depth. Once in-depth research is
topic in depth. Once in-depth research is
taught using a particular area it can be
taught using a particular area it can be
extrapolated to other areas.
extrapolated to other areas.
Increasing usage of
Increasing usage of
canned
canned
programs or
programs or
data banks Typical examples:
data banks Typical examples:
GraphViz,
GraphViz,
WordNet
Trends in Algorithmic
Trends in Algorithmic
Complexity
Complexity
Overcoming the scare of NP
Overcoming the scare of NP
problems
problems
(
(
it happened before with
it happened before with
undecidability
undecidability
)
)
3-SAT lessons
3-SAT lessons
Mapping polynomial problems
Mapping polynomial problems
within NP
within NP
Optimization, approximate or
Optimization, approximate or
random algorithms
Three Examples
Three Examples
Example I
Example I
The lessons of BLAST
The lessons of BLAST
(preprocessing, incrementability,
(preprocessing, incrementability,
approximation
approximation
)
)
Example II
Example II
The importance of analyzing
The importance of analyzing
very large networks.
very large networks.
(probability, sensors, sociological implications)
(probability, sensors, sociological implications)
Example III
Example III
Time Series.
Time Series.
Example I
Example I
(History of BLAST)
(History of BLAST)
sequence alignment
sequence alignment
Biologists matched sequences of
Biologists matched sequences of
nucleotides or aminoacids
nucleotides or aminoacids
empirically using Dot Matrices
Dot matrices
No exact matching
Alignment with Gaps
Dynamic Programming
Dynamic Programming
Approach
Dynamic Programming
Dynamic Programming
complexity O(n
Two solutions with gaps
Two solutions with gaps
Complexity can be exponential
Complexity can be exponential
The BLAST approach
The BLAST approach
complexity is almost
complexity is almost
linear
linear
Equivalent Dot Matrices would have
Equivalent Dot Matrices would have
the size
the size
3 billion columns
3 billion columns
(
(
human genome
human genome
)
)
and
and
Z rows
Z rows
where Z is the size of the
where Z is the size of the
sequence being matched against a
sequence being matched against a
genome (
BLAST Tricks
BLAST Tricks
Preprocessing
Preprocessing
Compile the locations in a genome
Compile the locations in a genome
containing all possible “seeds”
containing all possible “seeds”
(combinations of 6 nucleotides or
(combinations of 6 nucleotides or
aminoacids)
aminoacids)
Hacking
Hacking
Follow diagonals as much as possible
Follow diagonals as much as possible
(Blast strategy)
(Blast strategy)
Use dynamic programming as a last
Use dynamic programming as a last
resort
Lots of approximations but a
Lots of approximations but a
very successful outcome
very successful outcome
No multiple solutions
No multiple solutions
BLAST may not find best matches
BLAST may not find best matches
The notion of
The notion of
p-values
p-values
becomes very
becomes very
important (probability of matches in
important (probability of matches in
random sequences)
random sequences)
Tuning of the BLAST algorithm
Tuning of the BLAST algorithm
parameters
parameters
Mixture of
Mixture of
hacking
hacking
and
and
theory
theory
Example II
Example II
(Networks and Sociology)
Money travels (bills)
Probabilities
Probabilities
P(time,distance)
Money travels
Money travels
The entire process could be
The entire process could be
implemented using sensors.
implemented using sensors.
Mimics spread of disease.
Mimics spread of disease.
The impact of computing will
The impact of computing will
go deeper into the sciences
go deeper into the sciences
and spread more into the
and spread more into the
social sciences (Jon Kleinberg,
social sciences (Jon Kleinberg,
2006)
Example III (Time Series)
Example III (Time Series)
Illustrates data mining and
Illustrates data mining and
how much CS can help other
how much CS can help other
sciences
sciences
Slides from
Slides from
Dr Eamonn Keogh
Dr Eamonn Keogh
University of California.
University of California.
Riverside,CA
Examples of time
Examples of time
series
Time Series (cont 1)
Time Series (cont 2)
Time Series (cont 3)
Time Series (cont 4)
Time Series (cont 5)
Using Logic Programming in
Using Logic Programming in
Multivariate Time Series (Sleep
Multivariate Time Series (Sleep
Apnea)
Apnea)
from
from
G Guimar
G Guimar
ã
ã
es and L. Moniz Pereira
es and L. Moniz Pereira
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0: 00 0: 05 0: 10 0: 14 0: 19 0: 24 0: 28 0: 33 0: 38 0: 43 0: 48 0: 52 0: 58 1: 02 1: 07 1: 12 1: 16 1: 21 1: 26 1: 31 1: 36 1: 40 1: 45 1: 50 1: 55 2: 00 2: 04 2: 09 2: 14 2: 19 2: 24 2: 28 2: 33 2: 38 2: 43 2: 48 2: 53 2: 58 3: 02 3: 07 3: 12 3: 16 3: 21 3: 26 3: 31 3: 36 3: 40 3: 46 3: 50 3: 55 4: 00 Eve nt2 Eve nt3 Eve nt5 Eve nt Ta ce t
No ribca ge a nd a bdomina l move me nts without s noring S trong ribca ge a nd a bdomina l move me nts
Re duce d ribca ge a nd a bdomina l move me nts without s noring Ta ce t
No a irflow without s noring S trong a irflow with s noring Ta ce t
Airflow
Back to curricula
Back to curricula
recommendations
recommendations
Present status (USA)
Present status (USA)
and suggested
and suggested
changes
Current recommended
Current recommended
curricula
curricula
ACM, SIGCSE 2001 (USA)
ACM, SIGCSE 2001 (USA)
1. Discrete Structures (43 core hours)1. Discrete Structures (43 core hours)
2. Programming Fundamentals (54 core hours)2. Programming Fundamentals (54 core hours)
3. Algorithms and Complexity (31 core hours)3. Algorithms and Complexity (31 core hours)
4. Programming Languages (6 core hours)4. Programming Languages (6 core hours)
5. Architecture and Organization (36 core hours)5. Architecture and Organization (36 core hours)
6. Operating Systems (18 core hours)6. Operating Systems (18 core hours)
7. Net-Centric Computing (15 core hours)7. Net-Centric Computing (15 core hours)
8. Human-Computer Interaction (6 core hours) 8. Human-Computer Interaction (6 core hours)
9. Graphics and Visual Computing (5 core hours)9. Graphics and Visual Computing (5 core hours)
10. Intelligent Systems (10 core hours)10. Intelligent Systems (10 core hours)
11. Information Management (10 core hours)11. Information Management (10 core hours)
12. Software Engineering (30 core hours)12. Software Engineering (30 core hours)
13. Social and Professional Issues (16 core hours)13. Social and Professional Issues (16 core hours)
14. Computational Science (no core hours)14. Computational Science (no core hours) From
Changing Curricula
Changing Curricula
Two extremes
Two extremes
Increased Generality
Increased Generality
and
and
Limited Depth
Limited Depth
Limited Generality
Limited Generality
and
and
Increased
Increased
Depth
The two extremes in graphical
The two extremes in graphical
form
form
Breadth
(
generality
)D
The MIT pilot program for
The MIT pilot program for
freshmen
freshmen
At MIT there is a unified EECS
At MIT there is a unified EECS
department
department
Two choices for the first year course:
Two choices for the first year course:
Robotics using probabilistic
Robotics using probabilistic
Bayesian approaches
Bayesian approaches
(CS)
(CS)
Concrete suggestions I
Concrete suggestions I
Teaching is inextricably linked to research
Teaching is inextricably linked to research
.
.
Time
Time
and
and
resources
resources
govern curriculum
govern curriculum
changes.
changes.
Gradual
Gradual
changes are essential.
changes are essential.
Avoid overlap
Avoid overlap
of material among different
of material among different
required courses.
required courses.
If possible introduce an elective course on
If possible introduce an elective course on
Current trends in computer science.
Current trends in computer science.
Concrete suggestions II
Concrete suggestions II
When teaching algorithms stress
When teaching algorithms stress
the potential of:
the potential of:
Preprocessing
Preprocessing
Incrementality
Incrementality
Parallelization
Parallelization
Approximations
Approximations
Concrete suggestions III
Concrete suggestions III
Emphasize probability and
Emphasize probability and
statistics
statistics
Bayesian approaches
Bayesian approaches
Hidden Markov Models
Hidden Markov Models
Random algorithms
Random algorithms
Clustering and classification
Clustering and classification
Machine learning and Data
Machine learning and Data
Finally, …
Finally, …
Encourage
Encourage
interdisciplinary work.
interdisciplinary work.
It will inspire new directions
It will inspire new directions
in computer science.
in computer science.
Thank you!!
Future of Computer Intensive
Future of Computer Intensive
Science in the U.S.
Science in the U.S.
(Daniel Reed 2006)
(Daniel Reed 2006)
Ten years – a geological epoch on the computing time scale. Ten years – a geological epoch on the computing time scale.
Looking back, a decade brought the web and
Looking back, a decade brought the web and consumer email, consumer email, digital cameras and music, broadband networking, multifunction
digital cameras and music, broadband networking, multifunction
cell phones, WiFi, HDTV, telematics, multiplayer games,
cell phones, WiFi, HDTV, telematics, multiplayer games,
electronic commerce and computational science
electronic commerce and computational science. .
It also brought It also brought spam, phishing, identity theft, software insecurity, spam, phishing, identity theft, software insecurity,
outsourcing and globalization, information warfare and blurred
outsourcing and globalization, information warfare and blurred
work-life boundaries
work-life boundaries. What will a decade of technology advances . What will a decade of technology advances bring in communications and collaboration, sensors and
bring in communications and collaboration, sensors and
knowledge management, modeling and discovery, electronic
knowledge management, modeling and discovery, electronic
commerce and digital entertainment, critical infrastructure
commerce and digital entertainment, critical infrastructure
management and security?
management and security?
What will it mean for research and education?
What will it mean for research and education?
Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's
Eminent Professor and Vice-Chancellor for Information Technology at the University of North
Eminent Professor and Vice-Chancellor for Information Technology at the University of North
Carolina at Chapel Hill.
Cyberinfrastructure and Economic
Cyberinfrastructure and Economic
Curvature Creating Curvature in a
Curvature Creating Curvature in a
Flat World
Flat World
(Singtae Kim, Purdue, 2006)
(Singtae Kim, Purdue, 2006)
Cyberinfrastructure is central to Cyberinfrastructure is central to
scientific
scientific
advancement in advancement inthe modern, data-intensive research environment. For
the modern, data-intensive research environment. For
example, the recent revolution in the life sciences, including
example, the recent revolution in the life sciences, including
the seminal achievement of sequencing the human genome
the seminal achievement of sequencing the human genome
on an accelerated time frame, was made possible by parallel
on an accelerated time frame, was made possible by parallel
advances in cyberinfrastructure for research in this
advances in cyberinfrastructure for research in this
data-intensive field.
intensive field.
But beyond the enablement of basic research, But beyond the enablement of basic research,
cyberinfrastructure is a driver for global economic growth
cyberinfrastructure is a driver for global economic growth
despite the disruptive 'flattening' effect of IT in the
despite the disruptive 'flattening' effect of IT in the
developed economies. But even at the regional level,
developed economies. But even at the regional level,
visionary cyber investments to create smart infrastructures
visionary cyber investments to create smart infrastructures
will induce 'economic curvature' a gravitational pull to
will induce 'economic curvature' a gravitational pull to
overcome the dispersive effects of the 'flat' world and the
overcome the dispersive effects of the 'flat' world and the
consequential acceleration in economic growth.
Miscellaneous I
Miscellaneous I
Claytronics
Claytronics
Game theory (economics - psychology)
Game theory (economics - psychology)
Other examples in bioinformatics
Other examples in bioinformatics
Beautiful interaction between sequence
Beautiful interaction between sequence
(strings) and structures
(strings) and structures
Reverse engineering
Reverse engineering
In biology Geography and Phenotype
In biology Geography and Phenotype
(external structural appearance) are of
(external structural appearance) are of
paramount importance
paramount importance
Miscellaneous II
Miscellaneous II
Cross word puzzle using Google
Cross word puzzle using Google