The Importance of Software Engineering in Data Science 95 Chapter 11: Eithon Cadag, Principal Data Scientist at Ayasdi. Bridging the Chasm: From Bioinformatics to Data Science 102 Chapter 12: George Roumeliotis, Senior Data Scientist at Intuit.
DJ PATIL
You’re not putting in your time because of some mythical ten thousand hours thing (I don’t buy that argument at all, I think it’s false because it assumes linear serial learning rather than parallelized learning that accelerates). It’s hard to figure out how you can get involved in these things, they make it intentionally closed off.
HILARY MASON 18
HILARY MASON 19
HILARY MASON 20
HILARY MASON 21
HILARY MASON 22
HILARY MASON 23
HILARY MASON 24
HILARY MASON 25
HILARY MASON 26
PETE SKOMOROCH
PETE SKOMOROCH 28PETE SKOMOROCH
PETE SKOMOROCH 29
PETE SKOMOROCH 30
PETE SKOMOROCH 31
PETE SKOMOROCH 32
PETE SKOMOROCH 33
PETE SKOMOROCH 34
PETE SKOMOROCH 35
PETE SKOMOROCH 36
PETE SKOMOROCH 37
PETE SKOMOROCH 38
PETE SKOMOROCH 39
MIKE DEWAR
MIKE DEWAR 41
MIKE DEWAR 42
MIKE DEWAR 43
MIKE DEWAR 44
MIKE DEWAR 45
MIKE DEWAR 46
MIKE DEWAR 47
MIKE DEWAR 48
Riley Newman paid his way through college at the University of Washington by being part of the US Coast Guard. After graduating with degrees in economics and international studies, Riley pursued graduate studies in the UK at the University of Cambridge, before he was called back to the US by the Coast Guard.
RILEY NEWMAN 50
RILEY NEWMAN 51
RILEY NEWMAN 52
RILEY NEWMAN 53
RILEY NEWMAN 54
RILEY NEWMAN 55
What was your background, before you began the Open Source Data Science Masters and before your role at Mattermark. Overcoming these challenges, Clare completed her Open Source Data Science Masters and found herself as a data scientist at Mattermark, a venture-backed data startup working with large datasets to help professional investors quantify and discover signals of potentially high-growth companies.
57CLARE CORTHELL
58CLARE CORTHELL
If you have a core set of competencies and understand how to “debug” problems and learn what you need to solve them, you can do damage. You can always talk through a problem with someone else, even if they’re not an expert.
CLARE CORTHELL
You can start to discern the technical capacities that are required of you, mathematical foundations that are necessary, and so forth. There’s something magical about how much more communicative spoken English can be, especially when you can rewind and digest a concept for the second, third, or even fourth time.
60CLARE CORTHELL
The ability to evolve my own career with a self-designed curriculum begins to outline the immense cracks in the foundation of higher education*. It’s important to understand the behavior of the market and institutions with regard to your career.
62CLARE CORTHELL
If you want to get to the next level, wherever your next level may be, it’s possible to pave your own road that leads you there. Your data science Venn diagram has been widely shared and has really helped many people get an initial sense of what data science is. After a stint at IA Ventures as their Data Scientist in Residence, Drew joined Project Florida as Head of Data, where he uses data science to give individuals better insights into their health.
Instead, once you have a result, it’s about figuring out how to explain that result to people who are not necessarily technical or who are either making business decisions or making engineering decisions. It’s something that receives the least amount of thought, but turns out to be one of the most important things once you’re doing this in the wild. Even the people who have had success in data science up to this point have just been naturally good at it, whether they were blogging about it or giving good presentations.
DREW CONWAY
Or we recognize the problem but it’s not obvious how to find the relevant data that goes with it. This is a long-winded answer, but it’s much easier if you’re only thinking about minimizing error. It’s very early days; people still don’t know what we’re talking about when we say data science exactly so there’s a lot of opportunity.
Therefore it’s no surprise to me that the data science community is growing very, very quickly. People are moving here to do this work because in a sense it’s always been here, it’s just now that people. Likewise, if I wanted to go to a meetup in San Jose but I worked in the Mission district, it’s a pain in the ass.
KEVIN NOVAK 77
KEVIN NOVAK 78
KEVIN NOVAK 79
KEVIN NOVAK 80
KEVIN NOVAK 81
KEVIN NOVAK 82
KEVIN NOVAK 83
Basically, we don’t know that much about dark matter, but we can guess at things that it could possibly do. If it decays, the dark matter particle gets a kick, and it goes off in a random direction at a random speed. Galaxies are sitting at the bottom of a gravity well; they’re like bread crumbs in a big bowl of dark matter.
If the dark matter were spontaneously decaying and getting lots of extra energy, it could popcorn out, and totally change the profiles of galaxies in an essential way. We would look through the Hubble Space Telescope at the youngest galaxies in the universe and notice that they were not at all like the galaxies today. So, one of the questions was: Does that mesh well with our ideas of how our universe Chris Moody started off his journey towards data science by peering off into distant galaxies, studying computational astrophysics at UC Santa Cruz as a graduate student.
CHRIS MOODY 85
CHRIS MOODY 86
CHRIS MOODY 87
CHRIS MOODY 88
CHRIS MOODY 89
CHRIS MOODY 90
CHRIS MOODY 91
92CHRIS MOODY
CHRIS MOODY 93
CHRIS MOODY 94
I worked at the Stanford Linear Accelerator and the Nasa Jet Propulsion Lab doing basic research in materials science and systems engineering. I spent two years at startups called Quid and Newsle and then joined Facebook four months ago as a software engineer with a bent towards machine learning and data science. It sounds like you have this background in mathematics and you mentioned that you wanted to do more machine learning.
Could you talk a little bit about how you picked data science versus the other things you could’ve done in industry. You go and interview for quant jobs and you Erich sits at the intersection of data science and engineering, a role derived from his unique experiences across academia, quantitative analysis and software engineering. From Quid, he moved on to Facebook where he currently works as as data-centric software engineer — combining his deep theoretical understanding of mathematics with good software engineering skills.
ERICH OWENS 96
ERICH OWENS 97
ERICH OWENS 98
ERICH OWENS 99
ERICH OWENS 100
ERICH OWENS 101
I double degreed at the University of Washington in Business and Informatics; the latter is a specialized degree that focuses on data architecture and how people interact with data and information. I originally thought of Computer Science as a major, but took a few classes and realized I didn’t want to be coding all at the time. At the time, the Lab’s stated focus was on ubiquitous computing; this means embedding computing into your environment or finding ways of using computing in ways well-integrated to the environment.
PlaceLab, the second project, asked “Can we utilize WiFi devices to triangulate position and provide location-specific information to a user?” Have you ever heard of After dual degrees at the University of Washington followed by a PhD in Biomedical Informatics, Eithon came to data science through a focus on machine learning applied to biology. At the time of this interview, Eithon was a manager and principal data scientist at topological machine learning company Ayasdi, where he led analytical efforts in the healthcare and pharmaceutical space. In this interview, Eithon talks about his personal journey to data science mastery, and the joy of being insatiably curious.
EITHON CADAG 103
EITHON CADAG 104
EITHON CADAG 105
EITHON CADAG 106
EITHON CADAG 107
EITHON CADAG 108
EITHON CADAG 109
EITHON CADAG 110
EITHON CADAG 111
EITHON CADAG 112
EITHON CADAG 113
EITHON CADAG 114
GEORGE ROUMELIOTIS
GEORGE ROUMELIOTIS 116
GEORGE ROUMELIOTIS 117
GEORGE ROUMELIOTIS 118
GEORGE ROUMELIOTIS 119
GEORGE ROUMELIOTIS 120
GEORGE ROUMELIOTIS 121
GEORGE ROUMELIOTIS 122
I did my undergrad in computer science and became interested in biological problems, so I transitioned to bioinformatics and did a PhD in (Computational) Genetics at Stanford. I took these out of interest and because I thought they would be applicable to what I was doing; sequencing and essentially going through terabytes of DNA sequences to make sense of them. I took these courses because I thought they would be relevant to my research, but what I didn’t realize through the entire process was that I was basically doing data science in biology.
After graduating, she began a PhD in genetics at Stanford University, where she also dabbled in courses in computer science and machine learning. Diane’s background in genetics naturally prepared her to working with large volumes of data, leading her to realize that the work she engaged in at Stanford naturally belonged in the realm of data science. After graduating, Diane became part of the Insight Data Science Fellowship program where, as a Fellowship project, she built recipe searching site that used clustering to organize recipes by ingredients.
DIANE WU 124
DIANE WU 125
DIANE WU 126
DIANE WU 127
DIANE WU 128
DIANE WU 129
JACE KOHLMEIER
JACE KOHLMEIER 131
JACE KOHLMEIER 132
JACE KOHLMEIER 133
JACE KOHLMEIER 134
JACE KOHLMEIER 135
JACE KOHLMEIER 136
JACE KOHLMEIER 137
JACE KOHLMEIER 138
JACE KOHLMEIER 139
JOE BLITZSTEIN
JOE BLITZSTEIN 141
JOE BLITZSTEIN 142
JOE BLITZSTEIN 143
JOE BLITZSTEIN 144
JOE BLITZSTEIN 145
JOE BLITZSTEIN 146
JOE BLITZSTEIN 147
JOE BLITZSTEIN 148
JOE BLITZSTEIN 149
JOE BLITZSTEIN 150
I felt like there were a lot of business analysts and middle managers in the enterprise world who were not familiar with “data science” as a practice and set of techniques. These folks still lived in a world of “business intelligence” or “business analytics” from a decade ago, and I wanted to bring them up to speed on current methods (for example, ensemble AI models built on transactional data, data mining in graphs, forecasting with error bounds). I wanted to get these enterprise readers up to speed, so I needed to find a language and teaching approach that they’d understand.
A lot of data science books that were being introduced at the time required learning both R and techniques at the same time. Instead, I wanted to write a book that introduced these concepts step-by-step with a tool the reader knew, and then, once they got it, slowly push them into a programming mode. He is also the author of the book “Data Smart,” which presents an overview of machine learning techniques, as explained through spreadsheets.
JOHN FOREMAN 152
JOHN FOREMAN 153
JOHN FOREMAN 154
JOHN FOREMAN 155
JOHN FOREMAN 156
JOHN FOREMAN 157
JOHN FOREMAN 158
JOHN FOREMAN 159
JOHN FOREMAN 160
JOHN FOREMAN 161
JOHN FOREMAN 162
JOHN FOREMAN 163
JOHN FOREMAN 164
JOHN FOREMAN 165
JOHN FOREMAN 166
JOHN FOREMAN 167
JOHN FOREMAN 168
During my junior year, I took exams for AP Political Science and Comparative Government without actually taking the classes. I did well on those exams, so I ended up doing the same for Art History, Economics, and Physics during my senior year. I was just completely enthralled with the beauty of mathematics, the same way a person would appreciate a beautiful painting or work of art.
Fascinated with the beauty of calculus at an early age, Josh Wills majored in pure math at Duke. His first introduction to statistics was in the final year of university, where despite some misgivings of it being not nearly as interesting as hyperbolic partial differential equations, he actually fell in love with the discipline. Josh Wills is currently the Senior Director of Data Science at Cloudera, where, according to him, he “makes data into awesome.”.
JOSH WILLS 170
JOSH WILLS 171
JOSH WILLS 172
JOSH WILLS 173
JOSH WILLS 174
JOSH WILLS 175
JOSH WILLS 176
JOSH WILLS 177
JOSH WILLS 178
JOSH WILLS 179
JOSH WILLS 180
BRADLEY VOYTEK
BRADLEY VOYTEK 182
BRADLEY VOYTEK 183
BRADLEY VOYTEK 184
BRADLEY VOYTEK 185
BRADLEY VOYTEK 186
BRADLEY VOYTEK 187
BRADLEY VOYTEK 188
BRADLEY VOYTEK 189BRADLEY VOYTEK
BRADLEY VOYTEK 190
I am the CEO and Data Scientist of ttwick Inc, a data analytics startup with roots on Wall Street. I also obtained a specialization in systems analysis and programming from another Venezuelan institution, but it was the combination of engineering and programming that put me in the right frame of mind and gave me the skills to eventually evolve into a data scientist. Back in 1990, it was not easy to get enough data to analyze, and I used to spend a lot of hours in the computer lab with my newly issued email address and Internet access.
I Luis trained as a civil engineer in Venezuela before arriving in the US for his MBA on a Fulbright in the early 90s. Though he aspired to join the World Bank, Luis found an alternative application of his data skills in the world of finance. He is the founder of and data scientist at ttwick, a search engine for social media content.
LUIS SANCHEZ 192
LUIS SANCHEZ 193
LUIS SANCHEZ 194
LUIS SANCHEZ 195