• Tidak ada hasil yang ditemukan

The Data Science Handbook

N/A
N/A
EpIC#3D

Academic year: 2024

Membagikan "The Data Science Handbook"

Copied!
285
0
0

Teks penuh

The Importance of Software Engineering in Data Science 95 Chapter 11: Eithon Cadag, Principal Data Scientist at Ayasdi. Bridging the Chasm: From Bioinformatics to Data Science 102 Chapter 12: George Roumeliotis, Senior Data Scientist at Intuit.

DJ PATIL

You’re not putting in your time because of some mythical ten thousand hours thing (I don’t buy that argument at all, I think it’s false because it assumes linear serial learning rather than parallelized learning that accelerates). It’s hard to figure out how you can get involved in these things, they make it intentionally closed off.

HILARY MASON 18

HILARY MASON 19

HILARY MASON 20

HILARY MASON 21

HILARY MASON 22

HILARY MASON 23

HILARY MASON 24

HILARY MASON 25

HILARY MASON 26

PETE SKOMOROCH

PETE SKOMOROCH 28PETE SKOMOROCH

PETE SKOMOROCH 29

PETE SKOMOROCH 30

PETE SKOMOROCH 31

PETE SKOMOROCH 32

PETE SKOMOROCH 33

PETE SKOMOROCH 34

PETE SKOMOROCH 35

PETE SKOMOROCH 36

PETE SKOMOROCH 37

PETE SKOMOROCH 38

PETE SKOMOROCH 39

MIKE DEWAR

MIKE DEWAR 41

MIKE DEWAR 42

MIKE DEWAR 43

MIKE DEWAR 44

MIKE DEWAR 45

MIKE DEWAR 46

MIKE DEWAR 47

MIKE DEWAR 48

Riley Newman paid his way through college at the University of Washington by being part of the US Coast Guard. After graduating with degrees in economics and international studies, Riley pursued graduate studies in the UK at the University of Cambridge, before he was called back to the US by the Coast Guard.

RILEY NEWMAN 50

RILEY NEWMAN 51

RILEY NEWMAN 52

RILEY NEWMAN 53

RILEY NEWMAN 54

RILEY NEWMAN 55

What was your background, before you began the Open Source Data Science Masters and before your role at Mattermark. Overcoming these challenges, Clare completed her Open Source Data Science Masters and found herself as a data scientist at Mattermark, a venture-backed data startup working with large datasets to help professional investors quantify and discover signals of potentially high-growth companies.

57CLARE CORTHELL

58CLARE CORTHELL

If you have a core set of competencies and understand how to “debug” problems and learn what you need to solve them, you can do damage. You can always talk through a problem with someone else, even if they’re not an expert.

CLARE CORTHELL

You can start to discern the technical capacities that are required of you, mathematical foundations that are necessary, and so forth. There’s something magical about how much more communicative spoken English can be, especially when you can rewind and digest a concept for the second, third, or even fourth time.

60CLARE CORTHELL

The ability to evolve my own career with a self-designed curriculum begins to outline the immense cracks in the foundation of higher education*. It’s important to understand the behavior of the market and institutions with regard to your career.

62CLARE CORTHELL

If you want to get to the next level, wherever your next level may be, it’s possible to pave your own road that leads you there. Your data science Venn diagram has been widely shared and has really helped many people get an initial sense of what data science is. After a stint at IA Ventures as their Data Scientist in Residence, Drew joined Project Florida as Head of Data, where he uses data science to give individuals better insights into their health.

Instead, once you have a result, it’s about figuring out how to explain that result to people who are not necessarily technical or who are either making business decisions or making engineering decisions. It’s something that receives the least amount of thought, but turns out to be one of the most important things once you’re doing this in the wild. Even the people who have had success in data science up to this point have just been naturally good at it, whether they were blogging about it or giving good presentations.

DREW CONWAY

Or we recognize the problem but it’s not obvious how to find the relevant data that goes with it. This is a long-winded answer, but it’s much easier if you’re only thinking about minimizing error. It’s very early days; people still don’t know what we’re talking about when we say data science exactly so there’s a lot of opportunity.

Therefore it’s no surprise to me that the data science community is growing very, very quickly. People are moving here to do this work because in a sense it’s always been here, it’s just now that people. Likewise, if I wanted to go to a meetup in San Jose but I worked in the Mission district, it’s a pain in the ass.

KEVIN NOVAK 77

KEVIN NOVAK 78

KEVIN NOVAK 79

KEVIN NOVAK 80

KEVIN NOVAK 81

KEVIN NOVAK 82

KEVIN NOVAK 83

Basically, we don’t know that much about dark matter, but we can guess at things that it could possibly do. If it decays, the dark matter particle gets a kick, and it goes off in a random direction at a random speed. Galaxies are sitting at the bottom of a gravity well; they’re like bread crumbs in a big bowl of dark matter.

If the dark matter were spontaneously decaying and getting lots of extra energy, it could popcorn out, and totally change the profiles of galaxies in an essential way. We would look through the Hubble Space Telescope at the youngest galaxies in the universe and notice that they were not at all like the galaxies today. So, one of the questions was: Does that mesh well with our ideas of how our universe Chris Moody started off his journey towards data science by peering off into distant galaxies, studying computational astrophysics at UC Santa Cruz as a graduate student.

CHRIS MOODY 85

CHRIS MOODY 86

CHRIS MOODY 87

CHRIS MOODY 88

CHRIS MOODY 89

CHRIS MOODY 90

CHRIS MOODY 91

92CHRIS MOODY

CHRIS MOODY 93

CHRIS MOODY 94

I worked at the Stanford Linear Accelerator and the Nasa Jet Propulsion Lab doing basic research in materials science and systems engineering. I spent two years at startups called Quid and Newsle and then joined Facebook four months ago as a software engineer with a bent towards machine learning and data science. It sounds like you have this background in mathematics and you mentioned that you wanted to do more machine learning.

Could you talk a little bit about how you picked data science versus the other things you could’ve done in industry. You go and interview for quant jobs and you Erich sits at the intersection of data science and engineering, a role derived from his unique experiences across academia, quantitative analysis and software engineering. From Quid, he moved on to Facebook where he currently works as as data-centric software engineer — combining his deep theoretical understanding of mathematics with good software engineering skills.

ERICH OWENS 96

ERICH OWENS 97

ERICH OWENS 98

ERICH OWENS 99

ERICH OWENS 100

ERICH OWENS 101

I double degreed at the University of Washington in Business and Informatics; the latter is a specialized degree that focuses on data architecture and how people interact with data and information. I originally thought of Computer Science as a major, but took a few classes and realized I didn’t want to be coding all at the time. At the time, the Lab’s stated focus was on ubiquitous computing; this means embedding computing into your environment or finding ways of using computing in ways well-integrated to the environment.

PlaceLab, the second project, asked “Can we utilize WiFi devices to triangulate position and provide location-specific information to a user?” Have you ever heard of After dual degrees at the University of Washington followed by a PhD in Biomedical Informatics, Eithon came to data science through a focus on machine learning applied to biology. At the time of this interview, Eithon was a manager and principal data scientist at topological machine learning company Ayasdi, where he led analytical efforts in the healthcare and pharmaceutical space. In this interview, Eithon talks about his personal journey to data science mastery, and the joy of being insatiably curious.

EITHON CADAG 103

EITHON CADAG 104

EITHON CADAG 105

EITHON CADAG 106

EITHON CADAG 107

EITHON CADAG 108

EITHON CADAG 109

EITHON CADAG 110

EITHON CADAG 111

EITHON CADAG 112

EITHON CADAG 113

EITHON CADAG 114

GEORGE ROUMELIOTIS

GEORGE ROUMELIOTIS 116

GEORGE ROUMELIOTIS 117

GEORGE ROUMELIOTIS 118

GEORGE ROUMELIOTIS 119

GEORGE ROUMELIOTIS 120

GEORGE ROUMELIOTIS 121

GEORGE ROUMELIOTIS 122

I did my undergrad in computer science and became interested in biological problems, so I transitioned to bioinformatics and did a PhD in (Computational) Genetics at Stanford. I took these out of interest and because I thought they would be applicable to what I was doing; sequencing and essentially going through terabytes of DNA sequences to make sense of them. I took these courses because I thought they would be relevant to my research, but what I didn’t realize through the entire process was that I was basically doing data science in biology.

After graduating, she began a PhD in genetics at Stanford University, where she also dabbled in courses in computer science and machine learning. Diane’s background in genetics naturally prepared her to working with large volumes of data, leading her to realize that the work she engaged in at Stanford naturally belonged in the realm of data science. After graduating, Diane became part of the Insight Data Science Fellowship program where, as a Fellowship project, she built recipe searching site that used clustering to organize recipes by ingredients.

DIANE WU 124

DIANE WU 125

DIANE WU 126

DIANE WU 127

DIANE WU 128

DIANE WU 129

JACE KOHLMEIER

JACE KOHLMEIER 131

JACE KOHLMEIER 132

JACE KOHLMEIER 133

JACE KOHLMEIER 134

JACE KOHLMEIER 135

JACE KOHLMEIER 136

JACE KOHLMEIER 137

JACE KOHLMEIER 138

JACE KOHLMEIER 139

JOE BLITZSTEIN

JOE BLITZSTEIN 141

JOE BLITZSTEIN 142

JOE BLITZSTEIN 143

JOE BLITZSTEIN 144

JOE BLITZSTEIN 145

JOE BLITZSTEIN 146

JOE BLITZSTEIN 147

JOE BLITZSTEIN 148

JOE BLITZSTEIN 149

JOE BLITZSTEIN 150

I felt like there were a lot of business analysts and middle managers in the enterprise world who were not familiar with “data science” as a practice and set of techniques. These folks still lived in a world of “business intelligence” or “business analytics” from a decade ago, and I wanted to bring them up to speed on current methods (for example, ensemble AI models built on transactional data, data mining in graphs, forecasting with error bounds). I wanted to get these enterprise readers up to speed, so I needed to find a language and teaching approach that they’d understand.

A lot of data science books that were being introduced at the time required learning both R and techniques at the same time. Instead, I wanted to write a book that introduced these concepts step-by-step with a tool the reader knew, and then, once they got it, slowly push them into a programming mode. He is also the author of the book “Data Smart,” which presents an overview of machine learning techniques, as explained through spreadsheets.

JOHN FOREMAN 152

JOHN FOREMAN 153

JOHN FOREMAN 154

JOHN FOREMAN 155

JOHN FOREMAN 156

JOHN FOREMAN 157

JOHN FOREMAN 158

JOHN FOREMAN 159

JOHN FOREMAN 160

JOHN FOREMAN 161

JOHN FOREMAN 162

JOHN FOREMAN 163

JOHN FOREMAN 164

JOHN FOREMAN 165

JOHN FOREMAN 166

JOHN FOREMAN 167

JOHN FOREMAN 168

During my junior year, I took exams for AP Political Science and Comparative Government without actually taking the classes. I did well on those exams, so I ended up doing the same for Art History, Economics, and Physics during my senior year. I was just completely enthralled with the beauty of mathematics, the same way a person would appreciate a beautiful painting or work of art.

Fascinated with the beauty of calculus at an early age, Josh Wills majored in pure math at Duke. His first introduction to statistics was in the final year of university, where despite some misgivings of it being not nearly as interesting as hyperbolic partial differential equations, he actually fell in love with the discipline. Josh Wills is currently the Senior Director of Data Science at Cloudera, where, according to him, he “makes data into awesome.”.

JOSH WILLS 170

JOSH WILLS 171

JOSH WILLS 172

JOSH WILLS 173

JOSH WILLS 174

JOSH WILLS 175

JOSH WILLS 176

JOSH WILLS 177

JOSH WILLS 178

JOSH WILLS 179

JOSH WILLS 180

BRADLEY VOYTEK

BRADLEY VOYTEK 182

BRADLEY VOYTEK 183

BRADLEY VOYTEK 184

BRADLEY VOYTEK 185

BRADLEY VOYTEK 186

BRADLEY VOYTEK 187

BRADLEY VOYTEK 188

BRADLEY VOYTEK 189BRADLEY VOYTEK

BRADLEY VOYTEK 190

I am the CEO and Data Scientist of ttwick Inc, a data analytics startup with roots on Wall Street. I also obtained a specialization in systems analysis and programming from another Venezuelan institution, but it was the combination of engineering and programming that put me in the right frame of mind and gave me the skills to eventually evolve into a data scientist. Back in 1990, it was not easy to get enough data to analyze, and I used to spend a lot of hours in the computer lab with my newly issued email address and Internet access.

I Luis trained as a civil engineer in Venezuela before arriving in the US for his MBA on a Fulbright in the early 90s. Though he aspired to join the World Bank, Luis found an alternative application of his data skills in the world of finance. He is the founder of and data scientist at ttwick, a search engine for social media content.

LUIS SANCHEZ 192

LUIS SANCHEZ 193

LUIS SANCHEZ 194

LUIS SANCHEZ 195

Referensi

Dokumen terkait

The fundamental building block of a successful and mature data science capability is the ability to ask the right types of questions of the data.. This is rooted in the understanding

The fundamental building block of a successful and mature data science capability is the ability to ask the right types of questions of the data.. This is rooted in the understanding

STAFF HANDBOOK Name Mujiati Dwi Kartikasari Position Actuarial science Academic career Institution Year Initial Academic Appointment Universitas Islam Indonesia 2016 Master Degree

We have designed this Handbook to provide information and advice to researchers who are interested in applying for CRDC investment funding with information on our 2018-23 Strategic RD&E

We have designed this Handbook to provide information and advice to researchers who are interested in applying for CRDC investment funding with information on our 2018-23 Strategic RD&E

Rearranging Multi-Indices 137 Data Aggregations on Multi-Indices 140 Combining Datasets: Concat and Append 141 Recall: Concatenation of NumPy Arrays 142 Simple Concatenation with

Introduction to Data Science and Visualization Data with R Programming Data Science and applications Roles in Analytics Team Basic Programming in R Visualizations in R 1 Classification

“In the field of science, 40 per cent of Saudi doctors are women and there is an increasing number of successful women who have acquired global recognition as scientists and