Test-Driven Infrastructure with Chef, Second Edition by Stephen Nelson-Smith
Copyright © 2014 Atalanta Systems LTD.. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editors: Mike Loukides and Meghan Blanchette
Production Editor: Melanie Yarbrough
Proofreader: Elise Morrison
Indexer: WordCo Indexing Services
Cover Designer: Randy Comer
Interior Designer: David Futato
Illustrator: Rebecca Demarest
October 2013: Second Edition
Revision History for the Second Edition:
2013-10-10: First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449372200 for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Test-Driven Infrastructure with Chef, the cover image of an edible-nest swiftlet, and related trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
ISBN: 978-1-449-37220-0
Table of Contents
Preface. . . vii
1. The Philosophy of Test-Driven Infrastructure. . . 1
Underpinning Philosophy 2
Infrastructure as Code 2
The Origins of Infrastructure as Code 3
The Principles of Infrastructure as Code 5
The Risks of Infrastructure as Code 7
Professionalism 8
2. An Introduction to Ruby. . . 13
What Is Ruby? 13
Grammar and Vocabulary 15
Methods and Objects 17
Identifiers 19
More About Methods 22
Classes 25
Arrays 27
Conditional logic 30
Hashes 32
Truthiness 34
Operators 35
Bundler 37
3. An Introduction to Chef. . . 45
Exercise 1: Install Chef 47
Objectives 47
Directions 47
Discussion 49
Exercise 2: Install a User 54
Objectives 54
Directions 54
Worked Example 54
Discussion 57
Exercise 3: Install an IRC Client 61
Objectives 61
Directions 61
Worked Example 62
Discussion 66
Exercise 4: Install Git 70
Objectives 70
Directions 70
Worked Example 71
Discussion 74
4. Using Chef with Tools. . . 81
Exercise 1: Ruby 81
Objectives 81
Directions 81
Worked Example 82
Discussion 91
Exercise 2: Virtualbox 106
Objectives 107
Directions 107
Worked example 107
Discussion 110
Exercise 3: Vagrant 113
Objectives 113
Directions 113
Worked Example 114
Discussion 118
Conclusion 122
5. An Introduction to Test- and Behavior-Driven Development. . . 125
The Principles of TDD and BDD 125
A Very Brief History of Agile Software Development 125
Test-Driven Development 126
Behavior-Driven Development 127
TDD and BDD with Ruby 129
RSpec: The Transition to BDD 133
Cucumber: Acceptance Testing for the Masses 138
6. A Test-Driven Infrastructure Framework. . . 155
Test-Driven Infrastructure: A Conceptual Framework 156
Test-Driven Infrastructure Should Be Mainstream 156
Test-Driven Infrastructure Should Be Automated 157
Test-Driven Infrastructure Should Be Side-Effect Aware 158
Test-Driven Infrastructure Should Be Continuously Integrated 158
Test-Driven Infrastructure Should Be Outside In 159
Test-Driven Infrastructure Should Be Test-First 160
The Pillars of Test-Driven Infrastructure 161
Writing Tests 161
Running Tests 162
Provisioning Machines 162
Feedback of Results 163
7. Test-Driven Infrastructure: A Recommended Toolchain. . . 165
Tool Selection 166
Unit Testing 167
Integration Testing 167
Acceptance Testing 168
Testing Workflow 170
Supporting Tools: Berkshelf 173
Overview 173
Getting Started 174
Example 175
Advantages and Disadvantages 185
Summary and Conclusion 186
Supporting Tools: Test Kitchen 186
Overview 186
Getting Started 187
Summary and Conclusion 189
Acceptance Testing: Cucumber and Leibniz 190
Overview 190
Getting Started 192
Example 194
Advantages and Disadvantages 210
Summary and Conclusion 212
Integration Testing: Test Kitchen with Serverspec and Bats 213
Introducing Bats 220
Templates 233
Integration Testing: Minitest Handler 243
Overview 244
Getting Started 245
Example 251
Advantages and Disadvantages 257
Summary and Conclusion 257
Unit Testing: Chefspec 257
Overview 258
Getting Started 259
Example 260
Advantages and Disadvantages 268
Summary and Conclusion 269
Static Analysis and Linting Tools 270
Overview 270
Getting Started 271
Example 274
Advantages and Disadvantages 279
Summary and Conclusion 279
To Conclude 279
8. Epilogue. . . 281
A. Bibliography. . . 283
Preface
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐ mined by context.
This icon signifies a tip, suggestion, or general note.
Safari® Books Online
Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s lead‐ ing authors in technology and business.
Technology professionals, software developers, web designers, and business and crea‐ tive professionals use Safari Books Online as their primary resource for research, prob‐ lem solving, learning, and certification training.
Safari Books Online offers a range of product mixes and pricing programs for organi‐ zations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Pro‐ fessional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technol‐ ogy, and dozens more. For more information about Safari Books Online, please visit us
online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North Sebastopol, CA 95472
800-998-9938 (in the United States or Canada) 707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/test-driven-infra-chef.
To comment or ask technical questions about this book, send email to bookques tions@oreilly.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Acknowledgments
Writing the first edition of this book was an order of magnitude harder than I could ever have imagined. I think this is largely because alongside writing a book I was also writing software. Trying to do both things concurrently took up vast quantities of time, for which many people are owed a debt of gratitude for their patience and support. Writing the second edition, however, made the first one look like a walk in the park. Since the first edition there’s been a huge explosion in philosophies, technologies and enthusiastic participants in the field of TDI, all of which and whom are moving and developing fast. This has not only added massively to the amount there is to say on the subject but it has made it a real challenge to keep the book up to date.
So the gratitude is bigger than before too! Firstly, to my wonderful family, Helena, Corin, Wilfrid, Atalanta and Melisande (all of whom appear in the text)—you’ve been amazing, and I look forward to seeing you all a lot more. Helena, frankly, deserves to be credited as a co-author. She has proofed, edited, improved, and corrected for the best part of two years, and has devoted immeasurable hours to supporting me, both practically and emotionally. There is no way this book could have been written without her input—I cannot express how lucky I am to have her as my friend, colleague, and beloved. The list of Opscoders to thank is also longer, and is testament to the success of both the company and its product. My understanding would be naught were it not for the early support of Joshua Timberman, Seth Chisamore and Dan DeLeo. However, the second edition owes also a debt of thanks to Seth Vargo, Charles Johnson, Nathen Harvey, and Sean O’Meara. Further thanks to Chris Brown, on whose team I worked as an engineer for six months, giving me a deeper insight into the workings of Chef, and the depths of brilliance in the engineering team.
Inspirational friends, critics, reviewers and sounding boards include Aaron Peterson, Bryan Berry, Ian Chilton, Matthias Lafeldt, Torben Knerr and John Arundel. Special mention must go firstly to Lindsay Holmwood, who first got me thinking about the subject, and has continued to offer advice and companionship, and secondly to Fletcher Nichol, who has been a constant friend and advisor, and has endured countless hours of being subjected to pairing with me in Emacs and Tmux, on Solaris! It must also not be forgotten that without the early support of Trang and Brian, formerly of Sony Com‐ puter Entertainment Europe—the earliest adopters and enthusiastic advocates of my whole way of doing Infrastructure as Code—I doubt I would have achieved what I have achieved.
I’ve been fortunate beyond measure to work with a team of intelligent and understand‐ ing people at Atalanta Systems—all of whom have put up with my book-obsessed scattiness for the best part of two years—Kostya, Sergey, Yaroslav, Mike, Herman, and Annie…you’re all awesome!
Lastly, and perhaps most importantly—to my incredibly patient and supportive editor Meghan Blanchette—thank you a million times. I think you’ll agree it was worth the wait.
CHAPTER 1
The Philosophy of Test-Driven
Infrastructure
When the first edition of this book was published in late summer 2011, there was broad skepticism in response to the idea of testing infrastructure code and only a handful of pioneers and practitioners.
Less than a year later at the inaugural #ChefConf, the Chef user conference, two of the plenary sessions and a four-hour hack session were devoted to testing. Later that year at the Chef Developer Summit, where people meet to discuss the state and direction of the Chef open source project, code testing and lifecycle practices and techniques emerged as top themes that featured in many heavily attended sessions—including one with nearly 100 core community members.
Infrastructure testing is a hugely topical subject now, with many excellent contributors furthering the state of the art. The tools and approaches that make up the infrastructure testing ecosystem have evolved significantly. It’s an area with a high rate of change and few established best practices, and it is easy to be overwhelmed at the amount to learn and bewildered at the range of tools available. This book is intended to be the companion for those new to the whole idea of infrastructure as code, as well as those who have been working within that paradigm and are now looking fully to embrace the need to pri‐ oritize testing.
Underpinning Philosophy
There are two fundamental philosophical points upon which this book is predicated:
1. Infrastructure can and should be treated as code.
2. Infrastructure developers should adhere to the same principles of professionalism as other software developers.
While there are a number of implications that follow from these assumptions, the pri‐ mary one with which this book is concerned is that all infrastructure code must be thoroughly tested, and that the most effective way to develop infrastructure code is test-first, allowing the writing of the tests to drive and inform the development of the in‐ frastructure code. However, before we get ahead of ourselves, let us consider our two axiomatic statements.
Infrastructure as Code
“When deploying and administering large infrastructures, it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive admin‐ istration, high cost of ownership, and limited generally available knowledge or code usa‐ ble for administering large infrastructures.”
— Steve Traugott and Joel Huddleston “In today’s computer industry, we still typically install and maintain computers the way the automotive industry built cars in the early 1900s. An individual craftsman manually manipulates a machine into being, and manually maintains it afterwards.
The automotive industry discovered first mass production, then mass customization using standard tooling. The systems administration industry has a long way to go, but is getting there.”
— Steve Traugott and Joel Huddleston
These two statements came from the prophetic www.infrastructures.org at the very start of the last decade. More than 10 years later, a whole world of exciting developments have taken place: developments that have sparked a revolution, and given birth to a radical new approach to the process of designing, building, and maintaining the un‐ derlying IT systems that make web operations possible. At the heart of that revolution is a mentality and toolset that treats infrastructure as code.
The Origins of Infrastructure as Code
Infrastructure as code is an interesting phenomenon, particularly for anyone wanting to understand the evolution of ideas. It emerged over the last six or seven years in response to the juxtaposition of two pieces of disruptive technology—utility computing and second-generation web frameworks.
The ready availability of effectively infinite compute power at the touch of a button, combined with the emergence of a new generation of hugely productive web frame‐ works, brought into existence a new world of scaling problems that had previously only been witnessed by the largest systems integrators. The key year was 2006, which saw the launch of Amazon Web Services’ Elastic Compute Cloud (EC2), just a few months after the release of version 1.0 of Ruby on Rails the previous Christmas. This convergence meant that anyone with an idea for a dynamic website—an idea that delivered func‐ tionality or simply amusement to a rapidly growing Internet community—could go from a scribble on the back of a beermat to a household name within weeks.
Suddenly, very small developer-led companies found themselves facing issues that were previously tackled almost exclusively by large organizations with huge budgets, big teams, enterprise-class configuration management tools, and lots of time. The people responsible for these websites that had become huge almost overnight now had to an‐ swer questions such as how to scale databases, how to add many identical machines of a given type, and how to monitor and back up critical systems. Radically small teams needed to be able to manage infrastructures at scale and to compete in the same space as big enterprises, but with none of the big enterprise systems.
It was out of this environment that a new breed of configuration management tools emerged. Building on the shoulders of existing open source tools like CFEngine, Pup‐ pet was created in part to facilitate tackling these new problems.
Given the significance of 2006 in terms of the disruptive technologies we describe, it’s no coincidence that in early 2006 Luke Kanies published an article on “Next-Generation Configuration Management” in ;login: (the USENIX magazine), describing his Ruby-based system management tool, Puppet. Puppet provided a high level domain specific language (DSL) with primitive programmability, but the development of Chef (a tool influenced by Puppet, and released in January 2009) brought the power of a third-generation programming language to system administration. Such tools equipped tiny teams and developers with the kind of automation and control that until then had only been available to the big players and expensive in-house or proprietary software. Fur‐ thermore, being built on open source tools and released early to developer communities, allowed these tools to rapidly evolve according to demand, and they swiftly became more powerful and less cumbersome than their commercial counterparts.
infrastructure with software best practices. We work with this code using the same tools as we would with any other modern software project. The code that models, builds, and manages the infrastructure is committed into source code management alongside the application code. We can then start to think about our infrastructure as redeployable from a code base, in which we are using the same kinds of software development meth‐ odologies that have developed over the last 20 years as the business of writing and delivering software has matured.
This approach brings with it a series of benefits that help the small, developer-led com‐ pany solve some of the scalability and management problems that accompany rapid and overwhelming commercial success:
Repeatability
Because we’re building systems in a high-level programming language and com‐ mitting our code, we start to become more confident that our systems are ordered and repeatable. With the same input, the same code should produce the same out‐ put. This means we can now be confident (and ensure on a regular basis) that what we believe will recreate our environment really will do that.
Automation
By utilizing mature tools for deploying applications, which are written in modern programming languages, the very act of abstracting out infrastructures brings us the benefits of automation.
Agility
The discipline of source code management and version control means we have the ability to roll forward or backward to a known state. Because we can redeploy entire systems, we are able to drastically reconfigure or change topology with ease, re‐ sponding to defects and business-driven changes. In the event of a problem, we can go to the commit logs and identify what changed and who changed it. This is made all the easier because our infrastructure code is just text, and as such can be exam‐ ined and compared using standard file comparison tools, such as diff.
Scalability
Repeatability and automation make it possible to grow our server fleet easily, es‐ pecially when combined with the kind of rapid hardware provisioning that the cloud provides. Modular code design and reuse manages complexity as our appli‐ cations grow in features, type, and quantity.
Reassurance
understanding of how the system hangs together. That is risky—this person is now able to hold the organization ransom, and should they leave or become ill, the company is endangered.
Disaster recovery
In the event of a catastrophic event that wipes out the production systems, if our entire infrastructure has been broken down into modular components and de‐ scribed as code, recovery is as simple as provisioning new compute power, restoring from backup, and redeploying the infrastructure and application code. What may have been a business-ending event in the old paradigm of custom-built, partially automated infrastructure becomes a manageable outage with procedures we can test in advance.
Infrastructure as code is a powerful concept and approach that promises to help repair the split-brain phenomenon witnessed so frequently in organizations where developers and system administrators view each other as enemies, to the detriment of the common good. Through co-design of the infrastructure code that runs an application, we give operational responsibilities to developers. By focusing on design and the software life‐ cycle, we liberate system administrators to think at higher levels of abstraction. These new aspects of our professions help us succeed in building robust, scaled architectures. We open up a new way of working—a new way of cooperating—that is fundamental to the emerging DevOps movement.
The Principles of Infrastructure as Code
Having explored the origins and rationale for managing infrastructure as code, we now turn to the core principles we should put into practice to make it happen.
Adam Jacob, co-founder of Opscode and creator of Chef, says that there are two high-level steps:
1. Break the infrastructure down into independent, reusable, network-accessible services.
2. Integrate these services in such a way as to produce the functionality our infra‐ structure requires.
Adam further identifies 10 principles that describe what the characteristics of the re‐ usable primitive components look like. His essay—Chapter 5 of Web Operations, ed. John Allspaw & Jesse Robbins (O’Reilly)—is essential reading, but I will summarize his principles here:
Modularity
Cooperation
Our design should discourage overlap of services and should encourage other peo‐ ple and services to use our service in a way that fosters continuous improvement of our design and implementation.
Composability
Our services should be like building blocks—we should be able to build complete, complex systems by integrating them.
Extensibility
Our services should be easy to modify, enhance, and improve in response to new demands.
Flexibility
We should build our services using tools that provide unlimited power to ensure we have the (theoretical) ability to solve even the most complicated problems.
Repeatability
With the same inputs, our services should produce the same results in the same way every time.
Declaration
We should specify our services in terms of what we want to do, not how we want to do it.
Abstraction
We should not worry about the details of the implementation, and think at the level of the component and its function.
Idempotence
Our services should be configured only when required; action should be taken only once.
Convergence
Our services should take responsibility for their own state being in line with policy; over time, the overall system will tend to correctness.
In practice, these principles should apply to every stage of the infrastructure develop‐ ment process—from low-level operations such as provisioning (cloud-based providers with a published API are a good example), backups, and DNS, up through high-level functions such as the process of writing the code that abstracts and implements the services we require.
This book concentrates on the task of writing infrastructure code that meets these prin‐ ciples in a predictable and reliable fashion. The key enabler in this context is a powerful, declarative configuration management system that enables engineers (I like the term
behavior, and characteristics of the infrastructure that they are designing, and when actually executed, results in that infrastructure coming to life.
The Risks of Infrastructure as Code
Although the potential benefits of infrastructure as code are hard to overstate, it must be pointed out that this approach is not without its dangers. Production infrastructures that handle high-traffic websites are hugely complicated. Consider, for example, the mix of technologies involved in a large content management system installation. We might easily have multiple caching strategies, a full-text indexer, a sharded database, and a load-balanced set of web servers. That is a significant number of moving parts for the infrastructure developer to manage and understand.
It should come as no surprise that the attempt to codify complex infrastructures is a challenging task. As I visit clients embracing the approaches outlined in this chapter, I see similar problems emerging as they start to put these ideas into practice:
• Sprawling masses of infrastructure code
• Duplication, contradiction, and a lack of clear understanding of what it all does • Fear of change; a sense that we dare not meddle with the manifests or recipes because
we’re not entirely certain how the system will behave
• Bespoke software that started off well-engineered and thoroughly tested, but is now littered with TODOs, FIXMEs, and quick hacks
• Despite the lofty goal of capturing the expertise required to understand an infra‐ structure in the code itself, a sense that the organization would be in trouble if one or two key people leave
• War stories of times when a seemingly trivial change in one corner of the system had catastrophic side effects elsewhere
There are six areas where we need to focus our attention to ensure that our infrastructure code is developed with the same degree of thoroughness and professionalism as our application code:
Design
Our infrastructure code should seek to be simple and iterative, and we should avoid feature creep.
Collective ownership
All members of the team should be involved in the design and writing of infra‐ structure code and, wherever possible, code should be written in pairs.
Code review
The team should be set up to pair frequently and to see regular notifications when changes are made.
Code standards
Infrastructure code should follow the same community standards as the Ruby world; when standards and patterns have grown up around the configuration man‐ agement framework, the standards and patterns should be adhered to.
Refactoring
This should happen at the point of need as part of the iterative and collaborative process of developing infrastructure code; however, it’s difficult to do this without a safety net in the form of thorough test coverage of one’s code.
Testing
Systems should be in place to ensure that one’s code produces the environment needed and that any changes have not caused side effects that alter other aspects of the infrastructure.
I would argue that good practice in all six of these areas is a natural by-product of bringing development best practices to infrastructure code—in particular by embracing the idea of test-first programming. Good leadership can lead to rapid progress in the first five areas with very little investment in new technology. However, it is indisputable that the final area—that of testing infrastructure automation—is a difficult endeavor. As such, it is the subject of this book: a manifesto for bravely rethinking how we develop infrastructure code.
Professionalism
embarking upon or moving into a career involving infrastructure development absorb the hard lessons learned by the rest of the software industry over the previous few decades, avoid repeating these mistakes, and hold themselves accountable to the same level of professionalism.
Robert C. Martin in, Clean Code: A Handbook of Agile Software Craftsmanship (Prentice Hall), draws upon the Hippocratic oath as a metaphor for the standards of profession‐ alism demanded within the software development industry: Primum non nocere—first do no harm. This is the foundational ethical principal that all medical students learn. The essence is that the cost of action must be considered. It may be wiser to take no action or not to take a specified action in the interests of not harming the patient. The analogy holds as a software developer. Before intervening to add a feature or to fix a bug, be confident that you aren’t making things worse. Robert C. Martin suggests that the kinds of harm a software developer can inflict can be classified as functional and
structural.
By functional harm, we mean the introduction of bugs into the system. A software professional should strive to release bug-free software. This is a difficult goal for devel‐ oper and medical practitioner alike; granted that software (and humans) are highly complicated systems, as professionals we must make it our mantra to “do no harm.” We won’t ever be able to eradicate mistakes, but we can accept responsibility for them, and we can ensure we learn from them and put mechanisms in place to avoid repeating them.
By structural harm we mean introducing inflexibility into our systems, making software harder to change. To put the concept positively, it must be possible to make changes without the cost of change being exorbitantly high.
I like this analogy. I think it can also be taken a little further. Of all medical professionals, the one I would most want to be certain was observing the Hippocratic oath would be a brain surgeon. The cost of error is almost infinitely higher when operating upon the brain than when, for example, operating on a minor organ, or performing orthopedic surgery. I think this applies to the subject of this book, too.
As infrastructure developers, the software we have written builds and runs the entire infrastructure on which our production systems, the applications, and ultimately the business, operate. The cost of a bug, or of introducing structural inflexibility to the underpinning infrastructure on which our business runs, is potentially even greater than that of a bug in the application code itself. An error in the infrastructure could lead to the entire system becoming compromised or could result in an outage rendering all dependent systems unavailable.
The only way we can be confident that our code works is to test it. Thoroughly. Test it under various conditions. Test the happy path, the sad path, and the bad path. The happy path represents the default scenario, in which there are no exceptional or error condi‐ tions. The sad path shows that things fail when they should. The bad path shows the system when fed absolute rubbish. In the case of infrastructure code, we want to verify that changes made for one platform don’t cause unexpected side effects on other plat‐ forms. The more we test, the more confident we are.
When it comes to protecting and guaranteeing the flexibility of our code, there’s one easy way to be confident of code flexibility. Flex it. We want our code to be easy to change. To be confident that it is easy to change, we need to make easy changes. If those easy changes prove to be difficult, we need to change the way the code works. We must be committed to regular refactoring and regular small improvements across the team. This might seem to be at odds with the principle of doing no harm. Surely the more changes we make, the more risk we are taking on. Paradoxically, this isn’t actually the case. It is far, far riskier to leave the code to stagnate with little or no attention.
As infrastructure developers, if we’re afraid to make changes to our code, that’s a big red flag. The biggest reason people are afraid to make changes is that they aren’t confi‐ dent that the code won’t break. That’s because they don’t have a test harness to protect them and catch the breaks. I like to think of refactoring as a little like walking along a curbstone. When you have six inches to fall, you won’t have any fear at all. If you had to walk along a beam, four inches in width, stretching between two thirty story buildings, I bet you’d be scared. You might be so scared that you wouldn’t even set out. The same is so with refactoring. When you have a fully tested code base, making changes is done with confidence and zeal. When you have no tests at all, making changes is avoided or undertaken with fear and dread.
The trouble is, testing takes time. Lots of testing takes lots of time. In the world of infrastructure code, testing takes even more time because sometimes the feedback loops are significantly longer than traditional test scenarios. This makes it imperative that we automate our testing. Testing, especially for complicated, disparate systems, is also dif‐ ficult. Writing good tests for code is hard to do. That makes it imperative for us to write code that is easy to test. The best way to do that is to write the tests first. We’ll discuss this in more depth later, but the essential and applicable takeaway is that consistent, automated, and quality testing of infrastructure code is mandatory for the DevOps professional.
At this stage it’s important to acknowledge and address an obvious objection. As infra‐ structure developers we are asked to make a call with respect to a risk/time ratio. If it delays a release by three weeks, but delivers 100% test coverage, is this the right approach, given our maxim “do no harm”?
right direction to be making the decision consciously. Consider what part of the “brain” we are about to cut in to, what functions it performs for the body corporeal or corporate, as it were, and where we draw our line will become clear.
I’ll summarize by making a bold philosophical statement that underpins the rest of this book:
Testing our infrastructure code, thoroughly and repeatably, is non-negotiable, and is an essential component of the infrastructure developer’s work.
CHAPTER 2
An Introduction to Ruby
Before we go any further, I’m going to spend a little time giving you a quick overview of the basics of the Ruby programming language. If you’re an expert, or even a novice Ruby developer, do feel free to skip this section. However, if you’ve never used Ruby, or rarely programmed at all, this should be a helpful introduction. The objective of this section is to make you feel comfortable looking at infrastructure code. The framework we’re focusing our attention on in this book—Chef—is both written in Ruby, and fun‐ damentally is Ruby. Don’t let that scare you—you really only need to know a few things to get started. I’ll also point you to some good resources to take your learning further. Later in the book, we’ll be doing more Ruby, but I will explain pretty much anything that isn’t explicitly covered in this section. Also, remember we were all once in the beginners’ seat. One of the great things about the Chef community is the extent to which it’s supporting and helpful. If you get stuck, hop onto IRC and ask for help.
What Is Ruby?
Let’s start right at the very beginning. What is Ruby? To quote from the very first Ruby book I ever read, the delightfully eccentric Why The Lucky Stiff’s (poignant) Guide to Ruby:
My conscience won’t let me call Ruby a computer language. That would imply that the language works primarily on the computer’s terms. That the language is designed to accommodate the computer, first and foremost. That therefore, we, the coders, are for‐ eigners, seeking citizenship in the computer’s locale. It’s the computer’s language and we are translators for the world.
We can no longer truthfully call it a computer language. It is coderspeak. It is the language of our thoughts.
— http://bit.ly/1fieouZ
So, Ruby is a very powerful, very friendly language. If you like comparisons, I like to think of Ruby as being a kind of hybrid between LISP, Smalltalk, and Perl. I’ll explain why a bit later. You might already be familiar with a programming language—Perl, or Python, or perhaps C or Java. Maybe even BASIC or Pascal. As an important aside, if you consider yourself to be a system administrator, and don’t know any programming languages, let me reassure you—you already know heaps of languages. Chances are you’ll recognize this:
divert(-1) divert(0)
VERSIONID(`@(#)sendmail.mc 8.7 (Linux) 3/5/96') OSTYPE(`linux')
#
# Include support for the local and smtp mail transport protocols. MAILER(`local')
MAILER(`smtp') #
FEATURE(rbl) FEATURE(access_db) # end
Or possibly this: Listen 80
<VirtualHost *:80>
DocumentRoot /www/example1 ServerName www.example.com
# Other directives here </VirtualHost>
<VirtualHost *:80>
DocumentRoot /www/example2 ServerName www.example.org
# Other directives here </VirtualHost>
What about this?
LOGGER=/usr/bin/logger DUMP=/sbin/dump
# FSL="/dev/aacd0s1a /dev/aacd0s1g" FSL="/usr /var"
NOW=$(date +"%a")
mk_auto_dump(){ local fs=$1 local level=$2 local tape="$TAPE" local opts=""
opts="-${level}uanL -f ${tape}" # run backup
$DUMP ${opts} $fs if [ "$?" != "0" ];then $LOGGER "$DUMP $fs FAILED!"
echo "*** DUMP COMMAND FAILED - $DUMP ${opts} $fs. ***" else
$LOGGER "$DUMP $fs DONE!" fi
}
Or finally, this: CC=g++
CFLAGS=-c -Wall LDFLAGS=
SOURCES=main.cpp hello.cpp factorial.cpp OBJECTS=$(SOURCES:.cpp=.o)
EXECUTABLE=hello
all: $(SOURCES) $(EXECUTABLE)
$(EXECUTABLE): $(OBJECTS)
$(CC) $(LDFLAGS) $(OBJECTS) -o $@
.cpp.o:
$(CC) $(CFLAGS) $< -o $@
If you’re anything like me, you’ll know what all four of these are right away. You might know exactly what they do. They almost certainly don’t scare you; you will recognize some of it, and you’d know where to go to find out more. My aim is to get you to the same point with Ruby. The thing is, a sysadmin knows a ton of languages; they just mostly suck quite badly. Thankfully, Ruby doesn’t suck at all—Ruby is awesome—it’s easy to use and highly capable.
Grammar and Vocabulary
All languages have grammar and vocabulary. Let’s cover the basic vocabulary and grammar of Ruby. One of the best ways to learn a language is to have a play about in a
The idea of a REPL originated in the world of LISP. Its implementation simply required that three functions be created and enclosed in an infinite loop function. Permit me some hand-waving, as this hides much deep complexity, but at the simplest level the three functions are:
read
Accept an expression from the user, parse it, and store it as a data structure in memory.
eval
Ingest the data structure and evaluate it. This translates to calling the function from the initial expression on each of the arguments provided.
Display the result of the evaluation.
We can actually write a Ruby REPL in one line of code: $ ruby -e 'loop { p eval gets }'
1+1 2
puts "Hello" Hello nil
5.times { print 'Simple REPL' }
Simple REPLSimple REPLSimple REPLSimple REPLSimple REPL5
The first thing to note is every expression has a return value, without exception. The result of the expression 1+1 is 2. The result of the expression "puts "Hello"" is not
"hello". The result of the expression is nil, which is Ruby’s way of expressing noth‐ ingness. I’m going to dive in right now, and set your expectations. Unlike languages such as Java or C, nil is not a special value or even a keyword. It’s just the same as everything else. In Ruby terms, it’s an object—more on this in a moment. For now, every expression has a return value, and in a REPL, we will always see this.
The functions in our basic REPL should be familiar—we have a loop, we have an
eval, the p function prints the output of the eva, and gets reads from the keyboard. Obviously this is a ridiculously primitive REPL, and very brittle and unforgiving:
$ ruby -e 'loop { p eval gets }' forgive me!
-e:1:in `eval': undefined method `me!' for main:Object (NoMethodError) from -e:1:in `eval'
from -e:1:in `block in <main>' from -e:1:in `loop'
from -e:1:in `<main>'
Thankfully Ruby ships with a REPL—Interactive Ruby, or irb. The Ruby REPL is launched by typing irb in a command shell. It also takes the handy command switch
$ irb
irb(main):001:0> exit $ irb --simple-prompt >>
Go ahead and try talking to irb: >> hello
NameError: undefined local variable or method `hello' for main:Object from (irb):1
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>'
Methods and Objects
irb isn’t very communicative. I’d like to draw your attention to two words in the pre‐ ceding output—method and Object. This introduces one of the most important things to understand about Ruby. Ruby is a pure object-oriented language. Object-oriented languages encourage the idea of modeling the world by designing programs around classes, such as Strings or Files, together with classes we define ourselves. These classes and class hierarchies reflect important general properties of individual nails, horseshoes, horses, kingdoms, or whatever else comes up naturally in the application we’re designing. We create instances of these classes, which we call objects, and work with them.
In object-oriented programming we think in terms of sending and receiving messages between objects. When these instances receive the messages, they need to know what to do with them. The definition of what to do with a message is called a method. I mentioned that Ruby is like Smalltalk; Smalltalk epitomizes this model. Smalltalk allows the programmer to send a message sqrt to an object 2 (called a receiver in Smalltalk), which is a member of the Integer class. To handle the message, Smalltalk finds the appropriate method to compute the required answer for receivers belonging to the
Integer class. It produces the answer 1.41421, which is an instance of the Float class. Smalltalk is a 100% pure object-oriented language—absolutely everything in Smalltalk is an object, and every object can send and receive messages. Ruby is almost identical.
We can call methods in Ruby using “dot” syntax—e.g., some_object.my_method. In Ruby everything (pretty much everything) is an object. As such everything (literally everything) has methods, even nil. In Java or C, NULL holds a value to which no valid pointer will ever refer. That means that if you want to check if an object is nil, you compare it with NULL. Not so in Ruby! Let’s check in irb:
>> nil.nil? => true
>> nil.class >> nil.class => NilClass
OK, and what about NilClass? >> NilClass.class
=> Class
Go ahead and try a few others—a number or a string (strings are encased in single or double quotes):
>> 37.class => Fixnum
>> "Thirty Seven".class => String
In our case, hello isn’t anything—we haven’t assigned it to anything, and it isn’t a key‐ word in the language, so Ruby doesn’t know what to do. Let’s start, then, with something that Ruby does know about—numbers. Have a go at using Ruby to establish the fol‐ lowing:
1. What is 42 multiplied by 412? 2. How many hours are there in a week?
3. If I have 7 students, and they wrote 17,891 lines of code, how many did they write each, on average?
>> 42 * 412 => 17304 >> 24 * 7 => 168 >> 17891/7 => 2555
Hang on, that last number doesn’t look right! What’s going on here? Let’s look at the classes of our numbers:
>> 17891.class => Fixnum >> 7.class => Fixnum
Ruby only does integer division with Fixnum objects: >> 2/3
=> 0
Thankfully Fixnum objects have methods to convert them to Floats, which means we can do floating point maths:
>> 2.to_f/3
Let’s try some algebra: >> hours_per_day = 24 => 24
>> days_per_week = 7 => 7
>> hours_per_week = hours_per_day * days_per_week => 168
This introduces assignment and variables. Assignment is an operation that binds a local variable (on the left) to an object (on the right). We can see that now
hours_per_day is an instance of class Fixnum: >> hours_per_week.class
=> Fixnum
A variable is a placeholder. And it varies, hence the name: >> puts "Stephen likes " + drink
Stephen likes Rooibos => nil
>> drink = "Beetroot Juice" => "Beetroot Juice"
>> puts "Stephen likes " + drink Stephen likes Beetroot Juice => nil
Identifiers
A variable is an example of a Ruby identifier. Wikipedia describes an identifier as follows:
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique class of objects, where the “object” or class may be an idea, physical [countable] object (or class thereof), or physical [noncountable] substance (or class thereof).
There are four main kinds of identifiers in Ruby:
1. Variables 2. Constants 3. Keywords 4. Method names
Variables
Looking first at variables, there are actually four types of variables that you’ll encounter in Ruby:
3. Class variables 4. Global variables
You’ll mostly interact with the first two. Local variables begin with a lowercase letter, or an underscore. They may contain only letters, underscores, and/or digits:
>> valid_variable = 9 => 9
>> bogus-variable - 8
NameError: undefined local variable or method `bogus' for main:Object from (irb):34
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>' >> number9 = "ok"
=> "ok"
>> 9numbers = "not ok"
SyntaxError: (irb):36: syntax error, unexpected tIDENTIFIER, expecting $end 9numbers = "not ok"
^
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>'
Instance variables store information for an individual instance. They always begin with the “@” sign, and then follow the same rules as local variables.
Class variables are more rarely seen—they store information at the level of the class— i.e., further up the hierarchy than an instance of an object. They begin with “@@”. Global variables begin with a “$”—these don’t follow the same rules as local variables. You won’t need to use these very often. These can have cryptic looking names such as:
$! # The exception object passed to #raise.
$@ # The stack backtrace generated by the last exception raised. $& # Depends on $~. The string matched by the last successful match. $` # Depends on $~. The string to the left of the last successful match. $' # Depends on $~. The string to the right of the last successful match. $+ # Depends on $~. The highest group matched by the last successful match. $1 # Depends on $~. The Nth group of the last successful match. May be > 1. $~ # The MatchData instance of the last match. Thread and scope local. MAGIC The preceding global variables are taken from the excellent Ruby quick reference by Ryan Davis (creator and maintainer of Minitest)—I recommend you bookmark it, or print it out.
On the subject of cryptic symbols, I mentioned that Ruby is akin to Perl. Ruby’s creator, Yukihiro Matsumoto (Matz), describes the history of Ruby in an interview with Bruce Stewart:
which had not released yet, was going to implement OO features, but it was not really what I wanted. I gave up on Perl as an object-oriented scripting language.
Then I came across Python. It was an interpretive, object-oriented language. But I didn’t feel like it was a “scripting” language. In addition, it was a hybrid language of procedural programming and object-oriented programming.
I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python. That’s why I decided to design my own language.
— http://bit.ly/18FHd3p
He adds:
Ruby’s class library is an object-oriented reorganization of Perl functionality—plus some Smalltalk and Lisp stuff. I used too much I guess. I shouldn’t have inherited $_, $&, and the other, ugly style variables.
If you’re familiar with Perl, I commend to you: comparing Ruby and Perl.
Constants
Constants are like variables, only their value is supposed to remain unchanged. In actual fact, this isn’t enforced by Ruby—it just complains if you waver in your constancy:
>> MY_LOVE = "infinite" => "infinite"
>> MY_LOVE = "actually, rather unreliable"
(irb):38: warning: already initialized constant MY_LOVE => "actually, rather unreliable"
Constants begin with an uppercase letter—conventionally they may simply be capital‐ ized (Washington), be in all caps (SHOUTING), camelcase (StephenNelsonSmith), or capitalized snakecase (BOA_CONSTRICTOR).
Keywords
Keywords are built-in terms hardcoded into the language. You can find them listed at
http://ruby-doc.org/docs/keywords/1.9/. Examples include end, false, unless, super,
break. Trying to use these as variables will result in errors: >> super = "dooper"
SyntaxError: (irb):1: syntax error, unexpected '=' super = "dooper"
^
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>' >> false = "hope"
SyntaxError: (irb):2: Can't assign to false false = "hope"
^
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>' >> unless = 63
^
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>'
Method names
Method names are the fourth and final kind of identifier. We’ve already seen one of these at play:
>> 7.to_f => 7.0
Method names adhere to the same naming constraints as local variables, with a few exceptions. They can end in “?”, “!”, or “=”, and it’s possible to define methods such as “[]” or “<=>”. This might sound like a recipe for confusion, but it’s very much by design. Methods are just part of the furniture; Ruby without methods would be like ice cream… without ice. Or cream.
More About Methods
We discussed the idea of objects and methods at the very start of this section. However, it bears repeating, as the object is the most fundamentally important concept in Ruby. When we send a message to an object, using the dot operator, we’re calling some code that the object has access to. Strings have some nice methods to illustrate this:
>> "STOP SHOUTING".downcase => "stop shouting"
>> "speak louder".upcase => "SPEAK LOUDER"
The pattern is: OBJECT dot METHOD. To the left of the dot we have the receiver and to the right, the method we’re calling, or the message we’re sending.
Methods can take arguments: >> "Tennis,Elbow,Foot".split => ["Tennis,Elbow,Foot"]
>> "Tennis,Elbow,Foot".split(',') => ["Tennis", "Elbow", "Foot"]
The first attempted to split the string on white space but didn’t find any. The second split the string on the comma. The result of each method is an Array—more on arrays shortly.
I mentioned that methods may end in signs such as “?” or “!”. Here’s an example: >> [1,2,3,4].include? 3
=> true
>> curse = "BOTHERATION!" => "BOTHERATION!"
>> curse.downcase => "botheration!" >> curse
=> "BOTHERATION!" >> curse.downcase! => "botheration!" >> curse
=> "botheration!"
One final important idea connected with methods is the idea of method_missing. It is possible for an object to have the special method method_missing. In this case, if the object receives a message for which there is no corresponding method, rather than just throwing away the message and raising an error, Ruby can take the message and redirect it or use it in many powerful ways. Chef uses this functionality extensively to implement the language used to build infrastructure. This is an advanced topic, and I refer you to some of the classic texts—particularly Metaprogramming Ruby (The Pragmatic Pro‐ grammers) if you wish to learn more.
We create methods using the def keyword: >> def shout(something)
>> puts something.upcase >> end
=> nil
>> shout('i really like ruby') I REALLY LIKE RUBY
=> nil
We created a method and specified that it take an argument called “something”. We then called the upcase method on the something. This worked fine, because the argument we passed was a string. Look what happens if we give bogus input:
>> shout(42)
NoMethodError: undefined method `upcase' for 42:Fixnum from (irb):7:in `shout'
from (irb):10
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>' >> shout('more', 'than', 'one', 'thing')
ArgumentError: wrong number of arguments (4 for 1) from (irb):6:in `shout'
from (irb):11
from /opt/rubies/1.9.3-p429/bin/irb:12:in `<main>'
>> self => main >> self.class => Object
What’s all this about? We’re basically looking at the top level of Ruby. If we type methods, we see there are some methods available at the top level:
>> methods
=> [:to_s, :public, :private, :include, :context, :conf, :irb_quit, :exit,↵ :quit, :irb_print_working_workspace, :irb_cwws, :irb_pwws, :cwws, :pwws, ↵ :irb_current_working_binding, :irb_print_working_binding, :irb_cwb, :irb_pwb, ↵ :irb_chws, :irb_cws, :chws, :cws, :irb_change_binding, :irb_cb, :cb, ↵
:workspaces, :irb_bindings, :bindings, :irb_pushws, :pushws, ↵ :irb_push_binding, :irb_pushb, :pushb, :irb_popws, :popws, ↵
:irb_pop_binding, :irb_popb, :popb, :source, :jobs, :fg, :kill, :help, ↵ :irb_exit, :irb_context, :install_alias_method, ↵
:irb_current_working_workspace, :irb_change_workspace, :irb_workspaces, ↵ :irb_push_workspace, :irb_pop_workspace, :irb_load, :irb_require, :irb_source, ↵ :irb, :irb_jobs, :irb_fg, :irb_kill, :irb_help, :nil?, :===, :=~, :!~, ↵
:eql?, :hash, :<=>, :class, :singleton_class, :clone, :dup, ↵
:initialize_dup, :initialize_clone, :taint, :tainted?, :untaint, :untrust, ↵ :untrusted?, :trust, :freeze, :frozen?, :inspect, :methods, ↵
:singleton_methods, :protected_methods, :private_methods, :public_methods, ↵ :instance_variables, :instance_variable_get, :instance_variable_set, ↵ :instance_variable_defined?, :instance_of?, :kind_of?, :is_a?, ↵ :tap, :send, :public_send, :respond_to?, :respond_to_missing?, ↵ :extend, :display, :method, :public_method, :define_singleton_method, ↵ :object_id, :to_enum, :enum_for, :==, :equal?, :!, :!=, ↵
:instance_eval, :instance_exec, :__send__, :__id__]
These methods must be being called on something. We know that something is self. What is self? When we inspect it, we see:
>> self.inspect => "main" >> self.class => Object
So self is an instance of Object that evaluates to the string main. By default it has no instance variables:
>> instance_variables => []
But we can add them: >> @loathing = true => true
>> instance_variables => [:@loathing]
>> def say_hello >> puts "hello" >> end
=> nil
We can see that the method is now available to self. >> self.methods.grep /hello/
=> [:say_hello]
Here we called the methods method, which we know returns an array, and then called the grep method on the array. The grep method takes a pattern to match—we speci‐ fied /hello/, which matched only one method, so Ruby returned it. What we’ve effec‐ tively done is defined a top-level method on Object. We can see this by inspecting the method:
>> method(:say_hello).owner => Object
Normally we would define methods in the context of a class, so let’s look at classes.
Classes
Classes describe the characteristics and behavior of objects—simply put, a class is just a collection of properties with methods attached. We’re familiar with the idea of bio‐ logical classification—a mechanism of grouping and categorizing organisms into genus or species. For example the walnut tree and the pecan tree are both instances of the family Juglandaceae. In Ruby every object is an instance of precisely one class. We tend not to deal with classes as much as instances of classes. One particularly powerful feature of Ruby is its ability to give instances of a class some attributes or methods belonging to a different type of object. This idea—the Mixin—is seen fairly frequently, and we’ll cover it later.
Let’s see an example of creating a class: >> class Pet
>> end => nil
The class doesn’t do anything interesting yet, but we can create an instance of it:
>> rupert = Pet.new => #<Pet:0x00000001d97b68> >> rupert.class
=> Pet
$ emacs pet.rb class Pet
def initialize(name) @name = name end
def name @name end
end
corins_pet = Pet.new("Rupert")
puts "The pet is called " + corins_pet.name
$ ruby pet.rb
The pet is called Rupert
So the constructor has the method initialize. We’ve said that it takes an argument, and we’re setting an instance variable to hold the state of the pet’s name. Later we have a method, name, which returns the value of the instance variable. Simple.
The trouble is, children are fickle. What they thought was a great name turns out to be a dreadful name a few days later. Unless we were going to be draconian, and insist that pet names be immutable, it might be nice to allow the child to rename the pet. Let’s add an instance method that will change the name:
$ emacs pet.rb class Pet
def initialize(name) @name = name end
def name=(name) @name=name end
def name @name end end
pet = Pet.new("Rupert")
puts "The pet is called " + pet.name puts "ALL CHANGE!"
pet.name = "Harry"
puts "The pet is now called " + pet.name
The pet is called Rupert ALL CHANGE!
The pet is now called Harry
Here’s another example of a method name with some odd-looking punctuation at the end. But this is how we implement a method that allows assignment. This class is looking a bit lengthy (and frankly, ugly) for such a featureless class. Thankfully Ruby provides some syntactic sugar, which provides the ability to get and set instance variables. Here’s how it works:
class Pet
attr_accessor :name
def initialize(name) @name = name end
end
pet = Pet.new("Rupert")
puts "The pet is called " + pet.name puts "ALL CHANGE!"
pet.name = "Harry"
puts "The pet is now called " + pet.name
What’s actually going on here is that when the Class block is evaluated, the attr_ac cessor method is run, which generates the methods we need. Ruby is particularly good at this—metaprogramming—code that writes code. In more advanced programming, it’s possible to overwrite the default attr_accessor method and make it do what we want—great is the power of Ruby. But why all the fuss? Why can’t we just peek into the class and see the instance variable? Remember, Ruby operates by sending and receiving messages, and methods are the way classes deal with the messages. The same is so for instance variables. We can’t access them without calling a method—it’s a design feature of Ruby.
Right, that’s enough of messages and classes for the time being. Let’s move on to look at some data structures.
Arrays
Arrays are indexed collections of objects, which keep this in a specific order: >> children = ["Melisande", "Atalanta", "Wilfrid", "Corin"]
=> ["Melisande", "Atalanta", "Wilfrid", "Corin"] >> children[0]
The index starts at zero, and we can request the nth item by calling the “[]” method. This is very important to grasp. We’re sending messages again! We’re sending the [] message to the children array, with the argument “2”. The array knows how to handle the message and replies with the child at position 2 in the array. Arrays have convenient aliases:
>> children.first => "Melisande" >> children.last => "Corin"
We can append to an array using the “<<” method. Suppose we adopted orphan Annie: >> children << "Annie"
=> ["Melisande", "Atalanta", "Wilfrid", "Corin", "Annie"] >> children.count
=> 5
Collections of objects can be iterated over. For example: >> children.each { |child| puts "This child is #{child}" } This child is Melisande
This child is Atalanta This child is Wilfrid This child is Corin This child is Annie
=> ["Melisande", "Atalanta", "Wilfrid", "Corin", "Annie"]
This introduces two new pieces of Ruby syntax—the block and string interpolation. String interpolation is an alternative to the rather clumsy looking use of the “+” operator. Ruby evaluates the expression between #{} and prints the result.
>> dinner = "curry" => "curry"
>> puts "Stephen is going to eat #{dinner} for dinner" Stephen is going to eat curry for dinner
=> nil
Of course the expression could be much more complex: >> foods = ["chips", "curry", "soup", "cat sick"] => ["chips", "curry", "soup", "cat sick"]
>> 10.times { puts "Stephen will eat #{foods.sample} for dinner this evening." } Stephen will eat chips for dinner this evening.
Stephen will eat curry for dinner this evening. => 10
Here we see another example of a block! The integer “10” has a method times, which takes a block as an argument.
Blocks allow a set of instructions to be grouped together and associated with a method. In essence, they’re a block of code that can be passed as an argument to a method. They’re a particular speciality of Ruby and are incredibly powerful. However, they’re also a bit tricky to understand at first.
For programmers new to Ruby, code blocks are generally the first sign that they have definitely departed Kansas. Part syntax, part method, and part object, the code block is one of the key features that gives the Ruby programming language its unique feel.
— Russ Olser Eloquent Ruby
Blocks are created by appending them to the end of a method. Ruby takes the content of the block and passes it to the method. Depending on the length of the block, Ruby convention is either:
• If one line, then place in curly braces {} (unless the code has a side effect, such as writing to a file, in which case the do … end form applies)
• If more than one line, then replace curly braces with do … end
The method definition itself has code to handle the contents of the block. For now it’s sufficient to understand that blocks are a kind of anonymous function—that is a func‐ tion that we defined and call, without ever binding it to an identifier. Ruby uses them a great deal to implement iterators.
Although present in Smalltalk, I think that it’s when looking at blocks that we see most evidence of Lisp in Ruby. Lisp provides the lambda expression as a mechanism for creating a nameless or anonymous function, and passing it to another function. Lisp also has the concept of a closure—that is an anonymous function that can refer to vari‐ ables visible at the time it was defined. Referring again to a Matz interview, the creator of Ruby says:
…we can create a closure out of a block. A closure is a nameless function the way it is done in Lisp. You can pass around a nameless function object, the closure, to another method to customize the behavior of the method. As another example, if you have a sort method to sort an array or list, you can pass a block to define how to compare the elements. This is not iteration. This is not a loop. But it is using blocks … the first reason [for this implementation] is to respect the history of Lisp. Lisp provided real closures, and I wanted to follow that.
— Bill Venners
Ruby features a wide range of iterators for various purposes. One commonly used one is map. The map method takes a block, and produces a new array with the results of the block being applied, without changing the initial array:
>> children.map do |child| ?> if child == "Annie" >> child + " the Orphan" >> else
?> child + " Nelson-Smith" >> end
>> end
=> ["Melisande Nelson-Smith", "Atalanta Nelson-Smith", "Wilfrid Nelson-Smith", "Corin Nelson-Smith", "Annie the Orphan"]
>> children
=> ["Melisande", "Atalanta", "Wilfrid", "Corin", "Annie"]
The block arguments lie between the two pipe symbols. I find Why The Lucky Stiff ’s description particularly apt:
The curly braces give the appearance of crab pincers that have snatched the code and are holding it together. When you see these two pincers, remember that the code inside has been pressed into a single unit…. I like to think of the pipe characters representing a tunnel. They give the appearance of a chute that the variables are sliding down. Variables are passed through this chute (or tunnel) into the block.
— WTLSPGTR
Conditional logic
Ruby supports various control structures to manage the flow of data through a program. The most commonly used are those that fork based on decisions:
>> 10.times do
?> grub = foods.sample >> if grub == "cat sick"
>> puts "Stephen is not very hungry, for some reason." >> else
?> puts "Stephen will eat #{grub} for dinner this evening." >> end
>> end
In addition to if and else, we also have elsif: >> def editor_troll(editor)
>> if editor == "emacs"
>> puts "Best editor in the world!" >> elsif editor =~ /vi/
>> puts "Be gone with you, you bearded weirdo!" >> else
Be gone with you, you bearded weirdo! => nil
>> editor_troll("nano")
*yawn* - sorry - were you talking to me? => nil
>> editor_troll("vim")
Be gone with you, you bearded weirdo! => nil
>> editor_troll("textmate")
*yawn* - sorry - were you talking to me? => nil
A handy option is the unless keyword: >> def mellow_opinion(editor) >> unless editor.length == 0
>> puts "Cool, dude. I hear #{editor} is really nice." >> end
>> end => nil
>> mellow_opinion("emacs")
Cool, dude. I hear emacs is really nice. => nil
>> mellow_opinion("notepad")
Cool, dude. I hear notepad is really nice. => nil
>> mellow_opinion("") => nil
The final control structure you’ll come across is the case statement:
>> def seasonal_garment(season) >> case season
>> when "winter"
>> puts "Wooly jumper and hat!" >> when "spring"
>> puts "Shorts and t-shirt!" >> when "autumn"
>> puts "Hmm... English? Raincoat!" >> when "fall"
>> puts "Bit like spring, really." >> end
>> end
>> seasonal_garment("winter") Wooly jumper and hat!
=> nil
>> seasonal_garment("fall") Bit like spring, really. => nil
>> seasonal_garment("autumn") Hmm... English? Raincoat! => nil
Typically, the case statement is used if there are more than three options, as multiple
elsif statements look a bit ugly, but it’s really just a matter of style.
Hashes
A hash is another sort of collection in Ruby. Variously called a dictionary or associative array in other languages, its defining feature is that the index can be something other than a static value. Hashes are commonly used in Chef for key/value pairs:
>> wines = {} => {}
>> wines['red'] = ["Rioja", "Barolo", "Zinfandel"] => ["Rioja", "Barolo", "Zinfandel"]
>> wines['white'] = ["Chablis", "Riesling", "Sauvignon Blanc"] => ["Chablis", "Riesling", "Sauvignon Blanc"]
>> wines
=> {"red"=>["Rioja", "Barolo", "Zinfandel"], "white"=>["Chablis", "Riesling", "Sauvignon Blanc"]}
The great thing about hashes is they can be deeply nested. We can add, for example:
>> wines['sparkling'] = {"Cheap" => ["Asti Spumante", "Cava"], "Moderate" => ["Veuve Cliquot", "Bollinger NV"], "Expensive" => ["Krug", "Cristal"]}
=> {"Cheap"=>["Asti Spumante", "Cava"], "Moderate"=>["Veuve Cliquot", "Bollin-ger NV"], "Expensive"=>["Krug", "Cristal"]}
>> wines['sparkling']["Cheap"] => ["Asti Spumante", "Cava"] >> wines['sparkling']["Expensive"] => ["Krug", "Cristal"]