The other factor contributing to the spread of these types of features is the expertise of a new generation of users. These folks have grown up with technology and expect it to help facilitate and mediate all their interactions with friends, colleagues, teachers, and coworkers. They move seamlessly from computer to their mobile device or phone and back, and they want the tools to move with them. They work with technology, they play in technology, they breathe this technology, and it is virtually invisible to them. In the years we have been living in the social web, we are seeing tensions around privacy, around permanence versus ephemerality and people are beginning to question how these experiences can be archived for future generations while others want guarantees that the words and ideas and experiences they share have a half-life shorter than the speed at which they are propagated across the internet. The deeper philosophical, ethical and cultural questions are now part of the dialogue and the differing attitudes across the world and demographically, effects how these tools and experiences are developed and nurtured. The terms community, social media, and social networking all describe these kinds of tools and experiences. The terms often are used interchangeably, but they provide different views and facets of the same phenomenon.
Since 2008 I have run, among other things, a site that handles around 500 million page views per month, hundreds of transactions per second, and is on the Alexa Top 50 Sites for the US. I’ve learned how to scale for that level of traffic without incurring a huge infrastructure and operating cost while still maintaining world-class availability. I do this with a small staff that handles new features, in addition to a handful of virtual machines.
But of course, Hadoop really shines when you have not one, but rather tens, hundreds, or thousands of computers. If your data or computations are significant enough (and whose aren't these days?), then you need more than one machine to do the number crunching. If you try to organize the work yourself, you will soon discover that you have to coordinate the work of many computers, handle failures, retries, and collect the results together, and so on. Enter Hadoop to solve all these problems for you. Now that you have a hammer, everything becomes a nail: people will often reformulate their problem in MapReduce terms, rather than create a new custom computation platform.
The purpose of atomicity is to solve precisely this issue—if something goes wrong during your writes, you don’t need to worry about a half-finished set of changes making your data inconsistent. The traditional approach of wrapping the two writes in a transaction works fine in databases that support it, but many of the new generation of databases (“NoSQL”) don’t, so you’re on your own. Also, if the denormalized information is stored in a different database—for example, if you keep your emails in a database but your unread counters in Redis—you lose the ability to tie the writes together into a single transaction. If one write succeeds and the other fails, you’re going to have a difficult time clearing up the inconsistency.
In June 2011, the German government has presented a new draft for the implementation of the Directive but as Germany has failed to transport the Directive into national law within the specified time limit, the country now faces proceedings before the Court of Justice of the European Union for not fulfilling its contractual duties. The Constitutional Courts of Romania and the Czech-Republic have also declared the respective national data protections laws unconstitutional, bringing forward the argument that the Directive’s implementation would infringe the protection of privacy as well as the right to information self-determination. 22 Moreover, it is uncertain whether a collection of communication data retained by the Internet Service Providers (ISP) really proves to be very effective, since users increasingly tend to move their communication for example into discussion
Big data can be described as both ordinary and arcane. The basic premise behind its genesis and utility are as simple as its name: efficient access to more—much more—data can transform how we understand and solve major problems for business and government. On the other hand, the field of big data has ushered in the arrival of new, complex tools that relatively few people understand or have even heard of. But is it worth learning them?
➤ Parquet: Parquet is a columnar storage format for Hadoop frameworks. Parquet is a format for achieving nested name space in a columnar format, and is inspired by the Dremel paper: ( http://research.google.com/pubs/pub36632.html ). This nested space feature is the main advantage of using Parquet, compared to using XML/JSON. In addition, it can effi - ciently store sparse data that contains a lot of null fi elds. The encoding/decoding specifi cation is described here: ( https://github.com/Parquet/parquet-mr/wiki/The-striping-and- assembly-algorithms-from-the-Dremel-paper ). Parquet uses Thrift as its serialization format. Ordinary columnar-oriented databases require you to often read columnars from multiple machines. It increases the cost of I/O due to its network accessing requirement. Parquet defi nes row groups and column chunks. A row group is a logical collection of rows that also consists of some column chunks, which are chunks for a specifi c column. Since they are guaranteed to be contiguous in a fi le, it is possible to reduce the cost of a multiple reading I/O. A fi le contains some columns for each record, so it might not be necessary to read another fi le in order to fetch another column if a current Parquet fi le already contains it. There are several implementations of Parquet. They are listed on https://github.com/ apache/parquet-mr . You can use Parquet in MapReduce, Hive, or Pig without writing new code. Adapting Parquet can help you achieve good performance and reduce the development cost at the same time.
CRC cards also don’t capture the interrelationship among classes. Although it is true that collaborations are noted, the nature of the collaboration is not modeled well. Looking at the CRC cards, you can’t tell whether classes aggregate one another, who creates whom, and so forth. CRC cards also don’t capture attributes, so it is difficult to go from CRC cards to code. Most important, CRC cards are static; although you can act out the inter- actions among the classes, the CRC cards themselves do not capture this information. In short, CRC cards are a good start, but you need to move the classes into the UML if you are to build a robust and complete model of your design. Although the transition into the UML is not terribly difficult, it is a one-way street. Once you move your classes into UML diagrams, there is no turning back; you set aside the CRC cards and don’t come back to them. It is simply too difficult to keep the two models synchronized with one another.
language processing (NLP) this way: “We faced two challenges worthy of mentioning: applying NLP to the world of fashion, and determining the ‘words that matter’ to create meaningful ‘fashion text quora’, by brand. We solve the first one with our proprietary fashion style taxonomy, which provides us with a vector representation of the styles in the same n-dimensional space where we model the brand styles. The second challenge requires a constant evaluation of the text quora representative of the brand, with an emphasis on significant style variations for any given brand at any point in time.” Today’s sweet spot is often found when machine learning, specifically natural language processing, is combined with human intellect. This machine-meets-human matchup extends deeper into the general industry culture, as well, particularly in the new generation of fashion startups—many of which were built with analytics in mind from the start.
This strategy is similar to the demand-shaving programs many utilities already have in place. They offer intelligent, programmable thermostats and installation to customers at little or no cost. In return, customers voluntarily allow the utilities to adjust indoor temperature higher or lower, depending on the season, to reduce overall demand during periods of peak demand. The utilities benefit by not having to fire up their peaker plants as much, which cost more to run than their base-load operations. Customers benefit by receiving new thermostats. They can program these devices to stay at set temperatures at different times around the clock, curbing energy use and lowering their electric bill.
The approach felt wrong, like looking for a needle in a haystack, especially given the complex optimizations and behavior that the JVM displays. We tried another approach, and quickly confirmed that the application was maxing out CPU utilisation. As this is a known good use case for execution profilers (see Chapter 11 for the full details of when to use profilers), we spent ten minutes profiling the application. Sure enough, we found that the problem wasn’t in our code at all, but in a new infrastructure library we were using.
A little later (approximately two years ago) people started to really feel the pain of managing their applications at the VM layer. Even under the best circumstances it takes a brand new virtual machine at least a couple of minutes to spin up, get recognized by a load balancer, and begin handling traffic. That’s a lot faster than ordering and installing new hardware, but not quite as fast as we expect our systems to respond.
The content flat files are cached heavily, from 30 days up to a year. This is because every time a new file is produced, it has a unique URL so even if the editors needed to make a change to an existing story, they would output a new file with its own cache rules. The base page that reads in the flat files, on the other hand, is only cached for five minutes (see Figure 3-2).
observed, he introduced Michael as his design partner. Michael sat on a tall stool at a high table, and Aaron stood next to him. They shared a monitor, and each had a mouse and keyboard, much like traditional pair programming, except rather than looking at code, they were looking at a design sketch. At Pivotal Labs, the roles of generating ideas versus synthesizing information are called, respectively, “Driving” and “Navigating.” In this configuration, the designer using the computer is driving the design session, while the other is navigating — consulting notes, testing ideas, checking edge cases. It’s only at the end of the session that we observed when Aaron explained that Michael was actually their client who was embedded at Pivotal to guide the design, learn how to pair, and build his knowledge of the design system overall.
Prior to the arrival of Linux, Windows and a long list of commercial Unix variants had the lion’s share of the server operating system market. Even in the early days of Linux, the expectation was that enterprise customers would not switch to the fledgling open source operating system, even if it was “free.” Of course, as the Linux ecosystem grew and companies formed to offer enterprise-class, supported Linux distributions, the market share picture began to change rapidly. By the end of 2007, IDC reported that Linux had finally broken the US$2 billion barrier in a single quarter and had grown to represent 12.7% of all server revenue. By 2010 the share percent for Linux had grown to 17%, but the breakout moment arrived in 1Q 2012, when the IDC reported that Linux had grabbed 20.7% of worldwide server revenue compared to 18.3% for Unix:
Fast-forwarding to the modern era, the introduction of the GNU project and accompanying free software ideas from Richard Stallman in the 1980s, quickly followed by Linus Torvalds and the Linux operating system in 1991, were milestones that, combined with the increasing ease of network connectivity around the globe and mass communication via access to email, early primitive websites, and code repositories on FTP servers, led to a huge influx of new participants in the open source movement. Linux and various GNU project components provided a free base layer for open source activities. All the tools necessary for participating in open source—compilers, editors, network clients, and additional scripting languages and utilities—were embedded in a single freely accessible operating system environment, thereby significantly lowering the bar for entry and involvement by any party with access to a basic personal computer.
I was working in London for a top publisher within their internal development studio. One day, a fighting game came through the door from an external devel- opment team and it immediately grabbed everybody’s attention and imagination. The publisher was excited, marketing became ecstatic and it was an awesome game – or so it seemed. It had one major flaw. The visual content went way beyond anything that could ever be publishable. It was so grotesque and violent, that it would have gone beyond the highest rating and would have put Silence of the Lambs to shame. So, after much excitement, money spent and a stint at a trade show, the project was pulled and shelved because nobody would allow it to be sold to the general public. This is a perfect example of what could happen if you don’t define your target audience. The publisher considered a re-design and to replace all the art work, but decided that it would lose its USP (unique selling point) and would be too similar to many other fighting games.
Each and everyone of us has experienced a story that completely absorbs us. From a book we simply can not put down but keep on reading late into the night, to a movie or play that leaves us reflecting and thinking about what we’ve just seen for days to come. The question of what makes for a good story is what started the journey of this book. It’s the one I called and asked my dad about when I was preparing for my storytelling talk. I wanted to find out if there was a magical recipe or formula to follow, besides a beginning, a middle and an end, and whilst I wasn’t expecting a “Well, yes there is!”, the answers he gave and the conversations and later research this sparked, turned out to be far more interesting than what I could have imagined. From the art of dramaturgy and the three ingredients my dad said should be present in any good story, to just how much that we experience on a day to day basis that’s transcended by storytelling. Everywhere you look and everywhere you go there is a story to tell and exactly what makes for a great one and how this is connected to design and multi-device projects is the story I aim to tell you through this book.