To illustrate the advantage of rapidly scaling infrastructure without committing to purchase, consider Google’s earlier image detection efforts. Initially built using CPUs running on 1000 computers, Google’s cost for the hardware was roughly one million
dollars. They subsequently redeployed the project on GPUs, which worked so much better that they were able to run the model at a fraction of the cost on just sixteen computers (roughly $20,000).71 Most companies could not afford to make such a large hardware purchase for an experimental project, nor to explain to their finance department the hundreds of computers purchased but no longer needed.
You’ll still need to install software on the cloud-based hardware (the operating system, middleware, storage, etc.), but if you want to move straight to running applications, you can utilize a Platform as a Service (PaaS) offering, which may be proprietary or may be an open-sourced product implemented and maintained by a service provider. In this way, your analytics programme can outsource both the hardware and the foundational software and work directly on your application.
You may have concerns regarding security in the cloud. For public clouds and Software as a Service (SaaS), security is in fact the biggest barrier for
companies considering the cloud. In the Dell study cited above, 42 per cent of companies not yet using cloud said security was the reason, far higher than any other reason. European companies often want to keep their data within Europe, particularly following Edward Snowden’s revelations and the resulting turmoil in Europe’s Safe Harbor Provisions.
In industries such as finance, where security, reliability and compliance are particularly critical, companies have traditionally opted to manage their own data centres to keep tighter control of security and reliability. However, earlier
security concerns in these sectors continue to be alleviated by cloud providers, and companies in the financial, pharmaceutical, and oil and gas sectors have started utilizing cloud technologies.72
Some companies attest that running applications in the cloud leads to more secure applications, as it forces them to leave behind insecure legacy software.
Being more state of the art, applications built for the cloud are generally designed with high levels of control, with better monitoring capabilities and better overall security. The cloud providers give a degree of consistency that enhances security, and they are themselves very conscious of securing their assets.
Moving, cleaning and storing your data: data pipelines
You’ll need to architect your data pipeline, selecting data warehousing and middleware, such as messaging systems for transferring information in real time (e.g. Kafka, RabbitMQ, etc.)
Moving and cleaning data is generally the most time-consuming part of an
analytics effort. You can purchase an ETL tool to do much of the heavy lifting in data processing. It should provide useful ancillary functionality, such as for
documentation. A good ETL tool can make it easy to add a new data source, pulling data not only from a traditional database but also from newer sources such as web analytics servers, social media, cloud-based noSQL databases, etc.
You’ll also need to select and prepare the destination database(s). As we discussed earlier, there are hundreds of database solutions to choose from. If you’re deeply rooted in a vendor technology, you may want to continue within that vendor’s product ecosystem, or you may consider adding new technologies, running several systems in parallel or as separate projects. Migrating from a proprietary to an open-source database can bring significant cost savings. One company recently reported cutting its cost per terabyte in half. You’ll also need to invest significant effort in setting up the logical and physical structures of the database tables in a way that best fits your intended use.
Choosing software
We should emphasize again that extensive libraries for analytics have been developed for the major programming languages, and you should start your analytics effort by working with what is already available. Popular languages and tools such as Python, R, SAS and SPSS already include extensive analytic
libraries with large support communities. In Python, a developer can build a neural network with only a few lines of code by leveraging existing software packages such as Keras and TensorFlow.
Don’t expect to find off-the-shelf software that completely solves your analytic challenge, but existing software should give you a good head start, particularly if it integrates seamlessly with your data pipeline and automates data processing.
Keep in mind that solutions still need to be customized to your problem, and you’ll want to apply subject-matter expertise to engineer the model features that work best for your application. In addition, it is often the case that off-the-shelf solutions are simply implementing a common analytic model.
When purchasing analytic software, you should always ask yourself the standard questions that you would ask for any software purchase (cost, reliability, required training, etc.).
Keep in mind
Don’t expect to find an off-the-shelf solution that solves your problem without substantial additional effort.