Budget holders, who will oversee costs and have preferences for CapEx or OpEx.
Legal and privacy officers, who will have requirements related to data location, governance, fair use and accessibility.
IT teams, who will help you leverage technologies and skills already in your organization (Python, for example, is commonly used across IT teams). They will also have technical requirements that you must satisfy.
Business units, who will have requirements related to usability and delivery. Their input will certainly impact your choice of BI tools and could potentially impact any part of the big data technology stack, with requirements related to latency, accuracy, speed, concurrency,
consistency, transparency or delivery.
You may need to set aside your agile mindset when choosing technology.
Some solutions, after an initial test period and a limited proof of concept, require a significant deployment decision. In such cases, conduct a thorough requirements analysis before making significant investments or deployment efforts.
To illustrate, assume you work in the financial services, where security, reliability and compliance are critical. Companies in this industry have traditionally opted to manage their own data centres and keep complete control over security and reliability. They avoid early adoption of
technologies, particularly open-source. Early versions of Spark and Kafka were not even an option, as they did not support SSL security protocols.
In financial services, you would have stringent requirements related to auditing, which is typically more difficult with open-source software.
Whereas most companies plan their systems assuming a certain degree of system failure, you would require extreme reliability from each system component.
If you were in financial services, your big data technology choices would thus be guided by the following principles:
You would not be an early adopter of new technologies.
You would choose the most reliable software, regardless of whether it is open-source.
You would maximize reliability by purchasing support.
You would be very cautious when deciding to use cloud-based servers.
2. Technology recommendations You’ll find it’s often quite difficult to evaluate a technology. You’ll see product features listed on marketing material, but you need insights into usability, performance, reliability, and the spectrum of undocumented features that will determine the success or failure of the technology within your organization.
Start by gathering insights and experiences from within your own
organization and professional network. If your organization holds a Gartner or Forrester subscription, you’ll want to set up analyst interviews and request relevant analyst papers. If you don’t have such a subscription, you can often
speak with one of these analysts at a conference. Bear in mind that their expertise may be stronger in vendor rather than open-source technologies.
Some independent thought leaders publish reviews and recommendations, but be aware they are often paid for their endorsements. Look also on the online forums, including slack channels, which provide a continuous stream of insights into technologies. These are frequented by some very
knowledgeable practitioners, and user voting systems help keep quality high.
In fact, the developers of the technologies are themselves often active on such forums.
Take care when attempting to replicate solutions chosen by others. Small differences in requirements can lead to completely different technology requirements. To illustrate, Spark is a widely referenced technology for streaming analytics, so we may see frequent mention of it online. But
because Spark processes data in micro batches, it is generally not appropriate for solutions requiring a latency of under 500 milliseconds (½ second), and Apache Flink, a technology that originated in Germany, would probably be more appropriate for such applications.
3. Integration with existing technology Consider how you’ll integrate your analytic solution internally as well as with your customers’ technologies. Try to choose solutions that are modular (and hence provide more versatility).
However, the pricing and usability benefits of packaged capabilities, combined with automated data transfer features, may make more coupled solutions attractive. Larger vendors tend to create solutions that span multiple applications, including basic analytics within a visualization tool (e.g. Tableau), machine learning within a cloud environment or a larger software suite (e.g. Microsoft, SAS or IBM), ETL and delivery solutions coupled with a data warehouse (Microsoft’s BI stack) or AI capabilities within a CRM system (Salesforce’s Einstein). For such applications, you’ll want to consider whether such an offering fits your requirements in a way that better optimizes data flow or minimizes incremental software costs.
Understand the technology platforms of your target B2B customers, which may lead you to develop integrations with or parallel solutions within those technologies or cloud environments.
4. Total cost of ownership Many organizations see cost as a large barrier to using big data. In Dell’s 2015 survey, the top barriers to increasing the use of big data included the costs of IT infrastructure and the cost of outsourcing analysis or operations. Consider both direct and indirect costs, including licensing, hardware, training, installation and maintenance, system migration and third-party resources. Your IT department should already be familiar with this costing process, having performed similar analysis for existing technology.
These costs continue to fall, and if you’ve done your homework in preparing your business case, you should be able to choose the projects and solutions that will result in positive ROI.
5. Scalability Consider how the technology can handle increases in data,
replications, number of users and innovative data sources. Consider also how the licensing model scales. The license for BI tools may be manageable when deployed to a dozen users, but prohibitive at the point where you want to empower several hundred employees with its self-service capabilities. A lack of planning in this area can lead to some painful budgeting moments later.
6. Extent of user base If you choose a fringe technology, it will impact your ability to find external support as well as to hire and train internal staff to operate the technology. The broader the adoption of the technology,
particularly within your geography and industry sector, the more likely you will be able to hire qualified staff. There will also be more support available, both from third parties and from online forums such as stack overflow and slack groups. Similarly, a widely used, open-source technology is more likely to be kept up to date and to have bugs and usability issues quickly flagged and repaired.
7. Open source vs proprietary If you use open-source technology, you’ll be able to quickly leverage the efforts of the wider community and save
development time and licensing fees. As we mentioned above, your situation may dictate that you use proprietary technologies considered to be tried and true, and which come with strong service-level agreements.
8. Industry buzz Recruiting talent within the big data and data science
domains is very difficult. Using the newest software frameworks, databases, algorithms and libraries will increase your ability to recruit top talent.
9. Future vision of the technology If your organization is an early technology adopter, you’ll want to give preference to technologies that are quick to integrate and adapt to the technology space around them. For example, we mentioned earlier how Python is often the first language supported by new big data technologies, but that many algorithms in academia are developed in R. In addition, early consumers of new data types will want to choose an ETL or BI tool known to quickly add new data sources.
Ask vendors about their forward-looking visions. One of Gartner’s two axes in their Magic Quadrant is ‘completeness of vision,’ which incorporates vendors’ product strategies.
10. Freedom to customize the technology Will you be satisfied using the technology out of the box, or will you want to view and modify the code? If you are integrating the technology into a product for resale, check the
licensing restrictions.
11. Risks involved with adopting a technology Cutting-edge technologies will be less well tested, and hence higher risk. An outsourced as a Service brings additional reliance on third parties, and vendor solutions depend on vendor support.
Big data technologies are fascinating, and they are developing rapidly. But you can’t build a programme on technology alone. In the next chapter, I’ll talk about the most critical resource you’ll need, which is also the most difficult to secure:
your analytics team.
Takeaways
You’ll need to make choices related to hardware, use of cloud, data transfer, analytic tools and data delivery (BI).
Companies are increasing use of cloud solutions, but some concerns remain.
As-a-Service offerings can free you to focus on your core differentiators.
Stakeholder requirements and preferences will play a crucial role in technology decisions, particularly for BI tooling.
Consider several important factors as you decide between competing technologies.
Ask yourself
What parts of your infrastructure and software could you replace with as-a- Service offerings to allow you to focus more on your core differentiators?
Are you experiencing integration difficulties from utilizing too many
different technologies? What steps are you taking to assess these difficulties and to standardize your technology where necessary? Consider the tradeoff between costs and benefits for such a process.
Who in your company or professional network can provide you with broad, unbiased insights into available technologies? Consider what industry conferences might be helpful in this.
Consider your organization’s growth projections. How long will it be before the technologies you are using today are either unable to handle your data needs or become prohibitively expensive for the scale you’ll need to use them at?