• Tidak ada hasil yang ditemukan

Cloud Data Warehousing

N/A
N/A
arliyanti n

Academic year: 2023

Membagikan "Cloud Data Warehousing"

Copied!
68
0
0

Teks penuh

In more recent times, alternative approaches have emerged, such as different forms of the data lake. To remain efficient and competitive, organizations must be able to harness the power of the vast amounts of data that are constantly being generated and perform complex analytics on that data.

Getting Up to Speed on Cloud Data

The Cloud: A key factor driving the evolution of the modern data warehouse is the cloud. The data warehouse is hosted on hardware installed in a data center managed by the vendor.

FIGURE 1-1:  Traditional systems caused the cloud data warehouse to emerge.
FIGURE 1-1: Traditional systems caused the cloud data warehouse to emerge.

Learning Why the Modern Data

With the majority of an organization's data now in the cloud, the natural place to integrate this data is also in the cloud. In this section, we focus on the key technologies that should be part of any modern data warehouse.

FIGURE 2-1:  The modern data warehouse enables all data for all users.
FIGURE 2-1: The modern data warehouse enables all data for all users.

The Criteria for Selecting a Modern

A modern data warehouse should eliminate the need for upfront design and modeling of rigid, traditional structures that would require transformation of semi-structured data before loading. A modern data warehouse should automatically integrate your semi-structured data, once confined to NoSQL systems, with the structured data that is part of a traditional corporate relational database. A modern data warehouse must be designed with leading technology, but built on inclusive and established standards (such as SQL) and must be compatible with other skills and tools commonly available in the industry, such as Spark, Python and R computing languages.

Simplify its data pipeline without the need to design a new model for each new type of data loaded into its data warehouse. Furthermore, building a data warehouse that meets business requirements and takes full advantage of today's data volume and variety is often a high cost for any organization. A modern data warehouse must manage itself when it comes to ensuring system stability, resilience and availability.

The data warehouse must use a hierarchical key wrapping approach, which encrypts the encryption keys, as well as a robust key rotation process, which limits the number. In addition, the solution provider of a modern cloud data warehouse must perform periodic security tests, known as penetration testing, to proactively check for vulnerabilities. A modern data warehouse should reduce the overall complexity of the process to move data through the data pipeline faster.

On-Premises versus Cloud

Done right, a cloud data warehouse can be up and running in weeks or just a few months. In addition, an on-premises data warehouse requires specialized information technology (IT) personnel to deploy and maintain the system. A cloud data warehouse replaces the initial CapEx and ongoing costs of an on-premises system with simple OpEx usage-based pricing.

Conservatively speaking, the annual cost for a cloud data warehouse solution can be one-tenth that of a similar, on-premise system. The capabilities of the data warehouse are identical to the same software deployed with on-premises hardware. Alternatively, any data warehouse solution built for the cloud must capitalize on the benefits of the cloud (see Figure 5-1).

In most cases, this non-relational data must be transformed before being loaded into a traditional on-premises or cloud data warehouse. A cloud data warehouse that requires manual reconfiguration requires careful planning and coordination with the vendor to scale the resources. Cloud data warehouse services vary in the extent to which the customer is responsible for availability and resiliency.

See how many "9s" availability the cloud data warehouse solution supports (99.9XX percent uptime). At a basic level, a cloud data warehouse solution built on legacy, on-premises technology still requires the customer to manage all of these aspects.

FIGURE 5-1:  How a cloud-optimized architecture streamlines performance.
FIGURE 5-1: How a cloud-optimized architecture streamlines performance.

Enabling Data Sharing

Traditional data sharing methods, such as File Transfer Protocol (FTP), cloud storage (Amazon S3, Box, Dropbox, and others), application programming interfaces (APIs), and email, require you to make a copy of the shared data and send it to your data consumers. New data sharing technology enables organizations to easily share pieces of their data, and receive shared data, in a secure and controlled way. A multitenant cloud-built data warehouse provides the ideal platform for a data sharing service because it enables authorized members of a cloud ecosystem to access live, read-only versions of the data.

Business Insights: Having more complete data improves collaboration and provides better business insights as data sharing becomes the norm. Data services: The company uses internal data sets to also provide customers with data augmentation services such as data modeling, data enrichment and data analytics. Data sharing: The company looks for ways to improve its data products by sourcing external data and offering its data products to a broader audience, usually through a data marketplace or data exchange.

With the right data sharing architecture, you can easily analyze more of your data to discover new products, services, and market opportunities. According to Sean Howard, senior vice president of product development at Environics, having a secure data sharing service provides a convenient data delivery mechanism and tremendous opportunity for increasing revenue. The secure data sharing service increases customer loyalty, reduces handling costs and eliminates unnecessary file transfers, while dramatically simplifying version control.

FIGURE 6-1:  An efficient architecture for real-time data sharing.
FIGURE 6-1: An efficient architecture for real-time data sharing.

Maximizing Options with a

In the following sections, we review the technologies that make a cross-cloud data warehouse possible. Work with a data warehouse vendor that has done the hard work to resolve the differences between cloud configurations and built its solution on a common code base that spans all clouds. Your data warehouse platform should enable cross-regional and cross-cloud replication without reducing the performance of operations on your primary data.

Ask your data warehouse vendor if they support direct access and recovery for databases of any size, in any cloud, in any region. Find out if your data warehouse provider replicates databases and keeps them in sync across cloud platforms and regions. Moving data and workloads between geographic regions and clouds is easier with a cross-cloud architecture.

Data portability simplifies regulatory compliance if your industry requires your data to remain within a specific country or region. As your business grows, you may want to locate your data processing operations in the regions you serve. Advanced replication technology makes it easy to share data across many regions and different vendor clouds without having to set up data pipelines, copy data, or resolve security discrepancies.

FIGURE 7-1:  Global data replication ensures business continuity during outages.
FIGURE 7-1: Global data replication ensures business continuity during outages.

Securing Your Data

A cloud data warehouse service must always authorize users, validate credentials, and grant users access only to data for which they are authorized. Access control must be applied to all database objects including tables, schemas, and any virtual extensions in the data warehouse. Single sign-on procedures and federated authentication make it easy for people to log into the data warehouse service directly from other sanctioned applications.

Your cloud data warehouse provider should not have access to unencrypted customer data unless you expressly grant that access. Some providers also offer dedicated virtual private networks (VPNs) and bridges from a customer's systems to the cloud data warehouse. These dedicated services ensure that the most sensitive components of your data warehouse are completely separate from those of other customers.

It's also about ensuring that your data warehouse provider can demonstrate that they have the required security procedures in place. Your cloud data storage vendor should have procedures in place to protect against accidental or intentional destruction. Security should be the foundation of a data warehouse service; you wouldn't need to do anything extra to protect your data.

FIGURE 8-1:  Verify that all data traffic is encrypted and secure, and that  your cloud providers hold all relevant certifications.
FIGURE 8-1: Verify that all data traffic is encrypted and secure, and that your cloud providers hold all relevant certifications.

Minimizing Your Data Warehouse Costs

You also shouldn't pay for database cloning within your data warehouse for development and testing activities. You should be able to consult your data multiple times, not copy it, and therefore not have to pay extra for storage. Your cloud data warehouse should also allow you to store and query structured and semi-structured data such as JSON.

Finally, look for a vendor that offers multi-cloud capabilities, as that can save future costs if you migrate your data warehouse to another cloud storage environment. Compute resources are more expensive than storage resources, so your data warehouse service should allow you to scale each resource individually and make it easy to provision exactly the compute resources you need under a usage-based pricing model. The vendor should only bill you for the resources you use (down to the second) and automatically suspend the computing resources when you stop using them to avoid unnecessary costs.

Flexible terms should also allow you to “right-size” your computing clusters according to each workload. If you need to test new machine learning modules, you can use a large cluster. Workloads won't slow down, or even grind to a halt, thanks to computer clusters dedicated to each workload.

Six Steps to Getting Started with

Align with existing skills, tools and processes: Which tools and skills will your team use for different cloud data warehouse options. How many users should be able to access the data warehouse but don't today due to resource constraints. Do you have focused expertise in data warehouse development and testing, or a DevOps team that makes it easy.

If you have a large and complex traditional data warehouse, migrate a small part of the system to make it easier to use the cloud data warehouse. Once you've determined your data warehouse needs and success criteria, you're ready to begin evaluating solutions. To increase productivity and performance, White Ops implemented a cloud data warehouse with SQL as the base language and delivered as a service.

Optimizes time to value so you can take advantage of your new data warehouse as soon as possible. If you choose a cloud data warehouse based on price, consider the TCO for a conventional data warehouse, which includes licensing costs, which are usually based on the number of users; hardware (servers, storage devices, networking); data center (office space, electricity, administration, maintenance and ongoing management); Calculating the cost of cloud storage options is usually easier, but varies by vendor's services.

Gambar

FIGURE 1-1:  Traditional systems caused the cloud data warehouse to emerge.
FIGURE 2-1:  The modern data warehouse enables all data for all users.
FIGURE 5-1:  How a cloud-optimized architecture streamlines performance.
FIGURE 6-1:  An efficient architecture for real-time data sharing.
+3

Referensi

Dokumen terkait

(2016) ‘Big Data Analytics for Singapore Public Train System’, Taiwan International Conference on Operation Research and Data Mining.. (2014) ‘Time-Series Data Mining in

It aims to develop analytical and investigative knowledge and skills using data analytics tools and techniques to evaluate and respond to opportunities for developing and

A common business concern at organizations that already have a big data analytics strategy is how to reduce the time between receiving (dirty and messy) data to grasping insights

A common business concern at organizations that already have a big data analytics strategy is how to reduce the time between receiving (dirty and messy) data to grasping insights

Chapter 3 , Loading Your Data to Redshift , will takes you through the steps of creating tables, and the steps necessary to get data loaded into the database.. Chapter 4 ,

q How SQL Server 2000 promises to play a big role in meeting these challenges, through the introduction of new features to support data transformation, OLAP systems, data

Advantages of Data Streaming : ➔ Real-Time Insights ➔ Scalability Streaming data allows for real-time processing and analysis This makes it possible to scale up the visualization

Thus in future we can also increase security by using Vital configuration that provides and ensures data is available in secured way to client by replicating many Name node by Name Node