The goal of high availability is to provide an uninterrupted user experience with zero data loss, but high availability has many different meanings, depending on who you ask. According to Microsoft’s SQLServer Books Online, “a high-availability solution masks the effects of a hardware or software failure and maintains the availability of applications so that the perceived downtime for users is minimized.” (For more information, see http://msdn.microsoft.com/en-us/library/bb522583.aspx.) Many times users will say they need 100% availability, but what exactly does that mean? Does being 100% available mean data is 100% available during business hours, Monday through Friday, or data is available 24 hours a day 7days a week? High availability is about setting expectations and then living up to them. That’s why one of the most important things to do when dealing with high availability is to define those expectations in a Service Level Agreement (SLA) agreed on and signed by all parties involved.
Issues of invalid data generally only occur when you are creating geometry or geography from raw data imported into SQLServer2012, or created programmatically through one of the builder classes. SQLServer will never return invalid geometries from one of its instance methods, so once you've got a set of clean valid data in your database, you can generally use it without too many worries as to issues of invalidity. Sometimes, problems apparently caused by the requirement of validity are actually due to the fact that the wrong geometry type has been selected to represent an item of data. The most common example of this I see is when people use LineString geometries to represent routes using data gathered from GPS devices. The problem in such cases is that, as discussed previously, if any segment of a LineString geometry retraces itself, the LineString will be considered invalid.
The ﬁ rst generation of self-service reporting in SSRS was a step toward the robust capabilities in the current product. Report Builder 1.0 was a basic tool introduced with SSRS 2005 that produced a simple but proprietary report with limited capabilities. It was a great tool for its time that allowed users to simply drag and drop data entities and ﬁ elds from a semantic data model to produce simple reports. Today, the latest version of Report Builder creates reports that are entirely cross-compatible with SSDT and that can be enhanced with advanced features. Consider Report Builder 1.0 yester- day’s news. If you’re using it now, I strongly suggest making the transition to the newer tool set. The 2008 product version introduced Report Builder 2.0, a tool that is equally useful for business users and technical professionals. For user-focused designers, Report Builder 2.0 was simple and elegant. Incremental product improvements over the past few versions have made out-of-the-box report design even easier in Report Builder. Users can design their own queries or simply use data source and dataset objects that have been prepared for them by corporate IT so that they can drag and drop items or use simple design wizards to produce reports. In Report Builder, each report is managed as a single document that can be deployed directly to a folder on the report server or in the SharePoint document library. The version number has been dropped from the Report Builder name; now it is simply differentiated from previous versions by the version of SQLServer that installs it. Figure 1-5 shows the current version of Report Builder (installed with SQLServer2012) with a map report in design view.
Data management needs have evolved from traditional relational storage to both relational and non-relational storage and a modern information management platform needs to support all types of data. To deliver insight on any data, you need a platform that provides a complete set of capabilities for data management across relational, non-relational, and streaming data while being able to seamlessly move data from one type to another and being able to monitor and manage all your data regardless of the type of data or data structure it is. Apache Hadoop is the widely accepted Big Data tool, similarly, when it comes to RDBMS, SQLServer2012 is perhaps the most powerful, in-memory and dynamic data storage and management system. This book enables the reader to bridge the gap between Hadoop and SQLServer, in other words, between the non-relational and relational data management worlds. The book specifically focusses on the data integration and visualization solutions that are available with the rich Business Intelligence suite of SQLServer and their seamless communication with Apache Hadoop and Hive.
If you have done some work in the world of extract, transfer, and load (ETL) processes, then you’ve run into the proverbial crossroads of handling bad data. The test data is staged, but all attempts to retrieve a foreign key from a dimension table result in no matches for a number of rows. This is the crossroads of bad data. At this point, you have a fi nite set of options. You could create a set of hand-coded complex lookup functions using SQL Sound-Ex, full-text searching, or distance- based word calculation formulas. This strategy is time-consuming to create and test, complicated to implement, and dependent on a given language, and it isn’t always consistent or reusable (not to mention that everyone after you will be scared to alter the code for fear of breaking it). You could just give up and divert the row for manual processing by subject matter experts (that’s a way to make some new friends). You could just add the new data to the lookup tables and retrieve the new keys. If you just add the data, the foreign key retrieval issue is solved, but you could be adding an entry into the dimension table that skews data-mining results downstream. This is what we like to call a lazy-add. This is a descriptive, not a technical, term. A lazy-add would import a misspelled job title like “prasedent” into the dimension table when there is already an entry of “president.” It was added, but it was lazy.
Microsoft SQLServer is a complete suite of tools that include a relational database management system (RDBMS), multidimensional online analytical processing (OLAP) and tabular database engines, a brokering service, a scheduling service (SQL Agent), and many other features. As discussed in Chapter 1, it has become extremely important these days to integrate data between different sources. The advantage that SQLServer brings is that it offers a powerful Business Intelligence (BI) stack, which provides rich features for data mining and interactive reporting. One of these BI components is an Extract, Transform, and Load (ETL) tool called SQLServerIntegrationServices (SSIS). ETL is a process to extract data, mostly from different types of systems, transform it into a structure that’s more appropriate for reporting and analysis and finally load it into the database. SSIS, as an ETL tool offers the ability to merge structured and unstructured data by importing Hive data into SQLServer and apply powerful analytics on the integrated data. Throughout the rest of this chapter, you will get a basic lesson on how SSIS works and create a simple SSIS package to import data from Hive to SQLServer.
In SQLServer 2008 R2, Microsoft invested heavily in Reporting Services. Compared to previous versions, reports were easier for end users to produce and richer to look at. Shared datasets were introduced, as was the report part gallery, both of which reduced the effort required to create a report through re-use of existing objects. In addition, maps, gauges, spark-lines, data bars and KPIs were introduced to make Reporting Services a much more competitive and visually attractive reporting tool. In this chapter, we will start by looking at the features that have been deprecated and then explore the landscape that includes Power View and SharePoint. You will find about the exciting new Data Alerts and how your users will benefit. Finally, there is good news for those of you who render reports into Excel or Word format, as there has been improvement here too. So without further ado, let's get started.
Most front-end tools such as Excel use a PivotTable-like experience for querying Tabular models: Columns from different tables can be dragged onto the rows axis and columns axis of a pivot table so that the distinct values from these columns become the individual rows and columns of the pivot table, and measures display aggregated numeric values inside the table. The overall effect is some- thing like a Group By query in SQL, but the definition of how the data aggregates up is predefined inside the measures and is not necessarily specified inside the query itself. To improve the user experi- ence, it is also possible to define hierarchies on tables inside the Tabular model, which create multi- level, predefined drill paths. Perspectives can hide certain parts of a complex model, which can aid usability, and security roles can be used to deny access to specific rows of data from tables to specific users. Perspectives should not be confused with security, however; even if an object is hidden in a perspective it can still be queried, and perspectives themselves cannot be secured.
I’ve been working with Microsoft SQLServer since version 6.5 and was introduced to performance tuning and high-intensity database management in SQLServer 7 back in 2000. The environment at that time was a SQLServer 7 implementation clustered on a Compaq SAN and pulling in 1to 4 gigabytes (GB) per day, which was considered a great deal for a SQLServer back then. Performance tuning incorporated what appeared as voodoo to many at this time. I found great success only through the guidance of great mentors while being technically trained in a mixed platform of Oracle and SQLServer. Performance tuning was quickly becoming second nature to me. It was something I seemed to intuitively and logically comprehend the benefits and power of. Even back then, many viewed SQLServer as the database platform anyone could install and configure, yet many soon came to realize that a “database is a database,” no matter what the platform is. This meant the obvious-: the natural life of a database is growth and change. So, sooner or later, you were going to need a database administrator to manage it and tune all aspects of the complex environment.
Microsoft SQLServer2012 is a vast subject. One part of the ecosystem of this powerful and comprehensive database which has evolved considerably over many years is data integration – or ETL if you want to use another virtually synonymous term. Long gone are the days when BCP was the only available tool to load or export data. Even DTS is now a distant memory. Today the user is spoilt for choice when it comes to the plethora of tools and options available to get data into and out of the Microsoft RDBMS. This book is an attempt to shed some light on many of the ways in which data can be both loaded into SQLServer and sent from it into the outside world. I also try to give some ideas as to which techniques are the most appropriate to use when faced with various different challenges and situations.
For example, if you are new to administering a SQLServer environment, SSIS provides you with the tools needed to perform several administrative tasks, including rebuilding indexes, updating statistics, and backing up databases, which make up the primary list of maintenance items that should be per- formed on any database . Without SSIS, as a new administrator you could spend a lot of time writing T-SQL just to get these activities running on a regular basis. But this is not the extent of the capabili- ties of SSIS for administrators . How often are you asked for an export of data to Microsoft Excel or to move data from one server to another? Using SSIS, you can quickly export or import data from various sources, including Excel, text files, Oracle, and DB2.
This book is designed to be the one resource you turn to whenever you have questions about SQLServer administration. To this end, the book zeroes in on daily administration procedures, frequently used tasks, documented examples, and options that are representative while not necessarily inclusive. One of the key goals is to keep content concise enough that the book is compact and easy to navigate, while also ensuring that the book contains as much information as possible. Instead of a 1,000-page tome or a 100-page quick reference, you get a valuable resource guide that can help you quickly and easily perform common tasks, solve problems, and implement advanced SQLServer technologies such as replication, distributed queries, and multiserver administration.
Building a platform is very different from building a solution. In fact the goals are in many cases completely opposed. A platform is successful if the developers and administrators have complete access to all aspects of the product. They need to be able to optimize, extend, restrict, embed, and replace parts of the product to meet their needs. This means that all of the APIs are available and documented, all formats are open and described, and every component is configurable or replaceable. While there are always restrictions due to the many tradeoffs in software design, this was the goal when building Reporting Services. Very much like Windows, SQLServer, or Visual Studio, Reporting Services is designed to enable developers to build on a solid foundation and mold it to meet the business needs in significantly less time and with more functionality, but without losing the flexibility and power of building it themselves.
Multilateral agreement for trade in goods has existed since 1947 through an agreement known as the General Agreement on Tariffs and Trade (GATT) and spurred the growth of international trade throughout the world. It was only after almost half a century later, trade in services were integrated into the multilateral trading system through the General Agreement on Trade in Services (GATS) signed in 1994 and come into force on 1 January 1995. GATS was negotiated and concluded under the Uruguay Round of multilateral trade negotiations which also resulted in the establishment of World Trade Organisation (WTO). GATS lay the framework for international obligations and disciplines on regulating trade in services. It binds the commitment of the WTO members to a certain degree of market opening in various services sectors and subsectors, as stipulated in their respective so-called schedules of commitments. It also defines standards of transparency (such as the obligation for WTO members to publish all measures falling under the agreement) and several other disciplines on good governance for the services sectors.
The functions OPENROWSET and OPENDATASOURCE are most commonly used to pull data into SQLServer to be manipulated. They can however also be used to push data to a remote SQLServer. OPENROWSET can be used to not only execute SELECT statements, but also to execute UPDATE, INSERT, and DELETE statements on external data sources. Performing data manipulation on remote data sources is less common and only works if the OLEDB provider supports this functionality. The SQLOLEDB provider support all these statements.