SQLServer is designed to work best on sets of data. By definition, sets of data are unordered; it is not until the final ORDER BY clause that the final results of the query become ordered. Windowing functions allow your query to look at only a subset of the rows being returned by your query to apply the function to. In doing so, they allow you to specify an order to your unordered data set before the final result is ordered. This allows for processes that previously required self-joins, use of inefficient inequality operators, or non-set-based row-by-row processing to use set-based processing.
As this chapter has tried to demonstrate, there are a wide variety of methods available to take XML source files and load them into SQLServer. In some cases, the choice will depend on what your objectives are—if you want to load the file “as is” without shredding the data into its component parts, then clearly OPENROWSET (BULK) could be the best solution. If, however, the source file is being used as a medium for data transfer, then you have a wider set of options available. If you are basing your ETL process around T-SQL, you could find that using SQL Server’s XQuery support is the way to go. If, on the other hand, you are more “SSIS-centric,” then the SSIS XML task can be an excellent solution in many cases. For really large source files—or where speed is of the essence—then SQLXML Bulk Loader is possibly the only viable option.
However, the Achilles' heel of this ability has been that it has only been possible to replay the trace from a single machine, thereby capping its scalability. This limits the usefulness of the tool as it cannot realistically simulate heavy, mission critical loads. In turn, this means you don't always receive meaningful results. Thankfully, this is not the case anymore. In SQLServer2012, Microsoft introduces Distributed Replay. This allows DBAs to run that load concurrently from up to 6 client machines. If you are thinking "only 16?", then how about 16 clients x 512 threads per client, or over 8,000 concurrent connections on the Enterprise Edition of SQLServer2012? This is a very useful feature indeed, not only for load testing, but also for evaluating applications before upgrading to SQLServer2012; it can also prove beneficial prior to hardware and Windows upgrades. Therefore, because you can now simulate a load coming from multiple clients instead of just one, you can have more confidence that when you upgrade or roll out, everything will work. If it does not, at least you will know before it affects your users and you can investigate without impacting those concerned.
Similar to Analysis Services multidimensional cubes, tabular models can be used to aggregate large volumes of business data. OLAP cubes and models are also much easier for users to explore and browse using self-service reporting tools. BISM tabular models are based on the same in-memory aggregation technology as Microsoft’s PowerPivot add-in for Excel. PowerPivot models can be cre- ated with the PowerPivot add-in for Excel and published to SharePoint to be shared and hosted as report data sources. Excel may be used to browse these models using pivot tables and charts. In addition to ad hoc reporting and cube browsing, standard Reporting Services reports may be used to report on cube data using a special query language called Multidimensional Expressions (MDX). Self-service BI solutions put the power of data analysis in the hands of business users. Enabling effective analysis has required information technology groups, already stressed by resource con- straints, to design enterprise BI solutions that require specialized skills and extensive planning. Recent innovations in self-service BI tools such as Microsoft PowerPivot, tabular models, and Power View have bridged this gap. Tabular semantic models can serve up and aggregate large volumes of business data for browsing and reporting, with the added beneﬁ t of being completely server-hosted and secure. The tabular model technology actually utilizes the Analysis Services storage engine. The entire model is loaded into the server memory to aggregate and return results very quickly.
The first step in any database design project is to develop a naming standard that will be used during the design process. While naming standard development is definitely not a requirement, continuing without some standard could yield an unorganized database that may present challenges to develop- ers when accessing the data . Inconsistent naming conventions often inhibit the development process indirectly . For a developer who is writing T-SQL to modify or retrieve data, naming standards provide clear paths to constructing T-SQL statements . For example, assume that you are designing a database that will store human resources data . You are asked to create a structure that houses information about individual employees, such as their name, address, phone number, and department. Assume that you have designed the database shown in Figure 5-1 .
SSIS tasks are the foundation of the Control Flow in SSIS. When you are on the Control Flow design surface in SSDT, the SSIS Toolbox is populated with a set of task components that can be snapped together to represent a workfl ow for your package. You’ll also learn later in Chapter 17 how tasks can also react to failures in the package in the Event Handler tab. A task is a discrete unit of work that can perform typical actions required by an ETL process, from moving a fi le and preparing data sources to sending e-mail confi rmations when everything is complete. This is most evident in the fact that the Data Flow is tied to the Control Flow with a specifi c Data Flow Task. More advanced tasks enable you to perform actions like executing SQL commands, sending mail, running VB or C# code, and accessing web services. The SSIS Toolbox contains a large list of out-of-the-box tasks that you can use for ETL package development. Most of the tasks are covered in this chapter, with some in less detail because they are covered in other chapters. Two exceptions are the Looping and Sequence Containers, which are covered separately in Chapter 4. This chapter introduces you to most of the tasks you’ll be using on a frequent basis and provides some examples that demonstrate how to use them. This material is reinforced as you read through the rest of the book, because each of these tasks is used in at least one further example in subsequent chapters.
Most front-end tools such as Excel use a PivotTable-like experience for querying Tabular models: Columns from different tables can be dragged onto the rows axis and columns axis of a pivot table so that the distinct values from these columns become the individual rows and columns of the pivot table, and measures display aggregated numeric values inside the table. The overall effect is some- thing like a Group By query in SQL, but the definition of how the data aggregates up is predefined inside the measures and is not necessarily specified inside the query itself. To improve the user experi- ence, it is also possible to define hierarchies on tables inside the Tabular model, which create multi- level, predefined drill paths. Perspectives can hide certain parts of a complex model, which can aid usability, and security roles can be used to deny access to specific rows of data from tables to specific users. Perspectives should not be confused with security, however; even if an object is hidden in a perspective it can still be queried, and perspectives themselves cannot be secured.
Microsoft SQLServer is a complete suite of tools that include an RDBMS system, a multidimensional OLAP and tabular database engines, as well as other services, for example a broker service, a scheduling service (SQL Agent), and many more. As discussed in Chapter 1, Introduction to Big Data and Hadoop, it has become extremely important these days to integrate data between different sources. SQLServer also offers a powerful business intelligence stack, which provides rich features for data mining and interactive reporting. One of these BI components is an extract, transform, and load (ETL) tool called SQLServer Integration Services (SSIS). SSIS offers the ability to merge structured and un-structured data by importing Hive data into SQLServer and apply powerful analytics on the integrated data. Throughout the rest of this chapter, we will get a basic understanding on how SSIS works and create a simple SSIS package to import data from Hive to SQLServer.
Disaster-recovery and business-continuity planning are important requirements for many organiza- tions, both from a customer-service and a regulatory perspective. Data loss and system disasters can negatively impact an organization or even permanently shut it down. Hence, the need for disaster- recovery sites. Regrettably, many organizations still do not have a disaster-recovery site because of the high costs and maintenance challenges. To address these challenges, SQLServer2012 supported secondary replicas on Windows Azure Virtual Machines. This meant that organizations could manage disaster recovery by building hybrid platforms using Microsoft’s cloud platform, known as Windows Azure. Organizations indicated that this opportunity addressed their disaster-recovery requirements, but it also created another issue. The configuration process for database administrators was manual and cumbersome at times. The product group responded to the feedback and has introduced the Add Azure Replica wizard in SQLServer 2014. The wizard automates and simplifies the deployment of on-premises replicas to Windows Azure. When configuring the replicas, a database administrator can choose any Windows Azure data center around the world; however, when a location is considered primarily in terms of latency and politics, the best location for the replicas is near the data center.
SQLServer 2005 is a client-server database. Typically, the SQLServer 2005 database engine is installed on a server machine to which you connect any- thing from a few machines to many hundreds or thousands of client machines. A client-server architecture can handle large amounts of data better than a desktop database such as Microsoft Access. The SQLServer instance provides security, availability, and reliability features that are absent from databases such as Access. A client-server architecture also can reduce network traffic. The server side of a SQLServer installation is used for two broad categories of data processing: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP).
Notably, the most remarkable change in the industry over the past few years has been the opportunity and need to exchange information over the Internet. Previous technologies simply don't provide the means to access application components across the Internet. Component architectures such as COM, DCOM, and CORBA were designed to communicate across secure LAN and WAN systems, which required a substantial infrastructure investment. Connecting business trading partners and even regional sites was often cost prohibitive and logistically infeasible. Few options existed for reporting over the web. At best, a list or table filled with data could be viewed in custom-built, server-side web page solutions using ASP or CGI. Each page had to be carefully designed and scripted at the cost of dozens, or sometimes hundreds of programming hours.
The development of technology makes things easier to human. As we know, the development of technology bring an impact in every life aspect, including education aspe t. Now, e a i atio for test stude t’s a ilit is still ei g used s ince a long time ago. To be able to make it more efficient and fast, Aplikasi Ujian were made. Theories used are C#, SQLServer 2008 R2, ERD, DFD, UML.
Although you can use an insert operation to prepopulate a FILESTREAM ﬁeld with a null value, an empty value, or a limited amount of inline data, a large amount of data is streamed more efﬁciently into a ﬁle that uses Win32 interfaces. Here, the Win32 interfaces work within the context of a SQLServer transaction, and you use the Pathname intrinsic function to obtain the logical Universal Naming Convention (UNC) path of the BLOB ﬁle on the ﬁle system. You then use the OpenSqlFilestream application programming interface (API) to obtain a ﬁle handle and operate on the BLOB via the ﬁle system by using the following Win32 ﬁle streaming interfaces: ReadFile, WriteFile, TransmitFile, SetFilePointer, SetEndOfFile, and FlushFileBuffers. Close the handle by using CloseHandle. Because ﬁle operations are transactional, you cannot delete or rename FILESTREAM ﬁles through the ﬁle system.
The geometry_columns and spatial_ref_sys tables store metadata about any spatial data that has been imported by OGR2OGR into a destination database. If, at some point in the future, you were to use this same data as the source of another transformation (suppose you were to export the precincts dataset from SQLServer to KML) then OGR2OGR would refer to these tables to provide additional information used in the conversion. In essence, spatial_ref_sys is OGR2OGR's own version of SQL Server's sys.spatial_reference_systems table. However, such information is not essential and can always be re-created, so you can ignore these tables for now (or even delete them if you choose, although be aware that they will be re-created the next time you import any data).
Standard Edition is another licensed production system without all the features of the Enterprise Edition, but it is built to provide ease of use and manageability. Standard Edition is run in environments where you have determined that the features provided only in the Enterprise Edition are not needed to accomplish the current and future requirements of all applications running on the server. Let’s be honest: Asking for or receiving detailed requirements from management, customers, or clients probably will not happen all the time. (You will be lucky if you can make them out through the beer stains on the napkin.) Therefore, when it comes down to determining the version that meets the bare-bones requirements you receive, you may have to go back to the requirements provider to ensure all the documentation is accurate and complete. Try asking the requirements provider a series of questions in different ways to help you determine what the real requirements are for the application. That way you will feel comfortable supporting the application on the edition of SQLServer chosen. Standard Edition is significantly cheaper than Enterprise Edition so be wary of management wanting to install Standard Edition even though the application needs Enterprise Edition features.
I’ve been working with Microsoft SQLServer since version 6.5 and was introduced to performance tuning and high-intensity database management in SQLServer 7 back in 2000. The environment at that time was a SQLServer 7 implementation clustered on a Compaq SAN and pulling in 1to 4 gigabytes (GB) per day, which was considered a great deal for a SQLServer back then. Performance tuning incorporated what appeared as voodoo to many at this time. I found great success only through the guidance of great mentors while being technically trained in a mixed platform of Oracle and SQLServer. Performance tuning was quickly becoming second nature to me. It was something I seemed to intuitively and logically comprehend the benefits and power of. Even back then, many viewed SQLServer as the database platform anyone could install and configure, yet many soon came to realize that a “database is a database,” no matter what the platform is. This meant the obvious-: the natural life of a database is growth and change. So, sooner or later, you were going to need a database administrator to manage it and tune all aspects of the complex environment.