IMPLEMENTATION OF BI SYSTEM - A framework for implementing a scalable business intelligence sys

This section outlines the BI system implementation for XYZ consulting. Each subsection follows the processes outlined in the previous chapter of the proposed framework. The segment also discusses the specific technologies that were used in the implementation of the BI system.

The framework's initial phase is to examine the company’s readiness for a BI system as well as collect all the BI system's requirements. The major aspects for determining a company's readiness are culture, individuals, management, and a strategy, all of which are described in Section 2.2.1.

XYZ consulting is comprised of a diverse group of skilled experts that are already acquainted with data analytics. In addition, XYZ consulting collects over 1 billion data points every day from numerous mining shafts. Various project teams use this data when implementing or working on various initiatives. This is a positive indication that the employees of XYZ consulting are already accustomed to using data to complete their tasks.

Furthermore, following discussions with key stakeholders at XYZ consulting, the following functional requirements for the BI system were provided:

• The BI system must be able to integrate with the data already collected by XYZ consulting.

• Users must be able to generate custom interactive reports in the BI system.

• The BI system must make it simple to embed BI reports on XYZ consulting's web application, which its clients already utilise.

After the requirements have been identified, the next step is to build a data warehouse. XYZ consulting has already set up a data warehouse to store sensor data from numerous mine sites. This section will go over the implementation of the data warehouse in great depth.

The data warehouse is made up of data points generated by sensors and monitoring equipment on several mining sites. The sensors monitor various pieces of equipment for various initiatives. Water pumps, compressed air systems, air networks, fuel meter readings, and so on, are examples. Most of this data is measured on a set interval. This yields a set of time series data points.

It is evident from the preceding statement that the data that XYZ consulting is interested in is the measurement of consumptions associated with any mining activity. As a result, this can be viewed as the grain of XYZ consulting’s data warehouse.

A framework for implementing a scalable business intelligence system 51 Each measured value has a unique set of features. Namely:

• The name of the entity being measured - e.g., Mine_A_Level_1_Water_Pump

• The interval at which this value is expected - e.g., every 30 minutes

• The unit of measurement for the value - e.g., Litres (L)

• The source of the value - e.g., CSV file

• An optional description of the value - e.g., Measures the amount of water that is consumed on Level 1 in Mine A

These features represent the dimensions of the measurements. The facts that are captured with each measurement includes:

• The value of the actual measurement

• The time of the actual measurement

As a result, the data model illustrated in Figure 3-1 is created.

Figure 3-1 High level model for XYZ Consulting’s data warehouse

MongoDB¹³ was used to implement the physical database. MongoDB is a document-oriented database. It retains data in the form of key-value pair documents. This means that fields can differ between documents and the data structure can change over time. MongoDB is good for scalability and flexibility.

The database is cloud hosted and has a replica set architecture. The MongoDB Atlas interface is used to access the database. MongoDB performs real time data analysis in the most efficient manner, making it ideal for Big Data.

Another advantage of MongoDB's scalability is that it can run across numerous servers, balancing the load and/or replicating data to keep the system operational in the event of hardware failure. Moreover, MongoDB is commonly utilised for high-volume data storage.

13 MongoDB: The application data platform. Source: https://www.mongodb.com/

A framework for implementing a scalable business intelligence system 52 As a result, the dimensions are saved as documents in a collection called 'Tags,' and the facts are stored in a collection called 'Values'.

The actual data is derived from several operational data source systems. These include XYZ consulting’s custom applications, such as manual data input systems and automated monitoring systems. As a result, diverse data formats from various source systems are collected.

This resulted in an ETL tool being developed in C# using Microsoft Visual Studio¹⁴. The ETL tool is a simple console application that scans a folder's list of files and populates the appropriate MongoDB collections.

Figure 3-2 displays the operations that occur for the data to arrive at company A’s database.

The measurement is collected from the mine system by the specified source system. The data is then sent in a file from the source system to XYZ consulting’s server, where the ETL program is running. For incoming data files, each mine group has its own mailbox folder. When a file is in any of the mailbox folders, the ETL tool is launched and the specified files in the mailbox folders are processed.

14 Visual Studio tutorials | C#. Source: https://docs.microsoft.com/en-us/visualstudio/get- started/csharp/?view=vs-2019

Collects Data Check for

Incoming Data Found

File?

Extract Data

Transform Load Data

File Valid?

Figure 3-2 ETL flow diagram

Yes

Yes No

A framework for implementing a scalable business intelligence system 53 The tag name in the file is used by the ETL tool to identify the incoming data. The values are processed after the tag is validated to exist in the database. Another validation step is performed to confirm that the values coming in are on the correct time interval and are, in fact, numbers.

After the values have been validated, they are converted to documents that correspond to the value collection structure and loaded into the value collection.

Following the completion of the data warehouse deployment, the next step was to choose a BI tool. Four BI tools were considered after conducting market research. Microsoft PowerBI¹⁵, QlikView¹⁶, Tableu¹⁷, and Google’s Looker¹⁸ were among them.

As indicated in Figure 2-7, PowerBI and the other two platforms are all industry leaders, however, the other Product platform is considered a competitor in the market, bringing in innovation. These four tools all offer cloud capabilities, security, reporting, and embedded analytics. Several factors were examined before choosing a BI tool. The first consideration was the financial cost.

Figure 3-3 depicts the various annual expenditures in Rands for each tool's start-up development package. Other expenditures, such as data warehouse fees and other functionalities that may need to be linked with the tool to make it operate, are not included in these prices.

Figure 3-3 BI platform costs

15 What is Power BI? Source: https://powerbi.microsoft.com/en-us/what-is-power-bi/

16 QlickView. Source: https://www.qlik.com/us/products/qlikview

17 Meet the world's leading analytics platform. Source: https://www.tableau.com/

18 Looker. Source: https://looker.com/

0 100 200 300 400 500 600

PowerBI QlikView Tableu Looker

RSA Rands (Thousands)

BI Tool

Dalam dokumen A framework for implementing a scalable business intelligence system (Halaman 57-60)