A framework for implementing a scalable business intelligence system 53 The tag name in the file is used by the ETL tool to identify the incoming data. The values are processed after the tag is validated to exist in the database. Another validation step is performed to confirm that the values coming in are on the correct time interval and are, in fact, numbers.
After the values have been validated, they are converted to documents that correspond to the value collection structure and loaded into the value collection.
Following the completion of the data warehouse deployment, the next step was to choose a BI tool. Four BI tools were considered after conducting market research. Microsoft PowerBI15, QlikView16, Tableu17, and Google’s Looker18 were among them.
As indicated in Figure 2-7, PowerBI and the other two platforms are all industry leaders, however, the other Product platform is considered a competitor in the market, bringing in innovation. These four tools all offer cloud capabilities, security, reporting, and embedded analytics. Several factors were examined before choosing a BI tool. The first consideration was the financial cost.
Figure 3-3 depicts the various annual expenditures in Rands for each tool's start-up development package. Other expenditures, such as data warehouse fees and other functionalities that may need to be linked with the tool to make it operate, are not included in these prices.
Figure 3-3 BI platform costs
15 What is Power BI? Source: https://powerbi.microsoft.com/en-us/what-is-power-bi/
16 QlickView. Source: https://www.qlik.com/us/products/qlikview
17 Meet the world's leading analytics platform. Source: https://www.tableau.com/
18 Looker. Source: https://looker.com/
0 100 200 300 400 500 600
PowerBI QlikView Tableu Looker
RSA Rands (Thousands)
BI Tool
A framework for implementing a scalable business intelligence system 54 Another factor examined in picking the BI tool was the product's familiarity with the users at XYZ consulting, as well as their willingness to learn. Thus, of the four tools analysed, Microsoft PowerBI was chosen as the most advantageous tool for XYZ consulting due to cost as well as familiarity with the product among the personnel at XYZ consulting.
Following the selection of the BI tool, the framework's next step is to generate report templates.
This is required to optimise the report generation process. As shown in Figure 2-8, the initial step is to specify the user interactions. XYZ consulting’s several teams work on various initiatives. They would require access to the data in the Tag and Value collections to obtain data for their initiatives.
Because the tag and value collection include company-wide data, a report user will also need to supply parameters to filter down and to retrieve data related to their project. The list of parameters includes:
a) Data Folder
• Specifies the location of the development data. This is the data that the user will use to create reports.
b) Calendar Start Date
• Specifies the date from which the data will be pulled once the report is deployed, as well as the initial entry in the calendar table.
c) Group
• Specifies the client’s database from which the data will be retrieved once the report is launched. This is because each client has their own database in MongoDB.
d) List of Tag Ids
• Determines which tag's data is pulled into the report.
e) Billing Calendar
• Determines whether a custom billing date is added to the calendar.
f) Financial Month
• Specifies the first month of the fiscal year, allowing the calendar table to be changed accordingly.
The next step is to generate queries once the target tables and parameters have been defined.
The various queries will pull the required data into the report. PowerBI provides numerous techniques for generating data queries utilising M query, a mashup query language used to query a variety of data sources.
After experimenting with various techniques for importing data into PowerBI, Python data frames19 demonstrated to be the preferable alternative. This was because it readily allows the user to create reports locally on their computer without constantly pulling data from the server,
19 Python | Pandas DataFrame. Source: https://www.geeksforgeeks.org/python-pandas-dataframe/
A framework for implementing a scalable business intelligence system 55 and only when the user is ready to publish the report would the report seamlessly download data from the server by utilising the PowerBI Gateway20.
As a result, before a user can create a report, they must first install Python on their local computer, along with numerous Python packages such as pandas and matplotlib. An installation script was written to assist the user in installing the correct version of Python as well as the correct packages to optimise the installation procedure.
The various data tables, as illustrated in Figure 3-4, that are required in the report include the following:
• Tags - which contains all the Tag collection's data.
• Values - comprises all the measurement data from the Value collection.
• Rolling Calendar - which includes calendar entries such as financial year dates and, optionally, custom billing dates.
• Metadata - which provides user-added custom meta data for each tag.
Following the definition of the queries, the following step is to apply formatting rules to the reports to streamline colours, fonts, and other stylistic attributes. PowerBI allows you to customise the colour palettes of your reports in a variety of ways. The most effective method is to export the theme file, apply the formatting rules, and then import the theme file back into the report template.
20 Connect to on-premises data sources with a Power BI gateway. Source:
https://powerbi.microsoft.com/en-us/gateway/
Figure 3-4 Template data model
A framework for implementing a scalable business intelligence system 56 This method is far more involved than changing the colours in the PowerBI desktop application, but it provides far more customisation. The colour pattern of the reports was chosen to match the colour scheme of XYZ consulting’s web application, in which the reports will be embedded. The template creation process is complete after the formatting guidelines have been finalised.
A report with the file extension .pbix in PowerBI can be exported as a template with the file extension .pbit, which essentially creates a replica of the report without the actual data. Users can then open this template, enter the necessary parameters, and load the report. This will then generate a new .pbix file that will perform the queries based on the parameters and populate the data in the report. To use the template for report creation, the user must export data files from the export data system, which XYZ consulting has previously developed for local development. The data files are compressed into a zip file.
The zip file contains two folders, namely, Tags and Values, which contain csv files with the relevant data in them. A text file containing a comma separated list of tag IDs is also included.
Lastly, if the template was included in the export, a .pbit file is included as well, which the user will make use of to create a report.
Figure 3-5 Template dialog for parameters
After exporting the data and receiving the template, the user must then open the template.
Once the template is opened, the parameters dialog will appear as shown in Figure 3-5. The user then enters the necessary parameters, as discussed previously.
After entering the relevant parameters, the user clicks the Load button to load the data into the report. Once the user has given permission for the data source scripts to run, a welcome page with a glimpse of the data and a link to the Power BI tutorials website is loaded, as depicted in Figure 3-6. At this point, the user can now proceed with customising the report to meet their needs.
A framework for implementing a scalable business intelligence system 57
Figure 3-6 BI report template
A framework for implementing a scalable business intelligence system 58 Once a report has been prepared, the user will need to publish it to share it with other users both internally and externally. This leads us to the development of the BI system interface, commonly known as the BI portal. This interface's objective is to enable users to collaborate and share their reports with other users.
The BI portal was built as a web application in Microsoft Visual Studio using ASP.NET21 and C#. To access the content stored on the PowerBI service, this application makes use of the PowerBI API22 libraries.
Three interfaces were built, when developing the BI portal, which comprises of:
• An interface for publishing and reviewing new reports.
• An interface for managing existing reports.
• An interface for sharing reports with external users such as the mine employees.
PUBLISHING AND REVIEWING REPORTS
The first interface was designed to solve the urgent demand once a user completed generating a report, namely publishing, and sharing the report with other users. The process needs to be streamlined so that users were not perplexed about how to publish their reports.
As a result, a small team of business users and programmers was tasked with creating a publishing and reviewing workflow that would,
• Streamline the process of sharing reports.
• Provide criteria for what kind of reports can be published on the company website.
• Provide an audit system that allows you to quickly trace when a certain report was published.
Figure 3-7 depicts the process that was proposed. To begin, the user must request a review.
The user must supply information about the individual client for whom the report is being created, as well as the project team of which they are a member.
21 A framework for building web apps and services with .NET and C#. Source:
https://dotnet.microsoft.com/apps/aspnet
22 Power BI REST APIs. Source: https://docs.microsoft.com/en-us/rest/api/power-bi/
A framework for implementing a scalable business intelligence system 59 After providing those details, the user can proceed to upload their report. When a report is uploaded, a review request, also known as a ticket, is created.
As reviewers on this ticket, the manager in charge of the client and the project team's supervisor are linked. Their responsibility is to critically evaluate the report, propose revisions if necessary, and, once satisfied, approve the report's publication. Once the report has received both approvals, it is added to the interface for managing existing reports, and the ticket can then be closed.
Figure 3-8 depicts the user interface for opening a review ticket. The user chooses a client group from a list of options. The user then chooses a contract from the available contracts list.
Each project team oversees a certain contract.
Figure 3-8 User interface for opening a review ticket
Yes Yes
Assign the Reviewers
Reject
?
Close the Ticket Apply
Changes
Accept
? No
Open Review
Ticket No
Figure 3-7 Publish and review process
A framework for implementing a scalable business intelligence system 60 The user must then pick the workspace to which the report will be published. If the workspace is mistakenly selected, this can be updated during the review. After entering these details, the user can upload the .pbix file into the upload area and then click the submit button to begin a review request.
Figure 3-9 Example of report under review
When a ticket is opened, the reviewers are assigned automatically, and each receives an email notification. As shown in Figure 3-9, the reviewer can now see the report being examined and request modifications by leaving comments on the ticket. The ticket author can then adjust the report, re-upload it to the ticket, and tell the reviewers that the changes have been made.
MANAGING EXISTING REPORTS
Once a report is approved, it is added to the interface for managing existing reports. The objective of this interface is to collaborate and share reports with other users internally, as well as specify report settings such as who can update the report, when the data refresh interval is scheduled, and whether the report is visible to external clients.
Figure 3-10 depicts the interface for managing existing reports. The filter pane allows the user to switch between client groups. After the user has chosen a group, the list of accessible reports is displayed in a grid. The first column displays the status of the last data refresh for the report. The various colour markers are explained further below.
• Green indicates that the refresh was successful.
• Red indicates that there was an error.
• Blue indicates that the report is currently refreshing.
• Black indicates that the report has never been refreshed since it has been published.
A framework for implementing a scalable business intelligence system 61
Figure 3-10 Interface for managing existing reports
The report's name is specified in the name column. The next refresh column provides the next timestamp at which the report's data is scheduled to be refreshed. Some reports are set to 'Manual,' which means they are not set up for automatic scheduled refresh. The last refresh column specifies when the report was last refreshed.
Figure 3-11 depicts the last column, which provides different actions that can be made on the report.
Figure 3-11 Different report actions
As shown in Figure 3-11, various actions that can be performed on the reports are:
A. Link Contributors
• This enables the owner to associate several authors with the report. Once linked, these contributors can also make changes to the report.
B. Edit Report Configuration
• This allows the user to change the report's owner and set up an automated data refresh schedule. This also allows the owner to be able to delete the report.
C. Download and upload the .pbix file
• This allows people to download the report .pbix and the owner and contributors to upload a new .pbix. This will result in the creation of a new ticket.
D. Preview the report
• This allows readers to view the report without having to download it.
A framework for implementing a scalable business intelligence system 62 E. Connect the report to the client
• This enables the report's owner and contributors to connect the report to the client web application, allowing external clients to access the report.
F. Refresh the data in the report
• This enables users to manually update the data in the report.
SHARING REPORTS WITH EXTERNAL CLIENTS
Once the report is visible in the management view, it can be linked to the web application that external clients can access. This is accomplished by pressing the button marked by E in Figure 3-11.
Figure 3-12 Connecting the report to the client view
Upon clicking this button, a configuration dialog is opened in which the user will use to configure the report view on the client web application. The dialog is depicted in Figure 3-12.
The user is required to provide the main view heading. The user then specifies the name of the button that will be used to load the report, as well as the icon file that will be utilised by the button. The 'Show on MTB' checkbox then determines whether the report should be live.
Finally, the user can select 'Create new dashboard.'
A framework for implementing a scalable business intelligence system 63 Figure 3-13 depicts a client view linked to one of the reports.
Figure 3-13 Report linked to a live view for clients
When this is clicked, a new tab is generated that allows the user to link the view with specified client groups. The clients can access the report after the user selects the clients who should be linked to the view.
The next stage was to get the system ready for production after establishing the BI interface.
This requires the completion of several checks. The ETL verification was the first test using Microsoft Testing Libraries in .NET, which was the main testing framework that XYZ consultants used, several unit and integration tests were written for the ETL system.
These tests included all the expected incoming file types to ensure that any changes to the ETL system do not have a negative impact on any of the file types. The tests also ensured that,
• The different source files are read in correctly.
• The correct tables are populated.
• Each value is correctly written in as expected.
• The files are cleaned up and archived afterwards.
A framework for implementing a scalable business intelligence system 64 The ETL system's source code is version controlled and stored on GitHub23. As a result, the tests were additionally automated by constructing GitHub build pipelines24 that would execute the tests whenever a new update was introduced to the repository.
Performance testing is the next set of tests. The major performance test was to see how much data could be loaded into the PowerBI reports and how long each load would take. When in development, the Tag and Value queries read data from csv files, and when in production, they read data straight from MongoDB.
The calendar query, on the other hand, builds and calculates the calendar. Thus, there was more room for improvement on this query. To evaluate the speed of the calendar query, a one-year calendar divided into one-minute intervals was produced. This would result in 525 600 separate calendar entries.
The queries were run on a Lenovo Legion Laptop running Windows 10 with 16GB of memory and a 2.4GHz i5 CPU. The execution time of ten successive runs in Visual Studio Code for each method were recorded.
Figure 3-14 depicts the average execution time.
Figure 3-14 Average execution time, for calendar query
23 What Exactly Is GitHub Anyway? Source: https://techcrunch.com/2012/07/14/what-exactly-is- github-anyway/
24 Creating a CI/CD pipeline using GitHub Actions. Source:
https://medium.com/@michaelekpang/creating-a-ci-cd-pipeline-using-github-actions-b65bb248edfe 11,23
3,07
0,185 0
2 4 6 8 10 12
Apply Vectorisation Dask