• Tidak ada hasil yang ditemukan

A Beginner's Guide to Data Transformation in Excel

N/A
N/A
iganisphpl

Academic year: 2024

Membagikan " A Beginner's Guide to Data Transformation in Excel"

Copied!
236
0
0

Teks penuh

(1)
(2)

Table of Contents

Part 1

Power Query Basics

Power Query Editor ... 3

Overview ... 3

Updating the Source Location/Name ... 4

Deleting a Query Step ... 5

Transform Versus Add Column ... 6

Split Text by Delimiter ... 7

Automatic Data Typing ... 7

Rename a Column ... 8

Removing (Deleting) a Column ... 8

Perform Quick Mathematical Operations ... 9

Refreshing Source Data During Query Composition ... 9

Quick Insights on Data – Column Profiling, Data Quality & Distribution ... 10

Displaying a Complete List When Filtering ... 10

Displaying Column Profile Information ... 11

Displaying a Complete List for Selected Column ... 12

Column Distribution ... 12

Column Quality ... 13

Using Monospaced Fonts ... 13

Discover the Total Number of Rows ... 14

Formula Bar Applied Steps & M Code ... 15

Renaming Applied Steps ... 15

Other Applied Steps Options ... 15

Adding Documentation to Applied Steps ... 16

Power Query Formula Bar ... 16

Advanced Editor... 17

Close & Load Destinations ... 18

Working with Existing Queries... 18

Creating a Data Connection ... 19

Changing the Default Load Destination ... 19

“Saving” Your Queries ... 20

(3)

Refresh Data and Refresh Options ... 21

Refresh Options ... 21

Part 2 Important Power Query Tips & Tricks

Uploading Data from Excel ... 25

Uploading from Tables ... 25

The Query Ribbon ... 26

Useful Table Settings ... 26

Uploading from Named Ranges ... 26

Handling Changes to Source ... 27

Correcting Changes in Location or Filename ... 27

Correcting Changes to Sheet Names ... 28

Correcting Changes to Table Names ... 29

Correcting Changes to Column Heading Names ... 30

Correcting Changes Using the Advanced Editor ... 31

Understanding Data Types ... 32

The Importance of Data Types ... 32

Data Types vs. Formatting (and NULL Formatting) ... 33

Power Query Examples ... 33

Invoking Automatic Data Types ... 35

Working with NULLs ... 35

Power Query Shortcuts ... 36

Common time-saving features ... 36

Error Handling Finding & Correcting Errors in Data ... 38

Errors that should be corrected at the source level ... 38

Determining which rows contain errors ... 38

Errors to correct in Power Query ... 39

Methods to Deal With Errors in Power Query ... 40

More Data Views – Duplicate or Reference Query? ... 41

How to Duplicate a Query ... 41

How to Reference a Query ... 42

Query Dependency Visualize Query Relations ... 44

Viewing Query Dependencies Graphically ... 44

Query Management Delete, Manage, Copy Queries & Backup Results ... 45

(4)

Delete a Query ... 45

Copy a Query to Another Workbook ... 47

Export a Query for Later Use ... 47

Organizing Queries ... 48

Part 3 Helpful Power Query Transformations

Text Transformations Format, Extract & More ... 51

Transformation #1: Split Text by Delimiter ... 51

Transformation #2: Extract Employee ID from Unique Code ... 54

Transformation #3: Separate Names and Format Casing ... 57

Merge Columns – What to Watch Out For ... 63

Selecting the Columns to Be Merged ... 63

Merging Without Extraneous Spaces ... 64

Capturing the First Letter Only In the Middle Name ... 65

Standardizing Middle Initial Casing & Style ... 65

Fill & Replace Values To Create a Proper Dataset ... 66

Repeat Cells to Adjacent Cells (Fill) ... 66

Replacing Data ... 67

Standardizing Casing ... 68

Replacing NULL Values... 68

Sort Data – Including Multiple Levels ... 69

Simple Sorts ... 69

Multi-Level Sorts ... 69

Returning to an Original Sort Order ... 70

Remove Duplicates Including Multiple Columns ... 71

Remove Duplicates Based on a Single Column... 71

Remove Duplicates Based on Multiple Columns ... 71

Number Transformations What to Watch Out For ... 72

Adding a Set Value to All Values in a Column ... 72

Rounding Values in a Column (with little know feature) ... 72

Produce an Aggregation Using Two Columns ... 73

Adding Columns That Contain Nulls ... 73

Multiply a Percentage Against a Value ... 74

Preserving the Top N/Bottom N Rows Based on a Column ... 74

(5)

Working with Filter – AND & OR Conditions ... 75

Filtering by Specific Items ... 75

Filtering by Text That Exists ... 75

Creating a Compound Field Filter ... 76

Beware of Logic Errors ... 77

Filter Selection Trap ... 78

Predicting the Future ... 78

Change Type Trap (& Remove Columns Trap) ... 80

Change Type Trap – Problem... 80

Change Type Trap – Solution ... 81

Adjusting Automatic Change Type in Power Query Settings ... 81

Remove Columns Trap – Problem ... 82

Remove Columns Trap – Solution ... 82

Part 4 Powerful Power Query Transformations

Column From Example to Extract Patterns Quickly ... 85

Formulas For Free ... 87

Solving the Problem with the Traditional Approach ... 87

Conditional Columns – in Power Query ... 89

If… Then… Else Statements in Power Query ... 89

Aggregating (Grouping) Data on Multiple Levels ... 91

Launching the Group By Feature ... 91

Using the Group By Feature ... 91

Group By – Basic View ... 92

Group By – Advanced View ... 93

Group By for All Rows ... 95

Determining the Average Salary per Department ... 95

Creating Nested Tables ... 96

Calculating the Salary Deviation ... 98

Unpivot Columns – Basics ... 99

Unpivot Example ... 99

Trimming the Unpivot Fat ... 101

Unpivot Columns How to Overcome Common Errors ... 102

Common Error #1: Adding Data Post-Data Typing ... 102

(6)

Examining the Unpivot Query ... 103

Correcting the Unpivot Logic Error ... 104

Common Error #2: Dealing with NULLs in the Source Data ... 104

Potential Problem for Replacing Nulls with Zeroes ... 106

Pivot Column – Create Readable Reports Fast ... 108

Creating a Pivot Report ... 109

Split Column The Problem That’s Easy to Miss ... 111

The Problem ... 111

The Solution ... 111

The Flaw In the Logic ... 114

The Fix for the Flaw ... 114

Split Column by Delimiter Rows Instead of Columns ... 115

Splitting the Colors Across Rows ... 115

Tweaking the List ... 117

“Future-Proofing” the Query ... 118

Part 5 Date and Time Transformations

Date and Time Transformations ... 121

Making the Complex Simple ... 122

Useful Date Features ... 124

Date Calculation Examples ... 125

Useful Time Features ... 126

Time Calculation Examples ... 127

Useful Duration Features ... 128

Duration Calculation Examples ... 129

Part 6 Custom Columns

Why Use Custom Columns ... 131

Your Imagination has No Limits ... 131

Custom Columns – Type Compatibility & Date Function ... 133

Data Type Compatibility ... 133

Data Type Conversions using Functions ... 133

Date Function ... 134

(7)

Other Intrinsic Functions ... 135

Part 7 Power Query Online Data Sources

Connecting to Different Sources ... 137

The Vast World of Connectors ... 137

Import Data from a Website ... 138

Connecting to a Website ... 138

Connecting to an Embedded Excel File or Text File on a Webpage ... 139

Updating Connection Credentials ... 139

Import Data from ODATA (Open Data Protocol) ... 140

Connecting to ODATA ... 140

Get Google Sheet Data with Power Query ... 141

Obtaining the Google Sheets URL... 141

Connecting to the Google Sheets URL ... 141

Connect to Microsoft Exchange – Get Outlook Email ... 143

Extracting Content from Email Messages ... 143

Connect to SharePoint or OneDrive for Business ... 145

Extracting Content from Cloud-based Storage ... 145

Part 8 Combining & Appending Data

Why Append Data? The Difference Between Append and Merge ... 149

Appending Data ... 149

Creating the Appended Query ... 150

Merging Data ... 151

Creating the Merged Query ... 151

Part 9 Merge Options Join Kinds Explained

Overview of Merge Options Understanding Join Kinds ... 155

Advantages of Merges ... 155

Merge Options ... 155

Left vs. Right / First vs. Second ... 156

Understanding Join Kinds ... 157

Merge Based on Multiple Columns ... 158

(8)

Creating a Compound Key ... 158

Merge to Get Multiple Match Results ... 160

Dealing with One-t-Many Results ... 160

Retrieving the Match with the Latest Date ... 161

Merging Text Columns Dangers & Pitfalls ... 164

It’s the Little Things that Get Us ... 164

How to Use Fuzzy Match ... 165

When Close is Good Enough ... 165

Fuzzy Match Options ... 166

Using a Transformation Table... 166

Part 10 When to use Power Pivot & Load to Data Model

When to Load to the Data Model ... 169

Out of Sight, but Not Out of Mind ... 169

Power Pivot and the Data Model ... 169

Adding Query Results to the Data Model ... 169

When to Use the Data Model ... 170

Is Power Query Needed to Load Data into the Data Model ... 170

Part 11 Understanding M Formula Language

M Language How M Thinks (let Expression & Values) ... 173

Viewing Code in the Advanced Editor ... 173

Expressions ... 174

The let Expression ... 174

Storing the Individual Step Results ... 175

The in Expression ... 175

Comments in M Code ... 176

Step Name Syntax ... 176

“M” Characteristics ... 176

Defining & Invoking Custom M Functions ... 178

Function Syntax ... 178

Multi-Step Functions ... 180

Saving a Function ... 180

(9)

Using a Function in a Query ... 180

Updating Function Logic ... 182

Embedded Functions Within Queries ... 182

Reference Guide for Standard M Functions ... 183

Learn About Power Query Formulas ... 183

Built-In Function Documentation ... 186

Function Context ... 187

Alternate Method for Help on a Specific Function ... 188

Tables, Lists, & Records – How to Reference Them in a Table ... 189

Lists ... 189

Creating a List from a Table ... 189

Referencing Column Headers in Lists ... 189

Records ... 190

Referencing Tables, Lists, Records, and Cells ... 190

Performing a Search on a Table ... 192

Returning a Value from a Searched for Record ... 193

Returning a Table from a Table ... 193

Brackets & Lookup Operators in M Code ... 194

3 Important Power Query Rules ... 195

Creating Lists, Tables, & Records – to use as Parameters or Testing of Functions ... 196

Creating a List from Scratch ... 196

Creating a Record from Scratch ... 199

Creating a Table from Scratch ... 199

Understanding the Each Keyword and the Purpose of the Underscore ... 200

Shorthand Notation ... 200

Underscore Notation/Reference ... 200

Using Power Query Parameters ... 202

Creating Managed Parameters ... 203

Update the Query to Use the Parameter ... 204

Using the Parameter ... 204

Speed Up Queries – Table.Buffer & How to Test Impact ... 206

Why Is This Behavior Not the Default? ... 207

Testing Performance On Large Data Sets ... 207

Counting the Rows in a Data Set ... 207

(10)

Query Folding Improve Performance for Relational Databases ... 209

What is Query Folding? ... 209

Viewing Query Folding Instructions... 209

Breaking Query Folding ... 210

IF Then Statement – Lookup Operators to Get Value from Previous Row ... 211

Creating Row Against Row Comparisons ... 211

Error Handling in Power Query Bulk Replace Lookup with Try Otherwise ... 213

Testing the Waters... 213

The Purpose of Try Otherwise ... 213

Part 12 Working with Lists & Table Function

Power Query Text Functions... 217

Text.Contains, Text.Replace, etc. ... 217

Text.PositionOf ... 217

More Resources

Learn More

More Resources ... 222
(11)

Power Query Reference Book

This Power Query Reference Guide is accompanying documentation for my online Master Excel Power Query: Beginner to Advanced (including M) course on XelPlus.com/courses. Please do not reproduce or transmit in any form without permission.

We (XelPlus e.U.) have taken every effort to ensure the accuracy of this manual. In case you discover any discrepancies, please send us a quick email to [email protected].

(12)

Availability of Power Query

Power Query is natively integrated in several Microsoft products, including the following.

Microsoft Excel

Power Query enables Excel users to import data from a wide range of data sources into Excel for analytics and visualizations.

Starting with Excel 2016, Power Query capabilities are natively integrated and can be found under the “Get & Transform” section of the Data tab in the Excel Desktop ribbon.

Excel 2010 and 2013 users can also leverage Power Query by installing the Microsoft Power Query for Excel add-in.

Microsoft Power BI

Power Query enables data analysts and report authors to connect and transform data as part of creating Power BI reports using Power BI Desktop.

Microsoft SQL Server Data Tools for Visual Studio

Business Intelligence Developers can create Azure Analysis Services and SQL Server Analysis Services tabular models using SQL Server Data Tools for Visual Studio. Within this experience, users can leverage Power Query to access and reshape data as part of defining tabular models.

Common Data Service

Common Data Service lets you securely store and manage data that's used by business applications. Data within Common Data Service is stored within a set of entities. An entity is a set of records used to store data, similar to how a table stores data within a database.

Common Data Service includes a base set of standard entities that cover typical scenarios, but you can also create custom entities specific to your organization and populate them with data using Power Query. App makers can then use Power Apps to build rich applications using this data.

(13)

About Leila Gharani

Leila Gharani is a Microsoft MVP & a bestselling online course instructor. She runs XelPlus.com an spreadsheet resource site to help people gain the knowledge they need so they can create useful tools, solve problems and get more done.

Her background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert

& Project Manager. Find out more here.

(14)

The Accompanying Files

The examples shown in this manual are taken from the Excel files available inside the online course.

All files are available as a zip file.

(15)
(16)

Part 1

Power Query

Basics

(17)

P

OWER

Q

UERY

E

DITOR

Overview

You can open the Power Query editor from the Excel’s Data tab, by going to Data > Get Data > Launch Power Query Editor.

You can also automatically launch the editor by importing data into Power Query. Simply select the appropriate option from the Get & Transform Data Group on the Ribbon.

(18)

Updating the Source Location/Name

1. Open the query in Power Query

2. Update the file location/name reference by clicking the gear icon next to the Source step and edit the location/name

The advantage of the gear icon is that you are presented with a user-friendly way to browse to the new location, thus avoiding typographic errors.

NOTE: Any time you see a gear icon next to a step, it means that the step has parameters that can be adjusted.

These adjustments are usually presented in user-friendly dialog boxes, eliminating the need to write complex formulas and statements.

(19)

Deleting a Query Step

Power Query does not possess an Undo feature. If you perform an undesirable transformation on the data, click the that appears to the left of a step name when you hover your cursor over the step.

Be aware of the following behaviors when deleting a query step:

• Deleting a step cannot be undone

• Deleting a step may “break” following steps that were dependent on the deleted step Bulk Step Deletion

If you have a list of steps and you wish to delete all steps after a specific step, instead of deleting each step individually, you can save time by right-clicking the first step to delete and select Delete Until End.

This will delete all steps from that point and after in the step list.

(20)

Quicklist for Common Table Transformations

If you right-click the button located to the left of the first column header, you can gain access to the quick list for table transformations.

Many of the most common table transformation tools are in a single, easy to navigate list.

Transform Versus Add Column

The tabs labeled Transform and Add Column have many of the same features. This can be confusing for beginners as to which is the correct feature version to use.

Transform will replace the original data (column) with the transformed version of the data.

Add Column will create a new column to hold the results of the transformation.

(21)

Split Text by Delimiter

You can access the text splitting feature multiple ways:

Home (tab) Transform (group) Split Column

Transform (tab) Text Column (group) Split Column

• Right-click a column heading → Split Column

The Split Column feature has several variations by which the split will be performed. Many of these will be discussed in later sections.

The most widely used option is By Delimiter.

From here, you can select the delimiter and whether you want to split at every occurrence or just the first or last occurrence.

Automatic Data Typing

Power Query will automatically detect the data types when certain events occur:

• A row is promoted to a header row

• A column is split into multiple columns

If this step was not desired, you can click the that appears to the left of the change type step to delete the step.

If this automatic data typing feature is unwanted, you can deactivate the feature at the file-level or the program level.

(22)

File-Level Deactivation

File → Options & Settings → Query Options → Current Workbook → Data Load → Type Detection → Detect column types and headers for unstructured sources.

Program-Level Deactivation

File → Options & Settings → Query Options → Global → Data Load → Type Detection → Never detect column types and headers for unstructured sources.

Rename a Column

Columns can be renamed in several ways:

Transform (tab) Any Column (group) Rename

• Right-click a column heading and select Rename

• Double-click a column heading to enter rename mode

Removing (Deleting) a Column

Columns can be deleted (known as “vertical filtering”) several ways:

Home (tab) Manage Columns (group) Remove Columns

• Right-click a column heading and select Remove

• Click a column heading and press Delete on the keyboard

NOTE: If you have many columns to delete and only a few columns to preserve, it may be more efficient to select the columns you wish to keep and use the Remove Other Columns feature to delete the unwanted columns.

Choosing Columns to Keep and Columns to Delete

To select or deselect multiple columns, a selectable list of columns can be displayed by clicking Home (tab) Manage Columns (group) Choose Columns.

(23)

This is a superior way of removing unwanted columns, as opposed to selecting unwanted columns and deleting them.

Later, if you change your mind, you can click the gear icon and reselect a column to add it back into the data set.

PRO TIP: If you perform the same operation multiple times in a row, Power Query will consolidate all like steps into a single step in the Applied Steps list. This means, if you delete 5 columns then rename 5 columns, you will only see 2 steps in the Applied Steps list.

Conversely, if you were to delete a column, then rename a column and repeat this 4 more times, you will see 10 steps in the Applied Steps list.

Perform Quick Mathematical Operations

If you want to perform common mathematical operations such as multiplying the price by quantity or adding subtotal and tax, you can select the 2 columns of information (using the CTRL key) and select Add Column (tab) From Number (tab) Standard {operation}.

NOTE: Be mindful of the order at which columns are selected. This is important when performing operations such as division (ex: Total Quantity, versus Quantity Total).

Refreshing Source Data During Query Composition

If you believe the data may have changed since you opened the Power Query Editor, you can update the preview data by selecting Home (tab) Query (group) Refresh Preview.

(24)

Q

UICK

I

NSIGHTS ON

D

ATA

– C

OLUMN

P

ROFILING

, D

ATA

Q

UALITY

&

D

ISTRIBUTION

In an attempt to improve performance when connecting to data sources, Power Query downloads a maximum of 1,000 rows of data (if you have fewer rows, you will receive the entire data set.)

This is what is known as “the first three inches of the fire hose”. Typically, it’s enough of a representation of the data to give you a good idea of the entire data set.

This works well for most data sets, but can cause issues with certain tools like filtering, when proposed lists are incomplete because the full data set has yet to be examined.

Displaying a Complete List When Filtering

If you display a filter dropdown list and there appear to be missing entries, those entries may not exist in the first 1,000 rows of the data. Power Query will inform you that there may be items in the data that exist beyond the initial 1,000 entries.

You can force Power Query to read then entire data set for that field by clicking Load More.

This will read the fill data set and display a complete list of items for that column.

(25)

Displaying Column Profile Information

To display statistical information about a selected column, click View (tab) Data Preview (group) Column Profile.

When you select a column, statistical and distribution information is displayed for the first 1,000 rows of data.

This may provide inaccurate information regarding such things as…

• Count of rows

• Empty cells

• Min and max values

(26)

Displaying a Complete List for Selected Column

To display accurate statistics about the entire data set, force Power Query to load the entire data set by clicking in the lower-left corner where it says “Column profiling based on top 1000 rows” and change the option to “Column profile based on entire data set”.

NOTE: System performance may suffer when loading the entire data set. Consider loading all data to capture some quick insights into the data, but return the captured set to the first 1,000 to improve performance.

Column Distribution

To view the number of distinct versus unique values in each column, click View (tab) Data Preview (group) Column Distribution.

This provides a miniature column chart at the top of each column to display the relationship between distinct and unique values.

NOTE: These mini-charts are based on the setting for “first 1,000” or “all records”.

(27)

Column Quality

To display the percentage of valid data vs. errors vs. empty cells, click View (tab) Data Preview (group) Column Quality.

The column headings now display percentages of each category for the column.

NOTE: These percentages are based on the setting for “first 1,000” or “all records”.

Using Monospaced Fonts

If you wish each character to have the same width (no text kerning), select View (tab) Data Preview (group) Monospaced.

This may make the text easier to read, especially when working with columns containing large numbers.

(28)

Discover the Total Number of Rows

To quickly discover the number of rows in a data set, select Transform (tab) Table (group) Count Rows.

This will reduce the entire data set to a single value that represents the total number of rows.

This is often performed to satisfy a casual curiosity. The step is often deleted once the question has been answered.

(29)

F

ORMULA

B

AR

– A

PPLIED

S

TEPS

& M C

ODE

Power Query does an admirable job of recording all our manipulations in the Applied Steps window.

However, the names provided to the steps are sometimes less than obvious.

Renaming Applied Steps

A better way to document your steps, and make actions more understandable, is to rename the steps.

To rename a step, either click a step name and press F2 or right-click a step name and select Rename.

Other Applied Steps Options

When you right-click a step name, there are other helpful options listed in the menu, such as:

• Delete/ Delete Until End

• Move Up / Move Down

• Insert Step After

• Extract Previous (this one is VERY cool)

(30)

Adding Documentation to Applied Steps

Creating a descriptive name for a step helps make queries more understandable, but that only goes so far due to limited space.

If you need to document more detailed information about a step (i.e. justification for performing the step or mathematical formula behind the step), right-click the step and select Properties.

In the Description field, write what makes the most sense for your documentation needs.

Power Query Formula Bar

To gain greater insight into what each step is performing, we can activate the Formula Bar by selecting View (tab) Layout (group) Formula Bar.

Similar to Excel’s Formula Bar, we can see the “behind the scenes” instructions for the selected step.

These instructions are written in the Power Query ‘M Language’ (Data Mashup Language).

This is especially useful for steps that do not have the gear icon . We can see in the Formula Bar what is occurring and potentially perform a direct-edit on the instruction.

(31)

For steps that have more instructions than can be viewed on a single line of the Formula Bar, click the small down-arrow to the right of the Formula Bar to expand the height of the Formula Bar.

Advanced Editor

Although you can see each step individually in the Formula Bar, it’s difficult to remember what you saw in one step when comparing to another step.

The entire set of “M” code can be displayed by opening the Advanced Editor.

To open the Advanced Editor, select Home (tab) Query (group) Advanced Editor or by selecting View (tab) Advanced (group) Advanced Editor.

This will display the entire instruction set of “M” code for the query.

(32)

C

LOSE

& L

OAD

D

ESTINATIONS

Working with Existing Queries

When you edit an existing query, the only available option is “Close & Load” which defaults to the existing destination.

If you need to change the destination, return to Excel and right-click the query in the Queries &

Connections panel and select “Load To…

This will open the Import Data dialog box where you can select a new destination for the query output.

(33)

Creating a Data Connection

A data connection is where you have the steps defined for processing the data, but the data is not loaded into Excel in any form. The data is “waiting to be called” when needed.

Connection Only queries are good for data that will be used for an intermediate step to aid in processing, but the results are not needed for the final output.

Connection Only queries greatly reduce file size. This is because the data remains at the source and a copy is not stored locally in the Excel file.

Changing the Default Load Destination

The default load destination for Power Query is to load to an Excel Table.

If you would rather the default location be to a Connection Only query, or a table and/or the Data Model, select File (tab) Options & Settings Query Options Global Data Load Default Query Load Settings.

Note: Selecting a custom load but NOT selecting a worksheet or Data Model will establish a Connection Only query.

(34)

“Saving” Your Queries

If you are in the middle of creating a query and you need to leave the query editor, but you don’t want to discard your work, nor are you ready to load the results into Excel, select “Close & Load to…” and establish a Connection Only query.

This will allow you to return later and resume work without performing a potentially incomplete data load.

(35)

R

EFRESH

D

ATA AND

R

EFRESH

O

PTIONS

When the source data changes, the query results must be refreshed.

There are several ways of performing a refresh operation:

1. Select Data (tab) Queries & Connections (group) Refresh All.

2. Right-click on a query results table and select Refresh.

3. Right-click a query in the Queries & Connections panel and select Refresh.

4. Click the Refresh icon in the upper-right corner of the query.

Refresh Options

Each query has a set of refresh options that will allow for a certain degree of automation.

Select Data (tab) Queries & Connections (group) Properties to open the Properties dialog box.

(36)

Some useful settings include:

Enable background refresh – this allows you to continue working in Excel as the query is being refreshed. Disabling this feature is useful when the results of the query are needed for another operation (like a macro), and you don’t want the following step to run prematurely.

Refresh every N minutes – this schedules a refresh at a defined interval (in minutes). The workbook needs to be open for this option to work.

Refresh data when opening the file – self-explanatory.

(37)
(38)

Part 2

Important

Power Query

Tips & Tricks

(39)

U

PLOADING

D

ATA FROM

E

XCEL

There are multiple ways to upload data from Excel into Power Query. Two of the most used ways are from tables and named ranges.

Uploading from Tables

Excel requires that all data sources from Excel must be in the form of a proper Excel Table (when not working with Named Ranges.) If the table is not already in the form of an Excel Table, Power Query will automatically “upgrade” the plain table to an Excel Table. Excel will assign a default name, such as

“Table1”, to the table.

If you are creating reports for long-term use or reports that cull data from multiple sources, it’s a good idea to pre-convert the plain table to an Excel Table and give the table a better name before bringing the data into Power Query.

1. Select the data range and upgrade to an Excel Table. This can be done via:

CTRL-T keyboard shortcut

Home (tab) Styles (group) Format as Table

Insert (tab) Tables (group) Table

2. Give the table a proper name, like “SalesTable”, by selecting Table Design (tab) Properties (group) Table Name.

3. If you wish to retain your original table art (i.e. colors, fonts, borders, etc.), remove the automatically applied table layout by selecting Table Design (tab) Table Styles (group) bottom scroll button Clear.

Uploading the Data into Power Query

4. To send the newly upgraded data into Power Query, click in the data and select Data (tab) Get

& Transform Data (group) From Table/Range.

5.

Rename the query output to something like “SalesReport”. This will become the name of the output table on the Excel worksheet.
(40)

The Query Ribbon

Once we have loaded the data back into Excel, we have access to a new ribbon named Query.

The Query ribbon contains many useful features for working with the query as an object, such as:

• Refresh

• Edit

• Delete

• Duplicate / Reference

• Merge / Append

These features are also available by way of right-clicking on a query in the Queries & Connections panel.

Useful Table Settings

Two table settings that are worth exploring:

Table Design (tab) External Table Data (group) Properties Adjust column width.

Table Design (tab) External Table Data (group) Properties Preserve cell formatting.

Uploading from Named Ranges

If a data set has been given a name (i.e. Named Range), Power Query will leave the original data “as-is”

and not upgrade the data to an Excel Table.

The downside to this is that Named Ranges are not dynamic. If data is added to the originally defined Named Range, Power Query is unable to “see” the new data until the user manually updates the Named Range reference.

Solutions to this problem include:

• Define a range larger than your data to account for future data addition

• Create a dynamic named range by using formulas in the Name Manager to calculate the height/width of the data range

NOTE: If you are using a Named Range to define your data, select the Named Range before sending it to Power Query. This prevents Power Query from selecting what it thinks is your data, which may not match your Named Range definition. This will also prevent Power Query from upgrading your data into an Excel Table.

(41)

Handling Changes to Source

When importing data into Excel using Power Query, a connection to the source material is created. This way, when the source material changes, you can click “refresh” and the same source is read to import any additions, deletion, or changes into the report.

The connection is recorded as an absolute address; in other words, the full drive\path\filename is recorded. If any element of that address changes, the query will break and be unable to perform a refresh operation.

Other things that can “break” your query include:

• Changes to heading names

• Changes to table names

• Changes to sheet names

Correcting Changes in Location or Filename

If a file’s name or location changes and an update is performed on the Power Query output, we are presented with an error message.

To correct a query that refers to a file by location and name:

3. Open the query in Power Query

4. Click the Go To Error button to be taken to the offending step, or select the Source step in the Applied Steps list

5. Update the file location/name reference by either

• Editing the reference in the Formula Bar

• Click the gear icon next to the Source step and edit the location/name

(42)

The advantage of the gear icon is that you are presented with a user-friendly way to browse to the new location, thus avoiding typographic errors.

Correcting Changes to Sheet Names

If a sheet’s name changes and an update is performed on the Power Query output, we are presented with an error message.

To correct a query step that refers to a sheet name:

1. Open the query in Power Query

2. Click the Edit Settings button to be taken to the offending step, or select the Navigation step in the Applied Steps list

3. Update the sheet name reference by either

• Editing the sheet name reference in the Formula Bar

• Click the gear icon next to the Navigation step and point to the new sheet name

(43)

NOTE: Remember that Power Query is case-sensitive. Ensure you type the names as they appear in the source material.

Correcting Changes to Table Names

If a table’s name changes and an update is performed on the Power Query output, we are presented with an error message.

To correct a query step that refers to a table name:

1. Open the query in Power Query

2. Click the Go To Error button to be taken to the offending step, or select the Source step in the Applied Steps list

3. Update the incorrect table name to the new table name by editing the reference in the Formula Bar

(44)

Correcting Changes to Column Heading Names

If a table’s heading name changes and an update is performed on the Power Query output, we are presented with an error message.

To correct a query step that refers to a heading name:

4. Open the query in Power Query

We are presented with the following error message.

Unfortunately, we do not have any buttons to lead us to the offending step, nor any gear icon to open a user-friendly correction window. This is why having the Formula Bar visible is so important.

5. Update the entry in the Formula Bar to display the correct column heading name.

NOTE: Remember that Power Query is case-sensitive. Ensure you type the names as they appear in the source material.

(45)

Correcting Changes Using the Advanced Editor

If you have many changes to perform and you feel confident in your typing accuracy, you can open the Advanced editor and perform many of the updates in a single operation.

Open the Advanced Editor in Power Query by selecting Home (tab) Query (group) Advanced Editor.

Make the necessary changes in the editor and click when finished.

IMPORTANT: Remember that Power Query does not posses the Undo feature. Consider creating a backup of the file/query before editing. This way, if you make a mistake during the direct editing of the query, you can restore your query to its original state.

(46)

U

NDERSTANDING

D

ATA

T

YPES

The Importance of Data Types

It is very important to define your fields (columns) with the proper data types. This ensures the following:

• Data can be properly interpreted/understood

• Data can be manipulated properly

• Data cannot be manipulated improperly

• Proper level if numeric precision is maintained Data Type Description and Use

Decimal Number Maximum 15 digits regardless of decimal placement.

Currency

4 digit to the right of decimal; maximum 19 digits regardless of decimal placement.

Anything after the 4th decimal place is rounded.

Whole Number Integer value (no decimal precision). Maximum 19 digits. Any fractions are rounded.

Percentage Displayed as percentages in Power Query; displayed as decimals in a workbook.

Date/Time Date & Time in a single column. Stored as a decimal type.

Date Dates from 1900 to 9999 are supported. Stored as whole numbers.

Time Time only. Stored as decimal type.

Date/Time/Timezone UTC Date/Time

Duration Length of time shown as days, hours, minutes, and seconds. Stored as decimal type.

Text Text and numbers. Maximum of 268K characters.

True/False Boolean value.

Binary A sequence of bytes (e.g. when loading files from a folder).

Using Locale… Important if importing data from sources that have different regional settings.

(47)

D

ATA

T

YPES VS

. F

ORMATTING

(

AND

NULL F

ORMATTING

)

The fundamental principles behind storing versus viewing numbers in Excel and Power Query are:

In Excel

• numbers are calculated as stored, not as displayed

• the formatting of numbers does not affect the precision of numbers In Power Query

• formatting numbers changes the stored value

• once a number is formatted to a lesser accuracy, the original level of accuracy cannot be restored without deleting prior query steps

Excel Examples

If we take a value in Excel that has 2-decimal place precision and we format the value as a whole number, the number displayed will be rounded but the underlying value remains unchanged.

Power Query Examples

Reducing Decimal Precision

If we were to load into Power Query a column of values with 2-decimal place precision and we then set the data type to Whole Number, the fractional side of the number is discarded and the whole number side is rounded.

(48)

If we examine the number more closely, we can see that the change in data type has altered the value.

This precision cannot be restored once the alteration has taken place. It would be necessary to delete the step that applied the change to regain the original precision.

Increasing Decimal Precision

If you import whole numbers into Power Query and later add data that has fractional precision, you will lose the fractions.

This is because the original numbers are data typed as Whole Numbers. Future values will be converted to Whole Numbers, thus losing their fractional precision.

If you believe that you may encounter fractions in your data, it’s a good idea to set the data type to Decimal as a way of “future-proofing” your query.

Currency-based Values

When values are data typed as Currency in Power Query, the results are displayed in the table with 2- decimal place precision, but the underlying values are stored and calculated at 4-decimal place precision.

(49)

Invoking Automatic Data Types

If you have removed the originally established data types and want to let Power Query “figure it out”, select a column, separate columns using CTRL, or all columns using CTRL-A, and click Transform (tab) Any Column (group) Detect Data Type.

Working with NULLs

If a cell is empty when brought into Power Query, the corresponding cell will be displayed with a NULL indicator.

This is an indication that the cell is truly empty; no spaces or hidden characters

(50)

P

OWER

Q

UERY

S

HORTCUTS

As anyone who has worked with Excel, or any other application will tell you, learning time-saving shortcuts is one of the fastest ways to shorten the duration of your workflow.

Below are some of the more useful Power Query shortcuts. This is by no means a comprehensive list, but it’s a great place to start for reducing work.

Shortcut Task

F2 Edit the name of a column or query

Arrow keys (L & R) Navigate left or right through columns

CTRL key Select multiple, non-contiguous columns

Shift key Select contiguous columns

CTRL-A Select ALL columns

CTRL-Space Select the entire column of a selected cell ALT (while opening Excel) Open a second, unrelated instance of Excel

Common time-saving features

Selecting / Deselecting Columns

To select or deselect multiple columns, a selectable list of columns can be displayed by clicking Home (tab) Manage Columns (group) Choose Columns.

(51)

This is a superior way of removing unwanted columns, as opposed to selecting unwanted columns and deleting them.

Later, if you change your mind, you can click the gear icon and reselect a column to add it back into the data set.

Detecting Data Types

It’s not uncommon to delete the automatically applied type detection step from the Applied Steps list in Power Query.

When it comes time to perform data type detection, you can select the column(s) you want to data type and click Transform (tab) Any Column (group) Detect Data Type.

(52)

E

RROR

H

ANDLING

– F

INDING

& C

ORRECTING

E

RRORS IN

D

ATA We’ve seen how to handle common query-wide errors, such as:

• when the source files location or name changes

• when the sheet name changes

• when the table name changes

Now we will demonstrate common techniques for handling errors within columns of data.

Errors that should be corrected at the source level

Certain errors should be corrected in the data before loading into Power Query. Common issues include:

• A mixture of date formats from differing regions

• Column name inconsistencies when appending multiple files in a folder

Determining which rows contain errors

If your query returns a message stating that the results contain errors, click the error count to generate a detailed list of data rows that contain errors.

Power Query creates a new query that extracts the complete record for all rows of the results query that contain errors. Each row is prefaced with a row number indicating the record position in the source data.

(53)

If you click to the right of the error message, Power Query displays a detailed explanation of the error.

Errors to correct in Power Query

Many errors occur in Power Query due to an incorrectly applied data type.

In other words, if a column contains sales and certain records contain messages such as “No Sale”.

When the column has a Currency data type applied to it, the cells holding text will display an error due to a data type mismatch.

(54)

Methods to Deal With Errors in Power Query

Remove Rows with Errors

If the rows containing errors in each column are not needed, you can delete the error rows by selecting the column containing errors and click Home (tab) Reduce Rows (group) Remove Rows Remove Errors.

Replace Errors with Meaningful Data

You can replace the errors with meaningful data, such as a default date, zeroes, nulls, etc.

Select the column containing errors and click Transform (tab) Any Column (group) Replace Values

Replace Errors.

In the Replace Errors dialog box, enter the value you wish to replace the errors with, such as 0 (zero).

The results are as follows.

(55)

M

ORE

D

ATA

V

IEWS

– D

UPLICATE OR

R

EFERENCE

Q

UERY

?

When working with Power Query, you are likely to encounter two familiar scenarios:

1. Creating multiple query outputs based on the same data set.

2. Large queries that you wish to break down into sections for ease of understanding, maintenance, and reusability.

Duplicating a Query

Creates a second copy of your existing query where the actions and results are independent of the original query. This is like copying a file. The two queries have no connection between them; altering one does not affect the other.

Referencing a Query

Creates a new query that is dependent on the original query. The output of the original query becomes the input for the dependent query.

How to Duplicate a Query

There are several ways you can duplicate a query, depending on where you are at the moment.

If you are in Excel:

• From the Queries & Connections panel, right-click a query and select Duplicate.

• Select a query from the Queries & Connections panel, then select Query (tab) Reuse (group)

Duplicate.

(56)

If you are in Power Query:

• From the Queries list, right-click a query and select Duplicate.

• Select a query from the Queries list, then select Home (tab) Query (group) Manage Duplicate.

How to Reference a Query

There are several ways you can reference a query, depending on where you are at the moment.

If you are in Excel:

• From the Queries & Connections panel, right-click a query and select Reference.

(57)

• Select a query from the Queries & Connections panel, then select Query (tab) Reuse (group)

Reference.

If you are in Power Query:

• From the Queries list, right-click a query and select Reference.

• Select a query from the Queries list, then select Home (tab) Query (group) Manage Reference.

(58)

Q

UERY

D

EPENDENCY

– V

ISUALIZE

Q

UERY

R

ELATIONS

Viewing Query Dependencies Graphically

As seen in the previous section, queries can reference other queries.

When a query uses the output of another query as its input, a dependency is created. The input query is dependent on the output query.

A query that has dependent queries cannot be deleted without first deleting the dependent queries.

If the output query were to be deleted, the dependent queries would no longer have a source with which to draw data.

To visualize these dependencies in Power Query, select View (tab) Dependencies (group) Query Dependencies.

The following Query Dependencies dialog box will graphically display the queries, the relationship hierarchy, and their load destinations.

(59)

Q

UERY

M

ANAGEMENT

– D

ELETE

, M

ANAGE

, C

OPY

Q

UERIES

& B

ACKUP

R

ESULTS

Delete a Query

If you are in Excel:

• From the Queries & Connections panel, right-click a query and select Delete or press the Delete key on the keyboard.

• Select a query from the Queries & Connections panel, then select Query (tab) Edit (group) Delete.

If you are in Power Query:

• From the Queries list, right-click a query and select Delete.

(60)

• Select a query from the Queries list, then select Home (tab) Query (group) Manage Delete.

NOTE: If you delete a query, the output of the query remains in the workbook.

Deleting All Queries at Once

Suppose you want to send the query output results to a client but you don’t want them to see how the data was collected and processed. If you have several queries and you wish to delete all queries in a workbook, perform the following steps:

1. Consider saving the file under a different name. This way you will retain a version of the file with the queries.

2. In the version of the file to clean, select File (tab) Info Check for Issues Inspect Document.

3. In the Document Inspector dialog box, click Inspect.

4. Scroll down to the category labeled Custom XML Data and click Remove All.

5. Click Close.

All the queries will have been removed while leaving the query results intact.

(61)

Copy a Query to Another Workbook

When you want to copy an entire query to another workbook, it’s a simple matter of Copy/Paste.

1. From the Queries & Connections panel, right-click a query and select Copy.

2. Start a new workbook or open an existing workbook.

3. Open the queries list by selecting Data (tab) Queries & Connections (group) Queries &

Connections.

4. From the Queries & Connections panel, right-click in an empty part of the panel and select Paste.

Export a Query for Later Use

You can export a query to a file that you can import at a later date, or send it to someone for them to import into their workbook.

1. From the Queries & Connections panel, right-click a query and select Export Connection File.

2. Save the query as an .ODC (Open Database Connection) file.

NOTE: The .ODC file is an eXtensible Markup Language (XML) file that can be viewed in any text editor.

3. Start a new workbook or open an existing workbook.

4. Select Data (tab) Get & Transform Data (group) Existing Connections.

(62)

5. In the Existing Connections dialog box, select the applicable query from the list.

NOTE: If the query is being transferred to a different computer, you can click “Browse for more…” and manually locate the .ODC file.

Organizing Queries

If you are working in a workbook with many queries (perhaps dozens), you can organize your queries into Query Groups based on similar purpose, function, or phase.

Create a Query Group

1. From the Queries & Connections panel, click a query then hold the CTRL key and select one or more other queries.

2. Right-click one of the selected queries and click Move to Group.

3. You can select one of the existing groups or create a new group by selecting “New Group…” If you are creating a new group, give the group a name and consider adding in the Description field some explanation as to the purpose of the queries in this group.

(63)

The results can be expanded or collapsed to create a cleaner appearance to the queries list.

Remove a Query Group

To remove a query group, right-click the query group and click Ungroup.

All ungrouped queries are placed in the “Other Queries” group.

(64)

Part 3

Helpful

Power Query

Transformations

(65)

T

EXT

T

RANSFORMATIONS

– F

ORMAT

, E

XTRACT

& M

ORE

Transformation #1: Split Text by Delimiter

Our first transformation is to separate the Department and the Position into 2 columns.

Because the Department and Position are separated by “space – forward slash – space” characters, we can leverage these characters as a delimiter to assist in the separation process.

1. Click in the table and select Data (tab) -> Get & Transform (group) -> From Table/Range.

2. Rename the transformation to “ProperData”.

3. Select the Department/Position column and click Home (tab) -> Transform (group) -> Split Column -> By Delimiter.

(66)

4. In the Split Column by Delimiter dialog box, select Custom and use a forward slash (/) as the delimiter.

5. Click OK.

NOTES:

• Once the split has been performed, another “Changed Type” step is added. This is to reassess the newly split data for possible new data types. We could delete this step as it is not creating anything we didn’t already possess.

• Technically, we could have used a “ / “ (space, forward slash, space) as the custom delimiter.

This would have allowed us to eliminate the upcoming “Trim” step.

Removing the Leading & Trailing Spaces

If you select to the right of one of the Department names, you will be presented with a preview window at the bottom on the screen.

If you then click to the right of the data in the preview window you will notice that the cursor does not rest directly next to the last letter; there is a trailing space.

(67)

We can also discover that there is a leading space before the Position in the second column.

Trimming the Data

To remove the trailing and leading spaces, select the Department column, hold CTRL and select the Position column. Select Transform (tab) -> Text Column (group) -> Format -> Trim.

I prefer using the “space – forward slash – space” approach as it reduces the query by a step.

NOTE: We will deal with the column headings at the end. For now, we can leave them a bit messy.

(68)

Transformation #2: Extract Employee ID from Unique Code

Our next step involves the extraction of the Employee ID from within the Unique Code column. Notice that the format is “last name – employee ID – first name”.

Select the Unique Code column and click Transform (tab) -> Text Column (group) -> Split Column -> By Non-Digit to Digit.

This separates the Last Name from the Employee ID/First Name.

(69)

Next, select the column with Employee ID and Last Name and click Transform (tab) -> Text Column (group) -> Split Column -> By Digit to Non-Digit.

This separates the Employee ID from the First Name.

Getting rid of the unneeded bits

As we only need the Employee ID, we will remove the newly separated First Name and Last Name columns.

Click the column of Last Names, then press CTRL and click the column of First Names.

Press the Delete key to remove the selected columns.

Formatting the Employee ID

Remember, one of the requirements was to format the Employee IDs with an “E-“ prefix.

(70)

Select the column that contains Employee IDs and click Transform (tab) -> Text Column (group) ->

Format -> Add Prefix.

In the Prefix dialog box, enter a Value of “E-“ and click OK.

We now have our properly formatted Employee IDs.

(71)

Transformation #3: Separate Names and Format Casing

Our final set of transformations is to separate the contents of the Full Name column into a First Name column and Last Name column whilst ignoring any middle name information.

There is a wealth of options sitting withing the Extract feature in Power Query.

We don’t want to transform what we have into a new set of data; we want to leave the original Full Name column while adding additional columns for First Name and Last Name.

For this operation, we will use the version of Extract located on the Add Column ribbon.

Extracting First Names

Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text Before Delimiter.

(72)

In the Text Before Delimiter dialog box, enter a space in the Delimiter field and click OK.

We are presented with a new column that contains all text up to the first space in the Full Name text.

Extracting Last Names

Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text After Delimiter.

(73)

In the Text After Delimiter dialog box, enter a space in the Delimiter field. Expand the Advanced Options and set the Scan for the Delimiter option to “From the end of the input” and click OK.

Starting from the end and “looking” backward is necessary because of the existence of middle names in some of our records.

If we began our search for a space from left-to-right, we would stop before a middle name and extract the middle and last names.

We are presented with a new column that contains all text after the last space in the Full Name text.

(74)

Format the Names with Proper Casing

The next step is to format the newly added First Name and Last Name columns so that the first letter is upper-case while the remaining letters are lower-case.

Select the First Name and Last Name columns then click Transform (tab) -> Text Column (group) ->

Format -> Capitalize Each Word.

We now have our properly formatted names.

(75)

Finishing Touches

• We no longer require the original column of Full Names, so we can select the Full Name column and press Delete.

• Rename the column headings with more meaningful names.

Sending the Results to Excel

To send the transformation results back to Excel as a finished table, select Home (tab) Close (group)

Close & Load (lower part of the button) Close & Load to…

You can load the results to a table on a new sheet, or an existing sheet.

(76)

The results are as follows.

All three requirements have been satisfied and we didn’t have to write a single formula to get the job done.

(77)

M

ERGE

C

OLUMNS

– W

HAT TO

W

ATCH

O

UT

F

OR

Two very important first steps will govern your success with the Merge Columns feature:

• The order by which you select the columns to be merged.

• The tab you invoke the Merge Columns feature from; Transform or Add Column.

Selecting the Columns to Be Merged

The order in which you select the columns determines the merge order.

This is especially useful because if your data is not in the proper order, you are not required to establish the order before the merge. You can select the columns based on the order you want the final result, then perform the merge. The Merge Columns feature will accomplish two tasks in a single step.

Use the CTRL key to select multiple columns.

Launch the Merge Columns feature in one of the following manners:

• To replace the existing content, select Transform (tab) Text Column (group) Merge Columns or right-click the selected columns and select Merge Columns.

• To create additional content, select Add Column (tab) Text Column (group) Merge Columns.

(78)

In the Merge Columns dialog box, select the separator (delimiter) you wish to use to separate each column’s data within the result.

In the New Column Name field, type the name you wish to use for the heating of the new column.

Potential Issue

Because we were including middle names in the above example, names that do not contain a middle name are padded with two spaces between the first and last name.

Merging Without Extraneous Spaces

The key to getting the proper result (names without extra spaces) is to invoke the Merge Columns step from the Add Columns tab. This will produce a merged result without extra spaces. Worst Case Scenario, you may need to add an extra step to delete the original columns if you were intending to replace them.

(79)

Capturing the First Letter Only In the Middle Name

Suppose you only wanted to keep the first letter of the middle name. Before the merge operation:

1. Select the Middle Name column.

2. Select Transform (tab) Text Column (group) Extract First Characters.

3. In the Extract First Characters dialog box, enter a 1 in the Count field and click OK.

Standardizing Middle Initial Casing & Style

If you want to “future-proof” your query to account for times when data may arrive in lower-case formatting, perform the following step:

1. Select the Middle Name column.

2. Select Transform (tab) Text Column (group) Format UPPERCASE.

4. If you wish to have a period following the Middle Name, select the Middle Name column.

5. Select Transform (tab) Text Column (group) Format Add Suffix.

6. In the Suffix dialog box, enter a “.” (period – no quotes) in the Value field and click OK

Gambar

table and “right” table.

Referensi

Dokumen terkait

Analisis konversi data antar sistem pangkalan data Microsoft Excel dan sistem pangkalan data MySQL pada Departemen Ilmu Komputer Universitas Sumatera Utara

Power Query allows importing a subset of the data into the Query Editor, so that the data can be shaped by removing unneeded rows or columns or creating calculated columns

Mahasiswa tingkat akhir dalam mengolah data hasil penelitian nya selain menggunakan perhitungan manual dengan kalkulator scientific juga menggunakan aplikasi microsoft excel

Mahasiswa tingkat akhir dalam mengolah data hasil penelitian nya selain menggunakan perhitungan manual dengan kalkulator scientific juga menggunakan aplikasi microsoft excel

Analisis yang dilakukan ialah cara membangkitkan data secara random pada aplikasi Microsoft Excel, cara mencari nilai distribusi sampling rata-rata pada data yang telah dibangkitkan

This document provides a practical guide to high-speed printed circuit board layout, covering topics such as schematic design, component placement, power supply bypassing, parasitics, ground and power planes, packaging, RF signal routing and shielding, and checking the

The document is a user's guide for Himawari-8/9 satellite data, providing information on observation areas, map projection method, file naming convention, and block

The document provides data on the IEEE 5-Bus and 14-Bus electrical power