Table of Contents
Part 1
Power Query Basics
Power Query Editor ... 3
Overview ... 3
Updating the Source Location/Name ... 4
Deleting a Query Step ... 5
Transform Versus Add Column ... 6
Split Text by Delimiter ... 7
Automatic Data Typing ... 7
Rename a Column ... 8
Removing (Deleting) a Column ... 8
Perform Quick Mathematical Operations ... 9
Refreshing Source Data During Query Composition ... 9
Quick Insights on Data – Column Profiling, Data Quality & Distribution ... 10
Displaying a Complete List When Filtering ... 10
Displaying Column Profile Information ... 11
Displaying a Complete List for Selected Column ... 12
Column Distribution ... 12
Column Quality ... 13
Using Monospaced Fonts ... 13
Discover the Total Number of Rows ... 14
Formula Bar – Applied Steps & M Code ... 15
Renaming Applied Steps ... 15
Other Applied Steps Options ... 15
Adding Documentation to Applied Steps ... 16
Power Query Formula Bar ... 16
Advanced Editor... 17
Close & Load Destinations ... 18
Working with Existing Queries... 18
Creating a Data Connection ... 19
Changing the Default Load Destination ... 19
“Saving” Your Queries ... 20
Refresh Data and Refresh Options ... 21
Refresh Options ... 21
Part 2 Important Power Query Tips & Tricks
Uploading Data from Excel ... 25Uploading from Tables ... 25
The Query Ribbon ... 26
Useful Table Settings ... 26
Uploading from Named Ranges ... 26
Handling Changes to Source ... 27
Correcting Changes in Location or Filename ... 27
Correcting Changes to Sheet Names ... 28
Correcting Changes to Table Names ... 29
Correcting Changes to Column Heading Names ... 30
Correcting Changes Using the Advanced Editor ... 31
Understanding Data Types ... 32
The Importance of Data Types ... 32
Data Types vs. Formatting (and NULL Formatting) ... 33
Power Query Examples ... 33
Invoking Automatic Data Types ... 35
Working with NULLs ... 35
Power Query Shortcuts ... 36
Common time-saving features ... 36
Error Handling – Finding & Correcting Errors in Data ... 38
Errors that should be corrected at the source level ... 38
Determining which rows contain errors ... 38
Errors to correct in Power Query ... 39
Methods to Deal With Errors in Power Query ... 40
More Data Views – Duplicate or Reference Query? ... 41
How to Duplicate a Query ... 41
How to Reference a Query ... 42
Query Dependency – Visualize Query Relations ... 44
Viewing Query Dependencies Graphically ... 44
Query Management – Delete, Manage, Copy Queries & Backup Results ... 45
Delete a Query ... 45
Copy a Query to Another Workbook ... 47
Export a Query for Later Use ... 47
Organizing Queries ... 48
Part 3 Helpful Power Query Transformations
Text Transformations – Format, Extract & More ... 51Transformation #1: Split Text by Delimiter ... 51
Transformation #2: Extract Employee ID from Unique Code ... 54
Transformation #3: Separate Names and Format Casing ... 57
Merge Columns – What to Watch Out For ... 63
Selecting the Columns to Be Merged ... 63
Merging Without Extraneous Spaces ... 64
Capturing the First Letter Only In the Middle Name ... 65
Standardizing Middle Initial Casing & Style ... 65
Fill & Replace Values – To Create a Proper Dataset ... 66
Repeat Cells to Adjacent Cells (Fill) ... 66
Replacing Data ... 67
Standardizing Casing ... 68
Replacing NULL Values... 68
Sort Data – Including Multiple Levels ... 69
Simple Sorts ... 69
Multi-Level Sorts ... 69
Returning to an Original Sort Order ... 70
Remove Duplicates – Including Multiple Columns ... 71
Remove Duplicates Based on a Single Column... 71
Remove Duplicates Based on Multiple Columns ... 71
Number Transformations – What to Watch Out For ... 72
Adding a Set Value to All Values in a Column ... 72
Rounding Values in a Column (with little know feature) ... 72
Produce an Aggregation Using Two Columns ... 73
Adding Columns That Contain Nulls ... 73
Multiply a Percentage Against a Value ... 74
Preserving the Top N/Bottom N Rows Based on a Column ... 74
Working with Filter – AND & OR Conditions ... 75
Filtering by Specific Items ... 75
Filtering by Text That Exists ... 75
Creating a Compound Field Filter ... 76
Beware of Logic Errors ... 77
Filter Selection Trap ... 78
Predicting the Future ... 78
Change Type Trap – (& Remove Columns Trap) ... 80
Change Type Trap – Problem... 80
Change Type Trap – Solution ... 81
Adjusting Automatic Change Type in Power Query Settings ... 81
Remove Columns Trap – Problem ... 82
Remove Columns Trap – Solution ... 82
Part 4 Powerful Power Query Transformations
Column From Example – to Extract Patterns Quickly ... 85Formulas For Free ... 87
Solving the Problem with the Traditional Approach ... 87
Conditional Columns – in Power Query ... 89
If… Then… Else Statements in Power Query ... 89
Aggregating (Grouping) Data – on Multiple Levels ... 91
Launching the Group By Feature ... 91
Using the Group By Feature ... 91
Group By – Basic View ... 92
Group By – Advanced View ... 93
Group By for All Rows ... 95
Determining the Average Salary per Department ... 95
Creating Nested Tables ... 96
Calculating the Salary Deviation ... 98
Unpivot Columns – Basics ... 99
Unpivot Example ... 99
Trimming the Unpivot Fat ... 101
Unpivot Columns – How to Overcome Common Errors ... 102
Common Error #1: Adding Data Post-Data Typing ... 102
Examining the Unpivot Query ... 103
Correcting the Unpivot Logic Error ... 104
Common Error #2: Dealing with NULLs in the Source Data ... 104
Potential Problem for Replacing Nulls with Zeroes ... 106
Pivot Column – Create Readable Reports Fast ... 108
Creating a Pivot Report ... 109
Split Column – The Problem That’s Easy to Miss ... 111
The Problem ... 111
The Solution ... 111
The Flaw In the Logic ... 114
The Fix for the Flaw ... 114
Split Column by Delimiter – Rows Instead of Columns ... 115
Splitting the Colors Across Rows ... 115
Tweaking the List ... 117
“Future-Proofing” the Query ... 118
Part 5 Date and Time Transformations
Date and Time Transformations ... 121Making the Complex Simple ... 122
Useful Date Features ... 124
Date Calculation Examples ... 125
Useful Time Features ... 126
Time Calculation Examples ... 127
Useful Duration Features ... 128
Duration Calculation Examples ... 129
Part 6 Custom Columns
Why Use Custom Columns ... 131Your Imagination has No Limits ... 131
Custom Columns – Type Compatibility & Date Function ... 133
Data Type Compatibility ... 133
Data Type Conversions using Functions ... 133
Date Function ... 134
Other Intrinsic Functions ... 135
Part 7 Power Query Online Data Sources
Connecting to Different Sources ... 137The Vast World of Connectors ... 137
Import Data from a Website ... 138
Connecting to a Website ... 138
Connecting to an Embedded Excel File or Text File on a Webpage ... 139
Updating Connection Credentials ... 139
Import Data from ODATA (Open Data Protocol) ... 140
Connecting to ODATA ... 140
Get Google Sheet Data with Power Query ... 141
Obtaining the Google Sheets URL... 141
Connecting to the Google Sheets URL ... 141
Connect to Microsoft Exchange – Get Outlook Email ... 143
Extracting Content from Email Messages ... 143
Connect to SharePoint or OneDrive for Business ... 145
Extracting Content from Cloud-based Storage ... 145
Part 8 Combining & Appending Data
Why Append Data? The Difference Between Append and Merge ... 149Appending Data ... 149
Creating the Appended Query ... 150
Merging Data ... 151
Creating the Merged Query ... 151
Part 9 Merge Options Join Kinds Explained
Overview of Merge Options – Understanding Join Kinds ... 155Advantages of Merges ... 155
Merge Options ... 155
Left vs. Right / First vs. Second ... 156
Understanding Join Kinds ... 157
Merge Based on Multiple Columns ... 158
Creating a Compound Key ... 158
Merge to Get Multiple Match Results ... 160
Dealing with One-t-Many Results ... 160
Retrieving the Match with the Latest Date ... 161
Merging Text Columns – Dangers & Pitfalls ... 164
It’s the Little Things that Get Us ... 164
How to Use Fuzzy Match ... 165
When Close is Good Enough ... 165
Fuzzy Match Options ... 166
Using a Transformation Table... 166
Part 10 When to use Power Pivot & Load to Data Model
When to Load to the Data Model ... 169Out of Sight, but Not Out of Mind ... 169
Power Pivot and the Data Model ... 169
Adding Query Results to the Data Model ... 169
When to Use the Data Model ... 170
Is Power Query Needed to Load Data into the Data Model ... 170
Part 11 Understanding M Formula Language
M Language – How M Thinks (let Expression & Values) ... 173Viewing Code in the Advanced Editor ... 173
Expressions ... 174
The let Expression ... 174
Storing the Individual Step Results ... 175
The in Expression ... 175
Comments in M Code ... 176
Step Name Syntax ... 176
“M” Characteristics ... 176
Defining & Invoking Custom M Functions ... 178
Function Syntax ... 178
Multi-Step Functions ... 180
Saving a Function ... 180
Using a Function in a Query ... 180
Updating Function Logic ... 182
Embedded Functions Within Queries ... 182
Reference Guide for Standard M Functions ... 183
Learn About Power Query Formulas ... 183
Built-In Function Documentation ... 186
Function Context ... 187
Alternate Method for Help on a Specific Function ... 188
Tables, Lists, & Records – How to Reference Them in a Table ... 189
Lists ... 189
Creating a List from a Table ... 189
Referencing Column Headers in Lists ... 189
Records ... 190
Referencing Tables, Lists, Records, and Cells ... 190
Performing a Search on a Table ... 192
Returning a Value from a Searched for Record ... 193
Returning a Table from a Table ... 193
Brackets & Lookup Operators in M Code ... 194
3 Important Power Query Rules ... 195
Creating Lists, Tables, & Records – to use as Parameters or Testing of Functions ... 196
Creating a List from Scratch ... 196
Creating a Record from Scratch ... 199
Creating a Table from Scratch ... 199
Understanding the Each Keyword – and the Purpose of the Underscore ... 200
Shorthand Notation ... 200
Underscore Notation/Reference ... 200
Using Power Query Parameters ... 202
Creating Managed Parameters ... 203
Update the Query to Use the Parameter ... 204
Using the Parameter ... 204
Speed Up Queries – Table.Buffer & How to Test Impact ... 206
Why Is This Behavior Not the Default? ... 207
Testing Performance On Large Data Sets ... 207
Counting the Rows in a Data Set ... 207
Query Folding – Improve Performance for Relational Databases ... 209
What is Query Folding? ... 209
Viewing Query Folding Instructions... 209
Breaking Query Folding ... 210
IF Then Statement – Lookup Operators to Get Value from Previous Row ... 211
Creating Row Against Row Comparisons ... 211
Error Handling in Power Query – Bulk Replace Lookup with Try Otherwise ... 213
Testing the Waters... 213
The Purpose of Try Otherwise ... 213
Part 12 Working with Lists & Table Function
Power Query Text Functions... 217Text.Contains, Text.Replace, etc. ... 217
Text.PositionOf ... 217
More Resources
Learn More
More Resources ... 222Power Query Reference Book
This Power Query Reference Guide is accompanying documentation for my online Master Excel Power Query: Beginner to Advanced (including M) course on XelPlus.com/courses. Please do not reproduce or transmit in any form without permission.
We (XelPlus e.U.) have taken every effort to ensure the accuracy of this manual. In case you discover any discrepancies, please send us a quick email to [email protected].
Availability of Power Query
Power Query is natively integrated in several Microsoft products, including the following.
Microsoft Excel
Power Query enables Excel users to import data from a wide range of data sources into Excel for analytics and visualizations.
Starting with Excel 2016, Power Query capabilities are natively integrated and can be found under the “Get & Transform” section of the Data tab in the Excel Desktop ribbon.
Excel 2010 and 2013 users can also leverage Power Query by installing the Microsoft Power Query for Excel add-in.
Microsoft Power BI
Power Query enables data analysts and report authors to connect and transform data as part of creating Power BI reports using Power BI Desktop.
Microsoft SQL Server Data Tools for Visual Studio
Business Intelligence Developers can create Azure Analysis Services and SQL Server Analysis Services tabular models using SQL Server Data Tools for Visual Studio. Within this experience, users can leverage Power Query to access and reshape data as part of defining tabular models.
Common Data Service
Common Data Service lets you securely store and manage data that's used by business applications. Data within Common Data Service is stored within a set of entities. An entity is a set of records used to store data, similar to how a table stores data within a database.
Common Data Service includes a base set of standard entities that cover typical scenarios, but you can also create custom entities specific to your organization and populate them with data using Power Query. App makers can then use Power Apps to build rich applications using this data.
About Leila Gharani
Leila Gharani is a Microsoft MVP & a bestselling online course instructor. She runs XelPlus.com an spreadsheet resource site to help people gain the knowledge they need so they can create useful tools, solve problems and get more done.
Her background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert
& Project Manager. Find out more here.
The Accompanying Files
The examples shown in this manual are taken from the Excel files available inside the online course.
All files are available as a zip file.
Part 1
Power Query
Basics
P
OWERQ
UERYE
DITOROverview
You can open the Power Query editor from the Excel’s Data tab, by going to Data > Get Data > Launch Power Query Editor.
You can also automatically launch the editor by importing data into Power Query. Simply select the appropriate option from the Get & Transform Data Group on the Ribbon.
Updating the Source Location/Name
1. Open the query in Power Query
2. Update the file location/name reference by clicking the gear icon next to the Source step and edit the location/name
The advantage of the gear icon is that you are presented with a user-friendly way to browse to the new location, thus avoiding typographic errors.
NOTE: Any time you see a gear icon next to a step, it means that the step has parameters that can be adjusted.
These adjustments are usually presented in user-friendly dialog boxes, eliminating the need to write complex formulas and statements.
Deleting a Query Step
Power Query does not possess an Undo feature. If you perform an undesirable transformation on the data, click the that appears to the left of a step name when you hover your cursor over the step.
Be aware of the following behaviors when deleting a query step:
• Deleting a step cannot be undone
• Deleting a step may “break” following steps that were dependent on the deleted step Bulk Step Deletion
If you have a list of steps and you wish to delete all steps after a specific step, instead of deleting each step individually, you can save time by right-clicking the first step to delete and select Delete Until End.
This will delete all steps from that point and after in the step list.
Quicklist for Common Table Transformations
If you right-click the button located to the left of the first column header, you can gain access to the quick list for table transformations.
Many of the most common table transformation tools are in a single, easy to navigate list.
Transform Versus Add Column
The tabs labeled Transform and Add Column have many of the same features. This can be confusing for beginners as to which is the correct feature version to use.
• Transform will replace the original data (column) with the transformed version of the data.
• Add Column will create a new column to hold the results of the transformation.
Split Text by Delimiter
You can access the text splitting feature multiple ways:
• Home (tab) → Transform (group) → Split Column
• Transform (tab) → Text Column (group) → Split Column
• Right-click a column heading → Split Column
The Split Column feature has several variations by which the split will be performed. Many of these will be discussed in later sections.
The most widely used option is By Delimiter.
From here, you can select the delimiter and whether you want to split at every occurrence or just the first or last occurrence.
Automatic Data Typing
Power Query will automatically detect the data types when certain events occur:
• A row is promoted to a header row
• A column is split into multiple columns
If this step was not desired, you can click the that appears to the left of the change type step to delete the step.
If this automatic data typing feature is unwanted, you can deactivate the feature at the file-level or the program level.
File-Level Deactivation
File → Options & Settings → Query Options → Current Workbook → Data Load → Type Detection → Detect column types and headers for unstructured sources.
Program-Level Deactivation
File → Options & Settings → Query Options → Global → Data Load → Type Detection → Never detect column types and headers for unstructured sources.
Rename a Column
Columns can be renamed in several ways:
• Transform (tab) → Any Column (group) → Rename
• Right-click a column heading and select Rename
• Double-click a column heading to enter rename mode
Removing (Deleting) a Column
Columns can be deleted (known as “vertical filtering”) several ways:
• Home (tab) → Manage Columns (group) → Remove Columns
• Right-click a column heading and select Remove
• Click a column heading and press Delete on the keyboard
NOTE: If you have many columns to delete and only a few columns to preserve, it may be more efficient to select the columns you wish to keep and use the Remove Other Columns feature to delete the unwanted columns.
Choosing Columns to Keep and Columns to Delete
To select or deselect multiple columns, a selectable list of columns can be displayed by clicking Home (tab) → Manage Columns (group) → Choose Columns.
This is a superior way of removing unwanted columns, as opposed to selecting unwanted columns and deleting them.
Later, if you change your mind, you can click the gear icon and reselect a column to add it back into the data set.
PRO TIP: If you perform the same operation multiple times in a row, Power Query will consolidate all like steps into a single step in the Applied Steps list. This means, if you delete 5 columns then rename 5 columns, you will only see 2 steps in the Applied Steps list.
Conversely, if you were to delete a column, then rename a column and repeat this 4 more times, you will see 10 steps in the Applied Steps list.
Perform Quick Mathematical Operations
If you want to perform common mathematical operations such as multiplying the price by quantity or adding subtotal and tax, you can select the 2 columns of information (using the CTRL key) and select Add Column (tab) → From Number (tab) → Standard → {operation}.
NOTE: Be mindful of the order at which columns are selected. This is important when performing operations such as division (ex: Total Quantity, versus Quantity Total).
Refreshing Source Data During Query Composition
If you believe the data may have changed since you opened the Power Query Editor, you can update the preview data by selecting Home (tab) → Query (group) → Refresh Preview.
Q
UICKI
NSIGHTS OND
ATA– C
OLUMNP
ROFILING, D
ATAQ
UALITY&
D
ISTRIBUTIONIn an attempt to improve performance when connecting to data sources, Power Query downloads a maximum of 1,000 rows of data (if you have fewer rows, you will receive the entire data set.)
This is what is known as “the first three inches of the fire hose”. Typically, it’s enough of a representation of the data to give you a good idea of the entire data set.
This works well for most data sets, but can cause issues with certain tools like filtering, when proposed lists are incomplete because the full data set has yet to be examined.
Displaying a Complete List When Filtering
If you display a filter dropdown list and there appear to be missing entries, those entries may not exist in the first 1,000 rows of the data. Power Query will inform you that there may be items in the data that exist beyond the initial 1,000 entries.
You can force Power Query to read then entire data set for that field by clicking Load More.
This will read the fill data set and display a complete list of items for that column.
Displaying Column Profile Information
To display statistical information about a selected column, click View (tab) → Data Preview (group) → Column Profile.
When you select a column, statistical and distribution information is displayed for the first 1,000 rows of data.
This may provide inaccurate information regarding such things as…
• Count of rows
• Empty cells
• Min and max values
Displaying a Complete List for Selected Column
To display accurate statistics about the entire data set, force Power Query to load the entire data set by clicking in the lower-left corner where it says “Column profiling based on top 1000 rows” and change the option to “Column profile based on entire data set”.
NOTE: System performance may suffer when loading the entire data set. Consider loading all data to capture some quick insights into the data, but return the captured set to the first 1,000 to improve performance.
Column Distribution
To view the number of distinct versus unique values in each column, click View (tab) → Data Preview (group) → Column Distribution.
This provides a miniature column chart at the top of each column to display the relationship between distinct and unique values.
NOTE: These mini-charts are based on the setting for “first 1,000” or “all records”.
Column Quality
To display the percentage of valid data vs. errors vs. empty cells, click View (tab) → Data Preview (group) → Column Quality.
The column headings now display percentages of each category for the column.
NOTE: These percentages are based on the setting for “first 1,000” or “all records”.
Using Monospaced Fonts
If you wish each character to have the same width (no text kerning), select View (tab) → Data Preview (group) → Monospaced.
This may make the text easier to read, especially when working with columns containing large numbers.
Discover the Total Number of Rows
To quickly discover the number of rows in a data set, select Transform (tab) → Table (group) → Count Rows.
This will reduce the entire data set to a single value that represents the total number of rows.
This is often performed to satisfy a casual curiosity. The step is often deleted once the question has been answered.
F
ORMULAB
AR– A
PPLIEDS
TEPS& M C
ODEPower Query does an admirable job of recording all our manipulations in the Applied Steps window.
However, the names provided to the steps are sometimes less than obvious.
Renaming Applied Steps
A better way to document your steps, and make actions more understandable, is to rename the steps.
To rename a step, either click a step name and press F2 or right-click a step name and select Rename.
Other Applied Steps Options
When you right-click a step name, there are other helpful options listed in the menu, such as:
• Delete/ Delete Until End
• Move Up / Move Down
• Insert Step After
• Extract Previous (this one is VERY cool)
Adding Documentation to Applied Steps
Creating a descriptive name for a step helps make queries more understandable, but that only goes so far due to limited space.
If you need to document more detailed information about a step (i.e. justification for performing the step or mathematical formula behind the step), right-click the step and select Properties.
In the Description field, write what makes the most sense for your documentation needs.
Power Query Formula Bar
To gain greater insight into what each step is performing, we can activate the Formula Bar by selecting View (tab) → Layout (group) → Formula Bar.
Similar to Excel’s Formula Bar, we can see the “behind the scenes” instructions for the selected step.
These instructions are written in the Power Query ‘M Language’ (Data Mashup Language).
This is especially useful for steps that do not have the gear icon . We can see in the Formula Bar what is occurring and potentially perform a direct-edit on the instruction.
For steps that have more instructions than can be viewed on a single line of the Formula Bar, click the small down-arrow to the right of the Formula Bar to expand the height of the Formula Bar.
Advanced Editor
Although you can see each step individually in the Formula Bar, it’s difficult to remember what you saw in one step when comparing to another step.
The entire set of “M” code can be displayed by opening the Advanced Editor.
To open the Advanced Editor, select Home (tab) → Query (group) → Advanced Editor or by selecting View (tab) → Advanced (group) → Advanced Editor.
This will display the entire instruction set of “M” code for the query.
C
LOSE& L
OADD
ESTINATIONSWorking with Existing Queries
When you edit an existing query, the only available option is “Close & Load” which defaults to the existing destination.
If you need to change the destination, return to Excel and right-click the query in the Queries &
Connections panel and select “Load To…”
This will open the Import Data dialog box where you can select a new destination for the query output.
Creating a Data Connection
A data connection is where you have the steps defined for processing the data, but the data is not loaded into Excel in any form. The data is “waiting to be called” when needed.
Connection Only queries are good for data that will be used for an intermediate step to aid in processing, but the results are not needed for the final output.
Connection Only queries greatly reduce file size. This is because the data remains at the source and a copy is not stored locally in the Excel file.
Changing the Default Load Destination
The default load destination for Power Query is to load to an Excel Table.
If you would rather the default location be to a Connection Only query, or a table and/or the Data Model, select File (tab) → Options & Settings → Query Options → Global → Data Load → Default Query Load Settings.
Note: Selecting a custom load but NOT selecting a worksheet or Data Model will establish a Connection Only query.
“Saving” Your Queries
If you are in the middle of creating a query and you need to leave the query editor, but you don’t want to discard your work, nor are you ready to load the results into Excel, select “Close & Load to…” and establish a Connection Only query.
This will allow you to return later and resume work without performing a potentially incomplete data load.
R
EFRESHD
ATA ANDR
EFRESHO
PTIONSWhen the source data changes, the query results must be refreshed.
There are several ways of performing a refresh operation:
1. Select Data (tab) → Queries & Connections (group) → Refresh All.
2. Right-click on a query results table and select Refresh.
3. Right-click a query in the Queries & Connections panel and select Refresh.
4. Click the Refresh icon in the upper-right corner of the query.
Refresh Options
Each query has a set of refresh options that will allow for a certain degree of automation.
Select Data (tab) → Queries & Connections (group) → Properties to open the Properties dialog box.
Some useful settings include:
• Enable background refresh – this allows you to continue working in Excel as the query is being refreshed. Disabling this feature is useful when the results of the query are needed for another operation (like a macro), and you don’t want the following step to run prematurely.
• Refresh every N minutes – this schedules a refresh at a defined interval (in minutes). The workbook needs to be open for this option to work.
• Refresh data when opening the file – self-explanatory.
Part 2
Important
Power Query
Tips & Tricks
U
PLOADINGD
ATA FROME
XCELThere are multiple ways to upload data from Excel into Power Query. Two of the most used ways are from tables and named ranges.
Uploading from Tables
Excel requires that all data sources from Excel must be in the form of a proper Excel Table (when not working with Named Ranges.) If the table is not already in the form of an Excel Table, Power Query will automatically “upgrade” the plain table to an Excel Table. Excel will assign a default name, such as
“Table1”, to the table.
If you are creating reports for long-term use or reports that cull data from multiple sources, it’s a good idea to pre-convert the plain table to an Excel Table and give the table a better name before bringing the data into Power Query.
1. Select the data range and upgrade to an Excel Table. This can be done via:
• CTRL-T keyboard shortcut
• Home (tab) → Styles (group) → Format as Table
• Insert (tab) → Tables (group) → Table
2. Give the table a proper name, like “SalesTable”, by selecting Table Design (tab) → Properties (group) → Table Name.
3. If you wish to retain your original table art (i.e. colors, fonts, borders, etc.), remove the automatically applied table layout by selecting Table Design (tab) → Table Styles (group) → bottom scroll button → Clear.
Uploading the Data into Power Query
4. To send the newly upgraded data into Power Query, click in the data and select Data (tab) → Get
& Transform Data (group) → From Table/Range.
5.
Rename the query output to something like “SalesReport”. This will become the name of the output table on the Excel worksheet.The Query Ribbon
Once we have loaded the data back into Excel, we have access to a new ribbon named Query.
The Query ribbon contains many useful features for working with the query as an object, such as:
• Refresh
• Edit
• Delete
• Duplicate / Reference
• Merge / Append
These features are also available by way of right-clicking on a query in the Queries & Connections panel.
Useful Table Settings
Two table settings that are worth exploring:
• Table Design (tab) → External Table Data (group) → Properties → Adjust column width.
• Table Design (tab) → External Table Data (group) → Properties → Preserve cell formatting.
Uploading from Named Ranges
If a data set has been given a name (i.e. Named Range), Power Query will leave the original data “as-is”
and not upgrade the data to an Excel Table.
The downside to this is that Named Ranges are not dynamic. If data is added to the originally defined Named Range, Power Query is unable to “see” the new data until the user manually updates the Named Range reference.
Solutions to this problem include:
• Define a range larger than your data to account for future data addition
• Create a dynamic named range by using formulas in the Name Manager to calculate the height/width of the data range
NOTE: If you are using a Named Range to define your data, select the Named Range before sending it to Power Query. This prevents Power Query from selecting what it thinks is your data, which may not match your Named Range definition. This will also prevent Power Query from upgrading your data into an Excel Table.
Handling Changes to Source
When importing data into Excel using Power Query, a connection to the source material is created. This way, when the source material changes, you can click “refresh” and the same source is read to import any additions, deletion, or changes into the report.
The connection is recorded as an absolute address; in other words, the full drive\path\filename is recorded. If any element of that address changes, the query will break and be unable to perform a refresh operation.
Other things that can “break” your query include:
• Changes to heading names
• Changes to table names
• Changes to sheet names
Correcting Changes in Location or Filename
If a file’s name or location changes and an update is performed on the Power Query output, we are presented with an error message.
To correct a query that refers to a file by location and name:
3. Open the query in Power Query
4. Click the Go To Error button to be taken to the offending step, or select the Source step in the Applied Steps list
5. Update the file location/name reference by either
• Editing the reference in the Formula Bar
• Click the gear icon next to the Source step and edit the location/name
The advantage of the gear icon is that you are presented with a user-friendly way to browse to the new location, thus avoiding typographic errors.
Correcting Changes to Sheet Names
If a sheet’s name changes and an update is performed on the Power Query output, we are presented with an error message.
To correct a query step that refers to a sheet name:
1. Open the query in Power Query
2. Click the Edit Settings button to be taken to the offending step, or select the Navigation step in the Applied Steps list
3. Update the sheet name reference by either
• Editing the sheet name reference in the Formula Bar
• Click the gear icon next to the Navigation step and point to the new sheet name
NOTE: Remember that Power Query is case-sensitive. Ensure you type the names as they appear in the source material.
Correcting Changes to Table Names
If a table’s name changes and an update is performed on the Power Query output, we are presented with an error message.
To correct a query step that refers to a table name:
1. Open the query in Power Query
2. Click the Go To Error button to be taken to the offending step, or select the Source step in the Applied Steps list
3. Update the incorrect table name to the new table name by editing the reference in the Formula Bar
Correcting Changes to Column Heading Names
If a table’s heading name changes and an update is performed on the Power Query output, we are presented with an error message.
To correct a query step that refers to a heading name:
4. Open the query in Power Query
We are presented with the following error message.
Unfortunately, we do not have any buttons to lead us to the offending step, nor any gear icon to open a user-friendly correction window. This is why having the Formula Bar visible is so important.
5. Update the entry in the Formula Bar to display the correct column heading name.
NOTE: Remember that Power Query is case-sensitive. Ensure you type the names as they appear in the source material.
Correcting Changes Using the Advanced Editor
If you have many changes to perform and you feel confident in your typing accuracy, you can open the Advanced editor and perform many of the updates in a single operation.
Open the Advanced Editor in Power Query by selecting Home (tab) → Query (group) → Advanced Editor.
Make the necessary changes in the editor and click when finished.
IMPORTANT: Remember that Power Query does not posses the Undo feature. Consider creating a backup of the file/query before editing. This way, if you make a mistake during the direct editing of the query, you can restore your query to its original state.
U
NDERSTANDINGD
ATAT
YPESThe Importance of Data Types
It is very important to define your fields (columns) with the proper data types. This ensures the following:
• Data can be properly interpreted/understood
• Data can be manipulated properly
• Data cannot be manipulated improperly
• Proper level if numeric precision is maintained Data Type Description and Use
Decimal Number Maximum 15 digits regardless of decimal placement.
Currency
4 digit to the right of decimal; maximum 19 digits regardless of decimal placement.
Anything after the 4th decimal place is rounded.
Whole Number Integer value (no decimal precision). Maximum 19 digits. Any fractions are rounded.
Percentage Displayed as percentages in Power Query; displayed as decimals in a workbook.
Date/Time Date & Time in a single column. Stored as a decimal type.
Date Dates from 1900 to 9999 are supported. Stored as whole numbers.
Time Time only. Stored as decimal type.
Date/Time/Timezone UTC Date/Time
Duration Length of time shown as days, hours, minutes, and seconds. Stored as decimal type.
Text Text and numbers. Maximum of 268K characters.
True/False Boolean value.
Binary A sequence of bytes (e.g. when loading files from a folder).
Using Locale… Important if importing data from sources that have different regional settings.
D
ATAT
YPES VS. F
ORMATTING(
ANDNULL F
ORMATTING)
The fundamental principles behind storing versus viewing numbers in Excel and Power Query are:
In Excel
• numbers are calculated as stored, not as displayed
• the formatting of numbers does not affect the precision of numbers In Power Query
• formatting numbers changes the stored value
• once a number is formatted to a lesser accuracy, the original level of accuracy cannot be restored without deleting prior query steps
Excel Examples
If we take a value in Excel that has 2-decimal place precision and we format the value as a whole number, the number displayed will be rounded but the underlying value remains unchanged.
Power Query Examples
Reducing Decimal Precision
If we were to load into Power Query a column of values with 2-decimal place precision and we then set the data type to Whole Number, the fractional side of the number is discarded and the whole number side is rounded.
If we examine the number more closely, we can see that the change in data type has altered the value.
This precision cannot be restored once the alteration has taken place. It would be necessary to delete the step that applied the change to regain the original precision.
Increasing Decimal Precision
If you import whole numbers into Power Query and later add data that has fractional precision, you will lose the fractions.
This is because the original numbers are data typed as Whole Numbers. Future values will be converted to Whole Numbers, thus losing their fractional precision.
If you believe that you may encounter fractions in your data, it’s a good idea to set the data type to Decimal as a way of “future-proofing” your query.
Currency-based Values
When values are data typed as Currency in Power Query, the results are displayed in the table with 2- decimal place precision, but the underlying values are stored and calculated at 4-decimal place precision.
Invoking Automatic Data Types
If you have removed the originally established data types and want to let Power Query “figure it out”, select a column, separate columns using CTRL, or all columns using CTRL-A, and click Transform (tab) → Any Column (group) → Detect Data Type.
Working with NULLs
If a cell is empty when brought into Power Query, the corresponding cell will be displayed with a NULL indicator.
This is an indication that the cell is truly empty; no spaces or hidden characters
P
OWERQ
UERYS
HORTCUTSAs anyone who has worked with Excel, or any other application will tell you, learning time-saving shortcuts is one of the fastest ways to shorten the duration of your workflow.
Below are some of the more useful Power Query shortcuts. This is by no means a comprehensive list, but it’s a great place to start for reducing work.
Shortcut Task
F2 Edit the name of a column or query
Arrow keys (L & R) Navigate left or right through columns
CTRL key Select multiple, non-contiguous columns
Shift key Select contiguous columns
CTRL-A Select ALL columns
CTRL-Space Select the entire column of a selected cell ALT (while opening Excel) Open a second, unrelated instance of Excel
Common time-saving features
Selecting / Deselecting Columns
To select or deselect multiple columns, a selectable list of columns can be displayed by clicking Home (tab) → Manage Columns (group) → Choose Columns.
This is a superior way of removing unwanted columns, as opposed to selecting unwanted columns and deleting them.
Later, if you change your mind, you can click the gear icon and reselect a column to add it back into the data set.
Detecting Data Types
It’s not uncommon to delete the automatically applied type detection step from the Applied Steps list in Power Query.
When it comes time to perform data type detection, you can select the column(s) you want to data type and click Transform (tab) → Any Column (group) → Detect Data Type.
E
RRORH
ANDLING– F
INDING& C
ORRECTINGE
RRORS IND
ATA We’ve seen how to handle common query-wide errors, such as:• when the source files location or name changes
• when the sheet name changes
• when the table name changes
Now we will demonstrate common techniques for handling errors within columns of data.
Errors that should be corrected at the source level
Certain errors should be corrected in the data before loading into Power Query. Common issues include:
• A mixture of date formats from differing regions
• Column name inconsistencies when appending multiple files in a folder
Determining which rows contain errors
If your query returns a message stating that the results contain errors, click the error count to generate a detailed list of data rows that contain errors.
Power Query creates a new query that extracts the complete record for all rows of the results query that contain errors. Each row is prefaced with a row number indicating the record position in the source data.
If you click to the right of the error message, Power Query displays a detailed explanation of the error.
Errors to correct in Power Query
Many errors occur in Power Query due to an incorrectly applied data type.
In other words, if a column contains sales and certain records contain messages such as “No Sale”.
When the column has a Currency data type applied to it, the cells holding text will display an error due to a data type mismatch.
Methods to Deal With Errors in Power Query
Remove Rows with Errors
If the rows containing errors in each column are not needed, you can delete the error rows by selecting the column containing errors and click Home (tab) → Reduce Rows (group) → Remove Rows → Remove Errors.
Replace Errors with Meaningful Data
You can replace the errors with meaningful data, such as a default date, zeroes, nulls, etc.
Select the column containing errors and click Transform (tab) → Any Column (group) → Replace Values
→ Replace Errors.
In the Replace Errors dialog box, enter the value you wish to replace the errors with, such as 0 (zero).
The results are as follows.
M
ORED
ATAV
IEWS– D
UPLICATE ORR
EFERENCEQ
UERY?
When working with Power Query, you are likely to encounter two familiar scenarios:
1. Creating multiple query outputs based on the same data set.
2. Large queries that you wish to break down into sections for ease of understanding, maintenance, and reusability.
Duplicating a Query
Creates a second copy of your existing query where the actions and results are independent of the original query. This is like copying a file. The two queries have no connection between them; altering one does not affect the other.
Referencing a Query
Creates a new query that is dependent on the original query. The output of the original query becomes the input for the dependent query.
How to Duplicate a Query
There are several ways you can duplicate a query, depending on where you are at the moment.
If you are in Excel:
• From the Queries & Connections panel, right-click a query and select Duplicate.
• Select a query from the Queries & Connections panel, then select Query (tab) → Reuse (group)
→ Duplicate.
If you are in Power Query:
• From the Queries list, right-click a query and select Duplicate.
• Select a query from the Queries list, then select Home (tab) → Query (group) → Manage → Duplicate.
How to Reference a Query
There are several ways you can reference a query, depending on where you are at the moment.
If you are in Excel:
• From the Queries & Connections panel, right-click a query and select Reference.
• Select a query from the Queries & Connections panel, then select Query (tab) → Reuse (group)
→ Reference.
If you are in Power Query:
• From the Queries list, right-click a query and select Reference.
• Select a query from the Queries list, then select Home (tab) → Query (group) → Manage → Reference.
Q
UERYD
EPENDENCY– V
ISUALIZEQ
UERYR
ELATIONSViewing Query Dependencies Graphically
As seen in the previous section, queries can reference other queries.
When a query uses the output of another query as its input, a dependency is created. The input query is dependent on the output query.
A query that has dependent queries cannot be deleted without first deleting the dependent queries.
If the output query were to be deleted, the dependent queries would no longer have a source with which to draw data.
To visualize these dependencies in Power Query, select View (tab) → Dependencies (group) → Query Dependencies.
The following Query Dependencies dialog box will graphically display the queries, the relationship hierarchy, and their load destinations.
Q
UERYM
ANAGEMENT– D
ELETE, M
ANAGE, C
OPYQ
UERIES& B
ACKUPR
ESULTSDelete a Query
If you are in Excel:
• From the Queries & Connections panel, right-click a query and select Delete or press the Delete key on the keyboard.
• Select a query from the Queries & Connections panel, then select Query (tab) → Edit (group) → Delete.
If you are in Power Query:
• From the Queries list, right-click a query and select Delete.
• Select a query from the Queries list, then select Home (tab) → Query (group) → Manage → Delete.
NOTE: If you delete a query, the output of the query remains in the workbook.
Deleting All Queries at Once
Suppose you want to send the query output results to a client but you don’t want them to see how the data was collected and processed. If you have several queries and you wish to delete all queries in a workbook, perform the following steps:
1. Consider saving the file under a different name. This way you will retain a version of the file with the queries.
2. In the version of the file to clean, select File (tab) → Info → Check for Issues → Inspect Document.
3. In the Document Inspector dialog box, click Inspect.
4. Scroll down to the category labeled Custom XML Data and click Remove All.
5. Click Close.
All the queries will have been removed while leaving the query results intact.
Copy a Query to Another Workbook
When you want to copy an entire query to another workbook, it’s a simple matter of Copy/Paste.
1. From the Queries & Connections panel, right-click a query and select Copy.
2. Start a new workbook or open an existing workbook.
3. Open the queries list by selecting Data (tab) → Queries & Connections (group) → Queries &
Connections.
4. From the Queries & Connections panel, right-click in an empty part of the panel and select Paste.
Export a Query for Later Use
You can export a query to a file that you can import at a later date, or send it to someone for them to import into their workbook.
1. From the Queries & Connections panel, right-click a query and select Export Connection File.
2. Save the query as an .ODC (Open Database Connection) file.
NOTE: The .ODC file is an eXtensible Markup Language (XML) file that can be viewed in any text editor.
3. Start a new workbook or open an existing workbook.
4. Select Data (tab) → Get & Transform Data (group) → Existing Connections.
5. In the Existing Connections dialog box, select the applicable query from the list.
NOTE: If the query is being transferred to a different computer, you can click “Browse for more…” and manually locate the .ODC file.
Organizing Queries
If you are working in a workbook with many queries (perhaps dozens), you can organize your queries into Query Groups based on similar purpose, function, or phase.
Create a Query Group
1. From the Queries & Connections panel, click a query then hold the CTRL key and select one or more other queries.
2. Right-click one of the selected queries and click Move to Group.
3. You can select one of the existing groups or create a new group by selecting “New Group…” If you are creating a new group, give the group a name and consider adding in the Description field some explanation as to the purpose of the queries in this group.
The results can be expanded or collapsed to create a cleaner appearance to the queries list.
Remove a Query Group
To remove a query group, right-click the query group and click Ungroup.
All ungrouped queries are placed in the “Other Queries” group.
Part 3
Helpful
Power Query
Transformations
T
EXTT
RANSFORMATIONS– F
ORMAT, E
XTRACT& M
ORETransformation #1: Split Text by Delimiter
Our first transformation is to separate the Department and the Position into 2 columns.
Because the Department and Position are separated by “space – forward slash – space” characters, we can leverage these characters as a delimiter to assist in the separation process.
1. Click in the table and select Data (tab) -> Get & Transform (group) -> From Table/Range.
2. Rename the transformation to “ProperData”.
3. Select the Department/Position column and click Home (tab) -> Transform (group) -> Split Column -> By Delimiter.
4. In the Split Column by Delimiter dialog box, select Custom and use a forward slash (/) as the delimiter.
5. Click OK.
NOTES:
• Once the split has been performed, another “Changed Type” step is added. This is to reassess the newly split data for possible new data types. We could delete this step as it is not creating anything we didn’t already possess.
• Technically, we could have used a “ / “ (space, forward slash, space) as the custom delimiter.
This would have allowed us to eliminate the upcoming “Trim” step.
Removing the Leading & Trailing Spaces
If you select to the right of one of the Department names, you will be presented with a preview window at the bottom on the screen.
If you then click to the right of the data in the preview window you will notice that the cursor does not rest directly next to the last letter; there is a trailing space.
We can also discover that there is a leading space before the Position in the second column.
Trimming the Data
To remove the trailing and leading spaces, select the Department column, hold CTRL and select the Position column. Select Transform (tab) -> Text Column (group) -> Format -> Trim.
I prefer using the “space – forward slash – space” approach as it reduces the query by a step.
NOTE: We will deal with the column headings at the end. For now, we can leave them a bit messy.
Transformation #2: Extract Employee ID from Unique Code
Our next step involves the extraction of the Employee ID from within the Unique Code column. Notice that the format is “last name – employee ID – first name”.
Select the Unique Code column and click Transform (tab) -> Text Column (group) -> Split Column -> By Non-Digit to Digit.
This separates the Last Name from the Employee ID/First Name.
Next, select the column with Employee ID and Last Name and click Transform (tab) -> Text Column (group) -> Split Column -> By Digit to Non-Digit.
This separates the Employee ID from the First Name.
Getting rid of the unneeded bits
As we only need the Employee ID, we will remove the newly separated First Name and Last Name columns.
Click the column of Last Names, then press CTRL and click the column of First Names.
Press the Delete key to remove the selected columns.
Formatting the Employee ID
Remember, one of the requirements was to format the Employee IDs with an “E-“ prefix.
Select the column that contains Employee IDs and click Transform (tab) -> Text Column (group) ->
Format -> Add Prefix.
In the Prefix dialog box, enter a Value of “E-“ and click OK.
We now have our properly formatted Employee IDs.
Transformation #3: Separate Names and Format Casing
Our final set of transformations is to separate the contents of the Full Name column into a First Name column and Last Name column whilst ignoring any middle name information.
There is a wealth of options sitting withing the Extract feature in Power Query.
We don’t want to transform what we have into a new set of data; we want to leave the original Full Name column while adding additional columns for First Name and Last Name.
For this operation, we will use the version of Extract located on the Add Column ribbon.
Extracting First Names
Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text Before Delimiter.
In the Text Before Delimiter dialog box, enter a space in the Delimiter field and click OK.
We are presented with a new column that contains all text up to the first space in the Full Name text.
Extracting Last Names
Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text After Delimiter.
In the Text After Delimiter dialog box, enter a space in the Delimiter field. Expand the Advanced Options and set the Scan for the Delimiter option to “From the end of the input” and click OK.
Starting from the end and “looking” backward is necessary because of the existence of middle names in some of our records.
If we began our search for a space from left-to-right, we would stop before a middle name and extract the middle and last names.
We are presented with a new column that contains all text after the last space in the Full Name text.
Format the Names with Proper Casing
The next step is to format the newly added First Name and Last Name columns so that the first letter is upper-case while the remaining letters are lower-case.
Select the First Name and Last Name columns then click Transform (tab) -> Text Column (group) ->
Format -> Capitalize Each Word.
We now have our properly formatted names.
Finishing Touches
• We no longer require the original column of Full Names, so we can select the Full Name column and press Delete.
• Rename the column headings with more meaningful names.
Sending the Results to Excel
To send the transformation results back to Excel as a finished table, select Home (tab) → Close (group)
→ Close & Load (lower part of the button) → Close & Load to…
You can load the results to a table on a new sheet, or an existing sheet.
The results are as follows.
All three requirements have been satisfied and we didn’t have to write a single formula to get the job done.
M
ERGEC
OLUMNS– W
HAT TOW
ATCHO
UTF
ORTwo very important first steps will govern your success with the Merge Columns feature:
• The order by which you select the columns to be merged.
• The tab you invoke the Merge Columns feature from; Transform or Add Column.
Selecting the Columns to Be Merged
The order in which you select the columns determines the merge order.
This is especially useful because if your data is not in the proper order, you are not required to establish the order before the merge. You can select the columns based on the order you want the final result, then perform the merge. The Merge Columns feature will accomplish two tasks in a single step.
Use the CTRL key to select multiple columns.
Launch the Merge Columns feature in one of the following manners:
• To replace the existing content, select Transform (tab) → Text Column (group) → Merge Columns or right-click the selected columns and select Merge Columns.
• To create additional content, select Add Column (tab) → Text Column (group) → Merge Columns.
In the Merge Columns dialog box, select the separator (delimiter) you wish to use to separate each column’s data within the result.
In the New Column Name field, type the name you wish to use for the heating of the new column.
Potential Issue
Because we were including middle names in the above example, names that do not contain a middle name are padded with two spaces between the first and last name.
Merging Without Extraneous Spaces
The key to getting the proper result (names without extra spaces) is to invoke the Merge Columns step from the Add Columns tab. This will produce a merged result without extra spaces. Worst Case Scenario, you may need to add an extra step to delete the original columns if you were intending to replace them.
Capturing the First Letter Only In the Middle Name
Suppose you only wanted to keep the first letter of the middle name. Before the merge operation:
1. Select the Middle Name column.
2. Select Transform (tab) → Text Column (group) → Extract → First Characters.
3. In the Extract First Characters dialog box, enter a 1 in the Count field and click OK.
Standardizing Middle Initial Casing & Style
If you want to “future-proof” your query to account for times when data may arrive in lower-case formatting, perform the following step:
1. Select the Middle Name column.
2. Select Transform (tab) → Text Column (group) → Format → UPPERCASE.
4. If you wish to have a period following the Middle Name, select the Middle Name column.
5. Select Transform (tab) → Text Column (group) → Format → Add Suffix.
6. In the Suffix dialog box, enter a “.” (period – no quotes) in the Value field and click OK