Solutions for Power Query Challenge 6

2019-08-01
Ken Puls
Categories: Excel, Power BI, Power Query

This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow! There were a bunch, and some cool variety there. So now it's time to show you all what I came up with here.

What was Power Query Challenge 6?

The full description and source data can be found in yesterday's post, but in short it was to convert this:

Data table with multiple data points per cell

To this:

Data shown in First Normal Form (1NF)

So how do we do it?

Two Solutions for Power Query Challenge 6

Wait, two solutions? Why?

As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice. Then I built another that seems a bit more elegant. So I'll show them both, although quickly.

Solution 1 - Splitting Individual Columns and Merging Back Together

The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread. It basically follows this method:

Step 1:
- Create a Data query that connects to the source data, and load it as connection only
Step 2:
- Create 3 new queries for ItemID, Quantity and Price which
  - Reference the data query
  - Keep the InvoiceID column and the other relevant column
  - Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
  - Add an Index column
Step 3:
- Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each

So when complete the query chain looks like this:

And returns the table we're after:

The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed. So building the Price query would go through the following steps:

Right click the Data query --> Reference
Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
Right click the Price column --> Split column --> By Delimiter
- Clear the decimal from the Custom area
- Click the arrow to open the Advanced area
- Change the selection to split to Rows
- Check "Split using special characters"
- Choose to insert a Line Feed character
- Click OK
Set the Data Types
Go to Add Column --> Add Index Columns

Resulting in this:

The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.

Solution 2 - Group and Split

While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries. While effective, it didn't feel elegant. So I cooked up another solution that uses Grouping. It's actually quite similar to the first solution that Bill Szysz posted.

The basic method is as follows:

Connect to the data
Right click the InvoiceID column --> UnPivot Other Columns
Right click the Value column --> Split Column --> By Delimiter --> OK

Following the steps above gets us to this state:

To unwind this, we group it:

Go to Transform --> Group By
- Group By InvoiceID, Attribute
- Aggregate a "Data" column using the All Rows operation

At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).

Go to Add Column --> Custom
- Column Name: Custom
- Formula: =Table.AddIndexColumn( [Data], "Row", 1, 1)
Right click the Custom column --> Remove Other Columns
Expand all fields from the Custom column

Leaving us with this data:

The next step is to Pivot it:

Select the Attribute column --> Transform --> Pivot Column
- Choose Value for the Values
- Expand the Advanced arrow
- Change the Aggregation to "Don't Aggregate"
- Click OK
Select the "Row" column and Remove it. (Yes, it was needed to unpivot this correctly, but now adds no value.)
Set the data types
Load it to the desired destination

At this point, the query (again) looks perfect:

Now, I must admit, this felt far more professional and left me feeling good about it.

Which Solution to Power Query Challenge 6 is Better?

Naturally, solution 2 is better. It takes less queries, and looks way cooler. Right? Not so fast...

The real question is in the performance. And for this one I thought I'd test it. But I needed more data. I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes. Privacy was left on, and all times shown are in seconds:

Solution 1: 1.43, 1.48, 1.11, 1.27 Avg ~1.32 seconds
Solution 2: 2.77, 2.65, 2.63, 2.68 Avg ~2.68 seconds

I'll be honest, this surprised me. So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):

Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)

The revised timing for Solution 2 now gave me this:

Solution 2A: 2.54, 2.49, 2.56, 2.61. Avg ~2.55 seconds

So while the local grouping did have a small impact, the message became pretty clear here. Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!

My solution (including 5,000 rows of the data), can be found in the solution thread here.

Tags: challenge, first normal form, grouping, merging data, nested data, power query, solution, splitting data, table

4 thoughts on “Solutions for Power Query Challenge 6”

MF on August 3, 2019 at 1:47 am said:

Hi Ken,
Interesting findings on solution 2. Thanks for sharing.
I've a video for solution 1 (not exactly the same, but 99%).
https://youtu.be/w9O-OfH_AIM
Hope you or your readers like it.
Cheers,
Dirk on August 6, 2019 at 10:51 pm said:

Is the fill up or fill down function in power query not easier.
Ken Puls on August 9, 2019 at 8:40 am said:

Hi dirk,

It's not about filling up or down for the first column. There's no empty cells there to fill into. The issue arises from splitting the final three columns into separate data points, which is tricky.
Mehdi HAMMADI on September 2, 2019 at 8:16 am said:

Thank you for sharing the two solutions, I came naturally with the 1st one, tried a more elegant one but with no result. I guess my brain is still in vacation 🙂

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Solutions for Power Query Challenge 6

What was Power Query Challenge 6?

Two Solutions for Power Query Challenge 6

Solution 1 - Splitting Individual Columns and Merging Back Together

Solution 2 - Group and Split

Which Solution to Power Query Challenge 6 is Better?

Share:

4 thoughts on “Solutions for Power Query Challenge 6”

Leave a Reply

Latest Posts

Proving Excel’s Box and Whisker Chart Calculations

Building a Box and Whisker Plot in Excel

Reading Excel’s Box and Whisker Chart

Convert Julian Dates to Gregorian Dates with Power Query

Get Album and Track Data from Spotify

Connect to Spotify using Power Query