I got an email from a friend today who was using some complicated logic to replace specific records in a table with records from another table. His query was running pretty slow, so he reached out for a little help. In this post I'll show how to replace records via joins in Power Query; a much easier (and what should be a faster) solution to his issue.
Data Background
The data footprint that was sent to me looked something like this:
And the desired output is shown below:
So basically, we want to take the record for Unit002 from the Override table and replace the Unit002 value in the Original Data table.
At first glance, this looks hard. And my friend cooked up something pretty complicated to make this work. Funny thing is (and believe me… I've had this happen to me as recently as last week…) when you put another pair of eyes on it, you suddenly realize it's much easier than you first saw.
In this case we can actually solve this very easily by using a couple of Power Query's different Join types!
Laying the Groundwork
If you want to follow along, grab the sample workbook here. You'll notice that we have taken the following actions already:
- Select any cell in the Original Data table
- Create a New Query –> From Table
- Go to Home –> Close & Load To… –> Connection Only
- Select any cell in the Override With table
- Create a New Query –> From Table
- Go to Home –> Close & Load To… –> Connection Only
Which leaves us with the following queries in the Workbook Queries pane:
We are now set to replace the records.
Replace Records Via Joins in Power Query
This actually takes a Merge and an Append in order to complete the job. So let's start at the merge.
- Right click the "Original" query –> Reference
This creates a pointer to the data in the "Original" query, showing all four rows of data in the table. The challenge here is that we only want the rows which are NOT being replaced. The secret to getting those? An Anti-Join!
- Go to Home –> Combine –> Merge Queries
- Choose the Override query
- Select the Unit column on both the top and bottom queries
- Change the Join Kind to "Left Anti (rows only in first)"
- Click OK
At this point, you'll have 3 rows left, as shown below:
Why only 3 rows? Because the Left Anti Join only returns the rows which don't match what is in the other table. So where Unit002 exists in the second table, it cause it to pull everything EXCEPT Unit002 from the left table. (For more on using Anti-Joins in Power Query, see this blog post.)
Joining tables does create a new column however, even if it is full of null values (as this one is.) Since we don't need it, let's just delete that column:
- Right click the NewColumn column –> Remove
Now we just need to add the record(s) from the Override table to this list. That's fairly easy:
- Go to Home –> Combine –> Append
- Choose the Override table
- Right click the Unit column –> Sort –> Ascending (this step is optional, and done for readability only.)
And you're done! 5 steps (after the connection only queries were created), 100% user interface drive, and should perform quite quickly.
6 thoughts on “Replace Records Via Joins in Power Query”
Nice trick, Ken 🙂 Very usefull.
But does you know that we can do this in this way?
let
Source = Table.Combine({Override,Original}),
#"Remove duplicates" = Table.Distinct(Source, {"Unit"})
in
#"Remove duplicates"
I want only to say "we can" and not "this is a better way".
Truly, I do not know which is better or faster 🙂
Cheers
Great point, Bill. Many ways to skin this cat! Matt Allington also mentioned to me (via email) that we should consider doing two anti joins. The issue he raised is what would happen if the "replacement' table contained a record that did not exist in the original table. Both my and your solutions would add that record. By using an anti join the other way... we could avoid that.
This is not a big problem 🙂
The only we have to do is preparing the "Override" table.
I mean something like this below.
let
OrgUnit = Original[Unit],
Source = Excel.CurrentWorkbook(){[Name="Org_Override"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Unit", type text}, {"Organization", type text}, {"Department", type text}, {"Team", type text}}),
ToRemoveFromOverride = List.Buffer(List.Intersect({#"Changed Type"[Unit], OrgUnit})),
OverrideReady = Table.SelectRows(#"Changed Type", each List.Contains(ToRemoveFromOverride,[Unit]) )
in
OverrideReady
Cheers 🙂
Good techniques in both post and comments. Such scenario takes place from time to time. Thanks Ken and Bill.
Can we perform Cartesian product of table joins using power query?
Hey Abhay, there is a post going live 2016-05-11 to cover this topic. You can find it here: https://excelguru.ca/2016/05/11/cartesian-product-joins-for-the-excel-person/