While I was at the PASS BA conference in San Jose, CA last week, I got an email from a reader asking if Power Query could create a Cartesian Product join.
Now I’m an Excel guy, and I’d never heard this term before. Fortunately, I got the email while I was sitting around the table with a few of my geeky friends, many of whom came from the database world. This was cool as their answer wasn’t “What is that?”, it was “Why would you want to?” (As it turns out, there are some VERY good uses for this technique.)
Regardless, mine is not to wonder why, but rather to see if things can be done. And, as you might expect, we can absolutely create a Cartesian Product (or Cartesian Square)using Power Query. And actually, it’s REALLY easy when you know how.
Cartesian Product for the Excel person
So what the heck is a Cartesian Product anyway? (Besides being really hard to spell!)
Picture you have two lists:
- Automobile make
- Paint colours
Plainly the two are not related in any classic kind of term. (How do you match red to Dodge Ram?) But assume that we can paint any vehicle we have any colour of paint we have. Of course, that’s done at the factory, so we need to make a product list that shows all of our possible combinations: Dodge Ram – Red, Dodge Ram – Blue, Dodge Ram – Black, etc…
So basically, for each row on the Vehicle Make table, we need to assign every colour that exists in the Paint Colours table.
If you’d like to read more about this join/math, there is a good article on Wikipedia explain it.
Creating a Cartesian Product in Power Query
To illustrate this, I’m actually going to use a deck of cards, as shown in the Wikipedia article I referenced above. So we have two tables, as shown below:
(That’s table “Cards” on the left and table “Suits” on the right)
And now, what we want to do is create a join so that we get each suit assigned to each card. (We could do this the other way around too, or we could just sort it after. Either way gets us to the same place in the end.)
Setting up the Tables
So the first step is to set up the tables. To do this we simply pulled each table into Power Query and set up the query as a connection only query.
- Click in the appropriate table
- Create a New Query –> From Table
- Right click the only column –> Change Type –> Text
- Home –> Close & Load –> Close & Load To… –> Only Create Connection
We should now have two queries in our workbook that are pointers to the underlying data:
Now, let’s set up a new query that references the cards query:
- Workbook Queries Pane –> right click Cards –> Reference
- Rename the query to “52 Card Deck”
Awesome, we’ve now got a simple query all ready to go:
Creating the Cartesian Product
The trick now is to create the Cartesian Square. You’d think this would take some weird Voodoo magic, but it’s actually SUPER simple… just different than normal. We can’t fall back on the whole “merge tables” experience, as we’d need to pick matching values between the columns… and their aren’t any. For this reason, none of the join types I discuss in either of these articles will work:
- Merge Tables using Outer Joins in Power Query
- Merge Tables using Inner and Anti Joins in Power Query
So how do we do it? Like this:
- Add Column –> Add Custom Column
- In the formula area, enter “Suits” (with no quotes)
Did you see what we did there? We asked Power Query to provide the Suits table for each row of our cards table. The result is a table of tables which – when you click in the whitespace beside the Table keyword – you can see contains our suits:
The final step is to click that little expand icon on the top right of the Custom column (clear that default prefix checkbox as you do) to expand those records. And the result is a completed table where each card has all four suits.
Not bad… no need to write any funky formulas, fill up or anything. 🙂
Sample Workbook
If you’d like to download the sample workbook, you can find it here.
(Now I just hope that when I want to find this article that I can remember how to spell Cartesian correctly!)
20 thoughts on “Cartesian Product Joins (for the Excel person)”
Doh! That is sooo much easier than what I have been doing all this time: create a column of ones in both tables, and then merging on that column... this eliminates 3 or 4 steps!
Also known as a cross join. These can be handy for setting up filter combinations independent of observation data.
Now I know how Chris Webb picked his blog name. Cross Join us WAYYYYYY better than Cartesian Product!
Great and simple...
@Eric: If it gives you any comfort: Your "clumsy" version might perform much faster in some cases: https://blog.crossjoin.co.uk/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/
🙂
In Excel when you just pull columns from 2 different tables - the default is a "Cartesian" product (in the Query editor)
This is true not just for Excel but virtually any database - Access/ SQL Server etc
I am surprised that a Cartesian product is not a default option in PQ
I mentioned this to the PQ team you replied that it has been "bubbled up for consideration in future product improvements" - no definitive dates as of now
Simple but great. Thanks!
Awesome!! Thank you so much, saved me hours!
Easier way:
https://chandoo.org/wp/2016/10/04/generate-all-combinations-from-two-lists-excel/
I've never been a big fan of the term "easier". If you know the data model then maybe, but I really think it comes down to what you're comfortable with. It's a different way, for sure, and is also easy. Won't dispute that, but I don't really think one trumps the other.
Excel 2016 is (or will be) under the Get Data button on the Get & Transform group. It’s still Power Query. Just a quick note that everything you can do in 2013 we can still do in 2016, but you can ALSO do if you download Power Query. Don’t avoid doing that just for the sake of avoiding the download. It’s an amazing tool.
Ken - here's a way to develop a 'Jaccard' fuzzy match using Cartesian product
let
Table1 = #table(type table [value = text],
{ {"spruce"}, {"cedar"}, {"birch"}, {"pine"}, {"poplar"}, {"aspen"},
{"baobab"}, {"larch"}, {"willow"} }),
Table2 = #table(type table [word = text],
{ {"sprce"}, {"cidar"}, {"pane"}, {"aspern"}, {"babab"} }),
AddCol = Table.AddColumn(Table2, "Custom", each Table1),
Expand = Table.ExpandTableColumn(AddCol, "Custom", {"value"}, {"value"}),
FuzzyCalc = Table.AddColumn(Expand, "Prct", each
2 * List.Count( List.Intersect(
{ Text.ToList([word]), Text.ToList([value]) } ) )
/ (List.Count(Text.ToList([word])) + List.Count(Text.ToList([value])))),
SortRows = Table.Sort(FuzzyCalc,{{"word", Order.Ascending}, {"Prct", Order.Descending}}),
AddIndex = Table.AddIndexColumn(SortRows, "Index", 1, 1),
RemoveDupls = Table.Distinct(AddIndex, {"word"})
in
RemoveDupls
That's really interesting Drew. I wonder how the performance is? I'm going to have to have a play with this...
This is already in the running for the coolest EXCEL power query function I've learned in 2018. The link to wikipedia article on cartesian products is just icing on the cake.
After spending almost an hour on this problem and then finally finding your article, my brain is currently flooded in dopamine.
/drool
Never thought the blog would make someone drool, but... cool! (Glad you found it useful!)
Thank you very much for sharing this. In my quest to overcome this, I part processed my data in PowerQuery and then went to Access. This will save a ton of time and actually enable automation.
Glad you found it useful. We actually feature this in our new Power Query recipe cards too (available at https://powerquery.training/shop)
Video on a compare of three Fuzzy matching tools
https://www.youtube.com/watch?v=Qi13Mt2-Pmk&feature=youtu.be
Hi Ken,
Thanks for this great post.
Can it be extended to more than 2 tables? For example, I have many tables (the number can vary every time) and get them with Excel.CurrentWorkbook function and I need to multiplay all of them. Is there a way this to be done?
Thanks
Hi Kristina,
I can’t see why not. Even if you just run the pattern for the first two tables the do it again for the next, it should work fine.