A client contacted me today asking how to merge files with different column headers in Power Query. The issue she's facing is that some of the files in her folder have a column called "customer", where others have a column called "ship to/customer". Plainly there has been a specification change somewhere down the line, but it's causing issues in the combination - an issue that would affect either Excel or Power BI.
What happens when we try to merge files with different column headers?
In order to replicate this issue, I created two very simple CSV files as shown here:
I dropped these into a folder called "Test" and then
- Created a new query From File --> From Folder
- Renamed the query to FilesList (making a query that I can use to easily sort/filter the list of files later)
- Right clicked the query in the Queries pane --> Reference
- Renamed this query to Transactions
- Clicked the Combine Binaries button
At this point I was presented with the following window:
The only thing I really want to point out here is that I choose the Example file which has the column name that I do NOT want. (I want to rename "ship to/customer", so it's important that it show up here.)
I then clicked OK, and was presented with this:
Err.. wait… what happened to my customer column?
Why is the Customer column missing?
To understand this, we need to look at the steps in the Transaction query:
If you were to click on the "Invoked Custom Function1" step, you'd see that it adds a new column to the Transaction query. The first table shows 3 columns where the first column is "ship to/customer". The second table also shows 3 columns, but in this the first is "customer". So all is working so far.
But then, when you get to the the "Expanded Table Column1" step of the Transactions query, it expands to show only the "ship to/customer" column. Why? It's because of the following M code generated by Power Query:
= Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Transactions", Table.ColumnNames(#"Transform File from Transactions"(#"Sample File")))
What this means in English is that it reads the columns from the table in the first sample file. That's not super helpful.
Now we could work on trying to enumerate all headers, but that would be a pain, as the code is complicated and still leaves us in a place where we would need to combine both columns anyway. Let's fix this by dealing with it at the source.
How to merge files with different column headers properly
Step 1: Prepare the Transactions query:
Delete the Changed Type step at the end of Transactions query. This is because it is setting the "ship to/customer" column to text, and by the time we're done, that column will be called "customer". If we leave the step as is, it will cause an error.
Step 2: Modify the Transform Sample query:
Next we need to select the Transform Sample query:
Now, what we want to do is rename that "ship to/customer" column to make it "customer". So let's do that:
- Right click "ship to/customer" --> Rename --> "customer"
The problem here though, is that when we apply this to our other files, THIS will cause an error. Why? They don't have a "ship to/customer" column to rename. So we need to wrap this in an error handler.
To do this, we need to adjust the formula that was just created to wrap it in a "try/otherwise" clause. This is essentially equivalent to Excel's IFERROR() formula. If it works, it will return the result. If not, it returns an alternate item, which we will set to be the previous step in the query. In other words "Try to rename this column. But if it fails, give me the original table.
The keys here are to
- Insert the try and otherwise in the correct location (remember they are case sensitive)
- Get the right syntax for the previous step name (remember to wrap it in #" " if the step name has a space in it.)
In this case, it should look like this:
Step 3: Revel in your success:
You got it. At this point, returning to your Transactions query should leave you pretty pleased, as we've plainly been able to successfully merge files with different column headers into the table that we actually want:
The only thing left to do is set the data types, and we're done.
17 thoughts on “Merge Files With Different Column Headers”
Huh, never would have thought to return a table with "try...otherwise". Very slick technique.
That's much more elegant than any solution I've used before, thanks Ken!
@ Ken - I use the below approach
1) demote headers
2) Remove top 1 row
3) All columns are now Column 1 , Column 2 etc
4) Merge this to an Table that has only the headers (typically maintained locally or in a separate XL file.
This only requires the Columns in the source file to be in the same order.
Each file the headers could be slightly different
Hey Sam,
The benefit of your approach is that you don't care how often the column names change. The drawback is that the columns must be in the same order.
The benefit of mine is that I don't care about the order. The drawback, of course, is that if the column names keep changing, then mine breaks.
The right approach really comes down to trying to figure out how your data structure is likely to change.
I had this EXACT scenario today. My solution, not as elegant was:
1) Start with file of column header I did not want to keep
2) Remove the automatic "promote headers" in the transform query
3) create a new column. If first column = "name I don't want" then "name I do want" else first column. Now I have a new column that is what I want. Delete the first column, promote headers, done.
Thanks Ken this is really helpful
That works too, Ed. I wanted to avoid this route mainly because I didn't feel that adding a new conditional column was necessary, but again also because in this scenario you must have all the columns in the same order. But again, it really depends on the way the shape of your data changes.
I think there is an even easier solution to this problem.
There is an optional argument in the Table.RenameColumns function:
Table.RenameColumns(table as table, renames as list, optional missingField as nullable number)
If you use "MissingField.Ignore" in the optional MissingField argument, it will ignore any renames where the column does not exist. This way you don't have to wrap the function in any sort of logical.
Table.RenameColumns(#"Promoted Headers", {{"ship to/customer", "customer"}}, MissingField.Ignore)
Very cool!
Maybe this formula will do the job too?
Table.RenameColumns(#"Promoted Headers", {Table.ColumnNames(#"Promoted Headers"){0}, "customer"})
I often get an error when changing data type right after Promoted Headers and this handy formula saves me all the time!
Hey, btw I really wanted to attend your PQ training course but since I live in Japan, that's a long way to go...any chance of video session?
Hey Marshall,
Thanks for this. For online training we do offer our Power Query workshop at http://powerquery.training/course and we also provided the course at https://powerpivotpro.com/comprehensive-power-query-online-video-training-course/
Hope one of those may help!
Thank you for sharing this method of merging files.
Is it possible to do this with tables that are similarly formatted, where the number of columns varies, as each table may have different dates as headers, and the number of of date column headers vary? Ultimately, is there a way to promote all first rows from the files within the folder and append the tables where the consolidated table will include all headers with Power Query?
Hey Chris,
Maybe, but to be honest, I don't think I'd approach things that way. I would be more tempted to use the Combine Binaries experience, unpivot the date columns in the Sample Transform, then re-pivot them in the Master Query afterwards (or more likely via a Pivot Table).
marshal this is what I was looking for!
Thank you!
Pingback: Chris Webb's BI Blog: Making Sure All Columns Appear When You Combine Data From Multiple Files In Power BI/Power Query M Chris Webb's BI Blog
Is there any solution to combining multiple files with different table headers, where for example the 1st 10 columns are identical, but then I can have between 1 or 30 columns.
So in my case I have variable column count, rather than the same content, but different column names, but same content. This is kind of a matrix data, where each sheet is a different scope, and I'm trying to merge the scope
I guess the question is really how you want it to work. If the first 10 columns always show the same data, and the next 1-30 vary, what do you want to happen when you stack them? Personally I would probably combine the files as standard, but just remove the Promoted Column Headers step in the Sample Transform. That should just stack them all, and then you could rename columns afterwards as necessary. Does that sound like it would work?