Combine Excel Files

If you've wanted to use Power Query to combine Excel files with a single click - like you could for TXT and CSV files - the feature is finally here*.  The Combine Binaries feature has been re-written (as I discussed yesterday), and it now allows for easy combination of Excel files.

* This new feature in the O365 Fast Insider preview today and in Power BI Desktop's November update.  I'm expecting to see this show up in the January Power Query update for Excel 2010/2013.

Just a quick caveat, I'm going to cover the items specific to the Combine Excel Files experience here, and not dive deep into the methods of how to modify and adjust the created queries.  For that reason, if you haven't read yesterday's post, I highly recommend doing that first.

The Classic Combine Excel Files Experience

For anyone who has tried to combine Excel files in the past, you'll be familiar with this.  You create a new query to pull From File --> From Folder and navigate to the folder full of Excel files.  Then you hopefully click the Combine Binaries button:

image

And you get this:

image

Obviously that's not so helpful.  The answer to deal with this was to go to Add Column --> Add Custom Column and use the formula =Excel.Workbook([Content]) to convert the data into tables. Then some filtering and more user interface driven work was required in order to get your data.  The experience was a bit painful, and certainly not for beginner users.

The New Combine Excel Files Experience

To start with, I'm going to take two Excel files with sales transactions in them and combine them easily using the new combine Excel Files experience.  Here's the file characteristics:

  • Each file has a small set of data (2-3 rows)
  • The data is stored on Sheet 1 in a table
  • Both files are stored in the same folder

To get the data, I'm going to do the following:

  • Get Data using a new query "From File --> From Folder"
  • Browse to and select the folder
  • Click Edit at the preview window

As you can see, we've got a couple of Excel files here:

SNAGHTML9cbff3d

So I'll click the Combine Binaries button (at the top right of the Content column.)

And where this would have triggered an error in the past, it now kicks out a preview:

image

And what happens next depends on what you select (which is why they are numbered above.)

Combine Excel Files - Method 1

For the first kick at this, I'll select the Sample Binary Parameter 1 folder icon (indicated by the number 1 in the Combine Binaries preview.

Nothing will ever show in the preview window, but upon selecting the folder I can click OK, which will result in this:

image

As I showed in yesterday's post on the new Combine Binaries Experience, we now get a bunch of new queries and a few steps along the way.  The end result of this query, however, is a listing of all the worksheets, tables and ranges in each workbook.  This is the view that will allow you to go back and granularly pick out what you need and transform it.  In other words, this is kind of the detailed hard core view which was the equivalent of writing the custom columns that we used to have to do.

Because this is so similar to the classic method, I'm not going to do much more with this.  The real point was to expose that selecting the folder in the preview window will bring you to this setup.

Combine Excel Files - Method 2

Method 2 revolves around selecting the table in the preview window; in this case the Sales table.  When we select that branch in the preview window we WILL get a preview of the data:

image

And when we click OK, we actually get the data combined nicely:

image

As discussed in the previous post, if we wanted to modify and/or change:

  • The Source columns (Source.Name or others): We modify the Removed Other Columns1 step in this query.
  • The data before it is imported and combined: We modify the Transform Sample on the left side.

Now this section is MUCH easier than what we used to have to do!

Combine Excel Files - Method 3

But what if your data is not in an official Excel Table?  What if it's just data in a worksheet range?  Well, then you select the worksheet icon instead:

image

And the results are virtually identical to the previous method:

image

Why does that work?  It works because the Transform Sample is smart enough to automatically promote the first row to headers, so it doesn't actually need the table.  On the other hand, if that data wasn't in the first row, you may need to go back to the Transform Sample and tweak it to do what you need (like remove top rows, promote rows to headers, delete columns, filter, etc.)

Caveats When Trying to Combine Excel Files

This experience will work well for many things, but as always there are some caveats.

Single Object Only

The default experience here is to combine based on the object you select.  In other words, if you select Sheet 1, it will combine Sheet 1 from each file.  It won't combine all sheets in the file based on the Sheet 1 code.  If you want to do that, you need to go back to Method 1 above, filter to the objects you want, and deal with them using classic import methods.  (Unless you try to get real techy and build the function then repurpose it to use in that table - something I have not done yet.)

Preview Based On First Item In the List

The preview and import templates are based on the first file in the list.  You can see how that affects things when I throw a new file into my structure that has different worksheet and table names:

image

While the two Sales workbooks have Sheet1 in them, this one doesn't, making it impossible to use this function to combine the first worksheet in each file. (The template would be based on Dec and would return errors for the other two files.)

If the order is important, you'll need to sort the file list first to get the correct file to the top before you trigger the Combine Binaries function.

For the record, I have sent an email to the Power Query team suggesting that it would be nice to get the option to pick the file here which the template should be based upon.  That would make this much easier to work through, I think.

Inconsistent Columns Are Bad News

Let's say that you have two files with tables in them who have different column names (or quantities).   The transformations generated will actually deal with this correctly, resulting in a column of tables which have different headers.  All good so far, but when the main query gets to the last step, it expands the column of tables based on the headers for the table in the first row only.  This is actually a standard thing, so not a surprise, I just want to make sure you don't think this is a holy grail that will solve the differing column issue.  (We have more work to do in that case.)

Overall Thoughts

At the end of the day, I have to say that this is a pretty welcome addition.  I'm still not a fan of the names of the generated queries, and I would add something to change the template file, but I think this is going to make it a LOT easier for people to import and transform Excel files than it has been in the past.

Share:

Facebook
Twitter
LinkedIn

9 thoughts on “Combine Excel Files

  1. Hi Ken,
    great post. Same as with your book. Very Helpful. I'm still at the beginning of a very steep lurning curve.
    And that's my point. Learning to combine the "painful" way as you mentioned in your post, helped me to understand what the program is doing behind the scene.
    The new combine automation led me to some issues anyway (either due to being still a noob or having missed somewhat) ...

    1. Calling the combine function by clicking the right hand button enables to switch between the files contained in the folder. But ... (using the most May '17 BI desktop version) the shown files below do NOT change.

    2. Pushing the yellow Combine / edit button led to a later Invoke error. First example file got loaded. Remaining files with error status. Final load error also. In M code of example file and function call location of example file got hard coded. Further files could not be called therefore. Changing to '= Source{0}[Data]' solved this issue.

    Might be helpful for other beginners, although during these hours I learned a lot about this new functionality and M code again.
    Regards and thx again for your efforts
    Thomas

  2. Hi Ken,
    The problem with combining xlsx files from a folder is when the sheet name is different. For example, North.xlsx has a sheet name called "north" and West.xlsx has a sheet name called "west" then this wont work. Can you show in a blog how to handle this situation?

    This blog post is great if all the data in all the files is in "Sheet 1" but when the sheet name is different for each file, this throws off errors.

    Thanks for your help!
    John

  3. Hey John,

    Page 47 of M is for Data Monkey shows the classic method for extracting data from Excel files. This will allow you to expand each worksheet rather than named ones.

  4. Hi Ken,

    Yesterday I installed 64 bit excel 2016 version, and I am facing huge problem to combine excel files from folder. I constantly received an error:
    DataFormat.Error: The input couldn't be recognized as a valid Excel document.
    Details:
    Binary

    In power BI it works perfectly..... in excel 32 bit version it was working in same way as in Power BI

    Any idea?

  5. Hi Natalia,

    I'm wondering if you might be missing the 64 bit version of the 2010 AccessDatabaseEngine drivers that Power Query needs. (Yes, for Excel 2016 files.) Seems strange that it would work for Power BI and not Excel though.

    First thing you might try is doing a full online repair of Microsoft Office (you can kick this off from the Control Panel --> Programs and Features by selecting office, then clicking Change

    Second thing to try would be installing the 64 bit version of the Access driver here: https://www.microsoft.com/en-ca/download/details.aspx?id=13255

  6. Hi Ken,

    Thank you for taking the time to write these tutorials, they are very helpful.
    A question for you. If I want to combine multiple .xlsx files into one (the number and names of columns in each file are identical), each .xlsx file has two sheets but i only need data from one sheet, and each .xlsx file has hidden rows which i do not want to include in my final merged document. FInally, the results need to start off at the 6th row in my final merged document as opposed to the first row.

    I followed your tutorial and the resulting file is too large to generate, and the data always starts at row 1. Any guidance would be much appreciated 🙂

  7. Hi Mike,

    On the Home tab, you'll find a command to "Remove Rows" and "Remove Top Rows". That should allow you to eliminate the top 6 rows. It would need to be done in the sample transform. If that doesn't work, then you need to find some other methods to get rid of the rows. Often we can trigger an error by converting to a date/numeric data type, then removing rows with errors. This is something we show how to accomplish in our https://academy.powerquery.training/

  8. Thank you Mike. I was able to get rid of the top rows, my goal now is to merge all of the xlsx files into a "template" file where the data would start at a specific row number.
    Is it possible to pull only filtered data from the excel files that i am merging?
    Also, the link you provided gives me a 404 error.

  9. If you're using the sample transform, then yes, you can make that happen. (For reference, I've updated the link, thanks for letting me know!)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts