In last week’s post we looked at how to combine multiple files together using Power Query. This week we’re going to stay within the same workbook, and combine multiple worksheets using Power Query.
Background Scenario
Let’s consider a case where the user has been creating a transactional history in an Excel file. It is all structured as per the image below, but resides across multiple worksheets; one for each month:
As you can see, they’ve carefully named each sheet with the month and year. But unfortunately, they haven’t formatted any of the data using Excel tables.
Now the file lands in our hands (you can download a copy here if you’d like to follow along,) and we’d like to turn this into one consolidated table so that we can do some analysis on it.
Accessing Worksheets in Power Query
Naturally we’re going to reach to Power Query to do this, but how do we get started? We could just go and format the data on each worksheet as a table, but what if there were hundreds? That would take way too much work!
But so far we’ve only seen how to pull Tables, Named Ranges or files into Power Query. How do we get at the worksheets?
Basically, we’re going to start with two lines of code:
- Go to Power Query –> From Other Sources –> Blank Query
- View –> Advanced Editor
You’ll now see the following blank query:
let
Source = ""
in
Source
What we need to do is replace the second line (Source = “”) with the following two lines of code:
FullFilePath = "D:\Temp\Combine Worksheets.xlsx",
Source = Excel.Workbook(File.Contents(FullFilePath))
Of course, you’ll want to update the path to the full file path for where the file is saved on your system.
Once you click Done, you should see the following:
Cool! We’ve got a list of all the worksheets in the file!
Consolidating the Worksheets
The next step is to prep the fields we want to preserve as we combine the worksheets. Obviously the Name and Item columns are redundant, so let’s do a bit of cleanup here.
- Remove the Kind column
- Select the Name column –> Transform –> Data Type –> Date
- Select the Name column –> Transform –> Date –> Month –> End of Month
- Rename the Name column to “Date”
At this point, the query should look like so:
Next we’ll click the little double headed arrow to the top right of the data column to expand our records, and commit to expanding all the columns offered:
Hmm… well that’s a bit irritating. It looks like we’re going to need to promote the top row to headers, but that means we’re going to overwrite the Date column header in column 1. Oh well, nothing to be done about it now, so:
- Transform –> Use First Row As Headers –> Use First Row As Headers
- Rename Column1 (the header won’t accept 1/31/2008 as a column name) to “Date” again
- Rename the Jan 2008 column (far right) to “Original Worksheet”
Final Cleanup
We’re almost done, but let’s just do a bit of final cleanup here. As we set the data types correctly, let’s also make sure that we remove any errors that might come up from invalid data types.
- Select the Date column
- Home –> Remove Errors
- Set Account and Dept to Text
- Set Amount to Decimal Number
- Select the Amount column
- Home –> Remove Errors
- Set Original Worksheet to Text
Rename the query to “Consolidated”, and load it to a worksheet.
Something Odd
Before you do anything else, Save the File.
To be fair, our query has enough safe guards in it that we don’t actually have to do this, but I always like to play it safe. Let’s review the completed query…
Edit the Consolidated query, and step into the Source line step. Check out that preview pane:
Interesting… two more objects! This makes sense, as we created a new table and worksheet when we retrieved this into a worksheet. We need to filter those out.
Getting rid of the table is easy:
- Select the drop down arrow on the Kind column
- Uncheck “Table”, then confirm when asked if you’d like to insert a step
Select the next couple of steps as well, and take a look at the output as you do.
Aha! When you hit the “ChangedType” step, something useful happens… we generate an error:
Let’s remove that error from the Name column.
- Select the Name column –> Home –> Remove Errors
And we’re done. We’ve managed to successfully combine all the data worksheets in our file into one big table!
Some Thoughts
This method creates a bit of a loop in that I’m essentially having to reach outside Excel to open a copy of the workbook to pull the sheet listing in. And it causes issues for us, since Power Query only reads from the last save point of the external file we’re connecting to (in this case this very workbook.) I’d way rather have an Excel.CurrentWorkbook() style method to read from inside the file, but unfortunately that method won’t let you read your worksheets.
It would also be super handy to have an Excel.CurrentWorkbookPath() method. Hard coding the path here is a real challenge if you move the file. I’ve asked Microsoft for this, but if you think it is a good idea as well, please leave a comment on the post. (They’ll only count one vote from me, but they’ll count yours if you leave it here!)
24 thoughts on “Combine Multiple Worksheets Using Power Query”
Another great post Ken!
Chris Webb gives us one option to solve the hard coding issue:
http://feedproxy.google.com/~r/wordpress/Cpjz/~3/XYuUulSPc0Y
Chris's stuff is always good. I've got an alternate spin on it as well that I may post in future. 🙂
Pingback: Combining Data From Multiple Excel Workbooks With Power Query–The Easy/Complete/Power BI Ready Way! | The Power User
Pingback: Excel Roundup 20141124 « Contextures Blog
Hi Ken,
Great job !!!
I suppose i found third method to get data from current workbook (i have not seen this method in the net)
Of course it has some limitation but i can imagine situation when it will be helpful.
Below is a link for my video (YT) about this method.
https://www.youtube.com/watch?v=MhaT51MEY1M
You can check what we see in current workbook using Excel.CurrentWorkbook() and Excel.Workbook(File.Contents(pathToTheFile).
Thanks for your great job 🙂
Nice Bill! Watching right now and found the Filter part quite interesting. Hadn't actually seen that before.
Hi Ken, wondered if I could draw on your help
I have a situation where I merge two power queries to form one new one
For the merged query I would like the ability to have an additional column with a flag to denote keep record or disregard record. This would need to be maintained using some kind of 'maintenance table' on a separate worksheet
I basically get lists of store IDs from two separate data sources and where there is a mismatch I need the ability to disregard or keep a record for onward analysis. A keep record scenario would be a valid exception like a new store ID appearing in one data source but not the other) or disregard this record
Due to the frequency of the reporting process there is not time to have the underlying data changed so everything is in sync
I too think that an Excel.CurrentWorkbookPath() method would be fantastic!
any idea excel gurus how i separate in one column the below from each other. the email address needs to stand alone and i have 1,000s like this. help!
Amanda Mankane
so the name amanda makane is in one column and the email address minus the < sign is separated?
Don't see an email address, but assume it's at the end, with a space separating it.
To do this using Power Query, you'd pull the data in to Power Query, select the column, go to Transform-->Split Column-->By Delimiter, choose Space as your delimiter, and choose the option to split "At the right-most delimiter". Click Close and Load and you're done.
Hope that helps!
Here's a trick for getting at the sheets of the current workbook without hardcoding the full path. Set up a Defined Name of FullName:=SUBSTITUTE(MID(CELL("filename",A1),1,FIND("]",CELL("filename",A1))-1),"[",""). Get at from PQ with Source = Excel.Workbook(File.Contents(Excel.CurrentWorkbook(){[Name="FullName"]}[Content]{0}[Column1])).
Nice, Mike. I tend to use a similar approach that pulls from an Excel table. 🙂
Pingback: Power BI, Power Query, and a passion for awesome | clouded365
Hi Ken,
Thank you for a really helpful article!
I have a question: Is there any way to include the name of the workbook (like we're including the names of the sheets)?
I have a great amount of workbooks each containing 5 sheets (one for each region), and the workbooks are named after specific months.
By following this guide and your guide on how to 'Combine Multiple Excel Workbooks in Power Query' (https://excelguru.ca/2015/02/25/combine-multiple-excel-workbooks-in-power-query/), I have succeeded in creating a function that combines all my workbooks and all the sheets - BUT I really want a column that states the month (which is the name of the workbook). Can you help me? I'm pretty new to Power Query..
Regards,
Mette
To get the workbook name you'll actually need to use the ability to combine external workbooks, not from CurrentWorkbook() as this function assumes you're in the same workbook.
To make this work you'll need to start from Get Files --> From Folder and then add a custom column fore Excel.Workbook([Content]). You can then remove the columns you don't want, but preserve the workbook name. Just a heads up though... the data is pulled against the most recently saved copy of the workbook.
For more info: https://excelguru.ca/2015/02/25/combine-multiple-excel-workbooks-in-power-query/
When I do this in Excel 2016, I get a row of headers for each sheet. When I promote first row to headers, that only affects the top set (the headers for one sheet); the other sets of headers are still there as rows of data.
If the data is numeric, I can set the data type to whatever appropriate numeric type, which turns the headers into errors, and then remove errors. But what if the data is text? How do I get rid of those extra sets of headers?
If you know how many you have, you can just use the Remove Top Rows command to get rid of the correct amount. Otherwise, is there a term that you can filter on? (I.e. does every row start with the same text, allowing you to filter it out?)
Remove Top Rows only removes the headers from the first table. I don't see a better way than filtering on the header names. But what if the header names change?
Follow-up question: I see that this lines up the columns by position among the columns used, not by header name. Without going to a table or defined name, is it possible to make them line up by header name? As far as I can tell, I'd probably need a separate query for each sheet, then a query on those queries. And manually adding a query for each new sheet that is added.
Also FWIW, I stumbled across away to do this without any typing in the Query Editor: Choose Get Data > From File > From Workbook. Select the workbook and click Import. In the Navigator, pick any sheet and click Edit. In the Applied Steps section of the Query Settings windowpane of the Query Editor, delete all steps but the first one.
Hey Greg,
I think you're seeing a mix of old and new stuff going on here. (This post was published over 2 years ago, and the software has changed significantly in this area.) Have a read of this and see if it helps at all: https://excelguru.ca/2016/12/22/combine-excel-files/
Hi Ken,
using yor example i can only pull all sheets from the workbook, Can you give me a tip how to pull specific selected sheets in one query
Thanks in advance and btw a great blog
Hi David,
You can't exactly pull "selected" sheets, it kind of works the other way around. You connect to pull "all" sheets, then use the drop down filter arrow to de-select the ones you don't want before combining.
Thank you for the help. I have learnt new thing today
Is there an alternate way to combine data from multiple sheets doing a power query and sharing workbook?
Trying to have my team enter some stats on their respective sheets and combine all data for reporting purposes.
Your help is much appreciated.
Hi Jorge,
Yes, this post is quite old. Try having a read of this one: https://excelguru.ca/2016/12/22/combine-excel-files/