This post illustrates a cool technique that I learned at the MVP summit last week, allowing us to use Power Query to merge multiple files with properties from the file in the output. A specific example of where this is useful is where you have several files with transactional data, saved with the month as the file name, but no date records in the file itself. What we’d want to do is merge all the contents, but also inject the filename into the records as well.
The funny thing about this technique is that it’s eluded me for a long time, mainly because I way over thought the methods needed to pull it off. Once you know how, it’s actually ridiculously simple, and gives us some huge flexibility to do other things. Let’s take a look at how it works.
If you’d like to download the files I’m using for this example, you can get them here. You'll find that there are 3 files in total: Jan 2008.csv, Feb 2008.csv and Mar 2008.csv.
Step 1: Create your query for one file
The first thing we need to do is connect to the Jan 2008.csv file and pull in it’s contents. So let’s do that:
- Power Query –> From File –> From CSV
- Browse to the Jan 2008.csv file and import it
- Rename the “Sum of Amount” column to “Amount”
Perfect, we now have a basic query:
Notice here how the January file has no dates in it? That’s not good, so that’s what we’re trying to fix.
Step 2: Turn the query into a function
At this point we need to do a little code modification. Let’s go into the Advanced editor:
- View –> Advanced Editor
We need to do two things to the M code here:
- Insert a line at the top that reads: (filepath) =>
- Replace the file path in the Source step (including the quotes) with: filepath
At that point our M will read as follows:
We can now:
- Click Done
- Rename our Query to something like fnGetFileContents
- Save and close the query
Power Query doesn’t do a lot for us yet, just giving us this in the Workbook Queries pane:
Step 3: List all files in the folder
Now we’re ready to make something happen. Let’s create a new query…
- Power Query –> From File –> From Folder
- Browse to the folder that holds the CSV files
- Remove all columns except the Name and Folder Path
Wait, what? I removed the column with the binary content? The column that holds the details of the files? You bet I did! You should now have a nice, concise list of files like this:
Next, we need to add a column to pull in our content, via our function. So go to:
- Add Column –> Add Custom Column
- Enter the following formula: =fnGetFileContents([Folder Path]&[Name])
Remember it is case sensitive, but when you get it right, some magic happens. We get a bunch of “Table” objects in our new column… and those Table objects hold the contents of the files!
I know you’re eager to expand them, but let’s finish prepping the rest of the data first.
Step 4: Prep the rest of the data you want on the table rows
Ultimately, what I want to do here is convert the file name into the last day of the month. In addition, I don’t need the Folder Path any more. So let’s take care of business here:
- Remove the “Folder Path” column
- Select the “Name” column –> Transform –> Replace Values –> “.csv” with nothing
- Select the “Name” column –> Transform –> Date Type –> Date
- Select the “Name” column –> Transform –> Date –> Month –> End of Month
And we’ve now got a pretty table with our dates all ready to go:
Step 5: Expand the table
The cool thing here is that, when we expand the table, each row of the table will inherit the appropriate value in the first column. (So all rows of the table in row 1 will inherit 2/29/2008 as their date.)
- Click the little icon to the top right of the Custom column
- Click OK (leaving the default of expanding all columns)
- Rename each of the resulting columns to remove the Custom. prefix
And that’s it! You can save it and close it, and it’s good to go.
A little bit of thanks
I want to throw a shout out to Miguel Llopis and Faisal Mohamood from the Power Query team for demonstrating this technique for the MVP’s at the summit. I’ve been chasing this for months, and for some reason tried to make it way more complicated than it needs to be.
What’s the next step?
The next logical step is to extend this to working with Excel files, consolidating multiple Excel files together; something we can’t do through the UI right now. Watch this space, as that technique is coming soon!
16 thoughts on “Merge Multiple Files With Properties”
Hi Ken,
Great post!
Here is a way to do this same this with Excel files:
http://datapigtechnologies.com/blog/index.php/using-power-query-to-combine-data-from-multiple-excel-files-into-one-table
Thanks Mike! I'm going to a follow up in the next few weeks on merging Excel files. Might look a bit different than yours, as the technology has matured a bit.
I have a question about power query. I uploaded 2 tables in an excel spreadsheet using the power query tool. While doing that, I created a column name "concatenate", which basically it is an ID based in a combination of the other columns. Both tables are uploaded in the excel file using the "load to data model" option. The problem is when I used the merge option to join both tables. For a reason I dont understand, the merge option does not identify all the matching rows, just some of them. I thought that maybe the columns I am using as ID are not the same, but when using the vlookup function it works perfectly fine. What could be the reason? Thank you.
Hi Sara,
I shot you an email. If you could email me a copy of your file I could certainly take a look at it and work out what's happening.
re. Power Query for MErging Excel Files
Ken Puls on November 13, 2014 at 8:32 am said:
Thanks Mike! I'm going to a follow up in the next few weeks on merging Excel files. Might look a bit different than yours, as the technology has matured a bit.
Hi Ken,
Did you publish the above for merging Excel files having the same structure?
Thanks,
-Jeff
You know, I don't think I ever did, actually. But Miguel posted a big thread on one technique here: http://www.powerpivotpro.com/2015/01/power-query-for-excel-combine-multiple-files-of-different-file-types/
Thank you very much. You saved me tons of time
Great! Amazing! Thank you!
I was so sad about Power Query is unable to correctly combine binaries when working with .xls files. And was almost goint to write a function in VBA. But you saved me a lot of time and turned the water into wine.
Hello , how can i use power query for import 30 file microsoft access from a Folder ( and the quantity of file will addition every month to folder) ? because i see that above is for csv file.
__thanks for help__
Same steps as here, just connect to your Access database at the beginning instead of the CSV file.
Increíble aporte.
He ahorrado muchas horas de trabajo y búsqueda en la web con la técnica que enseñas aquí.
Estoy sumamente agradecido.
¡Gracias, y contento de que lo haya encontrado útil!
Hi Ken,
Thanks for this awesome technique. I have your book and I found it there as well.
Can I use the same technique with text files imported from emails?
I connected to my email, then pulled all the emails from a certain person that sends daily text reports, then I got to convert all those attachments in a working format. Everything worked great, but I need to keep the email subject and the date.
Thanks for your help.
Hi Ken,
I just updated to the new version 2.42.4611.421 and Power Query handled it beautifully. No issues.
After I got the Binary column, I added a custom column (Csv.Document) and it worked. Before the new version, it was giving me an error.
Thanks,
Pablo
Pingback: Combine Multiple Excel Workbooks in Power Query
Pingback: Combine Multiple Worksheets Using Power Query