How to Reference other Power Query queries

One of the things I really like to do with Power Query is shape data into optimized tables. In order to accomplish that goal, I’ve begun using Power Query to source data over Power Pivot’s built in methods. But in order to build things the way I want, I need an easy way to reference other power query queries.

Why would I go to the effort of feeding through Power Query first? I’m no SQL ninja, and I find Power Query allows me to easily re-shape data in ways that would be hard with my SQL knowledge. I can leverage this new tool to optimize my tables and build Power Pivot solutions that require less tricky and funky DAX measures to compensate for less than ideal data structure. (I’d rather have easy to understand relationships and simple DAX measures!)

Methodology

My methodology generally goes something like this:

  • Load a base table into a Power Query. I then set it to only create a connection. Let’s call this my Base Connection.
  • Next I’ll create as many queries as I need to re-shape the data in the Base Connection into the forms I need, then load those into the data model.

It’s that second part that is key. I need to be able to reference other Power Query queries (namely my Base Connection) so that I could prune/trim/re-shape the data.

Reference other Power Query queries - The Old Way

Until recently, I would create my Base Connection, then I’d do the following to create the new query to reference that one.

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and choose Reference

The problem was this… my intention was to reference and customize my query. Instead, it immediately loads it into a worksheet. I have to wait for that to finish before I can edit the new query and customize it the way I want.

Reference other Power Query queries - The New Way

I learned a new method last week from one of the Power Query team members which is much better (thanks Miguel!). I included it in my last post, but I thought this was worth calling out on its own.

Instead of following the method above, this time we will:

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and Edit

Now we’re taken into the Power Query window. On the left side we can see a collapsed “Queries” pane. When you expand that, you get a list of all Power Queries in the workbook.

image_thumb13[1]

  • Right click the Base Connection query and choose “Reference”

We now have a new query in the editor that we can edit, without loading it into a worksheet first. Faster, and more in line with my goals.

The other thing I like about this method is that it immediately gives me access to that queries pane. Why is that important? Because I can drill through the other queries and get at their M code without having to close the window and go back to Excel first. So if I have some funky M code I need to re-use, it makes it way easier to review it and copy it.

Share:

Facebook
Twitter
LinkedIn

29 thoughts on “How to Reference other Power Query queries

  1. Pingback: Excel Roundup 20150126 « Contextures Blog

  2. Hello Ken

    Thank you for your post. With these new method what I see is that every reference reads to the original source and not the final layout of my base query, and thats make that the time for data refresh grown in an exponential mode for every reference that I have. Is that correct?

    Is there another way to use the outcome of a queryas source for another query?

    Regards

  3. Hi Ivan,

    I'm not sure I fully understand here... Assume that we have query 1 and add a query called "2" that references query 1. Are you asking why, when you refresh Query 2, Query 1 refreshes? Are you looking to make Query 1 static, so that it doesn't update unless you specifically refresh it?

    If that's the case, then no... when you use one query to reference another it creates a dependency chain. So when you refresh the query, it will reach back to the original data source and refresh that. If that is a database or file, it will reach out to make sure the data is current. If it's another query, it will refresh that query to make sure it is current, which will then reach back to it's sources.

    I haven't found that structuring queries to read from each other adds any significant time to the refresh process, but then to be fair I haven't set up any tests to measure it either.

    Did that answer your question, or did I misinterpret?

  4. It would be great to use this ability to reference all Power Queries in a workbook to do search in all of them easily. Even better would be a Find and Replace feature, to update variable names etc.

    Has anybody done this yet? Would love to use this on workbook that 10 + Power Queries with 20+ steps each.

    Thanks!

  5. I'm not sure, to be honest. I think I've seen something, but iirc it was more about using a crash report to scan the details.

  6. I have a data sheet where the generated date shows up in a single cell up on the top and then the data table itself follows.

    I created a query to just pick up the generated data but now I want to use that date within a formula of a new column in the 2nd query (the one that pulls/transforms the data table itself). How can I do that?

    Or how can I add the Generated data as a new column within the data table itself? That way I can use this generated date column and create the new column referencing this.

  7. Ken, I agree with the aproach of using PQ to shape the data to help simplify the DAX relationships and measures. Can this be done in separate workbooks? In other words, can I import query-generated tables or connections, or data model tables from one workbook into another? I have many, many data tables that requiring a great deal of shaping. I'd like to import the results of all the shaping to another workbook for later processing either in PQ or DAX.

  8. Hey Bob,

    You can't connect to the data model in another workbook, but you can connect to Excel tables in another workbook. So if you wanted to use one workbook to run your main queries, then land those in tables, you could pull those into your solution.

    To be fair though, reaching against that workbook as a source won't force it to refresh. That may be the goal here though, meaning you only need to refresh from the original source when absolutely needed.

  9. Hi Ken, thanks. I was in your first PQ course ; and your and Miguel's book has been invaluable. My challenge is that I have a workbook with many tables each having well over 1m rows. Your assumption is correct; I don't need to refresh often.

  10. Yeah, that would be a pain, for sure. I guess my thought is that I'd probably try and build a solution that pulled the data in and cleaned/processed it with Power Query before uploading it into a staging database. It would take some VBA knowledge, but if built a prototype in the past that did exactly this. I have logged the "processed files" to a database table to ensure that they would not be refreshed again in future.

    Kind of a pain to build, but probably the better solution for you, as you can then refresh from the database as needed.

  11. Ken, thanks. What I did was in Workbook A I used DAX as query language (EVALUATE and SUMMARIZE) to extract from the data model into an Excel table (goal is to not violate the 1M rows limit for each Excel table). Then, in Workbook B I used PQ import and apend to reassemble, and then load into another data model.

  12. Hi Ken, great post. Thanks.

    I am trying restrict the rows in a Projects table (8000 rows) to only those ProjectNumbers that are present on Timesheets (100k rows) within the 12-month moving time period defined for my Timesheets query.

    I create a new ProjectList query which references the ProjectNumber collumn in the Timesheets query and removes duplicates to create a distinct list of the ProjectNumber values present.

    let
    Source = Timesheet,
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Project Number"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns")
    in
    #"Removed Duplicates"

    My query is on refresh, this ProjectList query reloads the entire 100k rows of the Timesheets query from SQL. Is there any way to reference the Timesheets query in such a way that it does not go back to SQL? Would Table.Buffer help 'somewhere'?

    Kind regards,
    Brian

  13. Hi Brian,

    Unfortunately the second query can't take advantage of any query folding against the SQL database, only the first query in the chain can. If I read this right, you've done this:
    Query 1: Connects to the raw table
    Query 2: References Query 1 and filters the data

    Basically, this means that no steps from query 2 will be rolled up and sent to the database, causing you to load all 100k rows

    Now if you set it up to do this:
    Query 1: Connect to the database, remove columns, remove duplicates
    Query 2: References Query 1 for further operations

    The the entire contents of Query 1 could be passed back to the SQL database in order to process it way more efficiently, returning just the 8 k rows.

  14. Hi Ken,

    The problem I have with this approach is that if I do some filtering of the data, then the referenced query gets affected too. This is inconvenient because sometimes I want to handle that query as a total different one, but having the flexibility of only "refreshing" one time, one data source. For example, I can bring the data as a connection and make another query as a reference. Everything fine here. Let's say I do a remove duplicate on the country column and obtain 10 countries. However, if I apply some filter on the first (original) query, and the countries in that scope are limited, then my other query will be affected and now I would see less countries.

    The other way of doing it is by duplicating the query, but the problem is that then the file path is hard coded and then I might need to first remember to update the path and then it takes manual effort to do this.

    Any idea?

    Thanks

    Fernando

  15. Hi Ken,

    Well I guess I don't NEED to modify the base query... I can just create more references and modify those without touching the original base query.

    Disregard!

    Thanks anyways

    Fernando

  16. Is it possible to reference a specific step in another query instead of the entire query?
    If I have 10 steps in a query, but I want to create a reference to the 5th step and have that step returned to my new query, is that possible?

  17. Hi John, not that I'm aware of. But why not right click the next step and choose "Extract Previous". That will give you a query that you can load as connection only, and then reference it using just the query name?

  18. Hi Ken,

    I've transformed a data query as a 'Connection Only' because I want to minimize the file size.

    Is there a way to reference this 'Connection Only' query from another excel workbook (i.e. not worksheet)? I want it to be reference so that any update I made to the base query I can refresh my reference query.

    The only plausible way to do this is to 'Load To' the 'Connection Only' query as a table first. Then from a different excel workbook, I would extract the query From File -> From Excel and open the 1st excel file. However, loading the base query to a table cause my first excel file size to increase tremendously which I am trying to avoid.

    Regards,
    Ayfiq

  19. No, sorry Ayfiq. Why not just set up the query as connection only in the other workbook to pull from the original? It won't increase the file size at all, and will be available for you where you need it.

  20. I have this problem with multiple calls to the main sql query from reference queries.

    I can not find in Excel, the option to turn off parallel load option, this seems to be a tip but seems to only exist in Power BI Desktop. ??

    Anyhow the intention of this is to prevent multiple calls to the base query (sql query + with some transforms), by caching it.

    I manually cache it by loaded to the worksheet.
    I then create an Excel named reference, (select say columns at top) and name it, then Get Data from Range (this is essentially the cache). I then change my reference queries to source their data from this new query.

    This made it super fast, without all the mumbo jumbo.
    Finally.

    I suggest Microsoft build a workflow and refresh schedule into Power Query for Excel (and Power BI etc), and allow freezing of base queries or something. It's simply too complicated for the business user to delve into all this, esp when its a tool made for business users. Hope this solution helps some people, though keep in mind, for very very large datasets, this may not be a solution, as you won't be able to load all that data you want cached into a workbook if its too large. How large? I don't know but I do know some people can not load large sets of data to their worksheet.

  21. Ken,

    I was hoping to reference a specific step of a prior query in the chain. Since that seems not to be possible, thanks for the mention of what Extract Previous does. Saved me some time.

    Your blog and book are greatly appreciated.

    Matthew

  22. Hi Matthew,

    You can certainly reference previous steps in the query chain. Try this... create a new query, record any other action so that you get 2 steps. Now click the little fx button on the formula bar and type in =Source. You should be looking at whatever was evaluated during the Source step.

  23. You are sooooo coool, i have spend the last day finding a solution, and cannot believe it is this simple...thank you 🙂

  24. Pingback: Aggregate Rows in Power Query | Computergaga

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts