The Each Keyword in Power Query

This post is a guest article from Marcus Croucher, a finance professional based in Auckland, New Zealand. Marcus instantly fell in love with Power Query after seeing how it can easily transform data in ways Excel finds difficult, and how it can automate repetitive workflows.

I have been using Power Query in a professional capacity for a number of years, but have never fully understood exactly how the each keyword works in Power Query. I did some research around two years ago, but the documentation at the time was quite sparse (and still isn't great) and I did not have enough knowledge about wider programming to fully understand it.

In the meantime, I was looking for a way to use Power Query-like technology on OSX. I ended up learning a lot of Python, which has several libraries that have similar functionality to Power Query (albeit without the amazing graphical interface of Power Query.) Two notable examples are pandas (very popular data analysis library), and petl (a more light-weight and easy to use data processing toolkit).

This general programming knowledge gave me the background to understand some of the underlying concepts behind each, which I will now proceed to attempt to convey to you, the general intermediate to advanced Excel user, so that you can understand and wield the each keyword with confidence. I will use Python as a parallel and will link to some Python articles that expand on the underlying concepts I am trying to explain.

A Deep Dive into How the Each Keyword Works

In this article, I assume that you are familiar with Power Query and how to use it on a practical level. I also assume that you have some (limited) experience with the advanced editor and have seen the underlying M code, and understand the underlying data structures and how to access them.

  • Table
  • List (columns) – notated as {item one, item two, etc.} and accessed by [column header]
  • Record (rows) – notated as [category1: data1, category2: data2, etc] and accessed by {row number}

I suggest going ahead and pasting the code snippets into a blank query in Power Query to see for yourself what is really going on.

When Might You Use the Each Keyword?

The each keyword in Power Query is used by quite a few of the built-in operations. For example, filtering a table based on a certain column features each here:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filter_to_score_5 = Table.SelectRows(sample_table,

each ([Score] = 5))

in

filter_to_score_5

Figure 1

Using the Each keyword to filter a table based on a certain column

The purpose is quite clear: we want to keep each row where the value in the Score column is equal to 5. However, how does this work? [Score] looks like it is referring to a full column – how can we do a full column equal to a single value? Can I access values from the row above or below the row we want to keep/discard?

Another example of the each keyword in Power Query is creating custom columns. Here, I create a custom column which is the combination of the index and the name columns:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_ordered_name_column = Table.AddColumn(

sample_table, "ordered_name",
each Number.ToText([Index]) & " " & [Person])

in

add_ordered_name_column

Figure 2

Using the Each keyword to create a custom column which is the combination of the index and the name columns.

Similar questions apply. It is easy to follow the logic, but how it works and the limitations behind it are somewhat of a mystery.

So What is the Each Keyword in Power Query?

The current documentation has this to say about the keyword:

Each Keyword

The each keyword is used to easily create simple functions. “each ...” is syntactic sugar for a function signature that takes the _ parameter “(_) => ...”

 

Each is useful when combined with the lookup operator, which is applied by default to _ For example, each [CustomerID] is the same as each [CustomerID], which is the same as () => _[CustomerID]

Still not very clear (unless you have a background in functional programming languages). It would be really nice to get a full understanding, as the each keyword in Power Query is used in a number of places (filtering, custom columns, etc.) and understanding it would give us an understanding of what we can and can't do with it.

Turns out, you need to understand three things to understand each:

  1. Functions as first class objects
  2. "_" as a temporary variable (and the associated shortcuts within M language)
  3. Anonymous functions (each)

Functions as First Class Objects

If you are used to using Excel, you are used to functions (e.g., =SUM() ) being a bit magical. We cannot touch or modify them, and they are supplied fully formed by Excel. You can create custom functions with VBA, but these are far and few between, and still seem like the lessor cousin to the in-built functions.

In Power Query, functions can be thought of just another "object" – or just another type of data. This means they can be:

  • Assigned to a variable and/or renamed.
    • Just like we can do something like variable = 5, in Power Query, so we can do something like variable = function.
  • Fairly easily created
  • Used as a parameter to another function (!)

To understand this, we need to distinguish between calling a function (using it in our code) and referring to it (to name it or to use it within another function). Similar to other programming languages, to call a function we use the parentheses at the end like:

function()

If we want to refer to a function we just omit the parentheses like:

function

Let's demonstrate the renaming/reassigning of functions. First I take one of the supplied functions which takes a list (i.e., column) and calculates the sum. Next, I build a sample table, and then take a sum of one of the columns using the function that I had defined at the beginning.

let

sum_column = List.Sum,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

sum = sum_column(sample_table[Score])

in

sum

Figure 3

Calling a function within another function.

It works!

Let's create a basic function

In Power Query, the syntax to create a function is:

(variable) => body of function

The body of the function is like any other query that returns a value, but instead of being delivered to the main Power Query interface it is delivered to whatever called it.

We also want to assign a name to the function, so let's expand the above template. Here, we create a simple function that takes a number and multiplies it by two.

multiply_by_two = (number) =>

let

output = number * 2

in

output

Looking good, but now we want to use this snippet in a full query. Let's build a (trivial) query that uses this function to multiply a variable by two and output the result:

let

multiply_by_two = (number) =>

let

output = number * 2

in

output,

x = 5,
y = multiply_by_two(x)

in

y

Figure 4

Building a simple query that uses the function we created.

Functions as inputs to other functions

Now that we have explored how functions are the same as any other data type or variable, and we have demonstrated how we can create our own functions, let's look at functions that take other functions as inputs. One example from the official documentation is the filtering function, Table.SelectRows.

About

Returns a table containing only the rows that match a condition.

 

Table.SelectRows(table as table, condition as function) as table

So the function expects a table (makes sense), and a function! How does this work? According to the documentation, the condition is "the condition to match".

It's not very well documented, but it turns out that this function expects a function to be supplied. It then applies that function to each row (record) of the table, and expects a true or false response from the function. It then uses this response to decide whether to keep or discard the row.

To recap, a record is a data type representing a row. We can access the items from a record by supplying the column name as follows: record[column name].

Let's create a function which we can then supply to Table.SelectRows on our sample data. Note – this is our case study example which I will develop throughout this article.

let

filterer_score_two_plus = (record) =>

let

value = record[Score],
result = value >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(

sample_table,
filterer_score_two_plus)

in

filtered

Figure 5

Creating the sample_table function.

Figure 6

Using the Table.SelectRows function to filter the table created by the sample_table function.

What have I done here? First I have created a function which takes a record, extracts the value in the Score column and returns true if it is greater or equal to two. I then construct my sample table and apply the Table.SelectRows function on it, supplying my recently constructed function as the second input. As we can see, the output as expected provides a table with all rows with scores greater or equal to two.

Now, only if there was a quicker and easier way to create such functions, as it looks like we might have to build these one-use functions quite a lot...

"_" as a Temporary Variable

The use of _ as a throw-away variable is common across several programming languages, Python included (see point 4 here). Usually, _ is used to name things that are not going to be used again and so it is not worth using up another name.

Here I write a query creating a table, assigning it to a variable called _. Power Query has no problem whatsoever using _ in this way.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_

For Python, using _ is just a convention, but it appears that Power Query has expanded the functionality here. Let's say we just want the column of names from the above table. Usually we can do this by selecting the column by name using [column_name] as the selector.

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

_[Person]

It turns out, we can omit the _ in this statement as Power Query will infer that if we just put [Person], the table we are referring to is the one called _. The example below works just as well as the one above:

let

_ = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"})

in

[Person]

Figure 7

If we omit the _ variable in this statement and just use [Person], Power Query infers that the table we are referring to is the one called _.

I wouldn't recommend this as general practice, as it is not well documented or understood behaviour and explicit is usually better than implicit. However, it does provide nice-looking code when used with the each keyword.

Note that this technique only works for column selections [column_name] rather than row selections {row_number}, as Power Query will interpret {row_number} as a new list.

Applying the _ Variable

With this concept in place, let's revise our filtering query defined above:

let

filterer_score_two_plus = (_) =>

let

result = [Score] >= 2

in

result,

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

filterer_score_two_plus)

in

filtered

The shortest we can actually get this is pretty close to our final stage. Let's put the function definition right into the Table.SelectRows function, and get rid of the let and the in (only really needed if there are multiple steps in the calculation):

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

 

filtered = Table.SelectRows(sample_table,

(_) => [Score] >= 2)

in

filtered

This is already pretty tight and clean. However, the (_) => is pretty scary if you're not familiar with function definitions, and the function definition symbol "=>" is fairly similar to our greater than symbol ">=".

Anonymous Functions

As we saw above, we end up creating one-off functions to supply to other functions quite frequently in Power Query. It seems silly to go through all of the syntax of creating and naming a function if it won't be used again. There is a concept called anonymous functions, which are functions that are defined but not named. They are used as soon as they are created. In Python, these are known as lambda functions.

We can actually use the each keyword in Power Query to define the function. (Yes, we are finally at the each keyword itself!) Each just minimizes the syntax for creating functions, by providing a default input variable name, "_", and removing the need for the => or anything else. So:

(_) =>

let

result = [Score] >= 2

in

result

 

can become:

each [Score] >= 2

You can still name this if you like (filterer = each [Score] >= 2), but using the each keyword in Power Query is much more useful if we use it inline. So we come to our final query, which should look fairly familiar to intermediate Power Query users:

let

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

filtered = Table.SelectRows(sample_table,

each [Score] >= 2)

in

filtered

 

So What have We Discovered?

Now we have demystified the magic of the each keyword in Power Query, but what have we discovered?

  • Each itself doesn't actually do that much, slightly simplifying our function definitions.
  • Once you understand the concept of supplying functions as inputs to functions, everything becomes a lot clearer.
  • The missing piece of the puzzle comes from an understanding of the special _ variable name, which enables us to take a shortcut when selecting columns from tables or records (we can use [column name] instead of _[column name]).
  • The use of each hinges on the behaviour of the underlying function. Because we know that Table.SelectRows calls the supplied function on each record of a table and expects a true/false response, we can construct a function that works as we expect it to. Unfortunately, this is not very well documented.

How can we use this information? I can think of a few different ways:

  • If we have a complicated add column or filtering step to do, we can separate out the underlying logic into a separate function which can have multiple steps. This removes the complexity from the main body of the code and abstracts it away. It can be easier to read filter_to_current_year_red_cars rather than try to interpret all of the various equivalence statements as you read through the code.
  • Now that we understand the context that is delivered (only the current record/row), we can construct functions that can do more interesting things (these usually require an index column). For example, we can filter a table based on a second table, or add a new column that subtracts the current row from the previous row.

Here's an example that:

  1. Abstracts the logic to a helper function, and
  2. Adds a column based on the difference for each row from the previous row.

let

row_difference = each

if [raw_index] = 0
then [Score]
else [Score] - add_index[Score]{[raw_index] - 1},

sample_table = Table.FromColumns({

{"Fred", "Mary", "Phil", "Sue"},
{1, 2, 3, 4},
{5, 2, 5, 1}},
{"Person", "Index", "Score"}),

add_index = Table.AddIndexColumn(sample_table,

"raw_index", 0, 1),

add_difference_column = Table.AddColumn(

add_index, "difference",

row_difference)

in

add_difference_column

Figure 8

Adding a column based on the difference for each row from the previous row

So does this help your understanding of the each keyword in Power Query? Has this given you some new ideas on how to structure your queries? Please let me know if you have any questions!

Share:

Facebook
Twitter
LinkedIn

24 thoughts on “The Each Keyword in Power Query

  1. Fascinating - thanks for sharing

    One question

    In the row_difference function, it is not clear to me why add_index table needs to be referenced

    else [Score] - add_index[Score]{[raw_index] - 1},

    The add_index table was passed in, and [Score] seems to reference it easily for the current record, but when trying to pull from the previous record, the table name needs to be fully referenced. Is there an obvious reason for that?

    Thanks again

  2. Thank you for the post.
    AFAIK the comma should be removed from

    (_) =>

    let

    result = [Score] >= 2

    in

    result,

  3. Pingback: Power Query and a new Power BI Desktop update | Guy in a Cube

  4. Hi Charlie!

    The "else [Score] - add_index[Score]{[raw_index] - 1}," fragment comes from the insight that our worker function is only passed one row at a time, so the "_" variable that [Score] implicitly references is only one record.

    If our worker function wants to access other data, it has to reach out of the local scope of the function (what was passed to it, i.e., "_"), and grab a variable on the global scope. This is the add_index table.

    So there is no way for us to say, just take the row above. Instead we have to say, take this full table, take this column of that table, and let's grab a value from that column based on our index (which *is* part of the information that the function is provided).

    I think some of the confusion comes from the [square bracket] annotation. We are used to using this to select a column from a table, such as table[column], but it also can be used to select a value from a row/record, like row[column]. In the code fragment above, the first [Score] is a row[column] lookup, and the add_index[Score] is the table[column] lookup.

    Hope that helps!
    Marcus

  5. Thanks Marcus, I noticed that, "else [Score] - sample_table[Score]{[raw_index] - 1}" works equally well to "else [Score] - add_index[Score]{[raw_index] - 1}". For me, sample_table makes sense based on your explanation of table[Column]{index} lookup, I'm confused as to why add_index is treated as a table?

  6. Hi Amit

    Interesting question! Why is add_index treated as a table?

    Each step in a query is (most of the time) a new table. This is why you can click through the steps in a query and see a series of tables – and all of these tables have names that let us reference them at any time after they are created.

    So in our query above, we have "add_index = Table.AddIndexColumn(sample_table, "raw_index", 0, 1)", which creates a new table based on the sample_table table, but with an extra column, and assigns this new table the name "add_index".

    By referencing add_index[Score], we are just trying to get the full column Score. If you work through the query, you can see that both sample_table and the add_index table have this column, which is why either approach works.

    Why did I reference add_index instead of sample_table? Probably because it is the most recent, and easier to find if you're looking through the code. It is also guaranteed to have exactly the same data in the Score column – there are no intermediate steps.

    There is a case for referencing the source data though for clarity, and the name sample_table is probably easier to understand than add_index. Choose the one that is clearer for you (I have no idea whether there is a difference in performance).

    Another question is, if Power Query is creating all of these intermediate tables each time you refresh a query, isn't that very wasteful? All the data would need to be copied over at each step in my query! The answer would be yes, but I strongly suspect that there is some optimisation that saves Power Query from physically copying over each step during run-time.

    Hope that answers your question!

  7. Great article!
    One thing that has always been confusing to me is using trying to access values from another column within an "each" function while using Table.TransformColumns(). That could be a nice addition to the article.

  8. Hi fVoss

    Yes that is confusing! I have had a look at the documentation here: https://msdn.microsoft.com/en-us/library/mt260818.aspx.

    As mentioned in the article, the documentation does help, but it would be really useful to understand exactly what the Table.TransformColumns function gives to our helper function to work with. In this case, it appears that our helper function is only given the contents of its cell - nothing more.

    We can verify this by feeding in a function that reflects exactly what was given to it: each _. Let's try this on our sample table:

    = Table.TransformColumns(sample_table, {}, each _)

    This takes a table, then applies our helper function "each _" to the entire table (the third argument is the default function, and as we have provided an empty list in the second argument, our helper function is applied to the whole table).

    What we see is that the table remains the same. Each cell has only itself to work with - there is no outside context. Unfortunately, this means that we have no way to reference outside columns.

    With Table.AddColumn, our helper function is given an entire record (try adding a custom column and just putting _ in there). We are able to use this to create context by creating the index column.

    As far as I can tell, if we want to combine information from multiple columns, we have to do this by adding a new column rather than modifying a column in-place.

    Cheers!

  9. This is really interesting and explains a few things I didn't know (e.g. referring to a function) OR did know but in a slightly different way ... I will have to re-read it to absorb all of the information!

    One small point, a function can be multi-line in which case it requires the let-in but if it is single-line then there is no need for it, so it becomes
    function = (parameters) => result,

    I use PQ for extracting data from and building data for the same Wiki so my scope is quite limited. I just posted a script online (for users to be able to extract a page) which might include some interesting usage ... including using a record construct to do a multi-line if-else statement.

    http://rr3.wikia.com/wiki/User:QuickNick/Career3

    It's probably not the most elegant code so feedback is welcome!

    P.S. Still looking for PQ-related work if anybody's hiring!!!

  10. Hi Nick

    Some crazy complex code you have there! Looks like it does the trick and using the numbering system definitely helps with readability.

    I saw your comment regarding throw-away variable names (unnecessary instances) and my advice would be to lean into them – yes it does seem wasteful that Power Query builds all of the intermediate tables and we have all of these variable names that we won't use again, but it does make the code much easier to follow, as the function names are close to the arguments. See below for some details on this. Pity Power Query doesn't have pipeline syntax.

    Great clarification on the whole one line vs. multiple line function definition thing. The whole "let in" concept is probably worth an article in itself. My understanding is that these keywords are an artefact of Power Query being inspired/derived from F#, a *functional" (function based) programming language with a heavy math focus.

    A math based example would be "let x = 5, y = 6 in x + y". We are interested in the "x + y" part, but the necessary definitions are in the x = 5, y = 6 part.

    This is a bit of a mind bender if you are used to *procedural* languages, like how many people use Visual Basic. We are used to the step 1, step 2, step 3 type programming. However, Power Query works backwards, starting at the "in" statement and working backwards through the "let" statement to find how to arrive at the "in statement". It doesn't matter what order you put your code in, as Power Query will work backwards to figure things out.

    Really. Try this. It doesn't make sense initially, but it works. However, you do lose your ability to work through the steps in the query editor.

    let

    add_difference_column = Table.AddColumn(add_index, "difference",
    row_difference),

    sample_table = Table.FromColumns({
    {"Fred", "Mary", "Phil", "Sue"},
    {1, 2, 3, 4},
    {5, 2, 5, 1}},
    {"Person", "Index", "Score"}),

    add_index = Table.AddIndexColumn(sample_table, "raw_index", 0, 1),

    row_difference = each
    if [raw_index] = 0
    then [Score]
    else [Score] - add_index[Score]{[raw_index] - 1}

    in
    add_difference_column

    My understanding is that using let and in allows us to provide complex definitions. A function or a query is a one-step statement (one expression), but let and in allows us to split this up into smaller stages. You could just have one massive nested function, but that sacrifices readability (and wouldn't allow us to access Power Query's awesome step-based interface).

    As you say, since a function definition expects an expression, we can either give it one directly, or use let and in to form a more complex expression with multiple steps.

    This article really helped me understand the concept:

    https://blog.crossjoin.co.uk/2016/05/22/understanding-let-expressions-in-m-for-power-bi-and-power-query/

    Good luck with the job search!

    Cheers
    Marcus

  11. Glad to hear you enjoyed the post, Amit, though full credit goes to Marcus Croucher for preparing the article - I just posted it to the blog. 😉
    Thank you for reading!

  12. This article is cool, Marcus. I've found that the Microsoft Power Query for Excel Formula Language Specification available from https://msdn.microsoft.com/en-us/library/mt211003.aspx has been useful reading to help me to get to grips with some of the stuff you cover.

    On the underscore, it has this to say:

    9.7 Simplified declarations
    The each-expression is a syntactic shortand for declaring untyped functions taking a single formal parameter named _ (underscore). Simplified declarations are commonly used to improve the readability of higher-order function invocation. For example, the following pairs of declarations are semantically equivalent:

    each _ + 1 (_) => _ + 1
    each [A] (_) => _[A]

    Table.SelectRows( aTable, each [Weight] > 12 )
    Table.SelectRows( aTable, (_) => _[Weight] > 12 )

    So "each" is syntactic sugar.

    And this document also points out that "let" is syntactic sugar over an implicit record expression. i.e. these are equivalent:

    [
    x = 1 + 1,
    y = 2 + 2,
    z = y + 1,
    result = x + y + z
    ][result]

    let
    x = 1 + 1,
    y = 2 + 2,
    z = y + 1
    in
    x + y + z

  13. Hi Jeff

    Good point on the language specification – I had forgotten all about this. Last time I looked at it years ago I found it really hard to understand anything, but it does have some really useful information.

    To be fair, the main documentation does also mention the each keyword syntactic sugar – the confusing thing for me was how you can omit the "_" in most situations.

    Good find on digging out the let specification: let as a record syntax totally blew my mind.

    Cheers
    Marcus

  14. Thanks for the write up. Is it possible to concatenate two each if functions, then returning the output string back to the previous step?

    So,

    #"Added Conditional Column" = Table.AddColumn(#"Moved Prefix", "Custom", each
    if (Text.Contains([Group Membership], "Japanese Language") then "Japanese, " else null) &
    if (Text.Contains([Group Membership], "English Language") then "English, " else null)
    )
    in

  15. Yes, it is. You just need to make sure your parenthesis are all in the correct place. Put both if statements inside the Custom Column dialog, but wrap them each inside parenthesis like this:

    =(if Text.Contains([Group Membership], "Japanese Language") then "Japanese, " else null)
    & (if Text.Contains([Group Membership], "English Language") then "English, " else null)

  16. Hi Farshid.

    Yes, you absolutely can use the underscore to reference the current row of the table, and it works the way described in the article linked.

    The more question that I try to answer is where exactly the "_" comes from – it's the input variable for the mini-function created by the "each" keyword.

    Hope that makes sense!

    Cheers – Marcus

  17. you managed to teach a senior the principle of anonymous function and 'each' keyword
    for some reason I was not able to find it elsewhere, both webpages or YT, all sounded double dutch to me
    much appreciated sir

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts