Calculate Start and End Dates

I got an email from a reader this morning who asked how to calculate start and end dates for a given employee when they have had multiple terms of employment. Since it's been a while since we've had a technical post on the blog, I thought that this would be a good one to cover.

The Challenge

In this case (which you can download here) we are given the table shown below on the left, and we need to create the table shown on the right:

image

As you can see, John's start date needs to be listed as Jan 1, 2013 and his end date needs to be listed based on the last date he worked here; Oct 31, 2016.

How to Calculate Start and End Dates using Power Query

My first thought was "we'll need a custom function to do this", but as it turns out, there is a MUCH easier way to accomplish this, and it's 100% user interface driven as well. I'm virtually certain that the performance will also be much better over larger data sets as well (although I haven't specifically tested this.)

Let's take a look:

Step 1: Connect to the data

This is pretty easy, just select the table and use Power Query to connect to the data:

  • Excel 2010/2013:  Power Query --> From Table
  • Excel 2016: Data --> From Table/Range

We'll be launched into Power Query and will be looking at our short little table:

image

Step 2: Calculate Start and End Dates via Grouping

The trick here is to actually use Power Query's Grouping feature to calculate the start and end dates.  To do this:

  • Go to Transform --> Group By

The dialog will open and is already offering to group by Name, which is what we need.  Now we just need to select the grouping levels. The first is going to be our Start Date, so we'll rename it as such and change to calculate a Min of the From column:

image

The effect here is that this will provide the lowest value from the "From" column for each employee. A perfect start.

Next, we need to add a new grouping level to get the End Date.  To do that:

  • Click Add Aggregation
  • Configure the new column as follows:
    • New column name: End Date
    • Operation: Max
    • Column: To

It should look as follows:

image

And believe it or not, you're done!

image

How this works

The key here is that the grouping dialog in Power Query works for all records in the group.  This is really important, as the first column has no bearing on subsequent columns… if it did, we'd get the max for the first record, which is not at all what we'd be looking for. Instead, the Group By will restrict to find all records for John, then will pull the Min and Max out of the remaining three rows, returning those as the values.

The other thing that is worth noting here is that the order of the source data is irrelevant. We could have provided either of these options and the answers would still have been calculated correctly:

SNAGHTML19ec2a56

Final Notes

It's also worth mentioning that this technique to calculate start and end dates will also work in both Excel and in Power BI, as the feature set is identical between the two products.

Sometimes things that look hard, are actually really easy when we have the right tools in our hands, and this happens to be one of those situations.  Smile

Share:

Facebook
Twitter
LinkedIn

8 thoughts on “Calculate Start and End Dates

  1. "Sometimes things that look hard, are actually really easy when we have the right tools in our hands"
    Can't agree more with one little addition to it "if you know how". 🙂

    For those who do not have Power Query, Power BI or Excel 2016, you may do it with a regular Pivot Table.

    Cheers,

  2. This is a very nice and easy way to pull whichever the start or the end day of a certain record. I was using Table.Distinct(Table.Sort(Source, {"Date",Order.Ascending}) to pull the oldest day(or month) but only to find the result was not as exactly as what I expected because it seems power query sometimes doesn't give me the correct answer I could get compared to using Excel's delete duplicate function. That is, the order of unique records from PQ often turns out to be wrong so I had to use Table.Buffer to preserve as-is before Table.Distinct() so I can have the right outcome.
    I'd very much like to know which method is more efficient and save more time. I'm not using SQL server but only Excel on our company's server so the question should be apart from Query folding.

  3. Hey Marshal,

    Honestly, I'm not sure which would be more efficient. I'd try a test both ways and see which runs faster. 🙂

  4. Pingback: #Excel Super Links #11 – shared by David Hager | Excel For You

  5. Hello from Barcelona!
    I have a report that display dates in a numerical format.
    1 column with Month Nº (1,2,3...etc) and next to it, 1 column with Years Nº (2016,2017...).
    I would like to add a column and build a regular format for these dates, because I have to match the dates to another report that actually has regular End of Month Date format (31/01/2017;28/02/2017...etc)
    How can I build, through PQ, a "standard" date format from Month + Year Nº, adding the last day of the month Nº, to get it ready to match with the other report?
    I have your book "M is for Monkey" but cannot find it in...:-( I am stuck !!
    (...crossing fingers you have a trick to save my WE).
    Thank you very much! Christelle.

  6. Hi Cristelle,

    What I would do is build a custom column that does something like =Text.From([Year Column])&"-"&Text.From([Month Column])&"-1"
    That should give you a date in a yyyy-m-d format (you may need to switch things around to go y-d-m depending on your date format)
    Convert it to date
    Select the column and go to Transform --> Date --> Month --> End of Month

    Hope that helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts