Using Power Query with Email

At our last MVP summit, I sat in on a session on PowerQuery. Our presenter, who is pretty passionate about it, at one point asked us “Who has wanted to use PowerQuery on their email inbox?” And honestly, I have to admit, I was kind of “meh” to the idea. After all, email is a distraction, and not something I generally like to go back and study. Why would I care how many emails someone sent me, or what time they do? Not interesting.

But oh wait… after fooling around with it when I was bored one day, I came across something pretty damn cool. PowerQuery can do much more than just count emails… it can parse the contents! (Okay, that’s uber-geek for “it can read and examine the contents of the email.”) Why is that important? It’s because PowerQuery is all about sourcing and reshaping data.

We can build this…

Check this out… this chart is created, in Excel, showing the affiliate income that I’ve earned from selling Charley Kyd’s dashboard kits from Exceluser.com.

image

Yeah, okay, it’s not a huge amount of money, but that’s not the point.

… From This…

The point is this:  Charley sends an automated email each time there is a sale, of which the contents are always in the same structure:

image

And here’s how…

Step 1: Create a folder and a rule in Outlook

The first thing I did was set up a rule in Outlook to funnel all emails like the one above into the “Affiliates’'\Exceluser.com” folder.  While it’s not entirely necessary – I could target the inbox itself – this makes the updates run a bit faster as they don’t have to examine ALL of my email.  It’s also good housekeeping.  That folder will now be the ENTIRE source for the chart above.  Even better, I didn’t even have to read the emails myself!

Step 2: Open Excel and create a new Power Query

To do this, you must be running the December 2013 update at a minimum.  (Earlier versions don’t support Exchange as a data source.)  To set it up:

POWER QUERY –> From Other Sources –> From Microsoft Exchange

If you haven’t used an Exchange connection, you’ll be prompted for your credentials, which it will then save for you.  Unfortunately only one Exchange account is supported at a time right now, so if you need to change it go to POWER QUERY –> Data Source Settings to modify your credentials.

Now, for me it pulled up a Navigator task pane on the right with a list of Mail, Calendar, People, Tasks and Meeting Requests.  I personally wish it would take you into Power Query, as this is just one more step to slow me down.  I have to select Mail, then click Edit Query (at the bottom) to go and start building my query.  Once I do so, then I see this:

image

Wow… that’s a lot of columns, but not really where I want to be.  Let’s filter this list down a bit…

I clicked the dropdown arrow beside “Folder Path”, and then clicked the “Load More” button, since there is little in the list:

SNAGHTML38a48c

Perfect, now I uncheck Select All, check \Affiliates\Exceluser.com\ and I’m set… my list will filter to only those emails from Exceluser.com.

Step 3: Removing Irrelevant Columns

There’s a lot of them, so I’m going to get rid of them.  I select and remove each of the following:

"Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients", "CcRecipients", "BccRecipients", "DateTimeReceived", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes", "Id"

That leaves me with 3 columns in all:  “Folder Path”,”DateTimeSent” and “Body”:

SNAGHTML3d45e9

Step 4: Expand the Email Body

Now here’s where cool starts to get visible… the Body field contains a bunch of entries called “Record”, which are shown in green.  That means they are clickable and, if you do click it, will show you the entire body of the selected email.  But even better is that I can expand those into the table itself.  That funny little double headed arrow next to “Body” does just that.  When I click it, I can see all the items in the record which, in this case includes both the text and html bodies of the email:

image

I’d rather play with plain text than HTML, so I cleared the HTMLBody flag and click OK, resulting in the following:

image

So the body has expanded, and if I click on a specific record it shows the entire body in the window at the bottom.  Very cool, and I can see that this email shows earnings of $12.91.  Even better, as I click through the other records, Charley’s emails are identical, with only the value changing.  This is VERY good.

Step 5: Strip out the Value

Now the above is good, but it contains a bunch of hard returns and stuff.  I want to get rid of those.  So I select the Body.TextBody.1 column then go to Transform—>Text Transforms—>Clean.  That removes all the hard returns, resulting in this:

SNAGHTML46cabd

Now I’m going to do a find and replace to remove all characters before the value.  Since it’s always the same, I got to Replace values and I can replace "Hi Ken Puls,This email is to notify you that you have made a sale.You have earned $" with nothing.

The column will now start with the value, but I need to remove everything after the value.  Easiest way for me was to:

  • Replace " in commission for this sale." (notice the leading space) with a pipe: "|"
  • Choose Split Column –> By Delimiter –> Custom –> |

I then made sure that the Body.TextBody.1.1 column was selected and choose to set the Data Type to Number.

Cool, I’ve got my values now!

Step 6: Final cleanup

Finally, I did a bit of cleanup…

  • I removed the Body.TextBody.1.2 column as well as the Folder Path columns, since I don’t need either any more
  • I renamed the Body.TextBody.1.1 column to Commission
  • I renamed the table (on the right) to Commissions

And here it is:

SNAGHTML512197

And finally I clicked OK to load the table to the worksheet.

Step 7: Create the Report

And this part is probably the easiest… I created a simple Pivot:

  • DateTimeSent on rows
  • Group dates by Months and Years
  • Years on Columns
  • Commissions on Values

And then the PivotChart where I removed the Field Buttons, vertical axis and gridlines, and added data labels.  Pretty straight forward if you’re used to working with PivotTables or charts at all.

Final Thoughts

This tool is truly awesome.  All I need to do to get an update on my affiliate status is to open the file and refresh the data.  It will reach into my email folder and do the rest.  Simply amazing, really.

And while the setup may look like a lot of steps, this file took me less than 30 minutes to build, from setting up the rule in Outlook to pulling the data into PowerQuery through cleaning it and reporting on it.  Granted, I’ve been playing around with similar data lately, but still, it was quick from start to finish.

So… “Meh, it’s just email”?  Not any more.  🙂

Share:

Facebook
Twitter
LinkedIn

84 thoughts on “Using Power Query with Email

  1. Pingback: Power Query for Excel – on the rise | Microsoft Excel and Access Experts Blog

  2. Hey Ken - great post. Glad to know that you found value in the Exchange connectivity in Power Query.

    Looking forward to more posts about your experience with Power Query.

  3. Is there a way to deal with the email attachments from within PowerQuery? E.g. to load the content of uniformed attachments into the same table.

  4. Yes, there is. When you connect to PowerQuery and pull in the email, you get an "attachments' column. Providing the files are uniform (no necessarily named the same), and the data is in a structure that PQ can interpret, you should be able to expand them to get the content.

  5. Great post. I found power query is very helpful specially when MSFT added more data source.

    I am working on POC to create some reports pulling a mail box from the Exchange, I was able to get all the columns in the power query sheet.

    The requirement of the POC is a bit challenging. There are about 15 custom attributes were added to the mail item in Exchange. I don't see the custom attributes in the attribute column in the query sheet. I am able to see the out of the box attributes i.e. ConversationIndex, ConversationTopic, etc...

    Please suggest.

    Thanks,
    Michael Hanna

  6. Great !!!

    Thank's to your post I could done a great imporvment for my customers.

    I've created some "team sites" for my customers within Office 365, to provide some dashboards and data analysis. But the IT skills are some time really poor and the way to share data was not so easy.

    So far I've also created a "team site mailbox" with a specific mail adress, so that my customers can forward their data through emails with Excel attchements. Then I've created a POWER QUERY connexion to that mailbox to directly read the data from attachments. So I can refresh the dashboards with the new data directly forwarded by mail.

    The method is far more easy for my customers, and less time consuming for me !

    Regards.

  7. Any ideas how to troubleshoot why my PowerQuery is not reading all the records in my inbox. I have over 500 records in my inbox (its new) but my query only returns 130 records. This is really cool, but would be better if it were complete.

  8. Mike, just to be clear, are you talking about the output result, or what you see in the Power Query window? The PQ window is simply a preview of your data, but the end result should comb through it all...

  9. Ken,
    The connection to Exchange is useful, however, I am also interested in connecting to data in Outlook Data Files, which do not seem to be available from the PQ Exchange connection. In the past I have pulled this data into Excel using VBA, connecting directly to the Oulook App and the data files (not limited to Inbox, etc). Do you know if this is possible using Power Query?

    How about connecting to "additional mailboxes" associated with exchange via the Advanced Options dialogue in Outlook?

  10. Hey Alex,

    Good questions there. Not sure on the Data Files to be honest, but I thought you could connect to different mailboxes now... May be wrong there as well though.

  11. Hi Ken,

    I'm using the Exchange connection to the Calendar of Outlook. I have successfully done it and removed and filtered columns.

    The problem that I'm having is that it is not refreshing. For example, after analyzing the meetings that I had on my calendar, I went back and deleted some of them, but PowerQuery doesn't reflect that change after clicking all the refresh buttons.

    Any idea?

    Im using Excel 2010. Power Query Version: 2.23.4035.242

    Kind regards,

    Fernando

  12. Hi Ken,

    I was able to find a solution. The problem seems to be the order on which I was filtering the columns. For example, I was filter by date "2016" and then by the columns which has the DisplayTo by a name. For some reason, it didn't update. But when I changed by first filtering by the "DisplayTo" column and then the date, it worked fine.

    I don't quite understand why it would behave like that.

    Regards,
    Fernando

  13. Sorry Fernando, I just realized I never answered this. Honestly, my answer would have been to step through the query and see if you'd accidentally filtered something out. I'm not sure why this change would have affected it either, but I'm glad you got it sorted!

  14. hi there, thanks for this blog post, do you know if you can pull from a group address?

  15. Hi Vanessa,

    If you have the credentials (email & password), then yes, it shouldn't be an issue. Having said this, we can only provide one set of credentials at a time, so if you also want to build intelligence against your own email account at the same time, then this would be an issue.

  16. Is there an easy way to schedule it to run at a given time with a given frequency? E.g. pull incremental data every night at 11:59PM.

  17. Hey Guys,

    how can I get all appointments from all of my organizational Exchange calendars? I want to create a Dashboard to monitor all Events from all calendars - organization wide. Is there any "best practise" to collect all calendars from everyone? Thank you!

  18. Hi Ken,
    Thanks for yet another great post! I'm experimenting with connecting PQ to generic mailboxes. So far so good but I'm wondering if there is a way to identify who was the "real" sender of an email instead of having generic mailbox address as a sender?

  19. Hi Ken, thanks for yet another great post! I'm experimenting with connecting Power Query to generic mailboxes. So far so good but I am wondering if it is possible to extract the 'real' sender of a message rather then getting generic mailbox address for all sent items.

  20. I'm not sure I follow. If you're using a general mailbox, isn't the email going from that email? The only way I can think of to extract the real sender would be to try and parse the signature lines or something.

  21. Hi Ken. Brilliant. One thing do you have any idea where the Flag column is as can't find it anywhere. Found absolutely everything else!

    Mike

  22. Hi Mike, maybe try looking for a "follow up" or something similar? (I can't connect to my exchange mailbox at the moment to test.)

  23. Not that I'm aware of. If it doesn't show up in the meta data of email addresses that you see when querying the inbox, then you're out of luck.

  24. Hi Ken,
    Your suggest query is good for plain text in text body. However if it contains a table, a attachment, all information extract from becomes a list of data. How can I extract this table properly?

    Item# Our Job # Contact Phone
    1 5005355917 Mr Yeung 85263344233
    It looks like
    Item#
    Our Job #
    Contact
    Phone
    1
    5005355917
    Mr Yeung
    85263344233

    Thanks,
    WaiChung

  25. Hi WaiChung, you are correct. Tables don't render well in emails. In text form, they show up as a stacked table, in HTML... I'm not sure as in my tests I can't get much out of HTML bodies at all.

    The solution I can see here is to pull the data in as text, remove the irrelevant rows from the top and bottom, then run a pattern like this across the data: https://excelguru.ca/2015/01/07/transpose-stacked-tables/

    Hope that helps!

  26. Hi Ken, in my Calendar Outlook i have the Category which i defined the Status of the Task: "OK" , "NOK". But i cannot see the Category in the Data Query in Power BI. Do you know how to call it out?
    thank you in advance.
    Bao

  27. Hey Bao, isn't there a Categories column? In my tests (in Excel anyway) this column exists and contains the Categories.

  28. Hi Ken, if I get automated email every day with table information, would you suggest to convert this email to text and then do a power query or can I do a power query directly without converting outlook file to text format? Thank you!

  29. You should be able to do this without converting outlook file into a text format. So long at the email server is Exchange, you should be able to query it directly.

  30. I have a bank confirmation email coming to my inbox with table (in a text format) in the middle of the body of the email. On top and bottom it is one paragraph of general information that I do not need.
    I connected to exchange however do not see the possibility of stripping the top and bottom paragraphs and combining tables. I tried splitting columns by space, it looks bit "ugly". Not sure what anyone can recommend? Thanks

  31. Hello. Ken, thank you for this information and the instructions.

    This is exactly what I need, and a lot lot easier than the way I have been trying do it during the past week, by building a VBA script (without knowing VBA).

    The only part I don't like about using Power Query is that, it seems if I delete any of the emails from the folder I set up in Outlook, those will be deleted in the spreadsheet, too (which makes perfect sense).

    Power Query can be used to get a "snapshot" of data, like your commissions, but I think you'd want to have that information saved and stable. I haven't tried to figure out a way to keep the spreadsheet "frozen for previous entries" - have you?

    Thanks again.

  32. I saw the question in the comments here and I've been googling this for over an hour now and at last I found it myself....
    in case anyone else is also wondering where are the follow-up flag related properties in the data set (which are very useful when a team is collaborating in a group mailbox), expand "Attributes" then further expand "ExtendedProperties" and take "FlagRequest", "TaskComplete" and "TaskDateCompleted"

  33. Hi Ken,

    Using Excel 2016, but I am unable to access Exchange (not listed as a data source), probably due to not having a ProPlus account. Is there a way around this, short of downloading a csv file?

    Thanks!

  34. Not really, no. The Exchange license is considered a business connector, so a license change would be required. I'm not sure if Exchange falls under the business SKUs or only the Pro Plus license though...

  35. Hi Ken,

    I have created a query for emails from exchange but struggle to... 'scale it' (?).

    What I have:
    step one - connect to a shared mailbox on exchange and filter out emails from last week (this leaves around 1400 emails) or last day (round 350 emails).
    I leave only columns with email body, email ID and email send date.
    Next I have a function that searches query from step one; uses email ID as input; drills down through email with the specific ID and searches for specific text (using inner.join feature).
    Then I have a Second Query; source of Second Query is First Query; Second Query has custom column that calls in The Function.

    I am trying to load the results of Second Query into Power Pivot table.

    Unfortunately it takes forever...
    I am under impression that each time I'm trying to refresh the second query - the first query tries to import all of the data from exchange with hundreds of thousands of lines...

    Do you know if there is any way to 'lock' (?) the First Query, so that once it's refreshed The Function and Second Query only work on the limited number of lines (1400; 350) instead of trying to work its way through everything what is available on the exchange server?

    Regards,
    Marek

  36. Hi Marek,

    Unfortunately I don't, and I've always found reading from Exchange to be slow.

    Out of curiosity, what about doing something slightly different? What about leveraging Flow to react to incoming email and write key bits of the email into a file? Then connect Power Query to that file instead. I'd bet that it would be a lot faster.

  37. Hi Ken,

    Thanks for the answer - I was not aware of MS Flow... I'll look into it.
    I have found a workaround that 'sorta' works..
    I have a Power Query file (1st) that checks for last weeks mails and downloads them into a table (date sent, body of email, id).
    And a separate file that connects to the 1st file and invokes the function.
    The 2nd Query works okeish.
    The 1st one too (but only if I refresh the query outside of business hours).

    I'll have a deeper look at the MS Flow

    Thank you again for your answer.

  38. Hi Ken. Any chance we can extract and save attachments (pdf, word, etc.) directly from the exchange query?

  39. Extract, yes. Save... Power Query doesn't save, but you can load the data to a table and save the contents in the workbook.

  40. Ken,
    I have some messages generated by an application that contain some text and tables. When I connect to exchange through Power Query, only the first couple of lines are visible of the body. If I do "text" then I only get one line of text. If I do "HTML" I get the first couple of lines but can't seem to be able to load the whole message to get the info. Are there some additional settings that would help me get the whole message?

  41. Are you certain, Max? I've seen the Text Body expansion look like only the first row, but it's just due to the text wrapping. When you select a cell, the whole thing usually shows in the preview window at the bottom...

  42. Ken,
    I have an email coming in like shown in the snippet below and I would like to parse the text to be able to create a new row in a table in Excel that has matching columns. Is this something I can do with Power Query? I'm running Office 365 Business Premium with Excel 2016. I looked for the Exchange connection that you mentioned but I don't see it. Is it possible that is due to my license or something? I also tried writing a Flow to achieve the same thing but my flow keeps getting hung up (inconsistently...the "Floors or Suite Numbers" section is the stopping point most of the time) on a couple of pieces of text. It will just stop parsing and run all the remaining text together...hence why I was thinking of cutting out the middle man and using Power Query only! Thoughts? Any help is greatly appreciated!

    Project Address
    980 N Michigan
    Chicago, IL

    Bill To If Different
    100 N Main
    Itasca, IL

    Floors or Suite Numbers
    18th Floor

    Project Size and Complexity
    Intermediate...20-50 stations, some ancillary

    Major Systems Line
    Allsteel Further

    Designer Assigned
    Needs Assignment

  43. Hi Bridget,

    Yes, it's a version issue. The "enterprise" connectors don't get included with Office 365 Business Premium. You'd need to move to ProPlus to get Exchange.

  44. Ken, I could really use some help with a perplexing problem I’m having. I’m able to get all the data I need from the emails plus attachment files with one exception. All the columns from the .xls attachments are good except for 3 columns that contain dollar amounts. Those come across as blank. I don’t know why this is happening. I experimented by saving the .xls attachments and using getdata on the saved files in the folder — same result. The only way I could make getdata import the amounts was if I opened each .xls attachment and saved it. I’m not sure why the save action helped, but it’s an inefficient and time-consuming workaround. I am soooo hoping there’s a solution for this. If anyone knows, I figure it’s you :-).

  45. Hi Janice,

    Honestly, I'm not totally sure without seeing it. On the step that originally imports that data, is the data there? I'm wondering if a change type is killing it and blanking it out, or if it happens earlier?

  46. The columns are General format in the email attachment. When I import the attachment contents the date and text columns load fine but the 3 amount columns load as text and are empty/blank. I tried Change Type after importing to Decimal Number, Currency, etc. and the results display “null”.

  47. I thought it so kind of you to respond to my question that I decided I would check to see if you’ve authored any books. Figured I would purchase one as a token of my gratitude. Then I realized I already had a kindle book on the topic and guess what? It’s M is for (Data) Monkey.

  48. Arghhhhh! Just saving the attachments doesn’t help. Same results when I getdata from folder. Only if I open and SAVE the saved attachment will getdata import the number values. On my initial attempts to getdata from Outlook attachments using Combine Files the query editor shows the amount columns imported as type any. Changing type in editor to number did not help. On my most recent attempt, getdata from the twice-saved files in a folder the query editor shows the amount columns as either type number or Int64.Type. It still only “saw” and imported the amounts from the subset of files I had twice-saved. The amount columns in the other files in the same source folder imported as nulls.

  49. So... when you look at the sample transform, on the navigation step, are the values blank already? If so, I'm not sure how you could work around this. If not, then they are getting eating in a transform later. If the latter is the case, then force it to text so you can watch what happens along the way.

  50. Shoot. I'm not sure what to say here Janice. No change you can get the users to send them as xlsx instead of xls? The issue is with the old style of Excel file, and I'm not sure how to work around it short of re-saving as xlsx.

  51. I think the problem I described in my 7/31-8/1 posts might be due to the presence of CHAR(160), the non-breaking-space character. In Excel I use the SUBSTITUTE function to replace Char(160) with “”. I’ve searched high & low, have not been able to identify a similar function in GetData/M query language.

  52. Hi Janice, if you try right clicking the column --> Transform --> Clean, that should give rid of all non-printable characters, including non-breaking spaces.

  53. Hi! I want to know if there's any way to get automatic replies to show in the Transform window? I need to get the email address of senders of a few hundred automatic replies. I can't seem to find it in any of the attributes that come up.

  54. Hi Mae,

    No, rules are not exposed. Only Calendar, Mail, Meeting Requests, People and Tasks.

    I would assume that this won't change, partly because it pulls from Exchange, and a great deal of rules are set up client-side only.

  55. I have a different sort of requirement on the same lines!
    How to get the email attachment which has static subject line?
    For example: Laundry report - Week 1. In this case week number is dynamic... How can I create a dynamic filter for this scenario?

  56. Hello Mr. Ken,

    What if the needed to be extracted data is a table in the body of email, how this will work ?

    I tried it but it shows all table content in one cell with formatting

    how to extract the table with the same structure of the table

  57. This can be tricky, Khaled. It really depends on what shows up. If it's an HTML email, it may contain table tags which you can interpret and use (honestly, it's been a long time since I've tried to use PQ to read from emails). If it's plain text... you're going to have to use Power Query to break it up manually.

  58. Hi Ken,

    Is it possible to get insights about count of emails received from suppliers in last one month on shared mailbox?

    What would be the query

  59. In order to connect to an exchange account (mail, calendar, contacts), you have to sign in to the account via Power Query (you can't access any other mailboxes that you may rights to.) So as long as you can sign in to it with the mailbox's account user name and password, yes.

  60. Hi Ken,
    thank you for your detailed explanation
    I have a question
    what about emails which have HTML table with the same headers,can I combine them using power query?
    I tried but the html table display as a set of words when I expand the column of html text.
    Regards

  61. Hi Osama,

    When you expand the Body of an email, you'll get the choice between the TextBody or HtmlBody fields. I would recommend looking at both to see which is easier to parse. Unfortunately in Excel - you'll need to do that manually, removing tags and splitting by other tags. It's a bit painful, but can be done.

  62. Hi Ken,

    I am trying to get all of the email addresses the email was sent to (to, cc, bcc) in one results cell. On the fields where there were more than one recipient, PQ is creating another line in the query results so each recipient is listed on it's own row. I need all of the addresses in one field, just like the original email. How can I get all of the addresses to be merged together in one cell?

    Thanks,
    Christy

  63. Hi Christy,

    If you just need to combine the emails into a single cell, add a custom column and use a formula like this:
    Text.Combine(
    [CcRecipients][Address],
    ";")

  64. Hi Ken,

    Is it possible to extract the email's flag status like the due date and flag completed date?

    Thank you in advanced.

    Regards,
    Alex

  65. Alex, I don't believe that data is captured, no. The only possibly relevant field I see (in the Attributes) is "ReminderDueBy", which doesn't seem like it is right for what you need. You can see all the available properties by connecting to your mail and expanding the Attributes column.

  66. Hey Ken,
    Great work!
    I want to use HtmlBody instead of TextBody and my HtmlBody contains many tables.
    Could you please guide me how to get Tables from the HtmlBody?

    Thanks,
    Thiru

  67. Okay, so this is a bit trickier, but here is how I would do it, assuming that the tables have consistent columns (number and names):

    1) Filter your emails to your desired records
    2) Remove any excess columns, but make sure you still have the Body column showing - BUT DO NOT EXTRACT THE HtmlBody COLUMN!!! (for some reason, expanding the column contents drops the important bits.)
    3) Instead, Extract the HtmlBody from the Body column via a Custom Column. I created a new custom column called "HtmlBody" using the following formula: =[Body][HtmlBody]
    5) Extract the tables from the new HtmlBody column by adding a new Custom Column. The formula you'll use looks like this:
    Table.PromoteHeaders(
    Html.Table(
    [HtmlBody],
    {
    {"Column1", "td:nth-of-type(1)"},
    {"Column2", "td:nth-of-type(2)"}
    },
    [RowSelector="tr"]
    )
    )

    If you've followed along with me steps, the only thing you need to do is make sure you have a list for each column your tables contain. Just don't forget that you need the complete record up to the nth-of-type(x)

    4) Remove the Body and HtmlBody columns
    5) Expand all records from the column and it should just append the data in all the tables together.

    I hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts