Years ago I published an article on my site called Five Very Useful Functions For Working With Text. This post is alternative version, but with a twist. I bring you 5 Very Useful Text Formulas - Power Query Edition.
The article is geared to explaining five functions specific to working with Text in Excel, and are a set of the most under-utilized functions in Excel (in my opinion). It even lets you try them out live in the web page without even having to open Excel at all.
Now, over the past few months I’ve been working with Power Query, and one of the things that’s been driving me a little crazy is that the formula names in Power Query and not the same as they are in Excel. I personally think this is a bit of a mistake, and that the formula names in Power Query should have been a bit more congruent with standard Excel formulas (Power Pivot’s DAX functions are similarly named to Excel, so why not Power Query’s M language?)
Some Critical Background
Before you bang your head against the wall, there are two things that are really different between Excel formulas and Power Query formulas:
Case Sensitivity:
While Excel formulas are not case sensitive, Power Query formulas are. If the Power Query formula signature says “Text.Start” then “TEXT.START” or “text.start” will NOT work for you.
Base 1 vs Base 0
Excel formulas are what we refer to as “Base 1”. This means that you count starting at 1. Power Query, on the other hand starts counting at 0, not 1. The implications of this are that it is very easy to write your formula referring to a number that is out by 1. To see the effects of this, check the section below on the alternate for the MID function equivalent.
Excel –> Power Query Formula Equivalents
Let’s take a look at how the five functions I illustrated in that original example differ from Excel to Power Query…
LEFT
To get the left x characters, we basically replace LEFT with Text.Start:
Syntax | Example | Result | |
Excel | =LEFT(text,num_chars) | =LEFT(“Excel”,2) | Ex |
Power Query | =Text.Start(text,num_chars) | =Text.Start(“Excel”,2) | Ex |
Easy enough once you recognize it, although I would have preferred that the formula name was consistent.
RIGHT
To get the right x characters we have a similar situation. The function name needs to change from RIGHT to Text.End:
Syntax | Example | Result | |
Excel | =RIGHT(text,num_chars) | =RIGHT(“Excel”,2) | el |
Power Query | =Text.End(text, num_chars) | =Text.End(“Excel”,2) | el |
Okay, so we’re getting the hang of this now… Just change the function name and the rest work the same, right? Um, no.
MID
This one gets a bit weird. First we replace MID with Text.Range. Okay, no problem there. But look at the results when we pass the same parameters:
Syntax | Example | Result | |
Excel | =MID(text,start,num_chars) | =MID(“Excel”,2,2) | xc |
Power Query | =Text.Range(text, start,num_chars) | =Text.Range(“Excel”,2,2) | ce |
They differ a little, don’t they? The issue comes down to that base 0 vs base 1 thing I mentioned above. Where Excel’s formula language counts the word with E being character 1, Power Query considers that character 0. So in this case, when we tell Power Query to start returning text at character 2, it pulls back c (E is 0, x is 1, c is 2). Interestingly though, the last parameter needs to be 2 to pull back 2 characters.
LEN
Getting the length of a text string in Power Query is actually a bit more intuitive than Excel’s native function, only because the function name isn’t trimmed off. Text.Length is what we need instead of LEN.
Syntax | Example | Result | |
Excel | =LEN(text) | =LEN(“Excel”) | 5 |
Power Query | =Text.Length(text) | =Text.Length(“Excel”) | 5 |
Notice that the result for this does return five characters, as you’d expect. So this plainly works as a 1 based result in both Excel and Power Query.
FIND
And finally we come to the FIND function. This one is again a bit confusing. We’ve got 3 things to consider here:
- The function name changes from FIND to Text.PositionOf
- The parameters for the text we want to find and the text we want to search in get flipped around!
- The result is 0 based, not 1 based
Syntax | Example | Result | |
Excel | =FIND(find_text,within_text) | =FIND(“xc”,“Excel”) | 2 |
Power Query | =Text.PositionOf(text, find_text) | =Text.PositionOf(“Excel”,”xc”) | 1 |
So in the case of FIND, we put the “xc” first, and “Excel” second. But in the Power Query version, it’s completely opposite. And look at that result… in Excel the x is treated as the 2nd character. In Power Query it is too, but because it starts counting at 0 we get a 1.
Some Thoughts
I find that even after using Power Query for a while now, I still have to look up the formula names from the Power Query formula categories page, both to find the formula name, and also the syntax. Part of this is due to the fact that there is no auto-complete/syntax help in the Power Query engine (I’m sure it will come eventually), but part is also that my instinct is to type the Excel function name first. And then, when I do get it right I’m constantly getting tripped up by the base 0 base 1 conversion.
As this is a tool aimed at Excel users, I am a bit disappointed in the formula naming convention. I could get used to pre-pending Text. to all of my functions, but I really wish the rest was similar to Excel. Maybe one day the PQ team can give us duplicate handles into the same back end function so that we can write stuff like this IN ADDITION to what already exists (don’t deprecate, just give us alternate routes):
- =Text.Right(text,characters)
Or how about:
- =Text.Mid(text,start,characters)
That would be good too, especially if the signature could be tweaked to require a one based parameter for the start character. That would be consistent with what the Excel pro would expect due to their experience with the MID function.
And how about this:
- =Text.Find(find_text,within_text)
How much easier would that be to learn if the function not only accepted parameters in the same order as the Excel function, but returned a 1 based result rather than a zero based result (again, similar to the way the current FIND function returns.)
Despite what I’m suggesting here, it does need to be recognized that there is a way to do the job, which is critical. I just think it could be better designed in order to make it easier for the seasoned Excel pro to learn because they’d be able to port their existing knowledge without having to learn a totally new syntax.
49 thoughts on “5 Very Useful Text Formulas – Power Query Edition”
Pingback: String functions in Power Query | MS Excel | Power Pivot | DAX
Pingback: Excel Roundup 20140825 « Contextures Blog
Pingback: Power Query – The IF function | The Ken Puls (Excelguru) Blog
Pingback: Power Query – Multi Condition Logic | The Ken Puls (Excelguru) Blog
Thanks for the update on commands, very useful!
Having issues adding a new column to a query, to return domain address details for email addresses - usually performed using:
OLD:
=MID(CELL,FIND("@",CELL),40) query
using the new format as below returns error
NEW
=Text.Range([Primary email address],Text.PositionOf([Primary email address],"@"),40)
* I use 40 as that is typically sufficient to get to end fo domain address (i.e. the @domain.com address)
AM i missing something?
Hi Greg,
Unfortunately, the Power Query Text.Range function can't be fed a final parameter that is larger than the number of characters, unlike MID. Give this a go, as it determines the number of characters for the final param:
=Text.Range([Primary email address],Text.PositionOf([Primary email address],"@")+1,Text.Length([Primary email address])-Text.PositionOf([Primary email address],"@")-1)
Hi Ken -
Solid post. Thanks for that! I do have a question though. Do you know if it's possible to use the FIND formula to generate a location within a text to split into two columns?
Example -
I have the following information:
Chatsworth, CAFindlay College Prep
Chino Hills, CAChino Hills High School
Katy, TXCypress Lakes High School
I want to split at after the state abbreviation so that I have two columns, state and school. To find the number of characters to move (from the left) to get to that split point, I added a Custom column with following formula: Text.PositionOf([HOMETOWN],",")+4
... and it seems to work fine but now I am stuck. Do you know if I can use that information to split columns at my desired point?
Is there another way to approach this issue?
Many thanks in advance!
Hi Fred,
Why not just use the "Split Column --> by Delimiter" and feed it a customer delimiter of ", "? Then you could split it again using "Split Column --> By Number of Characters" to get the first two. (Just remember to split by the "left most" in both cases.
Ken -
I think that would work for almost all of the instances but how would you suggest I account for the following International examples:
Melbourne, AUSMontverde Academy
New South Wales, AUSBrewster Academy
Vaughan, Ont., CANHuntington Prep
Both of these require more characters with your "Split Column --> By Number of Characters" option than the rows with US locations and the CAN option would require a step to account for the extra comma.
My goal is to find a solution that would work for all available scenarios in my data set.
Thanks again
Hi Ken...and Fred 🙂
Problem is not as simple as we think... but try this code. I assume that there are no more than 10 words in row. If more, you can change the proper lines (2) of code.
let
Source = Excel.CurrentWorkbook(){[Name="tblSource"]}[Content],
AddIndex = Table.AddIndexColumn(Source, "Indeks", 0, 1),
CustomCol = Table.AddColumn(AddIndex, "Custom", each Text.Remove([SourceData],{".",","})),
SplitColBySpace = Table.SplitColumn(CustomCol,"Custom",Splitter.SplitTextByDelimiter(" "),{"1", "2", "3", "4", "5", "6", "7", "8", "9", "10"}),
ChType = Table.TransformColumnTypes(SplitColBySpace,{{"1", type text}, {"2", type text}, {"3", type text}, {"4", type text}, {"5", type text}, {"6", type text}, {"7", type text}, {"8", type text}, {"9", type text}, {"10", type text}}),
RemCol = Table.RemoveColumns(ChType,{"SourceData"}),
UnPivotOthCol = Table.UnpivotOtherColumns(RemCol, {"Indeks"}, "Atrybut", "Warto??"),
DuplCol = Table.DuplicateColumn(UnPivotOthCol, "Warto??", "Copy of Warto??"),
ToUpper = Table.TransformColumns(DuplCol,{{"Copy of Warto??", Text.Upper}}),
LIntersect = Table.AddColumn(ToUpper, "Custom", each List.Intersect({Text.ToList([Warto??]),Text.ToList([Copy of Warto??])})),
LCount = Table.AddColumn(LIntersect, "Custom.1", each List.Count([Custom])),
ColCountry = Table.AddColumn(LCount, "Custom.2", each if [Custom.1]=1 then null else Text.Start([Warto??],[Custom.1]-1)),
RemOtherCol = Table.SelectColumns(ColCountry,{"Custom.2", "Indeks"}),
Country = Table.SelectRows(RemOtherCol, each ([Custom.2] null)),
MerTab = Table.NestedJoin(AddIndex,{"Indeks"},Country,{"Indeks"},"New"),
ExpNew = Table.ExpandTableColumn(MerTab, "New", {"Custom.2"}, {"Country"}),
AddColRepText = Table.AddColumn(ExpNew, "Part1", each Text.Replace([SourceData]," "&[Country],"&&")),
SplitCol = Table.SplitColumn(AddColRepText,"Part1",Splitter.SplitTextByDelimiter(",&&"),{"City", "School"}),
RemColumns = Table.SelectColumns(SplitCol,{"Country", "City", "School"})
in
RemColumns
Regards 🙂
oops... sorry... you need to replace every "Warto??" to "Value"
Fantastic, thanks for posting this Bill!
Thank you very much. Very helpful!
I just happend to see a copy-paste error in the RIGHT examples above:
Power Query =Text.End(text, num_chars) = Text.***Start***(“Excel”,2) el
obviously it should be
Power Query =Text.End(text, num_chars) =Text.***End***(“Excel”,2) el
Thanks again!
Thanks Anders. All fixed. 🙂
Thanks a lot! It is very helpful
I'm stuck with splitting one column into two. The column contains values such as "2.8kg", "2L", "600 g", "10.35 g". I need to extract the numbers incl decimals into one column and the letter (or text) into another. I can do it in Excel through a macro and a nested formula but would like to do this in Power Query.
@Shyla, I'm looking for something simlar. I have cells containing a lot of text. I need to extract a string that always starts with PRB and the following 6 digits. If that string is not found in the cells a null value applies.
Meant to link to the article posted on this subject: https://excelguru.ca/2015/11/19/keep-only-numbers-in-power-query/
Hi Wubbe,
Are you saying that you want a specific 6 digits, or any 6 digits? Maybe have a look at the article I linked to for Syla's solution. (Just posted the link as I missed cross referencing it before.)
I'm stuck with trimming last two characters from a column. Tried using Text.TrimEnd function but it gives errors. Please assist.
To use Text.TrimEnd, you need to provide the list of characters you want to trim. That could get rather complicated. My suggestion would be to pull the left x characters where x is the length of the text string minus 2:
=Text.Start([Column1],Text.Length([Column1])-2)
You'll need to update Column1 to your value, of course.
Pingback: OMG a magic formula – real facepalm moment – BI Tools enabling CSI
Text.Length(“Excel”) = 5 if counting is starting from 0 in PQ then it should be giving 4 only and not 5.
Note same thing for Text.Start(“Excel”,2) = Ex so again if counting start from 0 in PQ then this should give Exc and not Ex.
Even if I used Extract from ribbon it gives me identical results as normal Excel functions.
I believe problem is with only Text.Range function. Could you please clarify if I am missing anything here?
Hi Abhay,
No, it's definitely not just the Text.Range function. Text.PositionOf is another culprit, and there are more as well.
Hello,
Is there a way to feed regular Excel formula in an added column? I.e. I need to do some Vlookups in other sheets after I clean up data with Power query.
Hey there,
No, unfortunately you can't build an Excel formula inside Power Query then push it to Excel. It will only land inside the table as text.
You can, however, land the data in the table, then add the VLOOKUP in a new column. If you are going to do this, however, you should absolutely read this post: https://excelguru.ca/2016/08/23/fix-excel-formulas-dont-update-in-power-query-tables/
Thanks Ken for the tip!
Hi Ken, apologies if I am posting this improperly. May be hard to believe, but this is the first time I have posted publically in 25 years. At any rate... I have read this discussion on adapting from the Excel LEFT and FIND syntax to their DAX equivalents. I am somewhat confused on how I would combine these together. As an example below, I have an ip address in a worksheet cell A1 (10.80.0.50) for which I wish to extract the first 3 octets which will return a value of 10.80.0. I can do this in Excel using the following syntax =LEFT(A1,FIND("#",SUBSTITUTE(A1,".","#",3))-1). I cannot see how I can achieve same in DAX. Any and all help appreciated.
Hey, no worries, and welcome to Excelguru!
So this post is about M, not DAX. (DAX is the Power Pivot language, M is the Power Query language.)
I'll be honest (for either DAX or M), the better place to post is in our help forum at http://www.excelguru.ca/forums You'll find the people there very helpful and kind. One thing the experts will want to know though, is if you are intending this to be a Power Pivot formula or a Power Query formula. 🙂
Hi
can you help me to find if a long text contains a smaller text and not contain another small text i Power Query?
example:
if my long text is "ABCDEFGHIGKLMN" and i want to test if this text contanins "ABC" and not "ZXY"?
I have some text and i would like find whether it contains a full word.
e.g. I have attested my test results.
Now in above text i want to find whether it contained "test". Result should not consider "attested" while checking for "test". It seems that filter > contains or other M functions are not capable of doing this.
Any suggestions
Actually, I think this is a bit easier than you're thinking. You're not trying to find "test", you're trying to find " test " as a unique word with spaces around it, or possible " test." So if you adjust your pattern, you should be able to pick out what you need.
@ Ibrahim, easiest way is to write up two tests: one to see if it includes ABC and another to see if it includes ZXY, then add a conditional column to test what happens in each combination. You could absolutely roll that into one statement, but it might be easier to set it up individually first.
Great post Ken, comforting to know those handy Excel text functions translate.
Ken this was just the crosswalk I needed for my query. I love these in Excel, even more now in M! Thanks!!!
Ken, after using this new skill in M, I ran on an error when trying to parse out a customer number within ().
The cell is: CUSTOMER ABD (1234567)
When I use the text position of ( and ), I get an error:
Expression.Error: The 'count' argument is out of range.
However if I replace the second search for ")" with 7, it works fine.
Here is my code:
= Table.AddColumn(#"Removed Other Columns", "Cust#",
each Text.Range([customer],
Text.Position.Of([customer],"("),
Text.Position.Of([customer],")")
))
I appreciate you taking a look.
Mark
Hi Mark,
So the challenge you have is that you are returning the count of the ) character (21), where you need the difference between the ( and ) characters (7). So you need to subtract the position of the ( from the ) for the final value. In addition, then you'll have to shift the opening and closing counts by one in either direction due to Power Query's base 0 nature, returning this:
=Text.Range(
[customer],
Text.PositionOf([customer],"(") + 1,
Text.PositionOf([customer],")")
- Text.PositionOf([customer],"(") -1
)
Personally though, I'd use this one instead. 😉
=Text.BetweenDelimiters([customer], "(", ")"),
Totally makes sense! Thanks Ken! And yes I'll also now use Text.BetweenDelimiters, thanks again! Mark
Thanks so much for sharing this information! I'm an amateur developer just getting into some of the new methods Microsoft is moving to. Why could they not just throw out a pop up that says "Hey, this language is case sensitive."? Seems like that would've been pretty basic and easy to do. Thanks again!
Ugh. Error if the thing you're trying to extract digits from is a number, and not text.
Meaning you have to wrap the field in a Text.From as well e.g:
= Text.End(Text.From([Year]),2))
Why did they have to make this so hard, Ken? What was wrong with RIGHT and LEFT? Why don't they put some type coercion under the hood? I'm very grumpy about this, because it means that when you're learning PowerBI, you've got three things to learn:
1. What's not finished in the UI (such as multi column sorting)
2. Completely different language "a" with no intellisense (PowerQuery)
3. Easy to play but hard to master language "b" (DAX).
While PBI is a great tool, could they have made it harder to master if they had tried?
Hi Ken,
I just bought your book, M is for (DATA) Monkey and I'm going through it nicely. However, I have a project that needs some attention and I cannot seem to mimic my Excel formula that deals with MID and MOD.
I am working on creating a lookup table that has UPCs as unique values. I have to transform the UPCs on my lookup table so I can tie my data tables together. Right now I have to connect and export as table without connecting to the model, add a couple of columns to get the check digit (the formula I need in M), and then I concatenate and load that table to the model.
Here is the code I want to execute in M that calcualtes the UPC check digit.
=MOD(10-MOD((MID([UPC],12,1)+MID([UPC],10,1)+MID([UPC],8,1)+MID([UPC],6,1)+MID([UPC],4,1)+MID([UPC],2,1))*3+MID([UPC],11,1)+MID([UPC],9,1)+MID([UPC],7,1)+MID([UPC],5,1)+MID([UPC],3,1)+MID([UPC],1,1),10),10)
Once I can get that to work I will combine columns for a full 12 digit UPC.
Thank you
Hi Steve, I believe you need the formulas Number.Mod() and Number.From(Text.Middle()). But to be honest, you'll probably find it a lot easier if you can just split the column by 1 character (creating 10 columns), and change the data type to numbers. At that point you could add a custom column with Number.Mod() and build your calculation without having to worry about turning each value position into a number along the way.
Pingback: OMG a magic formula – real facepalm moment | Business Intelligence for Continual Service Improvement
Pingback: The IF Function In Power QueryThe Excelguru Blog
Pingback: ETL in Power Query - Adatis
Pingback: Multi Condition Logic In Power Query
Pingback: Separate Values and Text in Power Query - Part 2
Pingback: Power Query's Round Function
Pingback: Creating a Table Naming Conflict in Power Query