My last blog post was interesting in that I got a few emails about it. Both Imke Feldman and Bill Szysz sent me better methods, and a blog commenter asked for a slightly different version. For this post, I’m going to adapt Imke’s technique to show how we can Keep Only Numbers from a string of text (removing all other characters).
Other Posts on the Subject
Each of these posts will be a targeted to a specific scenario with its own idiosyncrasies. Which you need depends on your specific situation, of course:
- My original post to split off measurements leaving only the numbers (this will only work if there are no numbers in the measurement)
- The method in this post (which will remove all numbers – or text – in the input)
- Bill Szysz’s method to split off measurements (coming in a future post, but better than my original as it doesn’t break when measurements also include numbers)
In this Post:
In this post, we are going to keep only numbers in our data set. Actually, we’ll also keep spaces and decimals the first time around, but we could easily modify the function to clear those too. So for our first go, we’ll convert the data in the left column below, to show as displayed in the right column:
Of course, I started by just pulling the data into Power Query via the From Table command.
How to Keep Only Numbers
Looking at this from a logic point of view, what we want to accomplish is to remove any character that is not a number. Ideally, we would like to use a function like this in a custom column in order to do so:
The first parameter should be pretty easy, we could just feed in the [Quantity] column, but how would we provide all the characters to the last parameter?
Here’s the cool part… removeChars is an “any” datatype… that means we’re not restricted to a single character, we can actually provide a list. So all we need to do is find a way to create a list of the characters to remove.
This is where Imke’s email to me was really helpful. She had a step similar to the following in her code:
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_))
So what does this do? It actually creates a list of non-contiguous numbers (33-45, 47, 58-126), then transforms each value in the list into it’s alphanumeric equivalent. A partial set of the results is shown here:
For reference, character 32 is a space, 46 is a period, and 49-57 are the values from 0 through 9 – facts that you can discover by changing the values inside the lists.
In order to use this, I just popped into the Advanced Editor, and pasted the line above right between the “let” and “Source=…” lines. (Don’t forget to add a comma at the end.) And with a nice list of values contained the the CharsToRemove step, we can now create the custom column from the Source step:
- Add Columns –> Add Custom Column
- Name: Result
- Formula: =Text.Remove([Quantity],CharsToRemove)
And it loads up nicely:
Now, keep in mind here that the purposed of this is to strip all characters except the numbers. In the case of things like m2 and m3 in this data set, we’re left with a the final value, but that is exactly what the query is designed to do.
The final M code for this solution is:
let
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_)),
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quantity", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity],CharsToRemove))
in
#"Added Custom"
Keeping Only Numbers
What if we wanted to also remove any spaces and decimals? Easy enough, just add those values to the original list in the CharsToRemove step as follows:
CharsToRemove = List.Transform({32,46,33..45,47,58..126}, each Character.FromNumber(_))
And the result:
Removing Numbers Only
Now let’s keep the text and remove the numeric characters from 0-9 only. To do this we modify the original list values again:
CharsToRemove = List.Transform({48..57}, each Character.FromNumber(_))
And the result:
End Result
This is pretty neat. Once we recognize which character code represents each character, we can build a list of those characters to remove, and take care of them all in one shot. To put it all together, here is a look at the different views all shown in one table:
You can also download the completed file here.
18 thoughts on “Keep Only Numbers in Power Query”
Really cool!
Thanks
Thank you!
Cool approach to the problem. A tiny comment about this. If we know what we want to remove then we should use a list contains only these characters ( because of efficiency). Shorter list - better performance. So for example, if we want to remove letters only, we can use construction like this below
CharsToRemove = {"a".."z"}&Text.ToList("ąćęłńśóźż")
and then the last step looks like this
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove(Text.Lower([Quantity]), CharsToRemove)).
This is more efficient than using similar construction
CharsToRemove = {"a".."z","A".."Z"}&Text.ToList("ąćęłńóśźżĄĆĘŁŃÓŚŹŻ")
and
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity], CharsToRemove))
But like I said on the beginning, Imke's approach you've shown in this post was very cool 🙂
Thanks 🙂
Old good post, saved me a lot of time. Thanks!
Very Helpful. Thanks
Glad you found it helpful, Stan. Thanks for reading!
Pingback: Removing letters and special characters in Power BI query editor
Hello.
I am very new to this and I have a similar issue but in reverse. What if you had something like this " AttributeType: FILE NAME;AttributeValue: PLAZ_20180301_0100.150;RM: 7 records skipped for PLAZ_20180301" and you needed to removed extract "RM: 7 records skipped for" and place in its own column how would you do that ?
To be honest, I'd just split the column by the right most ; in this case. That should give you the value you're looking for.
Hi Ken,
I'm looking into a similair issue where I want to create 'plain item numbers' so everything from 0-9, a-z and A-Z are valid characters and anything else is not...
So for me it would make sence to specify what I do want over what I don't, let's call it "CharsToKeep" could that be done in a similair fasion?
Sure Melissa, how about this: Text.Select([YourColumn],{"A".."z","0".."9"})
I'll look into that - thanks Ken!
reaching out for help for similar issue. I need to extract 98526988 from cell with text "P01 Acc PO 98526988 CPE - MCKINSTRY CO LLC". 8 series numeric string can occur at any place in the cell across column and does not have fixed start point. Please help what formula should I use
That's a tough one @powerBILearner... what happens when you run it through Columns From Example?
I have an approach - not elegant though.
1. Read the data into Power Query, make a copy of the data column.
2. Split the text words in the string by "space" into rows, not columns
3. Add a column with string length
4. Filter out all items with string length 8
5. Change the remaining items to type Whole Number. Any items containing text will be errors.
6. Filter our errors using [Remove Rows][Remove Errors]
6. The remaining items are your serial numbers
I tested by creating multiple strings with the segments moved around. Added extra spaces all over the place. This worked ok for me.
Minor correction to the code: using the first snippet removes the everything but the numbers.
Thanks for the great article 🙂
Thank you so much for this extremely useful tip. I used a modified version to clean a phone numbers columns like this:
1) Create a "CharsToRemove" custom column using
Text.Remove([Phone Numbers on File],List.Transform({48..57}, each Character.FromNumber(_)))
formula to capture any characters other than numbers 0 to 9.
2) Create another custom column CleanPhone by removing any characters captured in "CharsToRemove" using the M formula
Text.Remove([Phone Numbers on File],Text.ToList([CharsToRemove]))
Pingback: Separate Values and Text in Power Query - Part 2