A couple of weeks back, I received a comment on my Force Power Query to Import as a Text File blog post. Rudi mentioned that he’d really like to see a method that would let us Treat Consecutive Delimiters as One; something that is actually pretty easy to find in Excel’s old Import as Text file feature.
…the option for “Treat consecutive delimiters as one” should be made available too. The text import eventually imported but the info was all out of sync due to the tabbed nature of the data – names longer and shorter cause other data to not line up.
So let’s take a look at this issue today. You can access and download the sample file I’ll be working with from this link.
The issue at hand
The contents of the text file, when viewed in Notepad, are fairly simple:
But the challenge shows up when we go to import it into Power Query. Let’s do that now:
- Power Query –> From File –> From Text
- Browse to where you stored the file and double click it to Open it up
The first indication that we have an issue is that we’ve got a single column, the second is that it’s a bit scattered:
Well no problem, it’s a Tab delimited file, so we’ll just split it by Tabs, and all should be good, right?
- Home –> Split Column –> By Delimiter
- Choose Tab and click OK
Uh oh… What happened here?
The issue is that we’ve got multiple delimiters acting between fields here. They look like one, but they’re not. Between Date and Vendor, for example, there are two quotes. But between the Date values and the Vendor values only one Tab is needed to line it up in the text document. The result is this crazy mess.
Approaches to the Issue
I can see three potential routes to deal with this problem:
- We could replace all instances of 2 Tab’s with a single Tab. We’d need to do that a few times to ensure that we don’t turn 4 Tabs into 2 and leave it though.
- We could write a function to try and handle this. I’m sure that this can be done, although I decided not to go that route this time.
- We could split the columns using the method that I’m going to outline below.
Before we start, let’s kill off the last two steps in the “Applied Steps” section, and get back to the raw import that we had at the very beginning. (Delete all steps except the “Source” step in the Applied Steps window.)
How to Treat Consecutive Delimiters As One
The method to do this is a three step process:
- Split by the Left Most delimiter (only)
- Trim the new column
- Repeat until all columns are separated
So let’s do it:
- Select Column1
- Split Column –> By Delimiter –> Tab –> At the left-most delimiter
Good, but see that blank space before Vendor? That’s one of those delimiters that we don’t want. But check what happens when you trim it…
- Select Column1.2
- Transform—>Format –> Trim
And look at that… it’s gone!
Let’s try this again:
- Select Column1.2
- Split Column –> By Delimiter –> Tab –> At the left-most delimiter
And trim the resulting column again:
- Select Column1.2.2
- Transform—>Format –> Trim
And we’re good. Now to just do a final bit of cleanup:
- Transform –> Use First Row as Headers
- Name the Query
And we’re finished:
Easy as Pie?
In this case, yes it was. Easy, if a bit painful. But what about spaces? If this file was space delimited I couldn’t have done this, as my Vendors all have spaces in them too. So then what?
I’d modify my procedure just a bit:
- Replace all instances of double spaces with the | character
- Split by the left most | character
- Replace all | characters with spaces
- Trim the new column
- Repeat until finished
Final Thought
I haven’t received too much in the way of data like this, as most of my systems dump data into properly delimited text or csv files, but I can certainly see where this is an issue for people. So I totally agree that it would be nice if there were an easier way to treat consecutive delimiters as one in Power Query.
I do like the versatility of the choices we have currently, but adding this as another option that works in combination with the existing ones would be fantastic.
Power Query team: here’s my suggestion…
🙂
6 thoughts on “Treat Consecutive Delimiters As One”
The problem with Trim is it also removes the excess spaces before and after the text not in between.
To treat successive white spaces as one you can use
sCols= Table.SplitColumn(mTrimCol,"mCols",Splitter.SplitTextByWhitespace(),{"C1", "C2","C3"})
You can then merge this with another Query that has 2 Columns Column Codes like C1, C2, C3 etc and then "Proper" Column Names like Data, Client , Amount etc
Oops...the first line should read
"The problem with Trim is it ONLY removes the excess spaces before and after the text not in between.."
Hi Sam,
Yes, that difference between the Excel TRIM function and Power Query's TRIM function is irritating. In this case though, I actually find it doesn't hurt me that much.
I tried to split using the SplitTextByWhitespace method, and ended up with garbage, as it splits on both the tabs (consecutive), as well as the spaces. So not ideal there. Then I tried:
-Replace " " with "|"
-SplitTextByWhitespace()
-Replace "|" with " "
That seems to work well. Pity that the SplitTextByWhitespace() method isn't exposed through the UI!
Ken,
re: #1 of three potential routes
I've used the following code to cleanup user entered text from Input and Text boxes. Maybe it has some applicabiltiy with Power Query (whatever that it)...
'---
'Converts adjacent separators in strText into a single instance
'and removes extra spaces between characters in strText.
'Passing strings ByVal allows Variants to be used.
'Error returns original text with any modifications prior to error.
Function VerifyText(ByVal strText As String, ByVal strChars As String) As String
'James Cone - Portland, Oregon USA
On Error GoTo DontVerify
Dim strPair As String
Dim strCopy As String
Dim lngLength As Long
Dim WSF As Excel.WorksheetFunction
Set WSF = Excel.WorksheetFunction
lngLength = VBA.Len(strChars)
'InStr would return 1 if strChars length was zero.
If lngLength < 1 Then Err.Raise 76543
strCopy = VBA.Trim(strText)
strPair = strChars & strChars
'Remove pairs
Do While VBA.InStr(1, strCopy, strPair, vbBinaryCompare)
strCopy = WSF.Substitute(strCopy, strPair, strChars)
Loop
'Remove from each end
If Left$(strCopy, lngLength) = strChars Then
strCopy = Right$(strCopy, Len(strCopy) - lngLength)
End If
If Right$(strCopy, lngLength) = strChars Then
strCopy = Left$(strCopy, Len(strCopy) - lngLength)
End If
VerifyText = strCopy
Set WSF = Nothing
Exit Function
DontVerify:
VerifyText = strText
End Function
Hi Jim,
It wouldn't have any applicability to Power Query, per se, (Power Query's language is so completely different to VBA) but if you fell back to a VBA solution without Power Query it could be useful. Thanks for posting!
Ken,
TX for this followup article. MUCH appreciated!!
Besides the info you gave to me through or correspondence, this step-by-step will be another useful reference for me until I get this process under the belt. The article is great; well structured and clear...and as always, love the screen clips as I am a visual learner.
Cheers!