Treat Consecutive Delimiters As One

A couple of weeks back, I received a comment on my “Force Power Query to Import as a Text File” blog post.  Rudi mentioned that he’d really like to see a method that would let us Treat Consecutive Delimiters as One; something that is actually pretty easy to find in Excel’s old Import as Text file feature.

…the option for “Treat consecutive delimiters as one” should be made available too. The text import eventually imported but the info was all out of sync due to the tabbed nature of the data – names longer and shorter cause other data to not line up.

So let’s take a look at this issue today.  You can access and download the sample file I’ll be working with from this link.

The issue at hand

The contents of the text file, when viewed in Notepad, are fairly simple:

image

But the challenge shows up when we go to import it into Power Query.  Let’s do that now:

  • Power Query –> From File –> From Text
  • Browse to where you stored the file and double click it to Open it up

The first indication that we have an issue is that we’ve got a single column, the second is that it’s a bit scattered:

image

Well no problem, it’s a Tab delimited file, so we’ll just split it by Tabs, and all should be good, right?

  • Home –> Split Column –> By Delimiter
  • Choose Tab and click OK

image

Uh oh…  What happened here?

The issue is that we’ve got multiple delimiters acting between fields here.  They look like one, but they’re not.  Between Date and Vendor, for example, there are two quotes.  But between the Date values and the Vendor values only one Tab is needed to line it up in the text document.  The result is this crazy mess.

Approaches to the Issue

I can see three potential routes to deal with this problem:

  1. We could replace all instances of 2 Tab’s with a single Tab.  We’d need to do that a few times to ensure that we don’t turn 4 Tabs into 2 and leave it though.
  2. We could write a function to try and handle this.  I’m sure that this can be done, although I decided not to go that route this time.
  3. We could split the columns using the method that I’m going to outline below.

Before we start, let’s kill off the last two steps in the “Applied Steps” section, and get back to the raw import that we had at the very beginning.  (Delete all steps except the “Source” step in the Applied Steps window.)

How to Treat Consecutive Delimiters As One

The method to do this is a three step process:

  1. Split by the Left Most delimiter (only)
  2. Trim the new column
  3. Repeat until all columns are separated

So let’s do it:

  • Select Column1
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

Good, but see that blank space before Vendor?  That’s one of those delimiters that we don’t want.  But check what happens when you trim it…

  • Select Column1.2
  • Transform—>Format –> Trim

And look at that… it’s gone!

image

Let’s try this again:

  • Select Column1.2
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

And trim the resulting column again:

  • Select Column1.2.2
  • Transform—>Format –> Trim

And we’re good.  Now to just do a final bit of cleanup:

  • Transform –> Use First Row as Headers
  • Name the Query

And we’re finished:

image

Easy as Pie?

In this case, yes it was.  Easy, if a bit painful.  But what about spaces?  If this file was space delimited I couldn’t have done this, as my Vendors all have spaces in them too.  So then what?

I’d modify my procedure just a bit:

  1. Replace all instances of double spaces with the | character
  2. Split by the left most | character
  3. Replace all | characters with spaces
  4. Trim the new column
  5. Repeat until finished

Final Thought

I haven’t received too much in the way of data like this, as most of my systems dump data into properly delimited text or csv files, but I can certainly see where this is an issue for people.  So I totally agree that it would be nice if there were an easier way to treat consecutive delimiters as one in Power Query.

I do like the versatility of the choices we have currently, but adding this as another option that works in combination with the existing ones would be fantastic.

Power Query team: here’s my suggestion…

image

🙂

Share:

Facebook
Twitter
LinkedIn

6 thoughts on “Treat Consecutive Delimiters As One

  1. The problem with Trim is it also removes the excess spaces before and after the text not in between.
    To treat successive white spaces as one you can use

    sCols= Table.SplitColumn(mTrimCol,"mCols",Splitter.SplitTextByWhitespace(),{"C1", "C2","C3"})

    You can then merge this with another Query that has 2 Columns Column Codes like C1, C2, C3 etc and then "Proper" Column Names like Data, Client , Amount etc

  2. Oops...the first line should read
    "The problem with Trim is it ONLY removes the excess spaces before and after the text not in between.."

  3. Hi Sam,

    Yes, that difference between the Excel TRIM function and Power Query's TRIM function is irritating. In this case though, I actually find it doesn't hurt me that much.

    I tried to split using the SplitTextByWhitespace method, and ended up with garbage, as it splits on both the tabs (consecutive), as well as the spaces. So not ideal there. Then I tried:
    -Replace " " with "|"
    -SplitTextByWhitespace()
    -Replace "|" with " "

    That seems to work well. Pity that the SplitTextByWhitespace() method isn't exposed through the UI!

  4. Ken,
    re: #1 of three potential routes
    I've used the following code to cleanup user entered text from Input and Text boxes. Maybe it has some applicabiltiy with Power Query (whatever that it)...
    '---
    'Converts adjacent separators in strText into a single instance
    'and removes extra spaces between characters in strText.
    'Passing strings ByVal allows Variants to be used.
    'Error returns original text with any modifications prior to error.

    Function VerifyText(ByVal strText As String, ByVal strChars As String) As String
    'James Cone - Portland, Oregon USA
    On Error GoTo DontVerify
    Dim strPair As String
    Dim strCopy As String
    Dim lngLength As Long
    Dim WSF As Excel.WorksheetFunction

    Set WSF = Excel.WorksheetFunction
    lngLength = VBA.Len(strChars)

    'InStr would return 1 if strChars length was zero.
    If lngLength < 1 Then Err.Raise 76543
    strCopy = VBA.Trim(strText)
    strPair = strChars & strChars

    'Remove pairs
    Do While VBA.InStr(1, strCopy, strPair, vbBinaryCompare)
    strCopy = WSF.Substitute(strCopy, strPair, strChars)
    Loop
    'Remove from each end
    If Left$(strCopy, lngLength) = strChars Then
    strCopy = Right$(strCopy, Len(strCopy) - lngLength)
    End If
    If Right$(strCopy, lngLength) = strChars Then
    strCopy = Left$(strCopy, Len(strCopy) - lngLength)
    End If

    VerifyText = strCopy
    Set WSF = Nothing
    Exit Function

    DontVerify:
    VerifyText = strText
    End Function

  5. Hi Jim,

    It wouldn't have any applicability to Power Query, per se, (Power Query's language is so completely different to VBA) but if you fell back to a VBA solution without Power Query it could be useful. Thanks for posting!

  6. Ken,

    TX for this followup article. MUCH appreciated!!

    Besides the info you gave to me through or correspondence, this step-by-step will be another useful reference for me until I get this process under the belt. The article is great; well structured and clear...and as always, love the screen clips as I am a visual learner.

    Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts