Adding null to values returns null

Today a user brought up something in the forums I've seen before; adding null to values returns null.  Simply put, they wanted to add the values in two columns together, but didn't get the results they expected.

The issue: Adding null to values returns null

Have a look at the following data:

image

Never mind that it's pivoted, pretend that you want to use Power Query or Power BI Desktop to sum across the columns and put the total in each row. The problem, of course, is that we can't sum across columns, as Power Query doesn't do that.  So our first temptation is to reach to a custom column and use the formula:

=[ProductA] + [ProductB] + [ProductC]

But notice how on the second and third rows we get null instead of the totals we'd expect:

image

The issue is obviously the null, which totally screws up the math.  (This still surprises me, as Excel ignores null values, which is what I believe should happen here.)

My original suggestion (was poor)

My original thought (and recommendation) was to select the columns and go to Transform --> Replace Values and replace null with 0.  It seems to work in this case.

Having said that, there's a potential issue here.  Null and zero are not the same things at all.  A null is a effectively an empty data point, where 0 is a value. In a Power Pivot or Power BI model that could lead to reporting zero dollars of sales for a day versus reporting an empty set.  Are those different?  To me they are… empty means we weren't even open, where 0 means that we were and sold nothing.  Maybe it's subtle, but it's a real difference.

Another concern comes up when we are dealing with columns that have real values in them.  Simply put:

  • Average(null,null,null,1,2,3,4,5) = 3
  • Average(0,0,0,1,2,3,4,5) = 1.875

So while my answer worked mathematically for a SUM, it was in truth not a good one.  And, in fact, the poster replied as said that he couldn't replace nulls with zero, as this compromised his data.

The evolution of solution

Bill Sysyz dropped in and posted a bit of M code that would sum all records on the row, which look like this:

= Table.AddColumn(YourPreviousStep, "Sum", each List.Sum(Record.ToList(_)))

I think this is really cool, but it also had an issue in that it summed ALL records in the row.  But there are also columns that include text too.  Now we can adjust this to only pick up certain columns, but it involves writing some code that provides each column name in the List.Sum() function.

That kicked off a discussion between Bill and I about complexity, which ended up with my stumbling into a solution that worked.  And it's simple…

Adding null to values and returning values

So here's all we need to do:

  • Select the columns you want to sum (hold down CTRL to select non-contiguous columns)
  • Go to Add Column --> Standard --> Add

I did exactly that for the ProductA, ProductB and ProductC columns from the data above, and here's the results (compared with the custom column method):

image

And just for reference, here is the formula that was generated:

= Table.AddColumn(#"Added Custom", "Sum", each List.Sum({[ProductA], [ProductB], [ProductC]}), Int64.Type)

So pretty much what Bill provided with specific hard coded columns, and 100% user interface driven, just the way it should be!

Final thoughts

The amusing part to me about this is that I have no idea how long this has worked.  I stumbled on this solution during as I was typing "you should be able to just do this… " and it worked!  I've been working with this tool so long that I sometimes miss that some of the old gaps got filled in.  At any rate, it works, it's awesome, and hopefully it helps someone.  Smile

Share:

Facebook
Twitter
LinkedIn

17 thoughts on “Adding null to values returns null

  1. I was actually thinking about that as soon as I read your introduction to the problem. I know it's been around for quite a while, as I've used it for all kinds of things, and it works wonders when you need to show your work in a pivot table drilldown when a measure won't display. Very handy as well after doing some unpivoting to get a total column.

  2. For sure Matthew. I've always reached straight to a custom column due to habit, but I'm sure it's been in place for a while now.

  3. This is cool. Snipped.

    So what is the opposite of this - i.e. sum all columns on this row *except* for NamedColumnA, NamedColumnB, etc?

  4. Great question, Ed. You're thinking to automatically sum in any new columns added to the data set?

    I'd probably do it like this (assuming that "Source" holds the full table and "Inserted Sum" which sums the columns)
    -Select the Source step and create a new step by clicking the fx button on the formula bar
    -Remove the columns I wanted to NOT sum and rename that step as something like "SumColumns"
    -Edit the previously created Inserted Sum step formula to point to Source instead of SumColumns
    -Update the Inserted Sum step to use Table.ColumnNames(SumColumns) instead of the hard coded column names:

    = Table.AddColumn(#"Added Custom", "Sum", each List.Sum({[ProductA], [ProductB], [ProductC]}), Int64.Type)
    becomes
    = Table.AddColumn(#"Added Custom", "Sum", each List.Sum(Table.ColumnNames(SumColumns)), Int64.Type)

    Haven't tested it, but I believe that should work.

  5. I have encountered a separate issue with nulls when trying to create a numeric column. When using Access as a data source a field with nulls cannot be used to create a numeric column. A binary will be returned instead of numeric values and the table will not load correctly.
    Table.AddColumn(_Sales, "Custom", each if [Customer] = "John" then 1 else if [Customer] = "Bob" then 2 else null)

    If I return text rather than number it works fine.
    Table.AddColumn(_Sales, "Custom", each if [Customer] = "John" then "1" else if [Customer] = "Bob" then "2" else "")

    Also, if I choose a different data source, such as Excel, then adding a numeric column does work with null values.

    Here is my sample data:

    ID Sales Customer
    1 25 John
    2 32 Bob
    3 null null
    4 58 John
    5 0 Bob

  6. That's odd, and I can't explain it. What I would probably try to do is run your first using 0 instead of null, then try replacing all 0's in the column with null afterwards.

    Very odd though... the data source shouldn't have any influence over how your formula is evaluated...

  7. Thanks Ken. Actually there is no need to replace 0s with nulls. The text version of the formula returns an empty string for the nulls which converts to null when the data type is changed to integer. I just find it weird that this extra step is necessary when using an Access data source as opposed to Excel (have not tested with SQL yet).

  8. Returning to Bill's solution:
    Table.AddColumn(prevstep, "SumAll", each List.Sum(Record.ToList(_))) is ok when you have dynamic number of data columns but requires number type in each. But what to do when you have some other text format columns - for example "Product Name"? We might "limit" record fields:
    step = Table.AddColumn(prevstep, "SumAll", each List.Sum(List.RemoveItems(Record.ToList(_), {[Produnt Name]}))). Funny thing is that you can add next column (average) and add new exception (you don't want to evaluate average with SumAll in it!)
    nextstep = Table.AddColumn(prevstep, "AverageAll", each List.Average(List.RemoveItems(Record.ToList(_), {[Produnt Name], [SumAll]})))

    and so on... each aggregation requires adding new item to remove.
    Bill's idea is great!

  9. Bill's comments are always great. My only point on it was that it would be nice to be able to do this through the UI without having to write the code is all. 🙂

  10. Strangely if just add Product A and Product B column using Add Column > Standard > Add it was not working. But when i selected any "three" columns it was working. Not sure if it is bug or default feature.

  11. Ha, well look at that... you're right Abhay

    The reason is that when you only select two columns, the formula that PQ generates is actually [Column1] + [Column2], which takes us back to the original issue of 1 + null = null. When you select 3 or more columns PQ changes to use List.Sum({[Column1],[Column2],[Column3]}) formula.

    What that means is that you can work around the issue by creating a custom column using =List.Sum({[Column1],[Column2]}) and it should work. Go figure that the behaviour would be different!

  12. I just pinged MS on this, and the reason that they try to use the + syntax first is that List.Sum() won't fold back to the server in many cases.

    So for basic cases they erred on the side of performance. I'll leave it up to you to decide if that was the right choice or not, but it tells me something pretty important... if you can aggregate 4 columns using the + indicator, you'll get a query that can fold back to the server. If you use the default of List.Sum(), you won't, which could have big performance impacts. But then you need a way to deal with errors. Fun stuff!

  13. Thank you for this article as it resolves the common issue that most of the Power BI developers encounters. The catch here is that at least 3 columns should be involved for the auto formation of proper query with List.Sum(....). Thank you again!!

  14. I intend to do subtraction (CY - LY) with two columns and having the similar problem? selecting columns and subtraction is not working?

  15. Hi Muhammad, for help with your question, we ask that you post on the Excelguru Help Forums - the forum specifically for Power Query can be found here: https://www.excelguru.ca/forums/forumdisplay.php?19-Power-Query-(Get-amp-Transform)-amp-M-code. Please note that you must first set up a free user account in order to post on the forum. Information about setting up an account and using the forums can be found in our Forum Guide here: https://www.excelguru.ca/content.php?299-Forum-Guide. Hope this helps!

  16. doesn't adding NULL to a value return NULL in most if not all databases. It is NULL, once something is undefined, adding a value to it does not suddenly define it.

  17. Hi Rusty,

    So I think this is where we run into some issues between audience expectations. From a technical database point of view, yes, you're correct. The problem I see here is that this tool was not build for database people, it was built for end users/analysts/business people. To us null = empty/nothing. When I take 1 + nothing, I expect 1. You can argue that this is not technically correct - and you may be right, but it confuses the heck out of end users. (This is the exact reason that the UNIQUE() function in Excel returns the count of DISTINCT values, not UNIQUE values. To most laypeople, they don't understand that UNIQUE means "only appears once".)
    It's a hard line to draw, to be fair, but it is a misalignment between what is technically correct vs understood by the majority of users. At the end of the day PQ decided to go with what is technically correct.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts