## Coalesce values from 2 columns into a single column in a pandas dataframe

I'm looking for a method that behaves similarly to coalesce in T-SQL. I have 2 columns (column A and B) that are sparsely populated in a pandas dataframe. I'd like to create a new column using the following rules:

- If the value in column A
, use that value for the new column C*is not null* - If the value in column A
, use the value in column B for the new column C*is null*

Like I mentioned, this can be accomplished in MS SQL Server via the coalesce function. I haven't found a good pythonic method for this; does one exist?

use combine_first():

In [16]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab')) In [17]: df.loc[::2, 'a'] = np.nan In [18]: df Out[18]: a b 0 NaN 0 1 5.0 5 2 NaN 8 3 2.0 8 4 NaN 3 5 9.0 4 6 NaN 7 7 2.0 0 8 NaN 6 9 2.0 5 In [19]: df['c'] = df.a.combine_first(df.b) In [20]: df Out[20]: a b c 0 NaN 0 0.0 1 5.0 5 5.0 2 NaN 8 8.0 3 2.0 8 2.0 4 NaN 3 3.0 5 9.0 4 9.0 6 NaN 7 7.0 7 2.0 0 2.0 8 NaN 6 6.0 9 2.0 5 2.0

Coalesce values from 2 columns into a single column in a pandas dataframe Answer 10/19/2018 Developer FAQ 2 I'm looking for a method that behaves similarly to coalesce in T-SQL.

Try this also.. easier to remember:

df['c'] = np.where(df["a"].isnull(), df["b"], df["a"] )

This is slighty faster: `df['c'] = np.where(df["a"].isnull() == True, df["b"], df["a"] )`

%timeit df['d'] = df.a.combine_first(df.b) 1000 loops, best of 3: 472 µs per loop %timeit df['c'] = np.where(df["a"].isnull(), df["b"], df["a"] ) 1000 loops, best of 3: 291 µs per loop

Use axis=1 if you want to fill the NaN values with next column data. How pandas ffill works? ffill is a method that is used with fillna function to forward fill the values in a dataframe. so if there is a NaN cell then ffill will replace that NaN value with the next row or column based on the axis 0 or 1 that you choose.

`combine_first`

is the most straightforward option. There are a couple of others which I outline below. I'm going to outline a few more solutions, some applicable to different cases.

##### Case #1: Non-mutually Exclusive NaNs

Not all rows have NaNs, and these `NaN`

s are *not* mutually exclusive between columns.

df = pd.DataFrame({ 'a': [1.0, 2.0, 3.0, np.nan, 5.0, 7.0, np.nan], 'b': [5.0, 3.0, np.nan, 4.0, np.nan, 6.0, 7.0]}) df a b 0 1.0 5.0 1 2.0 3.0 2 3.0 NaN 3 NaN 4.0 4 5.0 NaN 5 7.0 6.0 6 NaN 7.0

Let's combine first on `a`

.

df['a'].mask(pd.isnull, df['b']) # df['a'].mask(df['a'].isnull(), df['b'])

0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 7.0 6 7.0 Name: a, dtype: float64

df['a'].where(pd.notnull, df['b']) 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 7.0 6 7.0 Name: a, dtype: float64

You can use similar syntax using `np.where`

.

Alternatively, to combine first on `b`

, switch the conditions around.

##### Case #2: Mutually Exclusive Positioned NaNs

All rows have `NaN`

s which are mutually exclusive between columns.

df = pd.DataFrame({ 'a': [1.0, 2.0, 3.0, np.nan, 5.0, np.nan, np.nan], 'b': [np.nan, np.nan, np.nan, 4.0, np.nan, 6.0, 7.0]}) df a b 0 1.0 NaN 1 2.0 NaN 2 3.0 NaN 3 NaN 4.0 4 5.0 NaN 5 NaN 6.0 6 NaN 7.0

This method works in-place, modifying the original DataFrame. This is an efficient option for this use case.

df['b'].update(df['a']) # Or, to update "a" in-place, # df['a'].update(df['b']) df a b 0 1.0 1.0 1 2.0 2.0 2 3.0 3.0 3 NaN 4.0 4 5.0 5.0 5 NaN 6.0 6 NaN 7.0

df['a'].add(df['b'], fill_value=0) 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 dtype: float64

** DataFrame.fillna** +

`DataFrame.sum`

df.fillna(0).sum(1) 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 dtype: float64

The Pandas equivalent to COALESCE is the method fillna(): result = column_a.fillna(column_b) The result is a column where each value is taken from column_a if that column provides a non-null value, otherwise the value is taken from column_b. So your combo1 can be produced with: df['first'].fillna(df['second']).fillna(df['third']) giving:

##### Coalesce for multiple columns with `DataFrame.bfill`

`december 2019 answer`

All these methods work for two columns and are fine with maybe three columns, but they all require method chaining if you have `n`

columns when `n > 2`

:

**example dataframe**:

import numpy as np import pandas as pd df = pd.DataFrame({'col1':[np.NaN, 2, 4, 5, np.NaN], 'col2':[np.NaN, 5, 1, 0, np.NaN], 'col3':[2, np.NaN, 9, 1, np.NaN], 'col4':[np.NaN, 10, 11, 4, 8]}) print(df) col1 col2 col3 col4 0 NaN NaN 2.0 NaN 1 2.0 5.0 NaN 10.0 2 4.0 1.0 9.0 11.0 3 5.0 0.0 1.0 4.0 4 NaN NaN NaN 8.0

Using `DataFrame.bfill`

over the index axis (`axis=1`

) we can get the values in a generalized way even for a big `n`

amount of columns

Plus, this would also work for `string type`

columns !!

df['coalesce'] = df.bfill(axis=1).iloc[:, 0] col1 col2 col3 col4 coalesce 0 NaN NaN 2.0 NaN 2.0 1 2.0 5.0 NaN 10.0 2.0 2 4.0 1.0 9.0 11.0 4.0 3 5.0 0.0 1.0 4.0 5.0 4 NaN NaN NaN 8.0 8.0

Using the `Series.combine_first`

(accepted answer), it can get quite cumbersome and would eventually be undoable when amount of columns grow

df['coalesce'] = ( df['col1'].combine_first(df['col2']) .combine_first(df['col3']) .combine_first(df['col4']) ) col1 col2 col3 col4 coalesce 0 NaN NaN 2.0 NaN 2.0 1 2.0 5.0 NaN 10.0 2.0 2 4.0 1.0 9.0 11.0 4.0 3 5.0 0.0 1.0 4.0 5.0 4 NaN NaN NaN 8.0 8.0

The goal is to concatenate the column values as follows: Day-Month-Year. To begin, you’ll need to create a DataFrame to capture the above values in Python. You may use the following code to create the DataFrame:

I encountered this problem with but wanted to coalesce multiple columns, picking the first non-null from several columns. I found the following helpful:

##### Build dummy data

import pandas as pd df = pd.DataFrame({'a1': [None, 2, 3, None], 'a2': [2, None, 4, None], 'a3': [4, 5, None, None], 'a4': [None, None, None, None], 'b1': [9, 9, 9, 999]}) df

a1 a2 a3 a4 b1 0 NaN 2.0 4.0 None 9 1 2.0 NaN 5.0 None 9 2 3.0 4.0 NaN None 9 3 NaN NaN NaN None 999

##### coalesce a1 a2, a3 into a new column A

def get_first_non_null(dfrow, columns_to_search): for c in columns_to_search: if pd.notnull(dfrow[c]): return dfrow[c] return None # sample usage: cols_to_search = ['a1', 'a2', 'a3'] df['A'] = df.apply(lambda x: get_first_non_null(x, cols_to_search), axis=1) print(df)

a1 a2 a3 a4 b1 A 0 NaN 2.0 4.0 None 9 2.0 1 2.0 NaN 5.0 None 9 2.0 2 3.0 4.0 NaN None 9 3.0 3 NaN NaN NaN None 999 NaN

Concatenate two columns of dataframe in pandas python Concatenating two columns of the dataframe in pandas can be easily achieved by using simple ‘+’ operator. Concatenate or join of two string column in pandas python is accomplished by cat () function. we can also concatenate or join numeric and string column. Let’s see how to

Let’s see how to split a text column into two columns in Pandas DataFrame. Method #1 : Using Series.str.split () functions. Split Name column into two different columns. By default splitting is done on the basis of single space by str.split () function.

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter.I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2.

Pandas : Get unique values in columns of a Dataframe in Python; Pandas : Convert Dataframe column into an index using set_index() in Python; Python Pandas : How to display full Dataframe i.e. print all rows & columns without truncation; Pandas : count rows in a dataframe | all or those only that satisfy a condition; Pandas : Get frequency of a

##### Comments

- Nice that you added an example of usage and output. It's appreciated!
- I am wishing you a Merry Christmas! Thank you for huge support! And small presetn for you (7), good luck!
- @jezrael, hey, thanks a lot! I wish you a Merry Christmas and a Happy New Year ! :-)
`df["a"].isnull`

is the*method*. You didn't actually call it.