We see that our dataframe contains repetitive index (ID) and also not unique values for each unique index. Our aim is to convert Product_Name column into rows such as to keep index unique and in each new colum to be each product name. We need the following format:
ID
0
1
2
100
Apple
Banana
NaN
200
Cherry
Apricot
Apple
300
Avocado
Avocado
NaN
400
Orange
NaN
NaN
To achieve our aim we can use cumcount for new columns names to MultiIndex by set_index and reshape by unstack. The second way to do this will be to create Series of lists and new DataFrame by contructor:
# Second way of doing the same as aboves = df.groupby("ID")["Product_Name"].apply(list)pd.DataFrame(s.values.tolist(), index=s.index)
0
1
2
ID
100
Apple
Banana
None
200
Cherry
Apricot
Apple
300
Avocado
Avocado
None
400
Orange
None
None
That’s good. We found the solution, but what if we want to have just two columns. I mean to aggregate Product_Name column values in one cell for each unique index? That’s even easy to do. Use groupby then applyjoin and reset_index