-2

I have a dataset with date and id columns, and I want to set a third column 'Status' where if the date is before the value in the dict/lookup table, then 'On', 'Off' if on or after the date.

For example: dict = {'A':'1/2/2024',...} or look up table =

ID Lookup date
A 1/2/2024
B 1/4/2024

To achieve:

Date ID Status
1/1/2024 A Off
1/2/2024 A On
1/3/2024 A On

My current solutions involves duplicating the lookup values over the length of the df, but I want to do so without adding another column to prevent unnecessary memory use for big datasets with millions of rows:

df.merge(lookup,on='ID',how='left') 
df.loc[df['Date']<df['lookup date'],'Status'] = 'Off'
df.loc[df['Date']>=df['lookup date'],'Status'] = 'On'
1
  • Left join would create a new column with millions more entry in a big dataset - wondering if there's a way to do it by referring to the lookup / dict?
    – yungkenny
    Commented Jul 11 at 18:41

1 Answer 1

0
      Date ID Status
2024-06-01  A    Off
2024-06-28  A     On
2025-06-01  B    Off
2025-06-28  B     On

In the code below, "lookup_table" is a dict and to_achieve_table is a dataframe.

import datetime
import pandas as pd

lookup_table = {'A':'2024-06-06','B':'2025-06-06'}
lookup_table = {key: datetime.datetime.strptime(value, "%Y-%m-%d").date() for key, value in lookup_table.items()}

to_achieve_table = pd.DataFrame({
    'Date': ['2024-06-01','2024-06-28','2025-06-01','2025-06-28'],
    'ID': ['A', 'A', 'B', 'B']
})

to_achieve_table['Date'] = pd.to_datetime(to_achieve_table['Date'], format="%Y-%m-%d").dt.date

to_achieve_table['Status'] = to_achieve_table.apply(
    lambda row: 'On' if row['Date'] >= lookup_table[row['ID']] else 'Off', axis=1
)

print(to_achieve_table.to_string(index=False))

Not the answer you're looking for? Browse other questions tagged or ask your own question.