I've a dataset of the form :
Now for each combination of id1 and id2, so suppose id1 = 1 and id2 = 2, for each date value, i want to pick value from rows that lie within 1 week prior and 1 week post of the date in the current row, but in previous year.
So for example, id1 = 1, id2 = 2, date = 2023-06-01 i want to fetch value column values from rows with id1 = 1, id2 = 2 and date between 2022-05-24 and 2022-06-10, get the values from the value column and explode them into new columns.
So the data will finally look like :
If all days are not present in this range, the value will be filled later on by 0, but i need the data in sorted order, so if any middle day is missing, that should reflect in the resulting column.
How can i do this in pyspark or pandas even. I thought of using window and joins, but couldn't figure this out.
Any help is greatly appreciated.
EDIT :
An Example :
id1 id2 date value
0 1 2021-12-28 24
0 1 2021-12-30 24
0 1 2022-01-04 24
0 1 2022-01-06 24
0 1 2022-01-07 8
0 1 2022-01-11 16
0 1 2022-01-13 16
0 1 2023-01-03 16
0 1 2023-01-05 56
The target dataset should look like :
id1 id2 date value day-7 day-6 day-5 day-4 day-3 day-2 day-1 day-0 day+1 day+2 day+3 day+4 day+5 day+6 day+7
0 1 2021-12-28 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2021-12-30 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2022-01-04 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2022-01-06 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2022-01-07 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2022-01-11 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2022-01-13 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2023-01-03 16 0 24 0 24 0 0 0 0 24 0 24 8 0 0 0
0 1 2023-01-05 56 0 24 0 0 0 0 24 0 24 8 0 0 0 16 0