How to add a time column that increments based on id column value in pandas?

Question

I have a DataFrame that looks like so-

index    id    name    time     
1        101    A       00:12:00    
2        101    A       00:13:00    
3        101    A       00:14:00    
4        101    A       00:15:00    
.        .      .        .           
.        .      .        .           
59       101    A       01:11:00    
60       101    A       01:12:00    
.        .      .        .           
.        .      .        .           
119      101    A       02:11:00    
120      101    A       02:12:00
121      312    B       00:10:00
122      312    B       00:11:00
123      312    B       00:12:00
.        .      .        .           
.        .      .        .
180      312    B       01:09:00
181      312    B       01:10:00

What I want to do: Take the DataFrame (say df1), add a new column 'hour' to it which increments whenever the time recorded for the corresponding id passes an hour. This new DataFrame (df2) will be processed further. Below is an example for what df2 would look like-

index    id    name     time        hour
1        101    A       00:12:00    0
2        101    A       00:13:00    0
3        101    A       00:14:00    0
4        101    A       00:15:00    0
.        .      .       .           .
.        .      .       .           .
59       101    A       01:11:00    0
60       101    A       01:12:00    1
.        .      .       .           .
.        .      .       .           .
119      101    A       02:11:00    1
120      101    A       02:12:00    2
121      312    B       00:10:00    0
122      312    B       00:11:00    0
123      312    B       00:12:00    0
.        .      .       .           .       
.        .      .       .           .
180      312    B       01:09:00    0
181      312    B       01:10:00    1

(I'm unable to add a complete and explicit example due to the size but in the unwritten data the hour has the same value as the last written hour value, while the time increments in ticks of 1 min)

Is there any easy way to do this?

I viewed this answer which didn't answer my questions since I'm trying to increment based on time here.

What the columns mean:

id: uniquely identifies the individual

name: name of the individual

time: the time at which the data was recorded

hour: the number of hours for which the id has data for (eg: the first recorded time for id 101 is 00:12:00, which means the hour should increment at 01:12:00, 02:12:00 and so on.

Edit: Removed irrelevant parts of the question

I'll be creating a new df with the hour column and use it for further processing — Snak, Commented Jul 10 at 9:13
Then please provide a meaningful example, there are no hours in your example, also your expected output should match the provided input and be complete (no ...). You're making us guess what you want, which is a waste of time for everyone. Please be explicit. — mozway, Commented Jul 10 at 9:51
@Snak: can you please click edit on the question and add those details? Details down here in comments are not indexed or searchable, also it's ephemeral and liable to get deleted at any point. — smci, Commented Jul 10 at 10:06

mozway · Accepted Answer · 2024-07-10 10:52:12Z

IIUC, convert to_datetime get the first value per group with groupby.transform then convert to total_seconds and perform a floor division by 3600s:

time = pd.to_datetime(df['time'], format='%H:%M:%S')

df['hour'] = (time.groupby([df['id'], df['name']])
                  .transform('first').rsub(time)
                  .dt.total_seconds().floordiv(3600)
                  .convert_dtypes()
             )

Output:

      id name      time  hour
1    101    A  00:12:00     0
2    101    A  00:13:00     0
3    101    A  00:14:00     0
4    101    A  00:15:00     0
59   101    A  01:11:00     0
60   101    A  01:12:00     1
119  101    A  02:11:00     1
120  101    A  02:12:00     2
121  312    B  00:10:00     0
122  312    B  00:11:00     0
123  312    B  00:12:00     0
180  312    B  01:09:00     0
181  312    B  01:10:00     1

Collectives™ on Stack Overflow

How to add a time column that increments based on id column value in pandas?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pandasdataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
pandas
dataframe
or ask your own question.