2

I have a DataFrame that looks like so-

index    id    name    time     
1        101    A       00:12:00    
2        101    A       00:13:00    
3        101    A       00:14:00    
4        101    A       00:15:00    
.        .      .        .           
.        .      .        .           
59       101    A       01:11:00    
60       101    A       01:12:00    
.        .      .        .           
.        .      .        .           
119      101    A       02:11:00    
120      101    A       02:12:00
121      312    B       00:10:00
122      312    B       00:11:00
123      312    B       00:12:00
.        .      .        .           
.        .      .        .
180      312    B       01:09:00
181      312    B       01:10:00          

What I want to do: Take the DataFrame (say df1), add a new column 'hour' to it which increments whenever the time recorded for the corresponding id passes an hour. This new DataFrame (df2) will be processed further. Below is an example for what df2 would look like-

index    id    name     time        hour
1        101    A       00:12:00    0
2        101    A       00:13:00    0
3        101    A       00:14:00    0
4        101    A       00:15:00    0
.        .      .       .           .
.        .      .       .           .
59       101    A       01:11:00    0
60       101    A       01:12:00    1
.        .      .       .           .
.        .      .       .           .
119      101    A       02:11:00    1
120      101    A       02:12:00    2
121      312    B       00:10:00    0
122      312    B       00:11:00    0
123      312    B       00:12:00    0
.        .      .       .           .       
.        .      .       .           .
180      312    B       01:09:00    0
181      312    B       01:10:00    1

(I'm unable to add a complete and explicit example due to the size but in the unwritten data the hour has the same value as the last written hour value, while the time increments in ticks of 1 min)

Is there any easy way to do this?

I viewed this answer which didn't answer my questions since I'm trying to increment based on time here.

What the columns mean:

id: uniquely identifies the individual

name: name of the individual

time: the time at which the data was recorded

hour: the number of hours for which the id has data for (eg: the first recorded time for id 101 is 00:12:00, which means the hour should increment at 01:12:00, 02:12:00 and so on.

Edit: Removed irrelevant parts of the question

21
  • How do you add the new data to the old one?
    – mozway
    Commented Jul 10 at 9:11
  • I'll be creating a new df with the hour column and use it for further processing
    – Snak
    Commented Jul 10 at 9:13
  • 1
    Please provide your current code
    – mozway
    Commented Jul 10 at 9:13
  • 1
    Then please provide a meaningful example, there are no hours in your example, also your expected output should match the provided input and be complete (no ...). You're making us guess what you want, which is a waste of time for everyone. Please be explicit.
    – mozway
    Commented Jul 10 at 9:51
  • 1
    @Snak: can you please click edit on the question and add those details? Details down here in comments are not indexed or searchable, also it's ephemeral and liable to get deleted at any point.
    – smci
    Commented Jul 10 at 10:06

1 Answer 1

1

IIUC, convert to_datetime get the first value per group with groupby.transform then convert to total_seconds and perform a floor division by 3600s:

time = pd.to_datetime(df['time'], format='%H:%M:%S')

df['hour'] = (time.groupby([df['id'], df['name']])
                  .transform('first').rsub(time)
                  .dt.total_seconds().floordiv(3600)
                  .convert_dtypes()
             )

Output:

      id name      time  hour
1    101    A  00:12:00     0
2    101    A  00:13:00     0
3    101    A  00:14:00     0
4    101    A  00:15:00     0
59   101    A  01:11:00     0
60   101    A  01:12:00     1
119  101    A  02:11:00     1
120  101    A  02:12:00     2
121  312    B  00:10:00     0
122  312    B  00:11:00     0
123  312    B  00:12:00     0
180  312    B  01:09:00     0
181  312    B  01:10:00     1

Not the answer you're looking for? Browse other questions tagged or ask your own question.