All Questions
13,530
questions
0
votes
1
answer
31
views
Sort Pandas dataframe by Sub Total and count
I have a very large dataset called bin_df.
Using pandas and the following code I've assigned sub-total "Total" to each group:
bin_df = df[df["category"].isin(model....
1
vote
2
answers
33
views
Group By Two Variables and then Create New Column which is the Value of One Variable Based on the Value of Another Variable in Python (pandas)
I can do this in R but have no idea how to do this in Python.
I have data with sbj, num_item, visit, and height. I want to create baseline_height using pandas.
Ex:
sbj
num_item
visit
height
...
1
vote
2
answers
54
views
Pandas dataframe groupby apply function with variable number of arguments
I have a pandas dataframe that looks like
import pandas as pd
data = {
"Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
"Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
"theta": [8,9,2,...
0
votes
0
answers
64
views
Sorting a DataFrame by Multiple Conditions in Pandas
I'm struggling with a specific sort that I'm not managing to implement in Python.
Here's a sample dataframe
import pandas as pd
data = {
'product': ['A', 'A', 'A', 'B', 'B', 'B'],
'quantity': ...
1
vote
3
answers
68
views
dataframe filter groupby based on a subset
df_example = pd.DataFrame({'name': ['a', 'a', 'a', 'b', 'b', 'b'],
'class': [1, 2, 2, 3, 2, 2],
'price': [3, 4, 2, 1, 6, 5]})
I want to filter each ...
2
votes
1
answer
48
views
number every first unique piece in each group
In each group, each 1st unique item should be given a different number in new column 'num'.
I can form the groups but I don't know how to number the unique pieces.
Is there a way to do that ?
Unique ...
1
vote
1
answer
179
views
Permutation summation in Pandas dataframe growing super exponentially
I have a pandas dataframe that looks like
import pandas as pd
data = {
"Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
"Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
"theta": [8,9,2,...
0
votes
1
answer
46
views
Pandas, How can I group column 1 by column 2 with column 1's absolute max values without changing column 1 to absolute values?
So lets say I got a df_1 like this:
Floor UV
1 1 -2
2 1 3
3 1 -5
4 1 4
5 2 14
6 2 -15
And I have written this code:
output_df = df_1.loc[df_1.groupby(&...
1
vote
1
answer
26
views
Rolling Average with variable min_periods from another column
I have a dataframe with multiple accounts across the last few years and am trying to get the rolling average of a column, per account, which is easy enough. However I also need to have the min_periods ...
3
votes
1
answer
43
views
Groupby multiple columns and extract top rows based on non-grouped column value
I am trying to solve a problem some what very similar to: https://platform.stratascratch.com/coding/10362-top-monthly-sellers?code_type=2
here is my data frame:
product seller market ...
1
vote
3
answers
64
views
Problem using groupby and transform with conditional lambda on multiple columns in Pandas
I'm curious about a weird behavior I got while using Pandas.
My intial purpose was, for each group in my data, to replace all values in a column with NA when said column contains more than x% missing ...
0
votes
0
answers
36
views
Pandas interpolate on 2 missing values based on different columns and other specific filter function after groupby
After a groupby on a date column I would like to interpolate on 2 specific values based on different columns and also retrieve the value of another for which the sum of two columns is minimum ...
I ...
4
votes
1
answer
76
views
How to vectorize groupby combination lists of two columns in Pandas Dataframe
I've a dataframe and need to group by two columns from all possible combinations of dataframe columns ['A','B','C','D','E','F','G']
import pandas as pd
d = {'A': [0,1,1,0,0,1,0,0],
'B': [1,1,0,0,...
7
votes
1
answer
136
views
Translate Pandas groupby plus resample to Polars in Python
I have this code that generates a toy DataFrame (production df is much complex):
import polars as pl
import numpy as np
import pandas as pd
def create_timeseries_df(num_rows):
date_rng = pd....
3
votes
3
answers
95
views
Efficiently remove rows from pandas df based on second latest time in column
I have a pandas Dataframe that looks similar to this:
Index
ID
time_1
time_2
0
101
2024-06-20 14:32:22
2024-06-20 14:10:31
1
101
2024-06-20 15:21:31
2024-06-20 14:32:22
2
101
2024-06-20 15:21:31
...