Questions tagged [dataframe]
A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.
dataframe
146,961
questions
3
votes
3
answers
75
views
Polars - Filter DataFrame using another DataFrame's row's
I have two Dataframes - graph and search with the same schema
Schema for graph:
SCHEMA = {
START_RANGE: pl.Int64,
END_RANGE: pl.Int64,
}
Schema for search:
SCHEMA = {
START: pl.Int64,
...
0
votes
0
answers
13
views
A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]
I am aware that this is a common issue, but I am confused why I am getting it here:
train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0)
I am doing exactly what the warning says:
...
2
votes
2
answers
38
views
How to compare lists in two Pandas dataframes to get the common elements?
I want to compare lists from columns set_1 and set_2 in df_2 with ins column in df_1 to find all common elements.
I've started doing it for one row and one column but I have no idea how to compare all ...
-1
votes
0
answers
26
views
NameError Traceback (most recent call last) <ipython-input-3-9ec55f7a7976> in <module> : NameError: name 'books' is not defined
I am trying to plot the evolution of degree centrality over the books for some of the characters from Game of Thrones .I have a list evol that contains the computed degree centrality from all the ...
1
vote
5
answers
99
views
How to count the total entries by group when they are comma-separated
I'm working with the League of Legends Champions dataset
name
tags
Aatrox
Fighter
Ahri
Mage,Assassin
Akali
Assassin
Akshan
Marksman,Assassin
Alistar
Tank,Support
And I was wondering how to ...
1
vote
1
answer
48
views
Creating a large number of columns in R tidyverse based on a comparison with a specific column
I have a dataset in R tidyverse and I want to create 192 columns based on comparison with the sp column, just like the mp_comp_1 column. How can I do this for 192 columns in tidyverse?
library(...
0
votes
1
answer
37
views
Subtract dataframe into subdataframes using pandas
I have large dataframe and I want to substract this dataframe into smaller dataframes based on two conditions. Below is the small a piece of the dataframe:
| | id |outcome|
| -----...
1
vote
2
answers
56
views
Pandas dataframe groupby apply function with variable number of arguments
I have a pandas dataframe that looks like
import pandas as pd
data = {
"Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
"Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
"theta": [8,9,2,...
2
votes
1
answer
41
views
how do you sort column names in Date in descending order in pandas
I have this DataFrame:
Node Interface Speed Band_In carrier Date
Server1 wan1 100 80 ATT 2024-05-09
Server1 wan1 100 50 ...
-7
votes
2
answers
66
views
How to apply "if" condition on dataframes [duplicate]
So I am trying to create a list where it checks from the height column in the dataframe to see if the height is above 70, I want to append 2 and if it is between 66 and 70 append 1 otherwise append 0 ...
4
votes
1
answer
63
views
Get max date column name on polars
I'm trying to get the column name containing the maximum date value in my Polars DataFrame. I found a similar question that was already answered here.
However, in my case, I have many columns, and ...
0
votes
2
answers
56
views
Vectorized way to check if a string is in a dataframe column (set of strings)?
I have a pandas dataframe df. This dataframe has a column to_filter. to_filter is either an empty set or a set of strings. This dataframe also has an integer column id. The id may not be unique.
Given ...
0
votes
3
answers
38
views
How to use Python Pandas Groupby for multiple columns?
I have a dataframe that I am trying to do some calculations on and add a few columns.
Here is an example of the input dataframe:
df1:
Index Type Product Late or On Time
0 A X ...
1
vote
1
answer
33
views
Apply sklearn logloss with rolling on pandas dataframe
My function call looks something like
loss = log_loss(y_true=validate_d['y'], y_pred=validate_probs, sample_weight=validate_df['weight'], normalize=True)
Is there any way to combine this with pandas ...
1
vote
1
answer
22
views
finding the minimum value of matched rows between two dataframes
I have Two data frames
import pandas as pd
exam_1 = pd.DataFrame({'user': ['A', 'B', 'C'],
'marks': [10, 50, 40]})
exam_2 = pd.DataFrame({'user': ['A', 'C', 'D'],
...