Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

2 votes
0 answers
17 views

How do I categorize projects in a dataframe according to its title?

I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title. For that I want to use pre-defined keywords to identify which topic the project ...
Barbara Bressan Rocha's user avatar
0 votes
0 answers
35 views

Why data that are being written to excel are not starting from the 'A' column?

I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place. I have this function that reads the data: def updated_file(self, progress_bar): ...
Jugert Mucoimaj's user avatar
0 votes
0 answers
22 views

Pandas check if a column has NaT type, unable to find date diff with NaT values [duplicate]

I have StartDate and ExitDate two columns in my dataframe with NaT values in ExitDate column I wish to create a third column Tenure by finding Difference between ExitDate and StartDate. StartDate ...
Vinita's user avatar
  • 1,842
1 vote
1 answer
16 views

avg() over a whole dataframe causing different output

I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy. ...
anurag86's user avatar
  • 1,685
0 votes
2 answers
22 views

How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?

I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark. ...
Krushna's user avatar
  • 13
0 votes
1 answer
18 views

How to filter dataframe by column from different dataframe?

I want to filter dataframe by column with Strings from different dataframe. val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*)) ...
Joanna Kois's user avatar
0 votes
1 answer
21 views

Renaming dataframe column in Python with a string value in another dataframe by matching column/index names

Major edit: Apparently it is difficult to understand my question, so I'll do my best to concretize it. I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
Calle Flygare's user avatar
2 votes
3 answers
50 views

How to re order duplicates answers on polars dataframe

I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
user24900119's user avatar
-1 votes
0 answers
44 views

How to split a info in a single row in excel into columns using python [duplicate]

I have read a CSV file using pd.read_csv() I am trying to clean up this data but it is proving a bit difficult. Essentially all of the information is in a single column and row 1 and I need to split ...
rogue1's user avatar
  • 1
-1 votes
1 answer
37 views

How do I perform a smear between two dataframes in python/pandas? [duplicate]

I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
babyface's user avatar
3 votes
3 answers
73 views

Polars - Filter DataFrame using another DataFrame's row's

I have two Dataframes - graph and search with the same schema Schema for graph: SCHEMA = { START_RANGE: pl.Int64, END_RANGE: pl.Int64, } Schema for search: SCHEMA = { START: pl.Int64, ...
Ashmeet Lamba's user avatar
0 votes
0 answers
13 views

A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]

I am aware that this is a common issue, but I am confused why I am getting it here: train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0) I am doing exactly what the warning says: ...
Baron Yugovich's user avatar
2 votes
2 answers
38 views

How to compare lists in two Pandas dataframes to get the common elements?

I want to compare lists from columns set_1 and set_2 in df_2 with ins column in df_1 to find all common elements. I've started doing it for one row and one column but I have no idea how to compare all ...
emor's user avatar
  • 157
-1 votes
0 answers
25 views

NameError Traceback (most recent call last) <ipython-input-3-9ec55f7a7976> in <module> : NameError: name 'books' is not defined

I am trying to plot the evolution of degree centrality over the books for some of the characters from Game of Thrones .I have a list evol that contains the computed degree centrality from all the ...
acharyabibash's user avatar
1 vote
5 answers
95 views

How to count the total entries by group when they are comma-separated

I'm working with the League of Legends Champions dataset name tags Aatrox Fighter Ahri Mage,Assassin Akali Assassin Akshan Marksman,Assassin Alistar Tank,Support And I was wondering how to ...
Hiram Méndez's user avatar

15 30 50 per page