Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

0 votes
0 answers
8 views

Plot datetime data as just month not month-year

I have a data frame that looks like this: structure(list(date = structure(c(1592611200, 1624665600, 1626480000, 1620086400, 1624147200, 1624752000, 1626566400, 1.566e+09, 1621036800, 1651536000), ...
Ryan Gary's user avatar
  • 143
1 vote
2 answers
25 views

In a dataframe,replace values ​from one column with multiple conditions and not in the same row to another column

I am trying to transfer values ​​from one column to another column in a dataframe, with multiple conditions and not in the same row. Values from Columns 'BEGUZ_H' and 'ENDUZ_H' to Columns 'BEGUZ' and '...
mxplk's user avatar
  • 67
0 votes
1 answer
24 views

Find average temperature from a range of datetime for each day in dataframe

This is a subset of the dataframe I have: structure(list(name = c("waldorf", "waldorf", "waldorf", "waldorf", "waldorf", "waldorf", "...
Ryan Gary's user avatar
  • 143
4 votes
2 answers
41 views

How do I categorize projects in a dataframe according to its title?

I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title. For that I want to use pre-defined keywords to identify which topic the project ...
Barbara Bressan Rocha's user avatar
0 votes
0 answers
43 views

Why data that are being written to excel are not starting from the 'A' column?

I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place. I have this function that reads the data: def updated_file(self, progress_bar): ...
Jugert Mucoimaj's user avatar
0 votes
0 answers
23 views

Pandas check if a column has NaT type, unable to find date diff with NaT values [duplicate]

I have StartDate and ExitDate two columns in my dataframe with NaT values in ExitDate column I wish to create a third column Tenure by finding Difference between ExitDate and StartDate. StartDate ...
Vinita's user avatar
  • 1,842
1 vote
1 answer
16 views

avg() over a whole dataframe causing different output

I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy. ...
anurag86's user avatar
  • 1,685
0 votes
2 answers
25 views

How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?

I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark. ...
Krushna's user avatar
  • 13
0 votes
1 answer
19 views

How to filter dataframe by column from different dataframe?

I want to filter dataframe by column with Strings from different dataframe. val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*)) ...
Joanna Kois's user avatar
0 votes
1 answer
22 views

Renaming dataframe column in Python with a string value in another dataframe by matching column/index names

Major edit: Apparently it is difficult to understand my question, so I'll do my best to concretize it. I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
Calle Flygare's user avatar
3 votes
3 answers
52 views

How to re order duplicates answers on polars dataframe

I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
user24900119's user avatar
-1 votes
0 answers
44 views

How to split a info in a single row in excel into columns using python [duplicate]

I have read a CSV file using pd.read_csv() I am trying to clean up this data but it is proving a bit difficult. Essentially all of the information is in a single column and row 1 and I need to split ...
rogue1's user avatar
  • 1
-1 votes
1 answer
39 views

How do I perform a smear between two dataframes in python/pandas? [duplicate]

I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
babyface's user avatar
3 votes
3 answers
74 views

Polars - Filter DataFrame using another DataFrame's row's

I have two Dataframes - graph and search with the same schema Schema for graph: SCHEMA = { START_RANGE: pl.Int64, END_RANGE: pl.Int64, } Schema for search: SCHEMA = { START: pl.Int64, ...
Ashmeet Lamba's user avatar
0 votes
0 answers
13 views

A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]

I am aware that this is a common issue, but I am confused why I am getting it here: train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0) I am doing exactly what the warning says: ...
Baron Yugovich's user avatar

15 30 50 per page
1
2 3 4 5 6