Questions tagged [dataframe]
A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.
dataframe
77
questions
0
votes
0
answers
8
views
Plot datetime data as just month not month-year
I have a data frame that looks like this:
structure(list(date = structure(c(1592611200, 1624665600, 1626480000,
1620086400, 1624147200, 1624752000, 1626566400, 1.566e+09, 1621036800,
1651536000), ...
1
vote
2
answers
25
views
In a dataframe,replace values from one column with multiple conditions and not in the same row to another column
I am trying to transfer values from one column to another column in a dataframe, with multiple conditions and not in the same row.
Values from Columns 'BEGUZ_H' and 'ENDUZ_H' to Columns 'BEGUZ' and '...
0
votes
1
answer
24
views
Find average temperature from a range of datetime for each day in dataframe
This is a subset of the dataframe I have:
structure(list(name = c("waldorf", "waldorf", "waldorf", "waldorf",
"waldorf", "waldorf", "...
4
votes
2
answers
41
views
How do I categorize projects in a dataframe according to its title?
I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title.
For that I want to use pre-defined keywords to identify which topic the project ...
0
votes
0
answers
43
views
Why data that are being written to excel are not starting from the 'A' column?
I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place.
I have this function that reads the data:
def updated_file(self, progress_bar):
...
0
votes
0
answers
23
views
Pandas check if a column has NaT type, unable to find date diff with NaT values [duplicate]
I have StartDate and ExitDate two columns in my dataframe with NaT values in ExitDate column
I wish to create a third column Tenure by finding Difference between ExitDate and StartDate.
StartDate ...
1
vote
1
answer
16
views
avg() over a whole dataframe causing different output
I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy.
...
0
votes
2
answers
25
views
How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?
I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark.
...
0
votes
1
answer
19
views
How to filter dataframe by column from different dataframe?
I want to filter dataframe by column with Strings from different dataframe.
val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*))
...
0
votes
1
answer
22
views
Renaming dataframe column in Python with a string value in another dataframe by matching column/index names
Major edit:
Apparently it is difficult to understand my question, so I'll do my best to concretize it.
I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
3
votes
3
answers
52
views
How to re order duplicates answers on polars dataframe
I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
-1
votes
0
answers
44
views
How to split a info in a single row in excel into columns using python [duplicate]
I have read a CSV file using pd.read_csv()
I am trying to clean up this data but it is proving a bit difficult.
Essentially all of the information is in a single column and row 1 and I need to split ...
-1
votes
1
answer
39
views
How do I perform a smear between two dataframes in python/pandas? [duplicate]
I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
3
votes
3
answers
74
views
Polars - Filter DataFrame using another DataFrame's row's
I have two Dataframes - graph and search with the same schema
Schema for graph:
SCHEMA = {
START_RANGE: pl.Int64,
END_RANGE: pl.Int64,
}
Schema for search:
SCHEMA = {
START: pl.Int64,
...
0
votes
0
answers
13
views
A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]
I am aware that this is a common issue, but I am confused why I am getting it here:
train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0)
I am doing exactly what the warning says:
...