Questions tagged [dataframe]
A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.
dataframe
16
questions
2
votes
0
answers
17
views
How do I categorize projects in a dataframe according to its title?
I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title.
For that I want to use pre-defined keywords to identify which topic the project ...
0
votes
0
answers
35
views
Why data that are being written to excel are not starting from the 'A' column?
I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place.
I have this function that reads the data:
def updated_file(self, progress_bar):
...
0
votes
0
answers
22
views
Pandas check if a column has NaT type, unable to find date diff with NaT values [duplicate]
I have StartDate and ExitDate two columns in my dataframe with NaT values in ExitDate column
I wish to create a third column Tenure by finding Difference between ExitDate and StartDate.
StartDate ...
1
vote
1
answer
16
views
avg() over a whole dataframe causing different output
I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy.
...
0
votes
2
answers
22
views
How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?
I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark.
...
0
votes
1
answer
18
views
How to filter dataframe by column from different dataframe?
I want to filter dataframe by column with Strings from different dataframe.
val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*))
...
0
votes
1
answer
21
views
Renaming dataframe column in Python with a string value in another dataframe by matching column/index names
Major edit:
Apparently it is difficult to understand my question, so I'll do my best to concretize it.
I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
2
votes
3
answers
50
views
How to re order duplicates answers on polars dataframe
I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
-1
votes
0
answers
44
views
How to split a info in a single row in excel into columns using python [duplicate]
I have read a CSV file using pd.read_csv()
I am trying to clean up this data but it is proving a bit difficult.
Essentially all of the information is in a single column and row 1 and I need to split ...
-1
votes
1
answer
37
views
How do I perform a smear between two dataframes in python/pandas? [duplicate]
I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
3
votes
3
answers
73
views
Polars - Filter DataFrame using another DataFrame's row's
I have two Dataframes - graph and search with the same schema
Schema for graph:
SCHEMA = {
START_RANGE: pl.Int64,
END_RANGE: pl.Int64,
}
Schema for search:
SCHEMA = {
START: pl.Int64,
...
0
votes
0
answers
13
views
A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]
I am aware that this is a common issue, but I am confused why I am getting it here:
train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0)
I am doing exactly what the warning says:
...
2
votes
2
answers
38
views
How to compare lists in two Pandas dataframes to get the common elements?
I want to compare lists from columns set_1 and set_2 in df_2 with ins column in df_1 to find all common elements.
I've started doing it for one row and one column but I have no idea how to compare all ...
-1
votes
0
answers
25
views
NameError Traceback (most recent call last) <ipython-input-3-9ec55f7a7976> in <module> : NameError: name 'books' is not defined
I am trying to plot the evolution of degree centrality over the books for some of the characters from Game of Thrones .I have a list evol that contains the computed degree centrality from all the ...
1
vote
5
answers
95
views
How to count the total entries by group when they are comma-separated
I'm working with the League of Legends Champions dataset
name
tags
Aatrox
Fighter
Ahri
Mage,Assassin
Akali
Assassin
Akshan
Marksman,Assassin
Alistar
Tank,Support
And I was wondering how to ...