Questions tagged [dataframe]
A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.
dataframe
146,963
questions
0
votes
0
answers
7
views
Unable to delimit text into columns in R after pdf parsing
I have converted a pdf to text and trying to delimit the text which is not working properly
library(tidyverse)
library(pdftools)
library(lubridate)
pdf_rowwise <- strsplit(pdf_text("V://path//...
0
votes
0
answers
11
views
FutureWarning on The behavior of DataFrame concatenation
I am retrieving data from an API which contains multiple pages.
The required fields from the initial page is added to a pandas dataframe - in my code this variable is originally defined as df.
From ...
0
votes
1
answer
24
views
Python Pandas multi-indexing across column levels using .loc
I am still new to python and pandas and want to know if there's a better way to go about the indexing problem I'm having. Since I've seen people doing pretty slick things on this site, beyond what I ...
1
vote
0
answers
85
views
+100
Proportionately split dataframe with multiple target columns
I have a dataframe with 30 rows and 10 columns. 5 of the columns are input features and the other 5 are output/target columns. The target columns contain classes represented as 0, 1, 2. I want to ...
0
votes
0
answers
15
views
Normalizing Dataframe with a json column
Hi i am looking for a efficient way to normalise a data frame that contains a column with json data.
i get json response from a website that is saved as a dataframe.
The structure is shown below
enter ...
1
vote
2
answers
32
views
Plot datetime data as just month not month-year
I have a data frame that looks like this:
structure(list(date = structure(c(1592611200, 1624665600, 1626480000,
1620086400, 1624147200, 1624752000, 1626566400, 1.566e+09, 1621036800,
1651536000), ...
0
votes
2
answers
32
views
Find average temperature from a range of datetime for each day in dataframe
This is a subset of the dataframe I have:
structure(list(name = c("waldorf", "waldorf", "waldorf", "waldorf",
"waldorf", "waldorf", "...
1
vote
2
answers
38
views
In a dataframe,replace values from one column with multiple conditions and not in the same row to another column
I am trying to transfer values from one column to another column in a dataframe, with multiple conditions and not in the same row.
Values from Columns 'BEGUZ_H' and 'ENDUZ_H' to Columns 'BEGUZ' and '...
4
votes
2
answers
50
views
How do I categorize projects in a dataframe according to its title?
I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title.
For that I want to use pre-defined keywords to identify which topic the project ...
1
vote
4
answers
99
views
diagonal average calculation for my df in pandas
data = {
'SP': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
'State': ['cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', '...
431
votes
18
answers
521k
views
pandas get rows which are NOT in other dataframe
I've two pandas data frames that have some rows in common.
Suppose dataframe2 is a subset of dataframe1.
How can I get the rows of dataframe1 which are not in dataframe2?
df1 = pandas.DataFrame(data = ...
3
votes
3
answers
54
views
How to re order duplicates answers on polars dataframe
I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
0
votes
2
answers
29
views
How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?
I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark.
...
0
votes
0
answers
50
views
Why data that are being written to excel are not starting from the 'A' column?
I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place.
I have this function that reads the data:
def updated_file(self, progress_bar):
...
1
vote
1
answer
18
views
avg() over a whole dataframe causing different output
I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy.
...