Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

0 votes
0 answers
7 views

Unable to delimit text into columns in R after pdf parsing

I have converted a pdf to text and trying to delimit the text which is not working properly library(tidyverse) library(pdftools) library(lubridate) pdf_rowwise <- strsplit(pdf_text("V://path//...
0 votes
0 answers
11 views

FutureWarning on The behavior of DataFrame concatenation

I am retrieving data from an API which contains multiple pages. The required fields from the initial page is added to a pandas dataframe - in my code this variable is originally defined as df. From ...
0 votes
1 answer
24 views

Python Pandas multi-indexing across column levels using .loc

I am still new to python and pandas and want to know if there's a better way to go about the indexing problem I'm having. Since I've seen people doing pretty slick things on this site, beyond what I ...
1 vote
0 answers
85 views
+100

Proportionately split dataframe with multiple target columns

I have a dataframe with 30 rows and 10 columns. 5 of the columns are input features and the other 5 are output/target columns. The target columns contain classes represented as 0, 1, 2. I want to ...
0 votes
0 answers
15 views

Normalizing Dataframe with a json column

Hi i am looking for a efficient way to normalise a data frame that contains a column with json data. i get json response from a website that is saved as a dataframe. The structure is shown below enter ...
1 vote
2 answers
32 views

Plot datetime data as just month not month-year

I have a data frame that looks like this: structure(list(date = structure(c(1592611200, 1624665600, 1626480000, 1620086400, 1624147200, 1624752000, 1626566400, 1.566e+09, 1621036800, 1651536000), ...
0 votes
2 answers
32 views

Find average temperature from a range of datetime for each day in dataframe

This is a subset of the dataframe I have: structure(list(name = c("waldorf", "waldorf", "waldorf", "waldorf", "waldorf", "waldorf", "...
1 vote
2 answers
38 views

In a dataframe,replace values ​from one column with multiple conditions and not in the same row to another column

I am trying to transfer values ​​from one column to another column in a dataframe, with multiple conditions and not in the same row. Values from Columns 'BEGUZ_H' and 'ENDUZ_H' to Columns 'BEGUZ' and '...
4 votes
2 answers
50 views

How do I categorize projects in a dataframe according to its title?

I have a dataframe where I want to categorize energy releated projects in 4 different topics according to its title. For that I want to use pre-defined keywords to identify which topic the project ...
1 vote
4 answers
99 views

diagonal average calculation for my df in pandas

data = { 'SP': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'State': ['cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', 'cm', '...
431 votes
18 answers
521k views

pandas get rows which are NOT in other dataframe

I've two pandas data frames that have some rows in common. Suppose dataframe2 is a subset of dataframe1. How can I get the rows of dataframe1 which are not in dataframe2? df1 = pandas.DataFrame(data = ...
3 votes
3 answers
54 views

How to re order duplicates answers on polars dataframe

I have a Polars dataframe that contains multiple questions and answers. The problem is that each answer is contained in its own column, which means that I have a lot of redundant information. ...
0 votes
2 answers
29 views

How to get nested xml structure as a string from an xml document using xpath in pyspark dataframe?

I have a dataframe with a string datatype column with XML string. Now I want to create a new column with a nested XML structure from the original column. For this, I tried using XPath in PySpark. ...
0 votes
0 answers
50 views

Why data that are being written to excel are not starting from the 'A' column?

I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place. I have this function that reads the data: def updated_file(self, progress_bar): ...
1 vote
1 answer
18 views

avg() over a whole dataframe causing different output

I see that dataframe.agg(avg(Col) works fine, but when i calculate avg() over a window over whole column(not using any partition), i see different results based on which column i use with orderBy. ...

15 30 50 per page
1
2 3 4 5
9798