Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

0 votes
0 answers
32 views

Why data that are being written to excel are not starting from the 'A' column?

I'm using pandas to copy data from one excel to another and the data are being copied just not at the right place. I have this function that reads the data: def updated_file(self, progress_bar): ...
Jugert Mucoimaj's user avatar
0 votes
0 answers
21 views

Is using a Pandas Dataframe as a read-only table scalable in a Flask App?

I'm developing a small website in Flask that relies on data from a CSV file to output data to a table on the frontend using JQuery. The user would select an ID from a drop-down on the front-end, then ...
GreenGodot's user avatar
  • 6,580
0 votes
0 answers
22 views

Pandas check if a column has NaT type, unable to find date diff with NaT values [duplicate]

I have StartDate and ExitDate two columns in my dataframe with NaT values in ExitDate column I wish to create a third column Tenure by finding Difference between ExitDate and StartDate. StartDate ...
Vinita's user avatar
  • 1,842
1 vote
1 answer
26 views

Python Pandas difference in boolean indexing between ~ != and ==

I am confused about different results of boolean indexing when using ~ after != versus when using just == I have a pandas df with 4 columns: dic = { "a": [1,1,1,0,0,1,1], "b&...
Martin's user avatar
  • 25
0 votes
0 answers
9 views

FutureWarning in emobpy: incompatible dtype assignment with Pandas DataFrame

I am using the emobpy library to set custom rules for a mobility analysis, but I encounter a FutureWarning about incompatible data types when trying to modify DataFrame items. Here's the problematic ...
OUSSAMA ZIADI's user avatar
-3 votes
0 answers
22 views

لا استطيع ايجاد المكاتب التي قمت بتنزيلها مثل pandas opencv [closed]

مشكلتي هي انني قمت بتنزيل المكاتب مثل numpy - opencv - pandas و الكثيير من المكاتب التي قمت بتنزيلها من واجهه الاوامر في نظام التشغيل ويندوز 10 ولكن عند الدخول الى بيئه برمجه بايثون وهي ال pycharm و ...
Bello's user avatar
  • 1
-1 votes
1 answer
32 views

how do you merge values in rows, replace nan values in pandas

I am doing some manipulation on a data frame: df Node Interface Speed carrier 1-May 9-May 2-Jun 21-Jun Server1 internet1 10 ATT 20 30 ...
user1471980's user avatar
  • 10.5k
0 votes
0 answers
20 views

text_auto Parameter Not Working in Plotly

The text_auto parameter for a Plotly Express bar chart is not functioning for me, despite seemingly correct syntax. I am using both Jupyter Notebook and Eclipse and the issue persists in both. Plotly ...
Lysyd's user avatar
  • 1
0 votes
2 answers
41 views

Sort Pandas dataframe by Sub Total and count

I have a very large dataset called bin_df. Using pandas and the following code I've assigned sub-total "Total" to each group: bin_df = df[df["category"].isin(model....
Charlotte's user avatar
  • 411
0 votes
3 answers
56 views

How to find rows with value on either side of a given value?

Python, Pandas, I have a dataframe containing datetimes and values. # Create an empty DataFrame with 'timestamp' and 'value' columns df = pd.DataFrame(columns=['timestamp', 'value']) df.set_index('...
Dave's user avatar
  • 401
-1 votes
1 answer
31 views

pd.to_datetime() not consistently working to convert objects

I have been working with this data (csv) that exists in an AWS S3 bucket. When I am pulling the data I have to transform all the columns to their correct dtypes. All other dtypes are working properly ...
Keegan Husom's user avatar
1 vote
1 answer
67 views

How can I filter df “A” using as a condition a comparison to df “B”?

I’ve got 2 dataframes, dfA and dfB, with different shapes and with different orders. dfA is contained in dfB. There are 3 columns in this example, “Job Title”, “Job Department” and “Job Salary”. dfA ...
Alex's user avatar
  • 17
-1 votes
0 answers
30 views

Index into fields of a DataFrame row without needing a row index? [duplicate]

In pandas, is there a way to work with one row of a DataFrame at a time, and for each row, indexing into the columns by name but without indexing into the rows? My current approach is (say) modifying ...
user2153235's user avatar
  • 1,052
0 votes
1 answer
25 views

Pandas UDF to derive new column

In Spark/Databricks, I have a pandas dataframe with a string column. I need to perform multiple actions on this column (data cleansing type stuff), and produce a new column from the result. Here's my ...
Andrew's user avatar
  • 8,613
-5 votes
0 answers
39 views

Pandas .isin returns an empty dataframe even though I know the data is there [closed]

I have two dataframes, one titled small and one titled big. Small has a column called 'Part Number' while big has columns called 'fpartno' and 'fgroup'. I want to find all values in the 'Part Number' ...
Karl Boma's user avatar
-1 votes
0 answers
44 views

How to split a info in a single row in excel into columns using python [duplicate]

I have read a CSV file using pd.read_csv() I am trying to clean up this data but it is proving a bit difficult. Essentially all of the information is in a single column and row 1 and I need to split ...
rogue1's user avatar
  • 1
2 votes
3 answers
57 views

How to convert JSONL to parquet efficiently?

Given a jsonl file like this: {"abc1": "hello world", "foo2": "foo bar"} {"foo2": "bar bar blah", "foo3": "blah foo"} ...
alvas's user avatar
  • 120k
0 votes
0 answers
29 views

Python threads 'starved' by pandas operations

I am creating a UI application with Qt in Python. It performs operations on pandas DataFrames in a separate threading.Thread to keep the UI responsive; no individual pandas instruction takes noticable ...
AirToTec's user avatar
0 votes
0 answers
32 views

Minimize RAM usage of pandas operations in python

I have a python function using pandas that does operations on some dataframes. This python functions currently consumes a lot of RAM. I have tried to minimize RAM usage as much as possible but ...
BD O's user avatar
  • 13
-1 votes
1 answer
37 views

How do I perform a smear between two dataframes in python/pandas? [duplicate]

I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
babyface's user avatar
-2 votes
1 answer
62 views

Convert time into seconds [removing milliseconds]

I have a Pandas dataframe where, column name 'A' has date and time value (as of now it is of type string). Column A Column B 2024-07-11 13:09:37.466 PC2 2024-07-11 13:24:43.03 PC1 May 6 2024 22:49:...
FarahR's user avatar
  • 3
0 votes
0 answers
13 views

A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]

I am aware that this is a common issue, but I am confused why I am getting it here: train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0) I am doing exactly what the warning says: ...
Baron Yugovich's user avatar
1 vote
1 answer
31 views

how do you pick the max value of each row of certain columns in pandas

I have this data frame: df Node Interface Speed Band_In carrier 1-Jun 10-Jun Server1 wan1 100 80 ATT 80 30 Server1 wan2 ...
user1471980's user avatar
  • 10.5k
2 votes
2 answers
38 views

How to compare lists in two Pandas dataframes to get the common elements?

I want to compare lists from columns set_1 and set_2 in df_2 with ins column in df_1 to find all common elements. I've started doing it for one row and one column but I have no idea how to compare all ...
emor's user avatar
  • 157
-5 votes
0 answers
45 views

Pandas introducing lineterminators via to_csv without cause or reason [closed]

I've bug checked this thoroughly. I know that the bug is introduced when outputting to csv via the df.to_csv method. The method is randomly adding lineterminators which aren't called for in any way. I ...
Josh's user avatar
  • 1
0 votes
1 answer
20 views

RDKit PandasTools WriteSDF: RuntimeError: Bad pickle format: unexpected End-of-File while reading

I face the error: PandasTools.WriteSDF(pp, args.output_file, molColName='ID', properties=list(pp.columns)) File "/scratch/micromamba/envs/biotools_py39/lib/python3.9/site-packages/rdkit/Chem/...
M.Vu's user avatar
  • 460
-1 votes
0 answers
25 views

NameError Traceback (most recent call last) <ipython-input-3-9ec55f7a7976> in <module> : NameError: name 'books' is not defined

I am trying to plot the evolution of degree centrality over the books for some of the characters from Game of Thrones .I have a list evol that contains the computed degree centrality from all the ...
acharyabibash's user avatar
0 votes
1 answer
61 views

How do I handle merged cells in Excel using Pandas parse function?

I have an Excel file with merged columns and rows, and I want to read the excel file and parse it to convert it into a DataFrame. This is just a small example of what happened because the real data ...
RMB's user avatar
  • 1
0 votes
1 answer
51 views

Multi-level rolling average with missing values

I have data on frequencies (N), for combinations of [from, to, subset], and the month. Importantly, when N=0, the row is missing. N from to subset month ...
FooBar's user avatar
  • 16.3k
0 votes
0 answers
22 views

Pandas to_sql takes forever with Google Cloud SQL

I'm attempting to insert some data into Google Cloud SQL (running postgres) and it takes forever. It takes roughly 1 minute to insert 10 rows. I am not doing anything fancy, just initializing the ...
wizmer's user avatar
  • 930
2 votes
1 answer
46 views

How to convert timedelta to integer in pandas dataframe

I am trying to convert timedelta to integer. time = (pd.to_datetime(each_date2)-pd.to_datetime(each_date1)) pd.to_numeric(time, downcast='integer') time has following value: Timedelta('7 days 00:00:...
mona's user avatar
  • 91
0 votes
1 answer
45 views

String value from JSON response becomes an numeric value in Python pandas dataframe

I am using Python to pull data out of a REST API and store it in a SQL Database. Everything works fine except for one JSON value in the response. JSON Response [ { "pbxId": "...
Genius86's user avatar
0 votes
0 answers
45 views

Spark EOF Error (Parquet Read from S3)- Spark to Pandas conversion

I am reading close to 1 million rows stored in S3 as parquet files into a dataframe (900 MB size data in a bucket). Filtering the dataframe based on values and then later converting to a Pandas ...
Don Woodward's user avatar
1 vote
1 answer
39 views

How to use replace text using a regex in a Pandas dataframe [duplicate]

I have the following dataset: meste = pd.DataFrame({'a':['06/33','40/2','05/22']}) a 0 06/33 1 40/2 2 05/22 And I want to remove the leading 0s in the text (06/33 to 6/33 for example). I ...
Alexis's user avatar
  • 2,242
0 votes
1 answer
37 views

Subtract dataframe into subdataframes using pandas

I have large dataframe and I want to substract this dataframe into smaller dataframes based on two conditions. Below is the small a piece of the dataframe: | | id |outcome| | -----...
WilliamAshoti's user avatar
1 vote
2 answers
33 views

Group By Two Variables and then Create New Column which is the Value of One Variable Based on the Value of Another Variable in Python (pandas)

I can do this in R but have no idea how to do this in Python. I have data with sbj, num_item, visit, and height. I want to create baseline_height using pandas. Ex: sbj num_item visit height ...
NICE8x's user avatar
  • 11
1 vote
2 answers
55 views

Pandas dataframe groupby apply function with variable number of arguments

I have a pandas dataframe that looks like import pandas as pd data = { "Race_ID": [2,2,2,2,2,5,5,5,5,5,5], "Student_ID": [1,2,3,4,5,9,10,2,3,6,5], "theta": [8,9,2,...
Ishigami's user avatar
  • 239
1 vote
2 answers
57 views

How to find the number of rows within a group since a nonzero value occurred for a pandas dataframe?

I have a dataframe like so: ID value A 0 A 1 A 0 A 0 B 0 B 0 B 2 B 0 B 4 B 0 I want to add a column that counts the number of rows since a nonzero value occurred within the group (in this ...
mdrishan's user avatar
  • 499
0 votes
1 answer
35 views

Create a new column based on other columns for time series data in pandas

I have the following pandas dataframe with columns May, June, and July. Month June July Aug June a d g July b e h Aug c f i I want to create a several new columns with a 1 month forecast, 2 month ...
kmm2204's user avatar
2 votes
1 answer
41 views

how do you sort column names in Date in descending order in pandas

I have this DataFrame: Node Interface Speed Band_In carrier Date Server1 wan1 100 80 ATT 2024-05-09 Server1 wan1 100 50 ...
user1471980's user avatar
  • 10.5k
0 votes
1 answer
47 views

Keeping a running total of quantites while matching items and dates within a range

I'm attempting to match job lines to purchase orders on items within a date range while tracking the available quantity of the items. If I have three dataframes: joblines = pd.DataFrame({ '...
Warcupine's user avatar
  • 4,590
0 votes
2 answers
56 views

Vectorized way to check if a string is in a dataframe column (set of strings)?

I have a pandas dataframe df. This dataframe has a column to_filter. to_filter is either an empty set or a set of strings. This dataframe also has an integer column id. The id may not be unique. Given ...
roulette01's user avatar
  • 2,354
0 votes
0 answers
16 views

Concurrency Control Mechanism For Dataframe Processing In Django WebApp

I have django webapp where processing excel file data directly using pandas dataframe. now, I want to make this operations concurrency control for multiple request processing simultaneously. suggest ...
Enthu Learner's user avatar
0 votes
3 answers
38 views

How to use Python Pandas Groupby for multiple columns?

I have a dataframe that I am trying to do some calculations on and add a few columns. Here is an example of the input dataframe: df1: Index Type Product Late or On Time 0 A X ...
hobbsac's user avatar
  • 21
1 vote
1 answer
32 views

Apply sklearn logloss with rolling on pandas dataframe

My function call looks something like loss = log_loss(y_true=validate_d['y'], y_pred=validate_probs, sample_weight=validate_df['weight'], normalize=True) Is there any way to combine this with pandas ...
Baron Yugovich's user avatar
-2 votes
1 answer
48 views

trying to find out the logic of this page: approx ++ 100 results stored - and parsed with Python & BS4

trying to find out the logic that is behind this page: we have stored some results in the following db: https://www.raiffeisen.ch/rch/de/ueber-uns/raiffeisen-gruppe/organisation/raiffeisenbanken/...
zero's user avatar
  • 1,221
1 vote
1 answer
22 views

finding the minimum value of matched rows between two dataframes

I have Two data frames import pandas as pd exam_1 = pd.DataFrame({'user': ['A', 'B', 'C'], 'marks': [10, 50, 40]}) exam_2 = pd.DataFrame({'user': ['A', 'C', 'D'], ...
Naga's user avatar
  • 301
0 votes
2 answers
71 views

What is the best practice to calculate global frequency of list of elements with exact orders in python within multiple pandas dataframe?

Let's say I have the following datafarme df1 corresponding to user1: +-------------------+-------+--------+-------+-------+----------+----------------+ | Models | MAE | MSE | RMSE | ...
Mario's user avatar
  • 1,831
-2 votes
1 answer
60 views

Can I drop a row twice? [closed]

I want to drop rows with outliers in two different columns, and some of the outliers are present in both columns, so after I drop them in the first column, it drops them fine, but when I try to drop ...
Saad Kamboh's user avatar
0 votes
1 answer
30 views

Efficient calculation of volatility using EWMA

I am trying to calculate the volatility using EWMA (Exponentially Weighted Moving Average). Here is the function I developed: def ewm_std(x, param=0.99): n = len(x) coefs = param ** np.arange(...
NCall's user avatar
  • 121

15 30 50 per page
1
2 3 4 5
5762