Questions tagged [pandas]
Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.
pandas
288,143
questions
-1
votes
0
answers
44
views
How to split a info in a single row in excel into columns using python [duplicate]
I have read a CSV file using pd.read_csv()
I am trying to clean up this data but it is proving a bit difficult.
Essentially all of the information is in a single column and row 1 and I need to split ...
2
votes
3
answers
57
views
How to convert JSONL to parquet efficiently?
Given a jsonl file like this:
{"abc1": "hello world", "foo2": "foo bar"}
{"foo2": "bar bar blah", "foo3": "blah foo"}
...
0
votes
0
answers
29
views
Python threads 'starved' by pandas operations
I am creating a UI application with Qt in Python. It performs operations on pandas DataFrames in a separate threading.Thread to keep the UI responsive; no individual pandas instruction takes noticable ...
0
votes
0
answers
32
views
Minimize RAM usage of pandas operations in python
I have a python function using pandas that does operations on some dataframes. This python functions currently consumes a lot of RAM. I have tried to minimize RAM usage as much as possible but ...
-1
votes
1
answer
37
views
How do I perform a smear between two dataframes in python/pandas? [duplicate]
I have two dataframes and I need to perform a smear (if that is what it's generally called). Basically the first one is smaller (5 million rows) and the other is 40 million rows. I want to add the ...
-2
votes
1
answer
62
views
Convert time into seconds [removing milliseconds]
I have a Pandas dataframe where, column name 'A' has date and time value (as of now it is of type string).
Column A
Column B
2024-07-11 13:09:37.466
PC2
2024-07-11 13:24:43.03
PC1
May 6 2024 22:49:...
0
votes
0
answers
13
views
A value is trying to be set on a copy of a slice from a DataFrame while using loc [duplicate]
I am aware that this is a common issue, but I am confused why I am getting it here:
train_df.loc[:,'decision'] = np.where(train_probs[:,1]>cutoff, 1, 0)
I am doing exactly what the warning says:
...
1
vote
1
answer
31
views
how do you pick the max value of each row of certain columns in pandas
I have this data frame:
df
Node Interface Speed Band_In carrier 1-Jun 10-Jun
Server1 wan1 100 80 ATT 80 30
Server1 wan2 ...
2
votes
2
answers
38
views
How to compare lists in two Pandas dataframes to get the common elements?
I want to compare lists from columns set_1 and set_2 in df_2 with ins column in df_1 to find all common elements.
I've started doing it for one row and one column but I have no idea how to compare all ...
-5
votes
0
answers
46
views
Pandas introducing lineterminators via to_csv without cause or reason [closed]
I've bug checked this thoroughly. I know that the bug is introduced when outputting to csv via the df.to_csv method.
The method is randomly adding lineterminators which aren't called for in any way.
I ...
0
votes
1
answer
20
views
RDKit PandasTools WriteSDF: RuntimeError: Bad pickle format: unexpected End-of-File while reading
I face the error:
PandasTools.WriteSDF(pp, args.output_file, molColName='ID', properties=list(pp.columns))
File "/scratch/micromamba/envs/biotools_py39/lib/python3.9/site-packages/rdkit/Chem/...
-1
votes
0
answers
25
views
NameError Traceback (most recent call last) <ipython-input-3-9ec55f7a7976> in <module> : NameError: name 'books' is not defined
I am trying to plot the evolution of degree centrality over the books for some of the characters from Game of Thrones .I have a list evol that contains the computed degree centrality from all the ...
0
votes
1
answer
61
views
How do I handle merged cells in Excel using Pandas parse function?
I have an Excel file with merged columns and rows, and I want to read the excel file and parse it to convert it into a DataFrame.
This is just a small example of what happened because the real data ...
0
votes
1
answer
51
views
Multi-level rolling average with missing values
I have data on frequencies (N), for combinations of [from, to, subset], and the month. Importantly, when N=0, the row is missing.
N from to subset
month ...
0
votes
0
answers
22
views
Pandas to_sql takes forever with Google Cloud SQL
I'm attempting to insert some data into Google Cloud SQL (running postgres) and it takes forever.
It takes roughly 1 minute to insert 10 rows.
I am not doing anything fancy, just initializing the ...