Questions tagged [pandas]

Ask Question

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

65,609 questions with no upvoted or accepted answers

14 votes

2 answers

1k views

OperationalError: (sqlite3.OperationalError) too many SQL variables, while using SQL with dataframes

I have a pandas dataframe as below. activity User_Id \ 0 VIEWED MOVIE 158d292ec18a49 1 VIEWED MOVIE 158d292ec18a49 2 VIEWED MOVIE 158d292ec18a49 3 VIEWED MOVIE ...

Sarang Manjrekar

1,971

asked May 24, 2018 at 10:07

12 votes

1 answer

1k views

Pandas HD5-query, where expression fails

I want to query a HDF5-file. I do df.to_hdf(pfad,'df', format='table') to write the dataframe on disc. To read I use hdf = pandas.HDFStore(pfad) I have a list that contains numpy.datetime64 ...

user3276418

1,797

asked Nov 27, 2014 at 11:57

11 votes

0 answers

617 views

Does Pandas use hashing for a single-indexed dataframe and binary searching for a multi-indexed dataframe?

I have always been under impression that Pandas uses hashing when indexing the rows in a dataframe such that the operations like df.loc[some_label] is O(1). However, I just realized today that this is ...

victorx

3,467

asked Mar 3, 2020 at 3:38

11 votes

0 answers

2k views

Memory usage during and after drop_duplicates()

I am working with a data frame that takes up roughly 2 Gb of memory (according to htop) with dimensions (6287475,19). The data frame is heterogeneous in data type, which probably does not matter. ...

isalteverything

asked Oct 9, 2014 at 23:58

10 votes

3 answers

3k views

Python google-cloud-bigquery Parquet column 'Date' has type INT64 which does not match the target cpp_type INT32

I am trying to upload a Dataframe to a BigQuery Table using client.load_table_from_dataframe. And I am also supplying a schema using the job_config = bigquery.LoadJobConfig( schema=[xxxxxx,xxxxxx,...

Siesta

asked Oct 12, 2020 at 11:41

10 votes

1 answer

3k views

XLDateAmbiguous workaround

Reading Excel files into Python often means tripping over the Excel leap year issue. This is described in many posts, but none offer a convenient solution. So this is what I'm asking here. With code ...

dmvianna

15.6k

asked Dec 5, 2013 at 5:06

10 votes

3 answers

3k views

Changing style of pandas.DataFrame: Permanently?

When I change the style of a pandas.DataFrame, for instance like so # color these columns color_columns = ['roi', 'percent_of_ath'] (portfolio_df .style ...

Ugur

2,004

asked May 16, 2019 at 21:14

9 votes

0 answers

5k views

ArrowInvalid: GetFileInfo() yielded path which is outside base dir parquet

I have a parquet dataset stored in my S3 bucket with multiple partition files. I want to read it into my pandas dataframe, but am getting this ArrowInvalid error when I didn't before. Occasionally, ...

Wassadamo

1,306

asked Apr 28, 2022 at 18:09

9 votes

1 answer

2k views

Save sparse pandas dataframe to different file types

I'm working with IoT data from one source which send's tons of GB of sparse data from sensor readings. To make snapshots for analysis I try to export them to a small file and read it later as sparse ...

Coffeemug13

asked Mar 31, 2020 at 17:18

9 votes

1 answer

5k views

Why does pandas `loc` throw `KeyError` with column name?

I have a data frame that is given this initial construct: df_data = pd.DataFrame(columns=['name','date','c1','c2']).set_index(['name','date']) I then have code to fill this frame from a data base. ...

Brick

4,157

asked Nov 25, 2019 at 19:41

9 votes

2 answers

1k views

Tensorflow DNNclassifier: error wile training (numpy.ndarray has no attribute index)

I am trying to train a DNNClassifier in tensorflow Here is my code train_input_fn = tf.estimator.inputs.pandas_input_fn( x=X_train, y=y_train, batch_size=1000, shuffle = True ) ...

Cybercop

8,660

asked Apr 4, 2018 at 11:29

9 votes

3 answers

128k views

Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

I have a dataframe news_count. Here are its column names, from the output of news_count.columns.values: [('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '') ('day', '') ('...

Arvinth Kumar

asked Oct 2, 2017 at 22:31

9 votes

1 answer

4k views

How can I merge two pandas DataFrames based on a function instead of just where values are equal?

I have two DataFrames that each have a column for firstname. I'd like to merge the columns on those strings, but on the Levenshtein distance as opposed to just where the strings are equal. I'm ...

Travis

asked Aug 27, 2015 at 13:05

9 votes

1 answer

3k views

Stratified Cross validation of timeseries data

I want to do a time series cross validation based on group (grp column). In the below sample data, Temperature is my target variable import numpy as np import pandas as pd timeS=pd.date_range(start='...

XXavier

1,216

asked Oct 11, 2017 at 22:58

8 votes

2 answers

5k views

FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:8: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do ...

PlutoSenthil

asked Oct 27, 2021 at 8:36

15 30 50 per page

2 3 4 5

…

4374 Next

Collectives™ on Stack Overflow

Questions tagged [pandas]

OperationalError: (sqlite3.OperationalError) too many SQL variables, while using SQL with dataframes

Pandas HD5-query, where expression fails

Does Pandas use hashing for a single-indexed dataframe and binary searching for a multi-indexed dataframe?

Memory usage during and after drop_duplicates()

Python google-cloud-bigquery Parquet column 'Date' has type INT64 which does not match the target cpp_type INT32

XLDateAmbiguous workaround

Changing style of pandas.DataFrame: Permanently?

ArrowInvalid: GetFileInfo() yielded path which is outside base dir parquet

Save sparse pandas dataframe to different file types

Why does pandas `loc` throw `KeyError` with column name?

Tensorflow DNNclassifier: error wile training (numpy.ndarray has no attribute index)

Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

How can I merge two pandas DataFrames based on a function instead of just where values are equal?

Stratified Cross validation of timeseries data

FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [pandas]

Related Tags