Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

65,609 questions with no upvoted or accepted answers
14 votes
2 answers
1k views

OperationalError: (sqlite3.OperationalError) too many SQL variables, while using SQL with dataframes

I have a pandas dataframe as below. activity User_Id \ 0 VIEWED MOVIE 158d292ec18a49 1 VIEWED MOVIE 158d292ec18a49 2 VIEWED MOVIE 158d292ec18a49 3 VIEWED MOVIE ...
Sarang Manjrekar's user avatar
12 votes
1 answer
1k views

Pandas HD5-query, where expression fails

I want to query a HDF5-file. I do df.to_hdf(pfad,'df', format='table') to write the dataframe on disc. To read I use hdf = pandas.HDFStore(pfad) I have a list that contains numpy.datetime64 ...
user3276418's user avatar
  • 1,797
11 votes
0 answers
617 views

Does Pandas use hashing for a single-indexed dataframe and binary searching for a multi-indexed dataframe?

I have always been under impression that Pandas uses hashing when indexing the rows in a dataframe such that the operations like df.loc[some_label] is O(1). However, I just realized today that this is ...
victorx's user avatar
  • 3,467
11 votes
0 answers
2k views

Memory usage during and after drop_duplicates()

I am working with a data frame that takes up roughly 2 Gb of memory (according to htop) with dimensions (6287475,19). The data frame is heterogeneous in data type, which probably does not matter. ...
isalteverything's user avatar
10 votes
3 answers
3k views

Python google-cloud-bigquery Parquet column 'Date' has type INT64 which does not match the target cpp_type INT32

I am trying to upload a Dataframe to a BigQuery Table using client.load_table_from_dataframe. And I am also supplying a schema using the job_config = bigquery.LoadJobConfig( schema=[xxxxxx,xxxxxx,...
Siesta's user avatar
  • 451
10 votes
1 answer
3k views

XLDateAmbiguous workaround

Reading Excel files into Python often means tripping over the Excel leap year issue. This is described in many posts, but none offer a convenient solution. So this is what I'm asking here. With code ...
dmvianna's user avatar
  • 15.6k
10 votes
3 answers
3k views

Changing style of pandas.DataFrame: Permanently?

When I change the style of a pandas.DataFrame, for instance like so # color these columns color_columns = ['roi', 'percent_of_ath'] (portfolio_df .style ...
Ugur's user avatar
  • 2,004
9 votes
0 answers
5k views

ArrowInvalid: GetFileInfo() yielded path which is outside base dir parquet

I have a parquet dataset stored in my S3 bucket with multiple partition files. I want to read it into my pandas dataframe, but am getting this ArrowInvalid error when I didn't before. Occasionally, ...
Wassadamo's user avatar
  • 1,306
9 votes
1 answer
2k views

Save sparse pandas dataframe to different file types

I'm working with IoT data from one source which send's tons of GB of sparse data from sensor readings. To make snapshots for analysis I try to export them to a small file and read it later as sparse ...
Coffeemug13's user avatar
9 votes
1 answer
5k views

Why does pandas `loc` throw `KeyError` with column name?

I have a data frame that is given this initial construct: df_data = pd.DataFrame(columns=['name','date','c1','c2']).set_index(['name','date']) I then have code to fill this frame from a data base. ...
Brick's user avatar
  • 4,157
9 votes
2 answers
1k views

Tensorflow DNNclassifier: error wile training (numpy.ndarray has no attribute index)

I am trying to train a DNNClassifier in tensorflow Here is my code train_input_fn = tf.estimator.inputs.pandas_input_fn( x=X_train, y=y_train, batch_size=1000, shuffle = True ) ...
Cybercop's user avatar
  • 8,660
9 votes
3 answers
128k views

Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

I have a dataframe news_count. Here are its column names, from the output of news_count.columns.values: [('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '') ('day', '') ('...
Arvinth Kumar's user avatar
9 votes
1 answer
4k views

How can I merge two pandas DataFrames based on a function instead of just where values are equal?

I have two DataFrames that each have a column for firstname. I'd like to merge the columns on those strings, but on the Levenshtein distance as opposed to just where the strings are equal. I'm ...
Travis's user avatar
  • 695
9 votes
1 answer
3k views

Stratified Cross validation of timeseries data

I want to do a time series cross validation based on group (grp column). In the below sample data, Temperature is my target variable import numpy as np import pandas as pd timeS=pd.date_range(start='...
XXavier's user avatar
  • 1,216
8 votes
2 answers
5k views

FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:8: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do ...
PlutoSenthil's user avatar

15 30 50 per page
1
2 3 4 5
4374