Questions tagged [pandas]
Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.
pandas
65,609
questions with no upvoted or accepted answers
14
votes
2
answers
1k
views
OperationalError: (sqlite3.OperationalError) too many SQL variables, while using SQL with dataframes
I have a pandas dataframe as below.
activity User_Id \
0 VIEWED MOVIE 158d292ec18a49
1 VIEWED MOVIE 158d292ec18a49
2 VIEWED MOVIE 158d292ec18a49
3 VIEWED MOVIE ...
12
votes
1
answer
1k
views
Pandas HD5-query, where expression fails
I want to query a HDF5-file. I do
df.to_hdf(pfad,'df', format='table')
to write the dataframe on disc.
To read I use
hdf = pandas.HDFStore(pfad)
I have a list that contains numpy.datetime64 ...
11
votes
0
answers
617
views
Does Pandas use hashing for a single-indexed dataframe and binary searching for a multi-indexed dataframe?
I have always been under impression that Pandas uses hashing when indexing the rows in a dataframe such that the operations like df.loc[some_label] is O(1).
However, I just realized today that this is ...
11
votes
0
answers
2k
views
Memory usage during and after drop_duplicates()
I am working with a data frame that takes up roughly 2 Gb of memory (according to htop) with dimensions (6287475,19). The data frame is heterogeneous in data type, which probably does not matter. ...
10
votes
3
answers
3k
views
Python google-cloud-bigquery Parquet column 'Date' has type INT64 which does not match the target cpp_type INT32
I am trying to upload a Dataframe to a BigQuery Table using client.load_table_from_dataframe. And I am also supplying a schema using the
job_config = bigquery.LoadJobConfig(
schema=[xxxxxx,xxxxxx,...
10
votes
1
answer
3k
views
XLDateAmbiguous workaround
Reading Excel files into Python often means tripping over the Excel leap year issue. This is described in many posts, but none offer a convenient solution. So this is what I'm asking here. With code ...
10
votes
3
answers
3k
views
Changing style of pandas.DataFrame: Permanently?
When I change the style of a pandas.DataFrame, for instance like so
# color these columns
color_columns = ['roi', 'percent_of_ath']
(portfolio_df
.style
...
9
votes
0
answers
5k
views
ArrowInvalid: GetFileInfo() yielded path which is outside base dir parquet
I have a parquet dataset stored in my S3 bucket with multiple partition files. I want to read it into my pandas dataframe, but am getting this ArrowInvalid error when I didn't before.
Occasionally, ...
9
votes
1
answer
2k
views
Save sparse pandas dataframe to different file types
I'm working with IoT data from one source which send's tons of GB of sparse data from sensor readings. To make snapshots for analysis I try to export them to a small file and read it later as sparse ...
9
votes
1
answer
5k
views
Why does pandas `loc` throw `KeyError` with column name?
I have a data frame that is given this initial construct:
df_data = pd.DataFrame(columns=['name','date','c1','c2']).set_index(['name','date'])
I then have code to fill this frame from a data base. ...
9
votes
2
answers
1k
views
Tensorflow DNNclassifier: error wile training (numpy.ndarray has no attribute index)
I am trying to train a DNNClassifier in tensorflow
Here is my code
train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=X_train,
y=y_train,
batch_size=1000,
shuffle = True
)
...
9
votes
3
answers
128k
views
Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe
I have a dataframe news_count. Here are its column names, from the output of news_count.columns.values:
[('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '')
('day', '') ('...
9
votes
1
answer
4k
views
How can I merge two pandas DataFrames based on a function instead of just where values are equal?
I have two DataFrames that each have a column for firstname. I'd like to merge the columns on those strings, but on the Levenshtein distance as opposed to just where the strings are equal.
I'm ...
9
votes
1
answer
3k
views
Stratified Cross validation of timeseries data
I want to do a time series cross validation based on group (grp column). In the below sample data, Temperature is my target variable
import numpy as np
import pandas as pd
timeS=pd.date_range(start='...
8
votes
2
answers
5k
views
FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:8: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do ...