All Questions
Tagged with pandas python-polars
177
questions
2
votes
0
answers
55
views
Reshape issue on a pandas dataframe which was converted from polars
Update:
was able to work around the issue by converting it to numpy() first and then do a reshape.
Before edit:
I have a python program where I am using polars dataframe for reading from the file and ...
7
votes
1
answer
137
views
Translate Pandas groupby plus resample to Polars in Python
I have this code that generates a toy DataFrame (production df is much complex):
import polars as pl
import numpy as np
import pandas as pd
def create_timeseries_df(num_rows):
date_rng = pd....
0
votes
3
answers
97
views
How to replace a specific field inside a JSON string in each row of a csv file in Python with a random value?
I have a CSV file named input.csv with the following columns:
row_num
start_date_time
id
json_message
120
2024-02-02 00:01:00.001+00
1020240202450
{'amount': 10000, 'currency': 'NZD','seqnbr': 161 }
...
1
vote
0
answers
70
views
Utilize polars library to compute Levenshtein distance
I have sequences in the form of tuples in a column of a pandas DataFrame. A sample of my DataFrame is the following:
id sequence
33268 [(59, 2), (91, 2), (112, 2), (126, 2), (0, 3),...
44360 ...
4
votes
3
answers
125
views
How to create a new column within a polars DataFrame that is equal to a list?
I am currently trying to create a new column within a polars dataframe (df). Within my df, there are many many rows, and within this new column I only want my existing list to populate wherever ...
1
vote
1
answer
75
views
Polars read AWS RDS DB with a table containing column of type jsonb
I'm trying to read AWS RDS DB using the following method through polars:
df_rds_table_test = pl.read_database_uri(sql_query, uri)
Postgres DB contains a table with column name 'json_message' of type ...
2
votes
2
answers
82
views
Pandas vs. Polars: mean() function
There's code which counts mean value of a column
pd.DataFrame({'id': ['A', 'A', 'B', 'B', 'B', 'B'], 'a': [1, 2, 3, 4, float('inf'), float('inf')]}).groupby('id').mean() for Pandas. The result is:
...
0
votes
1
answer
70
views
How to concat multiple lazyframe created from numpy ndarray and datetime.datetime in Polars
I wish to convert this snipper of pandas code into polars code to learn polars and see if I can benefit w.r.t speed performances:
df_list = []
for datum in data:
df = pd.DataFrame()
temp_data =...
1
vote
1
answer
76
views
Can I optimize this cpu-bound pandas code with polars?
I have this pandas code:
def last_non_null(s):
return s.dropna().iloc[-1] if not s.dropna().empty else np.nan
def merge_rows_of_final_df(df_final):
# Group by columns A, B, and C
cols = ['...
1
vote
0
answers
98
views
Polars Downcast the Datatype without Precision Loss
I've observed that when I use polars.Expr.shrink_dtype to optimize the datatype of columns, it often alters the float values slightly.
For instance, a float64 value of 2.7 becomes 2.7000001 when ...
0
votes
0
answers
46
views
Dataframe larger than memory
In polars, pandas, or another dataframe library, is it possible to have a dataframe with data larger than RAM, as you can in DuckDB?
My current solution is to use the polars streaming API, polars....
3
votes
1
answer
117
views
Python - Rolling Indexing in Polars library?
I'd like to ask around if anyone knows how to do rolling indexing in polars?
I have personally tried a few solutions which did not work for me (I'll show them below):
What I'd like to do: Indexing the ...
1
vote
2
answers
180
views
Polars read_parquet method converts the original date value to a different value if the date is invalid
I'm reading a parquet file from S3 bucket using polars and below is the code that i use:
df = pl.read_parquet(parquet_file_name, storage_options=storage_options, hive_partitioning=False)
In the S3 ...
1
vote
0
answers
42
views
How can I expand a column of lists into the neighbouring columns (using Polars)? [duplicate]
Say I have the DataFrame:
>>> df = polars.DataFrame({"a": [0, 1, 2], "b": [1, 2, 3], "c": [[1, 2, 3], [2], [5, 0]]})
>>> df
shape: (3, 3)
┌─────┬─────┬──...
2
votes
3
answers
377
views
Polars compare two dataframes - is there a way to fail immediately on first mismatch
I'm using polars.testing assert_frame_equal method to compare two sorted dataframes containing same columns and below is my code:
assert_frame_equal(src_df, tgt_df, check_dtype=False, check_row_order=...