Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to retrieve object: only valid on seekable files #401

Open
abhra-gupta-trakstar opened this issue Feb 26, 2024 · 6 comments
Open

Comments

@abhra-gupta-trakstar
Copy link

Hi @matteofigus , with the recent upgrade to v0.66, we are facing issues with Deletion Jobs which are failing with FORGET_PARTIALLY_FAILED error. Upon looking into the logs, the ObjectUpdateFailed error is "Unable to retrieve object: only valid on seekable files"

Do you have any possible leads on what could cause this error?
We are using the fix in backend/ecs_tasks/delete_files/parquet_handler.py as mentioned here
Any advices?

@matteofigus
Copy link
Member

Hi, this seems to be related to a corrupted parquet file? Have you managed to trace back which S3 object is failing specifically, and tried to open it to verify it's ok?

@abhra-gupta-trakstar
Copy link
Author

abhra-gupta-trakstar commented Feb 27, 2024

We were able to trace back the parquet file in s3 and the file doesn't look corrupted
Screenshot 2024-02-27 at 2 17 08 PM
Screenshot 2024-02-27 at 2 29 11 PM

308099014-23beda7a-90db-4c23-adbe-f6b9a9375607

@matteofigus
Copy link
Member

matteofigus commented Feb 27, 2024

What version were you using before updating to v0.66?
Just to confirm, you didn't notice this behaviour before updating, is that right?

@abhra-gupta-trakstar
Copy link
Author

We were using v0.64 before updating to v0.66
Correct. I can confirm we started noticing this behaviour after update. No such incidents, all deletion jobs were running successfully when we're on v0.64

@matteofigus
Copy link
Member

matteofigus commented Feb 27, 2024

That's strange because we only did work on improving performance on JSON since 0.64. I see in the filename that this issue relates to a object that were recently created. Can you confirm you didn't change anything on the ingestion mechanism, perhaps using different versions of pandas or similar libraries to produce the parquet objects?

@abhra-gupta-trakstar
Copy link
Author

Hi @matteofigus, I went back and confirmed with the team there are no ingestion changes we did recently, the only change was upgrading s3f2 to 0.66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants