Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deletion Job failed with Job status: FORGET_PARTIALLY_FAILED #373

Open
abhra-gupta-trakstar opened this issue Jul 17, 2023 · 4 comments
Open

Comments

@abhra-gupta-trakstar
Copy link

abhra-gupta-trakstar commented Jul 17, 2023

Recently my Deletion Jobs are consistently failing with Job Status as FORGET_PARTIALLY_FAILED
I checked the troubleshooting docs and opened the ObjectUpdateFailed events.
The Error mentioned is "Apache Arrow processing error: Casting from timestamp[ns] to timestamp[us] would lose data:"
There are two such events in the Deletion Job
Screenshot 2023-07-17 at 2 47 21 PM
I looked up what this error might mean in the context of s3-find-and-forget and found an Open issue
Any suggestions would be greatly appreciated!

Full Error logs:

{
  "EventData": {
    "Message": {
      "DeleteOldVersions": true,
      "Format": "parquet",
      "Columns": [
        {
          "Column": "id",
          "Type": "Simple"
        }
      ],
      "IgnoreObjectNotFoundExceptions": true,
      "Object": "s3://<redacted>/processed/ts-hire/recruiterbox/candidate_candidate/ingest_year=2023/ingest_month=07/ingest_day=13/run-1689233710633-part-block-0-0-r-00008-snappy.parquet",
      "Manifest": "s3://s3f2-manifestsstack-<redacted>/manifests/2d62d0e2-2369-452d-b435-2d710a553618/ProcessedCandidateCandidateDataMapper/manifest.json",
      "JobId": "2d62d0e2-2369-452d-b435-2d710a553618",
      "RoleArn": "arn:aws:iam::<redacted>:role/S3F2DataAccessRole"
    },
    "Error": "Apache Arrow processing error: Casting from timestamp[ns] to timestamp[us] would lose data: 6493936287134515200"
  },
  "EmitterId": "ECSTask_f4659ba9a1124f04a8d3fd443068d795",
  "Sk": "1689579464450#06a03858-1071-410c-a8cd-1d2c8c3d517a",
  "Id": "2d62d0e2-2369-452d-b435-2d710a553618",
  "EventName": "ObjectUpdateFailed",
  "Type": "JobEvent",
  "CreatedAt": 1689579464
}
@abhra-gupta-trakstar
Copy link
Author

@matteofigus Is this something that is on your end ? Can there be any fix that I can apply on my end till the time a solution is found? Thanks!

@matteofigus
Copy link
Member

Hi @abhra-gupta-trakstar thanks for submitting an issue. As mentioned in the linked issue, another customer solved the issue by configuring the parquet handler to coerce timestamps in a particular way. I think a clean way to modify the solution would be to add a global setting to the solution to allow customers needing this behaviour to configure this easily.

I think the best on my side would be to identify a mechanism to reproduce the issue, so that we could test and publish the change.

I cannot offer a timeline for when we'll be able to prioritise this, but I guess in the meanwhile you could test the proposed change (I recommend a non-production environment for that) and see if that solves the issue. There are instructions here on how to make code changes and release them (in particular, I guess you want to look into the make redeploy-containers part as I think the changes will be on the code that runs on Fargate and deployed as docker container).

@abhra-gupta-trakstar
Copy link
Author

abhra-gupta-trakstar commented Aug 14, 2023

Thank you for the advice @matteofigus ! It worked the way you suggested. I released the change to production today after testing in QA.
Before I close this issue, I would like to get your general advice on how to maintain the repository going forward.

For eg. I would like to deploy the updated CFn template when a 0.63 version is released. How do I ensure the code we have changed in the backend py scripts persists while the CFn stack is updated to >=0.63 ?

@matteofigus
Copy link
Member

That's good news. We are working on a fix that we plan to publish in a later release, so hopefully you should just be able to upgrade eventually. Let's keep this thread open until then, I'll update you when we release this.

Thanks for being patient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants