0

The items I have in cosmos db have large unstructured data. A part of those items contains information which could look like below:

object_results": {
            "read_file": {
                "step_name": "ReadFile",
                "step_id": "read_file",
                "start_time": "2024-05-23 16:33:56",
                "end_time": null,
                "status": "success"
            },
            "provide_note": {
                "error_message": "Missing category_name for note_key daily_dashboard",
                "step_name": "ProvideNote",
                "step_id": "provide_note",
                "start_time": "2024-05-23 16:35:27",
                "end_time": null,
                "status": "failed"
            }
        }

Sometimes the number of items inside the object_results dictionary could vary dynamically. It can have more child items with different key names inside it as follows:

"object_results": {
            "display_result": {
                "step_name": "DisplayResult",
                "step_id": "display_result",
                "start_time": "2024-05-15 18:54:27",
                "end_time": null,
                "status": "failed"
            },
            "provide_note": {
                "error_message": "Missing category_name for note_key daily_dashboard",
                "step_name": "ProvideNote",
                "step_id": "provide_note",
                "start_time": "2024-05-15 18:58:14",
                "end_time": null,
                "status": "success"
            },
            "get_response": {
                "error_message": "Missing response_subject for response_key consumer_complaints",
                "step_name": "GetResponse",
                "step_id": "get_response",
                "start_time": "2024-05-15 20:13:45",
                "end_time": null,
                "status": "failed"
            }
        }

Now, what I'm trying to extract from this data is all the data where the corresponding status is 'failed'. But the problem is I'm having to do this only by writing each of the WHERE conditions manually as follows:

select * from response_results res
where res.object_results.provide_note.status = 'failed'
or res.object_results.display_result.status = 'failed'
or res.object_results.get_response.status = 'failed'

Is there a way to make the WHERE conditions generic so that I can fetch all the data containing a failed status irrespective of which key it belongs to?

2
  • 1
    To be honest, you've introduced a data model anti-pattern, using a property name to define a particular object type. Also, you have a variable number of subdocuments. The real solution is to re-model this properly with either: an array of objects, and moving the object type (e.g. display_result) to a property value of some type of objectType property; or move all of your results to have each one being its own document (with some common id associating them together). Note that having an infinite number of sub-docs or array elements I another anti pattern (unbounded array) Commented Jul 10 at 14:51
  • 1
    tl;dr there's no simple way to solve your problem. Your WHERE clause will just keep growing in complexity (and cost to execute) as you add more subdocuments, without using an array (or separate docs) Commented Jul 10 at 14:52

1 Answer 1

1

You've introduced a data model anti-pattern, using a property name to define a particular object type. Also, you have a variable number of subdocuments. The real solution is to re-model this properly with either:

  • an array of objects, and moving the object type (e.g. display_result) to a property value of some type of objectType property; or
  • move all of your results to have each one being its own document (with some common id associating them together). Note that having an infinite number of sub-docs or array elements I another anti pattern (unbounded array)

Turns out, you already store the object type inside of step_name. With that in mind, moving to an array would look something like:

{
    "object_results": [
        {
            "step_name": "ReadFile",
            "step_id": "read_file",
            "start_time": "2024-05-23 16:33:56",
            "end_time": null,
            "status": "success"
        },
        {
            "error_message": "Missing category_name for note_key daily_dashboard",
            "step_name": "ProvideNote",
            "step_id": "provide_note",
            "start_time": "2024-05-23 16:35:27",
            "end_time": null,
            "status": "failed"
        }
    ]
}

Now your search is simplified to any number of objects:

SELECT *
FROM res
WHERE ARRAY_CONTAINS(res.object_results, {status:"failed"},true)

Likewise, if you stored each object result as a separate document instead of in an array, it would be something like:

SELECT *
FROM res
WHERE res.object_results.status="failed"

Or if your separate documents had everything at top level:

SELECT *
FROM res
WHERE res.status="failed"
1
  • Immense thanks for your comments and suggestions. I'm totally new to Cosmos DB but I'm gonna dig deep into these suggestions. Looks like a good learning curve ahead for me! Thanks.
    – LearneR
    Commented Jul 10 at 15:12

Not the answer you're looking for? Browse other questions tagged or ask your own question.