The future of clinical data: building an intellige...

dharmez

A Clinical Intelligence Engine architecture guide

This blog documents a Clinical Intelligence Engine architecture template to integrate data generated by Augmedix into a data lake deployed by a healthcare enterprise based on the Google Cloud services. Clinical Intelligence Engine is a Google Cloud-based data analytics solution that proactively generates clinical intelligence and care recommendations for population care. The solution enables health systems to take action to improve their performance in terms of cost and quality. It can provide insight and intelligence, making it an essential tool for health systems pursuing clinical transformation.

Augmedix (Nasdaq: AUGX) delivers industry-leading, ambient medical documentation and data solutions to healthcare systems, physician practices, hospitals, and telemedicine practitioners.

Augmedix is on a mission to help clinicians and patients form a human connection by seamlessly integrating its technology at the point of care. Augmedix’s proprietary platform digitizes natural clinician-patient conversations, converted into comprehensive medical notes and structured data in real time. Augmedix platform uses the latest LLM technology by Google (MedLM, Gemini) and Speech-to-Text to generate accurate and timely medical notes added to the electronic health record (EHR).

The document guides enterprise architects who are experienced with data lake architecture and are familiar with relevant Google Cloud services. The architecture described in this document applies to organizations where a data lake already exists and are looking to integrate it with additional data sources or to organizations deploying a data lake for the first time. The proposed architecture is flexible and can integrate with more data sources.

Figure 1: Reference Architecture for connecting Augmedix to healthcare data lake

Components

Details and explanation about the components described in Reference Architecture (Figure 1):

The Augmedix Clinical Notes Engine is an ambient documentation engine that uses AI and LLM models to transform the audio recording of the clinician-patient encounter into a diarized transcript and then use the transcript to generate the encounter notes. The notes engine uses, among other models, ASR service from Google, MedLM, and Gemini.
Encounter_topics: A Pub/Sub service deployed by the enterprise. Augmedix uses encounter_topics to publish encounter data. The format of the encounter message is available here. As you can see, the message contains a section with encounter details (patient details, location, and physicians), the raw transcript (section 'transcriptSummary'), structured data (section 'noteContent') with a list of problems, and the final note (section 'noteSummary') with note sub-sections.
Encounter_service: Deployed on Google Kubernetes Engine (GKE), this service subscribes to a Pub/Sub topic, encounter_topic, processes the data payload published in the message, and imports it to the Healthcare Data Platform.
Healthcare Data Platform is a data lake architecture designed for healthcare enterprises. It uses the following Google Cloud services:
- Healthcare Data Engine (HDEv2) aggregates and standardizes the healthcare data in a FHIR R4 format. It comes with the harmonization and reconciliation API to process healthcare data ingested from siloed healthcare systems. It creates a longitudinal patient record (LPR) in FHIR format. HDEv2 is in private preview. You can contact your Google Cloud account team to get early access to HDEv2.
- Healthcare FHIR API (FHIR Datastore) stores LPR in the FHIR datastore and supports RESTful APIs for real-time read-only access to the harmonized FHIR data. It supports bulk export of FHIR data to BigQuery. You can stream FHIR resource changes to BigQuery.
- BigQuery is the healthcare data lake that combines Augmedix data with the LPR loaded from the FHIR Datastore. It is a serverless enterprise data warehouse. It supports analytical processing and machine learning on structured, semi-structured, and unstructured data. It enables data scientists and data analysts to build models for business intelligence and predictive analytics.
- Cloud Spanner is a managed database that stores encounter data ingested from the Augmedix Notes engine in a set of tables. Table schema may vary between organizations based on their needs, but it should probably have the following:

Name	Type	Description	Location in Pub/Sub message
_noteID	STRING	Provided by Augmedix and used for querying additional data about a note	/id
_timeEncounterStart	TIMESTAMP	Recording start time	/startTime
_timeEncounterEnd	TIMESTAMP	Recording end time	/endTime
_encounterID	STRING	ID provided by the Electronic Medical Record (EMR)	/encounterId
_transcriptSummary	JSON	Notes generated by Augmedix	/transcriptSummary
_structuredData	JSON	Structured data with History of Present Illness (HPI) and complaints	/noteContent/ComplaintsSelections
_noteFull	STRING	Complete note with all subsections	/noteSummary/fullNote
_noteHPI	STRING	History of Present Illness (HPI) section of the note	/noteSummary/HPI
_noteAP	STRING	Assessment and Plan (AP) section of the note	/noteSummary/AP
_noteROS	STRING	Review of Systems (ROS) section of the note	/noteSummary/ROS
_notePE	STRING	Physical Exam (PE) section of the note	/noteSummary/PE

Looker supports the development of the population_dashboards to analyze and visualize data from the healthcare data lake (BigQuery). Customers can create population-level dashboards, generate insights, and visualize them using Looker.
Colab Enterprise is a collaborative, managed notebook environment with the security and compliance capabilities of Google Cloud. AI researchers use Colab Enterprise to explore and analyze data from the healthcare data lake. It is a tool for rapid prototyping and model development. It is part of a unified AI platform (Vertex AI) for building and using generative AI solutions.

Clinical Intelligence Engine data flow from Augmedix to healthcare data lake

Data flow described in Reference Architecture (Figure 1):

T1: After a recording of the clinician-patient discussion is complete and fully digitized, Augmedix publishes a message to the Enterprise’s Pub/Sub to a pre-defined topic. The message includes data payload (JSON) with the auto-generated notes and structured data elements that were extracted from the patient-clinician discussion. Augmedix continuously improves its services and makes more discrete data elements available in the data payload. See a sample data payload at the bottom of this document.
T2: The Enterprise’s encounter_service is subscribed to the topic, picks the message.
T3: The Enterprise’s encounter_topic makes the necessary transformations to the data payload and stores the encounter_table in a Spanner instance.
A1: The data is ingested to the healthcarte_data_lake on a BigQuery instance. The data lake integrates data from multiple sources in the Enterprise, for example longitudinal patient data from the EMR and the other healthcare systems. (A2)
A2: Healthcare organizations can use Google’s Healthcare Data Engine (HDE) to aggregate, harmonize, de-duplicate, reconcile, and generate longitudinal patient records (LPR) in FHIR format. Healthcare organizations can ingest this data from different enterprise systems, like EMRs, revenue cycle management systems, registration systems, pharmacy systems, etc. HDE writes LPR in FHIR Datastore using Healthcare FHIR APIs. The FHIR formatted data is exported from the FHIR Datastore and stored in BigQuery for SQL-friendly access.
The healthcare_data_lake can now serve many stakeholders. For example:
1. A3: Using Looker (population_dashboard) Augmedix datasets can be integrated into business intelligence reports and dashboard.
2. A4: Using Vertex AI Colab notebooks, Augmedix’s datasets could enhance training datasets or inputs to inferences.

Clinical Intelligence Engine use cases: why is integrating ambient clinical notes to your data lake important?

The traditional source of truth in healthcare service delivery has been the electronic medical records system (EMR), but ambient documentation taps into a rich layer of data—the conversation between the clinician (provider) and patient—where much healthcare delivery materially takes place. Here, at this conversational level, at the point of care, resides a valuable data set that often never finds its way into the EHR, and to the extent it does, it is often made unstructured during EHR writeback. Augmedix’s structured data output to a data lake (outside of the EHR) preserves structured data from the point of care and enables several high value use cases such as:

Analyzing factors that frequently correlate to long-term hospital admissions for severe medical conditions such as Sepsis and other critical healthcare events.
Analyzing workflow data, such as encounter length, to drive enterprise level efficiency improvements
Analyzing a patient's data set and cross-mapping against a library of new pharmacological drugs in testing, so as to match a patient's demographics, medical conditions and geography to clinical trial opportunities.
Once de-identified, the transcript and note data could be used for training purposes

What data is exported from Augmedix’s Clinical Intelligence Engine?

The data payload that could be sent from Augmedix’s ambient documentation solution includes:

Transcript: A full transcript of the encounter, diarized by the speaker
Notes: The clinical notes that were sent to the EMR in two versions:
- Original notes as they were generated by Augmedix AI
- Notes after being edited by the provider on the Augmedix application
Structured data: Breaks down the visit into key components for every problem. Data elements can include symptoms, associated symptoms, pain scale rating, current medications, chronic conditions, labs and treatments.

All data elements can be delivered with all the information or with reduction of Protected Health Information (PHI) (de-identified).

For a sample dataset, please refer to: Note Sample, Transcript Sample

Design alternatives

Pub/Sub vs. direct API calls

This reference architecture recommends using Pub/Sub as the communication infrastructure between Augmedix and the enterprise’s HDE as described in the data flow section above. This approach is considered secure, scalable and keeps Augmedix’s solution and the enterprise's HDE highly decoupled. Alternatively, an approach of direct API calls could be implemented. In this approach, Augmedix will call a set of API endpoints, provided by the enterprise, to send the data payload or report encounter status. Augmedix has the infrastructure to quickly adapt to customer-provided APIs and has done that before already.

If you prefer this approach, please contact the Augmedix Partnerships team at partnerships@augmedix.com to get in touch with an integrations architect.

Injecting data to Augmedix ambient documentation

This reference architecture describes a one-way data flow, from Augmedix’s ambient documentation solution to the healthcare data lake. In addition to the encounter recording, Augmedix uses data from the EMR to improve the quality and comprehensiveness of the generated clinical notes. Advanced and highly utilized data lakes could hold data points from multiple systems, or generate valuable insights that could further enhance the clinical notes. Injecting these data elements and insights to Augmedix’s clinical notes generation process could yield significant value to the enterprise by generating higher quality notes, less missed charges and greater efficiency gains for the clinicians.

To discuss this option, please contact the Augmedix Partnerships team at partnerships@augmedix.com to get in touch with an integrations architect.

Sending data of only subset of the encounters

Data lakes typically aim to aggregate as much data as possible, to support future use cases. But these could be cases where the enterprise will prefer to filter the information it processes. Such use cases could be for cost control, load control on critical resources downstream, or privacy considerations. Augmedix has the ability to send only a subset of the encounters by applying filters on encounter type, specific providers, specific facilities, and more.

To discuss this option, please contact the Augmedix Partnerships team at partnerships@augmedix.com to get in touch with an integration architect.

Note: Augmedix holds the encounter data for one week before it is permanently purged. If the data is not exported to the data lake within this timeframe, it will not be available.

Design considerations

High availability and disaster recovery

Augmedix guarantees 99.0% uptime SLA for its data export API and retains the data (raw transcripts, notes, structured data) for one week after the encounter data was sent to the EMR. This SLA should suffice to most data lake use cases under normal circumstances. Some events or unique use cases may require higher uptime SLA or to retain the data on Augmedix’s database for longer. For example:

Your data lake is down (either planned maintenance or unplanned event), and you want to make sure the data from Augmedix’s ambient documentation solution retains the data for longer than one week so it can be sent at a later time.
You use your data lake for real-time alerting, and you require uptime SLA greater than 99.0%.
The “encounter_service” misprocessed some of the data, and you need it to be resent.

In any of these cases, please promptly contact the Augmedix Partnerships team at partnerships@augmedix.com.

Security, privacy, and compliance

HIPAA demands compliance with the Security Rule, the Privacy Rule, and the Breach Notification Rule. Google Cloud supports HIPAA compliance (within the scope of a Business Associate Agreement), but ultimately, customers are responsible for evaluating their own HIPAA compliance. Complying with HIPAA is a shared responsibility between the customer, Augmedix, and Google. Customers that are subject to HIPAA and want to utilize any Google Cloud products in connection with PHI must review and accept Google's Business Associate Agreement (BAA). Google ensures that the Google products covered under the BAA meet the requirements under HIPAA and align with our ISO/IEC 27001, 27017, and 27018 certifications and SOC 2 report.
Customers may reference these third party audit reports to assess how Google’s products can meet their HIPAA compliance needs.
Vertex AI does not utilize customer data, prompts, responses, or training data to improve or train the foundation models. Customers can conduct up-training of the foundation models by using their data and documents within their secured tenant on Google Cloud. It supports Google Cloud security controls that you can use to meet your requirements for data residency, data encryption, network security, and access transparency. For more information, see Security controls for Vertex AI and Security controls for Generative AI.
Cloud IAM should be used to implement the principles of least-privilege and separation-of-duties with cloud resources. This control can limit access at the project, folder, or dataset levels.

When designing your data lake, carefully consider if PHI is needed for your use cases. The general rule of thumb is ‘if you don’t absolutely need it, don’t deal with it.’ Augmedix’s ambient clinical documentation solution can export the data with PHI or de-identified.

If your data lake use cases do not require holding PHI, it is recommended that you use de-identified data export from Augmedix’s ambient clinical documentation solution. This way your database and the entire pipeline are kept clean of PHI. It will also permit broader access to the information. You may still need to link data from the ambient clinical documentation solution to the EMR record. The link (a token or a key) may be considered a PHI by itself and would need to be protected. You should consult with your compliance officer on the best approach to manage this link.

If your data lake use cases require PHI, you should make sure the design of the data lake complies with your organization’s HIPAA policy.

Please also note that the data exported from Augmedix’s ambient clinical documentation solution is not the final version of the clinician note. The clinician may edit the data in the EMR or in other systems after it was generated by Augmedix’s ambient clinical documentation solution. The source of truth and the final record for all clinical information is the EMR record.

Scaling and performance

Data lake use cases typically allow some latency in data processing. The cloud Pub/Sub serves as a string buffer to manage surpluses of load coming from Augmedix’s ambient clinical documentation solution. It is important to continuously monitor the number of unprocessed messages in the queue and the total time it takes to fully ingest a data payload into the data lake. If it exceeds the SLA with your stakeholders for how up-to-date the data needs to be, you should consider scaling the required services.

Augmedix guarantees sending the data to the data lake within 30 minutes of the time it was sent to the EMR. Please note that a few hours may pass between encounter completion and sending it to the EMR.

Deployment

You can deploy the solution in the customer's GCP organization. All the Google Cloud products described in the architecture are generally available for customers to start immediately. Augmedix, out of the box, supports publishing messages to Pub/Sub.

You may choose one of the following two options for implementation support and services:

Engage Google Cloud Consulting
Engage Augmedix at partnerships@augmedix.com

Products used

Here is a list that could provide more details on the Augmedix and Google products used in this reference architecture:

Augmedix Go: is an ambient documentation engine from Augmedix.
Pub/Sub: An asynchronous and scalable messaging service that decouples services that produce messages from services that process those messages.
Google Kubernetes Engine (GKE): A managed environment for deploying, managing, and scaling containerized applications using Google infrastructure.
Cloud Run: A serverless compute platform that lets you run containers directly on top of Google's scalable infrastructure.
Spanner: A fully managed, relational database service for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL.
BigQuery: A fully managed, highly scalable data warehouse with built-in machine learning capabilities.
Looker: An enterprise platform for BI, data applications, and embedded analytics that helps you create visualizations to explore and share insights in real-time.
Vertex AI: A comprehensive and user-friendly machine learning platform that provides a unified environment for the machine learning lifecycle, from data preparation to model deployment and monitoring.
Colab Enterprise: a collaborative, managed notebook environment with the security and compliance capabilities of Google Cloud.
Healthcare Data Engine (HDE): Providers and health systems use Google Cloud's HDE to aggregate clinical data from disparate healthcare systems and create longitudinal patient records.

Whenever using clinical information, it is recommended to familiarize yourself with data protection regulations. It is recommended that you consult with your compliance officer, but this link is a good starting point:

https://www.hhs.gov/guidance/document/summary-hipaa-security-rule-0

Evolving the architecture

Automated ambient clinical documentation that is based on AI, and specifically LLMs, is a very new and dynamic domain. Augmedix fully recognizes that this reference architecture should evolve over the next few months. Augmedix would appreciate your feedback and a discussion about your current and future needs. Your feedback is valuable to Augmedix and will shape its product roadmap to more closely match your use cases.

Contributors

Authors:

Ian Shakil | Founder, Director, Chief Strategy Officer, Augmedix
Tomer Levy | SVP Engineering, Augmedix
Dharmesh Patel | Industry Solutions Architect, Healthcare, Google Cloud

Other contributors:

Saurav Chatterjee | CTO, Augmedix
Mike Leff | Strategic Partnerships, Augmedix
Carrie King | Project Manager, Augmedix
Himanshu Chavda | Customer Engineer, Healthcare Specialist, Google Cloud
Luis Urena | Developer Relations Engineer, Google Cloud
Valentin Huerta | AI Engineer