What data sources are available for monitoring GitHub activity?

David-French · 06-18-2024 09:34 AM

For many organizations, GitHub houses critical intellectual property and is a prime target for attackers seeking to steal valuable source code, disrupt software development operations, or carry out supply chain attacks. Security teams must proactively monitor their GitHub Enterprise environments and have the capability to detect and respond quickly to any suspicious activity before an incident occurs.

While working as a Detection & Response Engineer at another company, I received intelligence that a threat group’s modus operandi was to compromise a developer or software engineer’s user account, create a GitHub personal access token under the account, and proceed to clone all of the GitHub repositories that the compromised user had access to. The attackers used VPN services to anonymize their activity. Needless to say, I became curious about what logging, monitoring, and detection opportunities existed for my GitHub Enterprise environment.

In this two-part blog series, I’m going to demonstrate how a security team can use the Google Security Operations platform to proactively monitor for and detect suspicious and notable behaviors in their GitHub Enterprise environment. Part one will walk through the process of ingesting GitHub audit logs in Google Security Operations. In part two, I’ll provide details on the 26 rules that we’ve shared to help security teams get started with monitoring their GitHub environment. I’ll explain the detection logic for one of the YARA-L rules in detail and test the rule to validate that it detects the intended behavior.

What data sources are available for monitoring GitHub activity?

Good question. A first step as a Detection Engineer is to understand what data sources (or logs) are available to us to build detection rules that alert us when suspicious or notable activity occurs in our GitHub Enterprise environment. Luckily for us, GitHub has an audit log that tracks user, organization, and repository events.

The screenshot below shows what it looks like when I review the audit log for my GitHub Enterprise account. If you’re unfamiliar with GitHub nomenclature, know that a GitHub Enterprise account provides administrators with a single point of visibility and management across multiple GitHub “organizations”. A GitHub organization can contain multiple GitHub code “repositories”.

On the subject of exploring a new log type and learning what behaviors the various event types represent, GitHub has documented each event type for us, which gives us a good head start.

Reviewing GitHub audit log events

Ingesting GitHub audit logs into Google Security Operations

At this point, we’ve learned that GitHub has an audit log that we can use to look for suspicious activity and we can search the audit log in the GitHub admin console. To enable the security team to easily search, analyze, and detect behaviors in these audit log events from a central location, I’m going to ingest the events in Google Security Operations.

GitHub supports various options for streaming audit logs to an external system. For this project, I’m going to configure GitHub to stream audit logs to a Google Cloud Storage bucket and configure Google Security Operations to ingest the logs from the bucket. An alternative to this method is to create an ingestion script deployed as a Cloud Function that pulls log events from GitHub’s audit log API and forwards them to Google Security Operations for ingestion, but the volume of events in this environment is small and the platform can be configured to delete the events from a bucket after they’re ingested to minimize costs.

Creating a service account for GitHub Enterprise

The first step in this process is to create a new service account in my Google Cloud project that will be used by GitHub Enterprise to stream audit log events to a Cloud Storage bucket.

create-google-cloud-service-account-for-log-streaming.png
Creating a Google Cloud service account for GitHub Enterprise log streaming

GitHub Enterprise also needs a way to authenticate to Google Cloud when it uses the service account. To accomplish this, we create a private key in JSON format as shown in the screenshot below. The JSON key file is automatically downloaded to your local machine. We’ll use this file in an upcoming step.

Creating a JSON key for the service account

Creating a storage bucket to store GitHub audit logs

I created a new Google Cloud storage bucket that GitHub will write its audit logs to. In the screenshot below, I give the bucket a name and ensure that public access prevention is on. We don’t want just anyone on the Internet reading our logs 🙂.

creating-google-cloud-storage-bucket (1).png
Creating a new Cloud Storage bucket to store GitHub audit logs

The service account I created earlier needs access to the new bucket. In the screenshot below, I’m assigning the “Storage Object Creator” role to the service account for the bucket, “github-audit-logs-1234”.

assign-storage-creator-role-to-google-cloud-service-account.png Assigning a role to the service account for the Cloud Storage bucket

Configuring GitHub audit log streaming

The next step is to configure my GitHub Enterprise account to stream its audit logs to the Google Cloud storage bucket. I enter the unique name for the bucket and the service account’s private (JSON) key that I created earlier. Before clicking “Save”, I click the “Check endpoint” button to ensure that GitHub can authenticate using the service account key and write data to the bucket.

Configuring audit log streaming for GitHub Enterprise account

The screenshot below shows a successfully configured audit log stream in GitHub Enterprise ✅.

Reviewing a configured audit log stream in GitHub Enterprise

After configuring the GitHub audit log stream, I verified that the log files are being written to the Cloud Storage bucket.

Verifying that GitHub audit logs are being written to the Cloud Storage bucket

Adding a data feed in Google SecOps

At this stage, my GitHub Enterprise account is streaming its audit logs to a Google Cloud storage bucket. In Google Security Operations, I create a new data feed that takes care of ingesting and normalizing the logs so that my security team can search, analyze, and build detection rules that utilize these events.

In the “add feed” window below, I choose “Google Cloud Storage” as the source type and “GitHub” as the log type. Specifying the log type ensures that the platform uses the appropriate parser to normalize the logs into the Unified Data Model (UDM). I clicked “get a service account” to create a Google Cloud service account that will be used to read the log files from the Google Cloud Storage bucket.

Adding a new feed in Google SecOps for GitHub logs

To finish up the configuration of this new data feed, I enter the URI for the Google Cloud Storage bucket and specify that the logs are stored in a directory that contains subdirectories. As I mentioned earlier, there is an option to have Google Security Operations delete the log files from the bucket after it ingests them. For this proof of concept, I’m not going to do this. You can read more about feed management in the documentation.

Configuring the input parameters for the data feed in Google SecOps

In the previous step, a new service account was created that will be used by Google Security Operations to read and ingest GitHub logs from the storage bucket. The last step in this configuration workflow is to navigate back to the Google Cloud Console and grant that service account access to the bucket where the GitHub logs are being written.

assign-storage-viewer-role-to-google-secops-service-account.png Granting the SecOps service account read-level access to the Cloud Storage bucket

The screenshot below shows that the new data feed has a status of “completed” meaning that Google Security Operations is reading the log files from the Cloud Storage bucket and ingesting the events.

Viewing the status of the new data feed

Verifying log ingestion in Google SecOps

To verify that Google Security Operations is ingesting GitHub audit logs, I navigate to the search page and run a simple UDM search, metadata.log_type = “GITHUB”. I see two events in the search results. These events were logged after I downloaded a couple of repositories from my GitHub Enterprise account 👀.

Searching GitHub logs in Google SecOps

Wrap up

That’s it for part one where I covered the following:

The importance of a security team having the capability to monitor their organization’s GitHub Enterprise account for suspicious activity including attacker tactics observed in the wild
How to ingest GitHub Enterprise audit logs in Google Security Operations
Using a very basic search query in Google Security Operations to verify that GitHub logs are being ingested successfully

Join me in part two where I’ll provide details on a collection of detection rules that you can use to get started with monitoring your GitHub environment for suspicious activity. I’ll do a deep dive on one of the rules to explain how it works before testing it, tuning it, and validating that it detects the intended behavior.

Monitoring for Suspicious GitHub Activity with Google Security Operations (Part 1)

What data sources are available for monitoring GitHub activity?

Ingesting GitHub audit logs into Google Security Operations

Creating a service account for GitHub Enterprise

Creating a storage bucket to store GitHub audit logs

Configuring GitHub audit log streaming

Adding a data feed in Google SecOps

Verifying log ingestion in Google SecOps

Wrap up