BigQuery Data Transfer - Permission denied when transferring public dataset

I am trying to transfer a public dataset to our project. I am specifying a Scheduled Query with a service account. In my project, (the destination project) I have given this service account the bigquery Admin role. I have also made sure the BigQuery Data Transfer service agent has the service account token creator role and the bigquerydatatransfer.serviceAgent role. 

Using Terraform I have specified the following data transfer config - 

 

resource "google_bigquery_data_transfer_config" "indications_query_config" {
  depends_on = [google_project_iam_member.bq_data_transfer_permissions]

  project                = module.gcp_project.project_id
  display_name           = "Open Targets Platform Data Transfer - Indications"
  location               = "us-west1"
  data_source_id         = "scheduled_query"
  schedule               = "14 of month 02:00"
  destination_dataset_id = module.bigquery.bigquery_dataset.dataset_id
  service_account_name   = google_service_account.bigquery_data_transfer_sa.email
  params = {
    destination_table_name_template = "indication"
    write_disposition               = "WRITE_APPEND"
    query                           = "SELECT * FROM `bigquery-public-data.open_targets_platform.indication`"
  }
}

 

When this transfer runs, I get the following error message:

Error code 7 : Failed to start BigQuery job. Error: Access Denied: Table bigquery-public-data:open_targets_platform.indication: User does not have permission to query table bigquery-public-data:open_targets_platform.indication, or perhaps it does not exist. User: SA_NAME@PROJECT_ID.iam.gserviceaccount.com.

I went ahead and created a service account key for this service account. I set the env var GOOGLE_APPLICATION_CREDENTIALS as the path to this key and ran the following python code:

 

from google.cloud import bigquery
import google.auth
SCOPES = [
    'https://www.googleapis.com/auth/cloud-platform'
]
CREDENTIALS, GCP_PROJECT = google.auth.default(scopes=SCOPES)
client = bigquery.Client(credentials=CREDENTIALS, project=GCP_PROJECT)

# Perform a query.
QUERY = (
    'SELECT * FROM `bigquery-public-data.open_targets_platform.indication`'
    'LIMIT 100')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.id)

 

When I do this, I am able to query the public dataset. I can also use the debugger to confirm that the bigquery client is indeed using the service account as credentials. 

Any idea whats going wrong here? 

Solved Solved
1 3 438
1 ACCEPTED SOLUTION

TL;DR - Give the service agent BigQuery Data Viewer in the project. Subscribe to the public dataset to create a linked dataset in your project. Create/ensure a destination dataset has the same location as the linked dataset (regional vs. multi-regional). Create a scheduled query.

View solution in original post

3 REPLIES 3

The main issue involves a mismatch in permission granting for the BigQuery Data Transfer service to access a public dataset. Although the service account can directly query the dataset, as demonstrated by the Python script, the BigQuery Data Transfer service requires explicit authorization to utilize these permissions.

To resolve this, specific steps must be taken to grant the necessary access to the Data Transfer service account. First, navigate to the Google Cloud Console, go to IAM & Admin -> IAM, and locate the BigQuery Data Transfer service account, which typically looks like service-[project-number]@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com. Edit this service account's roles to include the BigQuery Data Viewer role, which allows the service to read data from BigQuery, including public datasets. After adding this role, save the changes.

For additional verification, you can check the IAM policy for the public dataset itself by going to BigQuery -> Open Targets Platform -> indication, clicking on the Details tab, and then Permissions. Ensure that the BigQuery Data Transfer service account has the "BigQuery Data Viewer" role here as well.

Your Terraform configuration appears to be correctly set up, and no changes are necessary. This setup ensures that while your service account has the required permissions, the Data Transfer service needs its own explicit authorization to access the dataset.

By granting the "BigQuery Data Viewer" role directly to the Data Transfer service account, it gains the necessary read permissions, enabling it to perform the data transfer from the public dataset. This distinction is crucial because the Data Transfer service operates as a separate entity from your regular service account.

For troubleshooting, consider temporarily adjusting the scheduled query interval to a shorter timeframe, such as every hour, to facilitate quicker testing and validation. Additionally, reviewing the logs in Cloud Logging for the BigQuery Data Transfer service can provide further insights if issues persist.

Thank you for your answer. I have added the BigQuery Data Viewer role to my Project IAM. However, when I go to the public datasets permissions, I do not see it appear - 

Screenshot 2024-07-09 at 10.47.03 AM.png

When I rerun the scheduled query I get the same error message:

Error code 7 : Failed to start BigQuery job. Error: Access Denied: Table bigquery-public-data:open_targets_platform.indication: User does not have permission to query table bigquery-public-data:open_targets_platform.indication, or perhaps it does not exist. User: [SA_NAME]@[PROJECT].iam.gserviceaccount.com.

If I subscribe to this dataset, and create a linked dataset in my project, then I am able to see the Data Transfer service agent listed under the bigquery data viewer role. However, if I reconfigure my Scheduled Query transfer to use the following query:

SELECT * FROM `PROJECT.open_targets_platform.indication`

I get the following error:

Error code 5 : Not found: Dataset PROJECT:open_targets_platform was not found in location us-west1; JobID: PROJECT:66885aee-0000-2baa-88b5-d4f547ec4fb4

Which makes sense because the linked database is not a regional database and its location is listed as "US". Is there a way I can specify the source dataset's location as US and the destination as "us-west1"?

So then I tried to use Dataset Copy. I created the following Dataset Copy configuration: 

Screenshot 2024-07-09 at 3.52.17 PM.png

I find that this also fails with a NotFound error.

However, if I use a multi-region destination dataset in the same location as the source dataset, the Scheduled Query works as expected. Which should be enough to get me unblocked for now. It would be nice if the documentation mentioned such a limitation for these services. Unless they already do and I missed it 😛 

 

 

TL;DR - Give the service agent BigQuery Data Viewer in the project. Subscribe to the public dataset to create a linked dataset in your project. Create/ensure a destination dataset has the same location as the linked dataset (regional vs. multi-regional). Create a scheduled query.