Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-deployment of HANA scale-out cluster with no stand-by nodes fails with Error "ParentResourceNotFound" #913

Open
abravosuse opened this issue Nov 16, 2023 · 6 comments
Assignees
Labels
azure bug Something isn't working

Comments

@abravosuse
Copy link

abravosuse commented Nov 16, 2023

Used cloud platform
Azure

Used SLES4SAP version
SLES15SP4

Used client machine OS
openSUSE Leap 15.2

Expected behaviour vs observed behaviour
Expected behavior: Deployment of HANA scale-out cluster (with no standby node)
Observed behavior: deployment fails

How to reproduce
Specify the step by step process to reproduce the issue. This usually would look like something like this:

  1. Switch to the azure folder
  2. Create the terraform.tfvars file based on terraform.tfvars.example (content pasted later)
  3. Setup azure account
  4. Initialize terraform
    terraform init
  5. Create and switch to terraform workspace hsonsb
    terraform workspace new hsonsb
  6. Execute deployment
    terraform apply -auto-approve

Used terraform.tfvars

resource_group_name = "<my_rg>"
vnet_address_range = "10.130.0.0/16"
subnet_address_range = "10.130.1.0/24"
admin_user = "cloudadmin"
reg_code = "<my_internal_code>"
reg_email = "alberto.bravo@suse.com"
os_image = "SUSE:sles-sap-15-sp4-byos:gen2:latest"
public_key  = "~/.ssh/id_rsa_cloud.pub"
private_key = "~/.ssh/id_rsa_cloud"
cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub"
cluster_ssh_key = "salt://sshkeys/cluster.id_rsa"
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v9/"
provisioning_log_level = "debug"
pre_deployment = true
cleanup_secrets = true
bastion_enabled = false
hana_name = "vmhsonsb"
hana_count = "4"
hana_scale_out_enabled = true
hana_scale_out_standby_count = 0
hana_scale_out_shared_storage_type = "anf"
anf_pool_size                      = "15"
anf_pool_service_level             = "Ultra"
hana_scale_out_anf_quota_shared    = "2000"
storage_account_name = "<my_storage_account_name>"
storage_account_key = "<my_storage_account_key>"
hana_inst_master = "//<my_storage_account_name>.file.core.windows.net/hana/51055267"
hana_ha_enabled = true
hana_ips = ["10.130.1.11", "10.130.1.12", "10.130.1.13", "10.130.1.14"]
hana_cluster_vip = "10.130.1.15"
hana_sid = "SC1"
hana_instance_number = "30"
hana_master_password = "<my_password>"
hana_primary_site = "NBG"
hana_secondary_site = "WDF"
hana_cluster_fencing_mechanism = "sbd"
iscsi_name = "vmiscsihsonsb"
iscsi_srv_ip = "10.130.1.4"
hana_data_disks_configuration = {
disks_type       = "Premium_LRS,Premium_LRS,Premium_LRS,Premium_LRS,Premium_LRS"
disks_size       = "64,64,64,64,32,64"
caching          = "ReadOnly,ReadOnly,ReadOnly,ReadOnly,None"
writeaccelerator = "false,false,false,false,false"
luns             = "0,1#2,3#4#5"
names            = "data#log#usrsap#backup"
lv_sizes         = "100#100#30#60"
paths            = "/hana/data#/hana/log#/usr/sap#/hana/backup"
}

Logs

Full log files salt-os-setup.log, salt-predeployment.log and salt-result.log will be delivered via PM if needed.
The deployment ends with the following messages:

Error: creating Volume: (Name "vmhsonsb-netapp-volume-shared-2" / Capacity Pool Name "netapp-pool-hsonsb" / Net App Account Name "netapp-acc-hsonsb" / Resource Group "<my_rg>"): netapp.VolumesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Failed to perform 'write' on resource(s) of type 'netAppAccounts/capacityPools/volumes', because the parent resource '/subscriptions/<subscription_id>/resourceGroups/<my_rg>/providers/Microsoft.NetApp/netAppAccounts/netapp-acc-hsonsb/capacityPools/netapp-pool-hsonsb' could not be found."

   with module.hana_node.azurerm_netapp_volume.hana-netapp-volume-shared[1],
   on modules/hana_node/main.tf line 339, in resource "azurerm_netapp_volume" "hana-netapp-volume-shared":
  339: resource "azurerm_netapp_volume" "hana-netapp-volume-shared" {



Error: creating Volume: (Name "vmhsonsb-netapp-volume-shared-1" / Capacity Pool Name "netapp-pool-hsonsb" / Net App Account Name "netapp-acc-hsonsb" / Resource Group "<my_rg>"): netapp.VolumesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Failed to perform 'write' on resource(s) of type 'netAppAccounts/capacityPools/volumes', because the parent resource '/subscriptions/<subscription_id>/resourceGroups/<my_rg>/providers/Microsoft.NetApp/netAppAccounts/netapp-acc-hsonsb/capacityPools/netapp-pool-hsonsb' could not be found."

   with module.hana_node.azurerm_netapp_volume.hana-netapp-volume-shared[0],
   on modules/hana_node/main.tf line 339, in resource "azurerm_netapp_volume" "hana-netapp-volume-shared":
  339: resource "azurerm_netapp_volume" "hana-netapp-volume-shared" {



 Error: remote-exec provisioner error

   with module.hana_node.module.hana_majority_maker.module.majority_maker_provision.null_resource.provision[0],
   on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision":
   78:   provisioner "remote-exec" {

 error executing "/tmp/terraform_1129644356.sh": Process exited with status
 1
@abravosuse abravosuse added the bug Something isn't working label Nov 16, 2023
@yeoldegrove
Copy link
Collaborator

@abravosuse From what I experienced in the past, the creation of the Netapp resources can take time and is error-prone to timing race conditions...
I could see the same behavior that you see. The netapp volumes are being created even though the netapp pool is not yet created.

Could you try the follwing patch which passes the actual output/names of the resources (after they are created) to the HANA/Netweaver modules?

--- main.tf	2023-11-17 11:18:37.678249072 +0100
+++ main.tf.new	2023-11-17 11:18:25.115136635 +0100
@@ -219,9 +219,9 @@
   virtual_host_ips            = local.netweaver_virtual_ips
   iscsi_srv_ip                = join("", module.iscsi_server.iscsi_ip)
   # ANF specific
-  anf_account_name           = local.anf_account_name
-  anf_pool_name              = local.anf_pool_name
-  anf_pool_service_level     = var.anf_pool_service_level
+  anf_account_name           = azurerm_netapp_account.mynetapp-acc.0.name
+  anf_pool_name              = azurerm_netapp_pool.mynetapp-pool.0.name
+  anf_pool_service_level     = azurerm_netapp_pool.mynetapp-pool.0.service_level
   netweaver_anf_quota_sapmnt = var.netweaver_anf_quota_sapmnt
   # only used by azure fence agent (native fencing)
   subscription_id           = data.azurerm_subscription.current.subscription_id
@@ -255,9 +255,9 @@
   os_image                      = local.hana_os_image
   iscsi_srv_ip                  = join("", module.iscsi_server.iscsi_ip)
   # ANF specific
-  anf_account_name                = local.anf_account_name
-  anf_pool_name                   = local.anf_pool_name
-  anf_pool_service_level          = var.anf_pool_service_level
+  anf_account_name                = azurerm_netapp_account.mynetapp-acc.0.name
+  anf_pool_name                   = azurerm_netapp_pool.mynetapp-pool.0.name
+  anf_pool_service_level          = azurerm_netapp_pool.mynetapp-pool.0.service_level
   hana_scale_out_anf_quota_data   = var.hana_scale_out_anf_quota_data
   hana_scale_out_anf_quota_log    = var.hana_scale_out_anf_quota_log
   hana_scale_out_anf_quota_backup = var.hana_scale_out_anf_quota_backup
@abravosuse
Copy link
Author

Thank you @yeoldegrove ! Just to be on the safe side . Applying the patch consists of updating lines 219-221 and 255-257 in file azure/main.tf as indicated above, correct?

@yeoldegrove
Copy link
Collaborator

yeoldegrove commented Nov 20, 2023

@abravosuse Yeah, just delete the lines indicated with - and add the ones with +.
The lines should me unique thought.
Another way would be putting the patch in a file and simply running patch <patch1.patch.

@abravosuse
Copy link
Author

I have followed your suggestion above @yeoldegrove . And the deployment fails now with the following errors:

│ Error: remote-exec provisioner error
│
│   with module.hana_node.module.hana_provision.null_resource.provision[2],
│   on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision":
│   78:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_709979505.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│
│   with module.hana_node.module.hana_provision.null_resource.provision[1],
│   on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision":
│   78:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_1308350619.sh": Process exited with status
│ 1
╵
╷
│ Error: remote-exec provisioner error
│
│   with module.hana_node.module.hana_provision.null_resource.provision[0],
│   on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision":
│   78:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_2089967319.sh": Process exited with status
│ 1

These are errors in the salt provisioner. Therefore I guess I could get more details about them in the different hosts. But which ones?

Thank you!

@yeoldegrove
Copy link
Collaborator

@abravosuse I would need the /var/log/salt-* from hana01,02,03. Or you give me access to the hosts ;)

@abravosuse
Copy link
Author

abravosuse commented Jan 11, 2024

@yeoldegrove please send me your public SSH key and I will grant you access to the HANA hosts...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure bug Something isn't working
3 participants