Description
Environment:
- AWS ParallelCluster 2.1.1
- OS: Cent OS 7
- Scheduler: SGE
- Master instance type: m5.large
- Compute instance type: m5.xlarge
Bug description and how to reproduce:
Deploying a ParallelCluster 2.1.1 with Raid 0 configuration fails with this error.
Beginning cluster creation for cluster: cluster1
Creating stack named: parallelcluster-cluster1
Status: parallelcluster-cluster1 - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::EC2::Instance MasterServer Received FAILURE signal with UniqueId i-0ecca142dxxxxx
I thought the failure could be because I'm using encrypted EBS volumes with custom KMS key but I commented out both encrypted and ebs_kms_key_id settings but still the same failure.
Additional context:
Any other context about the problem. E.g.:
- configuration file without any credentials or personal data.
[global]
update_check = true
sanity_check = true
cluster_template = default
[aws]
aws_region_name = us-west-2
[cluster default]
vpc_settings = vpc-0094xxxxx
key_name = cdns-cluster
base_os = centos7
compute_instance_type = m5.2xlarge
master_instance_type = m5.large
#compute_root_volume_size = 20
#master_root_volume_size = 20
initial_queue_size = 0
tags = {"BU" : "IT", "Sub_BU" : "IT"}
raid_settings = rs
#extra_json = { "cluster" : { "ganglia_enabled" : "yes" } }
[vpc vpc-0094xxxxx]
vpc_id = vpc-0094xxxxx
master_subnet_id = subnet-06cxxxxxx
use_public_ips = false
ssh_from = 172.16.0.0/12
[raid rs]
shared_dir = raid
raid_type = 0
num_of_raid_volumes = 2
volume_size = 100
encrypted = true
ebs_kms_key_id = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
When I created the cluster with --norollback option, I can see that the master has a 20GB disk mounted and exported under /shared and also noticed that the 2 disks for the raid0 configuration are not attached to the master.
Attachments:
cfn-init.log
cloud-init.log