Skip to content

Parallelcluster 2.1.1 with raid 0 config on Cent OS 7 fails in create cluster #823

Closed
aws/aws-parallelcluster-cookbook
#253
@ahmedelz

Description

@ahmedelz

Environment:

  • AWS ParallelCluster 2.1.1
  • OS: Cent OS 7
  • Scheduler: SGE
  • Master instance type: m5.large
  • Compute instance type: m5.xlarge

Bug description and how to reproduce:
Deploying a ParallelCluster 2.1.1 with Raid 0 configuration fails with this error.

Beginning cluster creation for cluster: cluster1
Creating stack named: parallelcluster-cluster1
Status: parallelcluster-cluster1 - ROLLBACK_IN_PROGRESS
Cluster creation failed.  Failed events:
  - AWS::EC2::Instance MasterServer Received FAILURE signal with UniqueId i-0ecca142dxxxxx

I thought the failure could be because I'm using encrypted EBS volumes with custom KMS key but I commented out both encrypted and ebs_kms_key_id settings but still the same failure.

Additional context:
Any other context about the problem. E.g.:

  • configuration file without any credentials or personal data.
[global]
update_check = true
sanity_check = true
cluster_template = default

[aws]
aws_region_name = us-west-2

[cluster default]
vpc_settings = vpc-0094xxxxx
key_name = cdns-cluster
base_os = centos7
compute_instance_type = m5.2xlarge
master_instance_type = m5.large
#compute_root_volume_size = 20
#master_root_volume_size = 20
initial_queue_size = 0
tags = {"BU" : "IT", "Sub_BU" : "IT"}
raid_settings = rs
#extra_json = { "cluster" : { "ganglia_enabled" : "yes" } }

[vpc vpc-0094xxxxx]
vpc_id = vpc-0094xxxxx
master_subnet_id = subnet-06cxxxxxx
use_public_ips = false
ssh_from = 172.16.0.0/12

[raid rs]
shared_dir = raid
raid_type = 0
num_of_raid_volumes = 2
volume_size = 100
encrypted = true
ebs_kms_key_id = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

When I created the cluster with --norollback option, I can see that the master has a 20GB disk mounted and exported under /shared and also noticed that the 2 disks for the raid0 configuration are not attached to the master.

Attachments:
cfn-init.log
cloud-init.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions