代码之家  ›  专栏  ›  技术社区  ›  mohit

Terraform:为多个实例创建CloudWatch警报时出错

  •  0
  • mohit  · 技术社区  · 6 年前

    我正在两个区域中创建多个ec2实例。我想将CloudWatch警报与状态检查和CPU利用率联系起来。

    我有两个问题,包括创建cloudwatch警报的逻辑。

    ├── main.tf
    ├── modules
    │   ├── alb
    │   │   ├── aws_alb.tf
    │   │   ├── aws_instance.tf
    │   │   ├── bootstrap.sh
    │   │   ├── cloudwatch.tf
    │   │   ├── main.tf
    │   │   ├── output.tf
    │   │   ├── security-group.tf
    │   │   ├── sns.tf
    │   │   └── variables.tf
    │   └── route53
    │       ├── main.tf
    │       └── variables.tf
    └── variables.tf
    

    主.tf

    module "north-virginia" {
      source          = "./modules/alb"
      region          = "us-east-1"
      az              = ["us-east-1a", "us-east-1b", "us-east-1c"]
    }
    
    module "oregon" {
      source          = "./modules/alb"
      region          = "us-west-2"
      az              = ["us-west-2a", "us-west-2b", "us-west-2c"]
    }
    

    模块/alb/aws\u instance.tf

    resource "aws_instance" "web" {
      ami               = "${data.aws_ami.amzn2.id}"
      instance_type     = "${var.instance_type}"
      count             = 3
      availability_zone = "${element(var.az, count.index)}"
      tags {
        Name = "${count.index}"
      }
    }
    

    resource "aws_cloudwatch_metric_alarm" "cpu_utilization" {
      count               = "${length(local.instance_id_var)}"
      alarm_name          = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
      comparison_operator = "GreaterThanOrEqualToThreshold"
      evaluation_periods  = "2"
      metric_name         = "CPUUtilization"
      namespace           = "AWS/EC2"
      period              = "120"
      statistic           = "Average"
      threshold           = "60"
      alarm_description   = "This metric monitors ec2 cpu utilization"
    
      dimensions {
        InstanceId = "${element(aws_instance.web.*.id, count.index)}"
      }
    }
    
    resource "aws_cloudwatch_metric_alarm" "status_check" {
      count               = 3
      alarm_name          = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
      comparison_operator = "GreaterThanOrEqualToThreshold"
      evaluation_periods  = "2"
      metric_name         = "StatusCheckFailed"
      namespace           = "AWS/EC2"
      period              = "120"
      statistic           = "Average"
      threshold           = "1"
      alarm_description   = "This metric monitors ec2 status check."
    
      dimensions {
        InstanceId = "${element(aws_instance.web.*.id, count.index)}"
      }
    }
    

    预期行为:

    它在每个区域创建并附加3个报警实例。

    每次我应用它都会产生警报,反之亦然。

    我得到下面的错误,这是解决如果我等待2分钟,而它是更新警报或如果我使用 terraform apply -parallelism=1

    错误:

    4 error(s) occurred:
    
    * module.north-virginia.aws_cloudwatch_metric_alarm.status_check[0]: 1 error(s) occurred:
    
    * aws_cloudwatch_metric_alarm.status_check.0: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
    status code: 400, request id: ea6c4502-dede-11e8-9262-c55251d6673a
    * module.north-virginia.aws_cloudwatch_metric_alarm.cpu_utilization[1]: 1 error(s) occurred:
    
    * aws_cloudwatch_metric_alarm.cpu_utilization.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
    status code: 400, request id: ea6c6c09-dede-11e8-a13f-bbb86ff53045
    * module.oregon.aws_cloudwatch_metric_alarm.status_check[1]: 1 error(s) occurred:
    
    * aws_cloudwatch_metric_alarm.status_check.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
    status code: 400, request id: ed198a56-dede-11e8-b95a-9d366b9f2e85
    * module.oregon.aws_cloudwatch_metric_alarm.cpu_utilization[3]: 1 error(s) occurred:
    
    * aws_cloudwatch_metric_alarm.cpu_utilization.3: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
    status code: 400, request id: ed193c4d-dede-11e8-9c63-21cde1551122
    

    1 回复  |  直到 6 年前
        1
  •  1
  •   KJH    6 年前

    首先,我将通过删除/注释 module "oregon" . 一旦你拿到 弗吉尼亚州 一个正确,然后重新添加。

    其次,我将切换模块中的代码,将计数计算为 var.az

    count = "${length(var.az)}"
    

    这样,您就可以在调用模块的代码中更改AZs的#,并动态更改所创建实例的#。

    第三,经济增长 name 你给CloudWatch的警报看起来是一样的。试着区分它们。例如。:

    alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-cpu-util"
    alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-status-check"
    

    PS>在测试之间,确保您已经清除了所有可能已经创建的资源,以确保您正在运行一个干净的测试。