Openstack Autoscaling New Approach

NOTE(kiennt): There is a legacy Faythe guideline. The new version is coming soon, check its repository for status.

This guide describes how to automatically scale out your Compute instances in response to heavy system usage. By combining with Prometheus pre-defined rules that consider factors such as CPU or memory usage, you can configure OpenStack Orchestration (Heat) to add and remove additional instances automatically, when they are needed.

The standard OpenStack Autoscaling approach

Let’s talk about the standard OpenStack Autoscaling approach before goes to the new approach.

Main components

Autoscaling process

For more details, you could check IBM help documentation

Drawbacks

The new approach with Faythe

The idea

Actually, this isn’t a complete new approach, it still leverages Orchestration (heat) to do scaling action. The different comes from Monitor service.

Take a look at Rico Lin - Heat’s PTL, autoscale slide, basically, Autoscaling is the combination of 3 steps:

OpenStack Telemetry takes care of Metering and Alarm. Ok, the new approach is simply using another service that can take Telemetry roles.

The another service is Prometheus stack. The question here is why I chose this?

The implementation

The ideal architecture

                                               ------------------------------------------------+--+
                                               |                                                  |
                                               |     ---------------+--+  ---------------+--+     |
---------------------++                        |     |   Instance 1    |  |   Instance 2    |     |
|                     |                        |     |                 |  |                 |     |
|                     |            Scrape Metrics    |  ---------+--+  |  |  ---------+--+  |     |
|  Prometheus server  <------------------------------+--+Exporter(s)|  |  |  |Exporter(s)|  |     |
|                     |                        |     |  ---------+--+  |  |  ---------+--+  |     |
|                     |                        |     ---------------+--+  ---------------+--+     |
---------+----------+-+                        |     ------------------------------------+--+     |
           |                                   |     |           Autoscaling Group          |     |
           | Fire alerts                       |     ------------------------------------+--+     |
           |                                   |                                                  |
           |                                   |                                                  |
---------+-v------------+                      |     ------------------------------------+--+     |
|                       |         Send scale request |                                      |     |
|Prometheus Alertmanager---------------------+----+-->          Scaling Policy              |     |
|                       |                      |     |                                      |     |
---------------------+--+                      |     ------------------------------------+--+     |
                                               |                                                  |
                                               |                     Heat Stack                   |
                                               ------------------------------------------------+--+

It’s a piece of cake, right? But where is Faythe, I don’t see it? Let’s talk about the solution problems:

We need a 3rd service to solve these problems - Faythe does some magic.

via GIPHY

The reality architecture

                                              ------------------------------------------------++-+
                                               |                                                  +
                                               |     ---------------+--+  ---------------+--+     |
---------------------++                        |     |   Instance 1    |  |   Instance 2    |     |
|                     |                        +     |                 |  |                 |     |
|                     |            Scrape Metrics    |  ---------+--+  |  |  ---------+--+  |     |
|  Prometheus server  <------------------------------+--+Exporter(s)|  |  |  |Exporter(s)|  |     |
|                     |                        |     |  ---------+--+  |  |  ---------+--+  |     |
|                     |                        |     ---------------+--+  ---------------+--+     |
---------+----------+-+                        |     ------------------------------------+--+     |
           |                                   |     |           Autoscaling Group          |     |
           | Fire alerts                       |     ------------------------------------+--+     |
           |                                   |                                                  |
           |                                   |                                                  |
---------+-v------------+                      |     ------------------------------------+--+     |
|                       |                      |     |                                      |     |
|Prometheus Alertmanager|                      |                Scaling Policy              |     |
|                       |                      |     |                                      |     |
---------+-----------+--+                      |     ---+--^--------------------------------+     |
            |                                  |           |                                      |
            | Send request through             |           |         Heat Stack                   |
            | pre-configured webhook           ------------------------------------------------+--+
            |                                              |
---------+--v-----------+                                  |
|                       |                                  |
|        Faythe         ---------------------------------+-+
|                       |       Send actual scale request
---------------------+--+

NOTE: The stack leverages OpenStack instance metadata and Prometheus labels.

Guideline

The current aprroach requires some further setup and configuration from Prometheus and Heat stack. You will see that it’s quite complicated.

The simplify in logic is paid by the complex config steps.

Step 1: Create a stack - the following is the sample template. It has several requirements:

---
resources:
  asg:
    type: OS::Heat::AutoScalingGroup
    properties:
      min_size: { get_param: min_size }
      max_size: { get_param: max_size }
      resource:
        type: { get_param: service_template }
        properties:
          flavor: { get_param: flavor }
          image: { get_param: image }
          key_name: { get_param: key_name }
          network: { get_param: network }
          subnet: { get_param: subnet }
          metadata: {
              "monitoring": "1", ## Required
              "service": "myservice",
              "stack_asg_name": { get_param: "OS::stack_name" }, ## Required
              "stack_asg_id": { get_param: "OS::stack_id" }, ## Required
            }
          security_group: { get_param: security_group }

  scaleout_policy: ## Have to be named as `scaleout_policy`
    type: OS::Heat::ScalingPolicy
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: { get_resource: asg }
      cooldown: { get_param: scale_out_cooldown }
      scaling_adjustment: { get_param: scaling_out_adjustment }

  scalein_policy: ## Have to be named as `scalein_policy`
    type: OS::Heat::ScalingPolicy
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: { get_resource: asg }
      cooldown: { get_param: scale_in_cooldown }
      scaling_adjustment: { get_param: scaling_in_adjustment }

Step 2: Configure Prometheus openstack discovery

- job_name: openstack_scale_test
  openstack_sd_configs:
    - role: instance
      identity_endpoint: "<openstackendpoint>"
      username: "<openstackusername>"
      password: "<openstackpassword>"
      domain_name: "default"
      port: 9100 ## Exporter endpoint
      refresh_interval: 20s
      region: "RegionOne"
      project_name: "<openstackproject>"

  relabel_configs:
    ## Only keep metrics from ACTIVE instance
    - source_labels: [__meta_openstack_instance_status]
      action: keep
      regex: ACTIVE

    ## Only scrape from instance with monitoring tag
    - source_labels: [__meta_openstack_tag_monitoring]
      action: keep
      regex: 1

    - source_labels: [__meta_openstack_project_id]
      target_label: project_id
      replacement: $1

    - source_labels: [__meta_openstack_tag_stack_asg_name]
      target_label: stack_asg_name
      replacement: $1

    - source_labels: [__meta_openstack_tag_stack_asg_id]
      target_label: stack_asg_id
      replacement: $1

Step 3: Define a Prometheus rule, for example:

groups:
  - name: targets
    rules:
      - alert: high_memory_load
        expr: avg by(stack_asg_id, stack_asg_name, project_id) ((node_memory_MemTotal_bytes{service="myservice"} - node_memory_MemFree_bytes{service="myservice"}) / node_memory_MemTotal_bytes{service="myservice"} * 100) > 80
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory"
          description: "Instance {{ $labels.instance }} of job {{ $labels.job }} (stack {{ $labels.stack_id }} has been high af for 5m"

Step 4: Configure Prometheus Alertmanager webhook, for example:

route:
  receiver: "custom_alert"
  group_wait: 20s
  group_interval: 3m

receivers:
  - name: "custom_alert"
    webhook_configs:
      - send_resolved: true
        url: http://<faythe-host>:<faythe-port>/openstack/autoscaling/openstack-1f
        http_config:
          basic_auth:
            username: "admin"
            password: "password"

Note that, openstack-1f has to be the name of OpenStack configuration group in Faythe config file. It helps Faythe to work with multiple OpenStack.

Step 5: Configure Faythe

## OpenStackConfiguration.
openstack_configs:
  openstack-1f:
    region_name: "RegionOne"
    domain_name: "Default"
    auth_url: "<openstackendpoint>"
    username: "<openstackusername>"
    password: "<openstackpassword>"
    project_name: "<openstackproject>"

server_config:
  ## Example:
  ## "www.example.com"
  ## "([a-z]+).domain.com"
  ## remote_host_pattern: "10.240.202.209.*"
  basic_auth:
    username: "admin"
    password: "password"
  log_dir: "/whatever/directory/faythe-logs"

Step 6: Let’s make them work:

via GIPHY

Drawbacks and TODO

Drawbacks

TODO