The Cloud Foundry Blog

Using AWS spot instances to cut the cost of your BOSH deployments

AWS, Google and Azure are in a price war; and seem committed to matching on-demand prices. That is good, but not the whole story.

The under reported price is that of the AWS spot market; which can be up to 10 x cheaper than on-demand prices.

Instance Type On-demand / month Spot / month
m3.medium $56 $7
m2.xlarge $200 $14
r3.xlarge $358 $29

There are only 2 differences between AWS spot and on-demand instances.

1. Spot instances take longer to start (5 min vs 1 min)
2. Spot instances “fail” more frequently; when spot prices move above your bid price your instance gets terminated.

You might think that ( 2 ) would prevent you from using spot to host “always on” services like Cloud Foundry.

However:

1. There is very little spot contention in regions like eu-west-1; so spot price spikes are actually rare.
2. Spot price spikes seem to be isolated to a single AZ at a time.

How can you take advantage of these massive spot market savings?

1. By running good cloud software with no single points of failure (eg Cloud Foundry using cf-release or Elasticsearch ELK using logsearch-boshrelease)
2. By spreading your deployment across 3 AZs, so a failure/price spike is an inconvienience rather than a disaster

The screencast below walks through deploying CF v170 to 3 AZs using spot instances, and shows how this results in a 80% cost saving (and 1/3 more capacity) than a 2 AZ deployment using on-demand instances.

Converting an existing BOSH release to use spot instances

I’m pleased to report that the BOSH aws cpi supports spot instances as of build 2560; so you can take advantage of the techniques described above to reduce the cost of your BOSH deployments on AWS.

Simply add `resource_pools['name'].cloud_properties.spot_bid_price` properties to your existing BOSH deployments specifying the maximum amount you’re prepared per instance hour, and `bosh deploy`.

For example, below we have 2 resource pools for m2.xlarge instances

resource_pools:
  - name: large_ondemand
    cloud_properties:
      availability_zone: eu-west-1a
      instance_type: m2.xlarge
  - name: large_spot
    cloud_properties:
      availability_zone: eu-west-1a
      instance_type: m2.xlarge
      spot_bid_price: 0.4

Both will start m2.xlarge instances in eu-west-1a; but the large_spot will do so on the spot market, and never pay more that $0.4 / hour (compared to the on-demand price of $0.27 / hour). Due to spot price spikes we’ll probably “lose” the large_spot instance for about 1 day / month

Warning: Here be dragons!

Finally, it should be noted that this is an experimental feature with very little “production” experience, so is likely to have rough edges and unforeseen failure modes. Please use with caution!

Early testing suggests that:

1. You’re likely bump into AWS spot request limits, and will need to get these raised for your account.
2. Spot price spike frequency differs by AWS region. Check your region and decide whether your SLAs can handle loosing 1/3 of your instances each time the spot prices spike.
3. Spot instance creation is slower than on-demand instance creation; this slows down deployments AND resurrections from minutes to 10s of minutes.
4. Its easier to first deploy to on-demand instances; and then “update” your deployment to spot instances one resource_pool at a time

This entry was posted in Cloud Foundry, oss and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>