The following is a guest blog post by Julian Fischer (email@example.com, @railshoster) founder and CEO or AnyNines, a Cloud Foundry and Rails hosting service operated by Avarteq GmbH in Saarbrücken, Germany.
Cloud Foundry is well known for simplifying application portability from one CF-based PaaS to another, but how simple is it to move an entire, live, Cloud Foundry installation from one underlying IaaS to another? We asked the team at Pivotal, who recounted their experience moving the Cloud Foundry instance at run.pivotal.io from one Amazon AWS availability zone to another in 40 minutes. If Pivotal could do it in less than an hour, could we?
After running the AnyNines Cloud Foundry PaaS on vCloud, we decided to move our underlying IaaS installation over to OpenStack (see the official AnyNines OpenStack migration announcement for a more details, and also check out our AnyNines blog). This decision was motivated in part because we wanted to build a competence in the emerging synergy between Cloud Foundry and OpenStack, and to gain experience in this domain for our growing cloud hosting and consulting business.
We already had experience running OpenStack, and more recently took an active contributory role as well with the release of OpenStack Swift Cloud Foundry service giving simple access to OpenStack’s Amazon S3-like object store. Operating a Cloud Foundry PaaS layer on top of OpenStack was simply the logical choice.
Before Getting Started
Before jumping into the fray, we experimented with several migration scenarios, ultimately deciding to start with a virgin Cloud Foundry deployment on top of OpenStack (as opposed to a moving the currently deployed CF stack). Customer uptime was our primary concern and we didn’t want to affect the existing production environment, just in case we needed to revert to the old stack. Deploying Cloud Foundry is relatively straightforward and the incremental cost of running two CF deployments for a short period mitigated the risk of breaking anything in production. In addition, a second CF platform allowed us to test the migration before actually performing it on production system.
Deploying Cloud Foundry using BOSH meant we could use the same manifest as our production deployment, with only minor adjustments for the cloud plugins, network configurations and resource allocations (see the Gist below to compare the details of each manifest). With only a few lines of difference, it was a relatively straightforward exercise to deploy a new CF stack.
network: network: cloud_properties: cloud_properties: name: "anynines" name: "anynines" ip: 5.22.x.x | net_id: 12345678-9101-1121-1236-142355d67ca5 netmask: 255.255.255.0 | type: manual gateway: 5.22.x.x | label: private dns: | ip: 10.10.10.10 - 109.234.x.x < - 109.234.x.x < resources: resources: persistent_disk: 100000 persistent_disk: 100000 cloud_properties: cloud_properties: ram: 1024 | instance_type: any-infr-small disk: 8192 disk: 8192 cpu: 2 < cloud: cloud: plugin: vcloud | plugin: openstack properties: properties: vcds: | openstack: - url: https://cloud.example.com | auth_url: https://auth.example.com:5000/v2.0 user: anynines | username: anynines password: secret | api_key: secret entities: | tenant: anynines-tenant organization: anynines | default_key_name: secret-key virtual_datacenter: anynines-vdc | default_security_groups: ["anynines"] vapp_catalog: Bosh | private_key: /root/.ssh/secret-key.pem media_catalog: Bosh < vm_metadata_key: cf-agent-env < description: Bosh <
Preparing the New OpenStack Environment
Preparing the new OpenStack environment involved the following steps:
- Deploy Bosh in the new OpenStack Cloud Foundry environment.
- Change all configuration variables in the deployment manifest to suit the new environment. We switched from static IPs to DNS names for
db access. This required adjusting a few settings in the new environment.
- Setup a new SSL termination gateway and configure it to use the new gorouter.
- Running a mirrored Cloud Foundry deployment will result in two different domain names. This will require a post-deployment step to change the domain names back to the original (post migration). To avoid this additional name change, adjust the DNS system included in CF BOSH to respond to both domains (For AnyNines we used a9s.eu and a9sapp.eu). Insert both domains in the CF BOSH powerdns database as shown in this Gist. The DNS is queried from all instances deployed using CF BOSH and the new environment will be able to connect to an SSL gateway without hitting the live endpoint.
- Deploy a clone of the existing Cloud Foundry installation to the new environment.
Migrating Apps, Databases, Configurations, …
Once the OpenStack infrastructure is ready, focus attention on migrating key system state parameters, as follows:
- Transfer all persistent disks from the vCloud installation to the new OpenStack environment.
- Store the
gorouterrouting table for later comparison in the new environment.
- Shutdown all instances with persistent disks. This may include
service nodes. Additionally, stop the
health managerto avoid a situation whereby the Cloud Controller may try to restart all applications.
- Sync persistent disks (again).
nfs_serverfor the first step of the migration.
TIP: OpenStack requires minor adaptations to the Cloud Controller database. For faster processing of encrypted entries, use a script (AnyNines used a small Ruby script) to update all services with the new host IP. In addition, ensure all application environment variables are correctly updated.
service nodes. At this point , the new environment should be ready to re-start all existing applications.
health managerto enable app health monitoring and logic re-start all previously running applications.
- To validate the startup, compare the routing table of the new environment with the old parameters.
- Start any remaining instances.
- Adjust the old gateway to point to the new environment.
At this point, the new environment should be up and running. Be sure to clear up (or archive) the old environment. Don’t forget to revert any DNS settings and to remove the CF BOSH DNS hack.
Post Migration – Facts and Lessons Learned
With solid prep work and some practice (to adjust our recipe and process flows), the whole migration took less than one hour:
- Customer downtime was limited to only 30 minutes, all performed in a maintenance window. The downtime was required as we had to freeze the
cc_dbstate and prevent any changes to customer apps/data during the migration.
- Start to finish, the migration took about 45 minutes.
- Several hundred live customer apps and services, with their data, were migrated within this timeframe.
- The startup on the new system took only 10 minutes – that’s the time it took to get all customer applications up and running.
As with any migration, the most time consuming part was the prep work involved and the investment in quality time experimenting beforehand to ensure we had a sound and repeatable recipe. We took care to perform the work in a planned maintenance window, so as not to affect customers. We also created several backup and risk mitigation plans – “just in case” scenarios – to restore any changes back to their original state in case the migration didn’t go as planned.
Ultimately we proved our original premise that Cloud Foundry was a platform robust enough to support full stack cloud migrations. This was a critical requirement for us as a business, and equally important to customers who want to protect their cloud investment with migration flexibility. Prep work minimized the risks, practice gave us confidence, and a recipe ensured a repeatable process from start to finish.