From Fedora Project Wiki

No edit summary
(completely revamp page)
Line 3: Line 3:
= Background =
= Background =


Fedora Infrastructure is running 2 private cloudlets for various infrastructure projects. One of these is the primary or 'production' cloud, the other is used for newer versions and testing different setups or software tech. Currently the primary cloud is running openstack folsom, the other is testing openstack havana.  
Fedora Infrastructure is running 2 private cloudlets for various infrastructure projects. One of these is the primary or 'production' cloud, the other is used for newer versions and testing different setups or software tech. Currently the primary cloud is running openstack folsom, the other is testing openstack icehouse. We are in the process of migrating to the new icehouse based cloud now.  


= History =
= History =


In 2012 we setup 2 sets of machines for 2 cloudlets. We tested various cloud software on these cloudlets at various times.  
In 2012 we setup 2 sets of machines for 2 cloudlets. We tested various cloud software on these cloudlets at various times. Finally a primary cloudlet was established with openstack folsom. In 2014 and 2015 we setup a new cloud using ansible playbooks to do a repeatable and maintainable setup.  


= Two Cloudlets =
= Two Cloudlets =
Line 15: Line 15:
= Current setup =
= Current setup =


Current setup (as of 2014-03-25) as described in #fedora-classroom:
Current setup (as of 2014-03-25) is described in #fedora-classroom:
http://meetbot.fedoraproject.org/fedora-classroom/2014-03-25/infrastructure-private-cloud-class.2014-03-25-18.00.log.html
http://meetbot.fedoraproject.org/fedora-classroom/2014-03-25/infrastructure-private-cloud-class.2014-03-25-18.00.log.html


= Use cases =
old cloudlet (folsom, being migrated away from):


== Doesn't need persistent storage ==
* fed-cloud01, fed-cloud03, fed-cloud04, fed-cloud05, fed-cloud06, fed-cloud07, fed-cloud08 are all compute nodes in this cloud.
* fed-cloud02 is the main controller node.


* Fedora QA may use instances with it's AutoQA setup. Instances would be created, tests run and destroyed. It's unknown how many instances we would need here.
new cloudlet (icehouse, being migrated to):


* Coprs uses our cloud for a frontend, backend and builders. Builds are submitted to the frontend, the backend processes them and creates builder instances to build and then terminates build instances when complete.
* fed-cloud09 is the main controller node
 
* fed-cloud10,11,12,13,14,15 are compute nodes.  
* Mass rebuilds of Fedora packages. This could be done for testing a new global rpm/package change, or to discover FTBFS (Fails to build from source) packages. This would use as many builders as we could easily spin up to reduce time for building all 10,000+ Fedora packages. Could use the chainbuilding setup as above as a scaffolding. Additionally, extra builder instances could be potentially used by the official build system during mass rebuilds to reduce rebuild time.
 
* Docs folks need to generate i18n versions of docs. This would require an instance, tools and a script running. Then data is synced off and the instance could be destroyed.
 
== Needs persistent storage, but possibly can use a /mnt ed volume ==
 
* Test instances may be used for testing new tech or applications as a proof of concept before persuing a RFR.
 
== Needs persistent storage and snapshots ==
 
* Infrastructure Development hosts have been moved to this cloud. These instances could possibly be 'on demand' when development needs to take place. Currently we have about 8 development instances many on cloud. The rest should be migrated soon.
 
* We may want to move some of our one-off instances that are outside phx2 into the cloud for easier management. Things like keyservers, unbound instances, listservers or hosted resources. This is not yet planned for.
 
Further down the road:
 
* Instances for qa/packagers to test new packages or track down bugs.
 
* Instances for demos or events to show off Fedora.  


= Setup / deployment =
= Setup / deployment =
Line 50: Line 32:
This hardware is setup on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This allows us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers.  
This hardware is setup on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This allows us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers.  


We have 8 physical servers for this deployment. Currently 6 of them are in the 'production' cloudlet, and 2 are available for testing new deployments.  
We have 15 physical servers total. Currently 8 of them are in the 'production/old/folsom' cloudlet, and 7 are in the new icehouse cloudlet. As we migrate we will move more nodes to the icehouse cloudlet.


= Policies =  
= Policies =  


We need to setup clear policies on usage and access to the private cloud. In general we plan to open things to a small group of trusted contributors, take their feedback and usage and expand access out to larger groups as capacity and desire allows.
Users or groups that need rare one off images can simply request one via a infrastructure ticket.  


Users or groups that need rare one off images can simply request one via a ticket. Users or groups that often need instances will be granted accounts to spin up and down their own images.  
Users or groups that often need instances may be granted accounts to spin up and down their own images.  


Instances may be rebooted at any time. Save your data off often.  
Instances may be rebooted at any time. Save your data off often.  


Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.  
Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.  
Default network policy will allow only ports 80, 443, and 22 tcp.


Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.  
Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.  
Line 70: Line 50:
= Images =
= Images =


We should customize available images for the above use cases:
We will provide fedora and centos and rhel images.  
 
== All images ==
 
Currently for Fedora we are using the produced cloud images. For RHEL we are using a similar minimal instance image.
 
== Infrastructure Dev Instances ==
 
Based on rhel6 image.
 
Should contain:
 
mod_wsgi, httpd, git-core, puppet, persistent volume mounted on /srv
 
== QA images ==
 
TBD
 
== Builder Images ==
 
For mockchain/kopers use. Should be limited to 24 hours.
 
= Using ansible with the cloud =
 
TBD: fill in with info on how to make transient or persistent instances via ansible on lockbox01.
 
= Moving to "production" =
 
This section is a checklist of things we need to do before we can consider either of the cloudlets "production". Once we move to production mode on them we will move to scheduling outages, try and keep instances running smoothly and just perform upgrades and maint on the cloudlets. We want to make sure before we do this that things are stable and processes are ready for users.
 
* SOP needs to be written for creating images. (whats in them, update policy, ssh keys policy, etc)
** We can reuse the Fedora cloud images for Fedora
** We still need to determine a 'standard' RHEL6 image.
** ssh keys should be added for root for sysadmin-main and sysadmin-cloud?
 
* <strike>Decide who gets a login to manage instances, and who can just request instances be made for them. </strike>
** Normal use cases have instances created by ansible. If further access is needed it's granted on case by case.
** Down the road we may want to integrate fas somehow, but not now.
 
* <strike>SOP on making an instance for a requestor (via infra ticket?)</strike> yes, via ticket.
** Write an instance-setup script tools to fetch from fas user ssh key based on the given fas login so that user can receive an email once instance is created and log in.
** Do all instance creation in ansible?
 
* Decide on time limits or other resource limits per account/tennat. Setup initial accounts/tenants.
 
* OpenStack cloudlet needs ansible playbooks written to install/configure it.
 
* <strike>OpenStack needs folsom testing performed. </stike> folsom installed now.
 
* <strike>OpenStack needs vlan testing performed. </strike> vlans in use.
 
* <strike>OpenStack needs non glusterfs testing done. </strike> current install has base fs.
 
* <strike>Do we need to decide between euca and openstack? When?</strike> For now we are going to do both.  
 
* We need monitoring added. nagios? Controllers down, nodes down, capacity issues, etc.
** run a nagios persistent image in either cloudlet and monitor the other?


* We need reporting added. Note when instances are made, etc. Either logging to log02, or some seperate report for cloud-sysadmins. Possibly some export from the software about cpu/mem/disk, etc.
If you need to add images, please name them the same as their filename. Ie, "Fedora 22 Beta TC 2" is fine, please don't use 'test image' as we have no idea what it might be.  
** Could possibly be done at ansible creation time or via a gather script
** email from ansible script is done now, still need reporting of non ansible instances.  


* <strike>Need to determine who has access to physical cloud machines. Repurpose sysadmin-cloud, setup fas and sudo?</strike>
= Major users =
** if we (and we should) add fas, why not configure it to only create shell account for people who is admin or sponsor from sysadmin-cloud. Which bring us back to 2nd point where approved people from sysadmin-cloud could have access to request instances to be made.
** shell access to the physical cloudlet machines doesn't grant you any access to the cloud software directly.
** sysadmin-cloud will have access to the compute and head nodes, but no special access to the cloud instances.


* <strike>Setup group that can run ansible against physical cloud machines for updates, etc. (see above question too)</strike>
* The copr buildsystem is housed entirely in the Fedora Infrastructure Private cloud.  


* <strike>Consider a re-occuring maint window for reboots/updates... ie, tell everyone that every month we have a window to do so, save work before then?</strike>
* jenkins. Fedora infrastructure provides a jenkins instance to run tests on some open source projects.  
** will just schedule these as needed.


* Figure out how to handle dns. Should we setup some kind of dyndns? Should we just leave it with generic dns? Should we ask for control of reverse dns?
* Many Infrastructure dev instances are housed in the Fedora Infrastructure private cloud.  
** We have asked for control of reverse dns.  


* How do we back these systems up and what should we actually be backing up.
* The twisted project runs some buildbot tests.  
** At this point I'd say we don't back up and note to users to always back up their data often.


* <strike>Figure out how to make some system be or seem to be persistent</strike> This can be done in ansible repo on lockbox
= hardware access =


* store more metadata per instance created so we can track who/when/where/what (requires tags in eucalyptus 3.3 - not existent as of now)
ssh access to the bare nodes will be for sysadmin-cloud and possibly fi-apprentice (with no sudo).  


= Post-Production/2.0 =
= maint windows =


* backup 'subscription' service - so users of the cloud can request that backups be performed on their instances and how they should happen
With the move to the new icehouse cloud, we will be reserving the right to update and reboot the cloud when and as needed. We will schedule these outages as we do for any outage and will spin back up any persistent cloud instances we have in our ansible inventory after the outage is over. It's up to owners of any other instances to spin up new versions of them after the outage and make sure all updates are applied.


* openid/fas integration might be nice.
= Contact / more info =


* increase capacity and gather more use cases.
Please contact the #fedora-admin channel or the fedora infrastructure list for any issues or questions around our private cloud.

Revision as of 16:50, 13 May 2015

Note -- there was a meeting held describing the current state of the cloud on March 25, 2014. The logs of that are more up to date than this wiki page.

Background

Fedora Infrastructure is running 2 private cloudlets for various infrastructure projects. One of these is the primary or 'production' cloud, the other is used for newer versions and testing different setups or software tech. Currently the primary cloud is running openstack folsom, the other is testing openstack icehouse. We are in the process of migrating to the new icehouse based cloud now.

History

In 2012 we setup 2 sets of machines for 2 cloudlets. We tested various cloud software on these cloudlets at various times. Finally a primary cloudlet was established with openstack folsom. In 2014 and 2015 we setup a new cloud using ansible playbooks to do a repeatable and maintainable setup.

Two Cloudlets

We have things setup in 2 cloudlets to allow us to serve existing cloud needs, while still having the ability to test new software and tech. From time to time we may migrate uses from one to the other as a newer version or kind of setup is determined to meet our production needs more closely.

Current setup

Current setup (as of 2014-03-25) is described in #fedora-classroom: http://meetbot.fedoraproject.org/fedora-classroom/2014-03-25/infrastructure-private-cloud-class.2014-03-25-18.00.log.html

old cloudlet (folsom, being migrated away from):

  • fed-cloud01, fed-cloud03, fed-cloud04, fed-cloud05, fed-cloud06, fed-cloud07, fed-cloud08 are all compute nodes in this cloud.
  • fed-cloud02 is the main controller node.

new cloudlet (icehouse, being migrated to):

  • fed-cloud09 is the main controller node
  • fed-cloud10,11,12,13,14,15 are compute nodes.

Setup / deployment

This hardware is setup on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This allows us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers.

We have 15 physical servers total. Currently 8 of them are in the 'production/old/folsom' cloudlet, and 7 are in the new icehouse cloudlet. As we migrate we will move more nodes to the icehouse cloudlet.

Policies

Users or groups that need rare one off images can simply request one via a infrastructure ticket.

Users or groups that often need instances may be granted accounts to spin up and down their own images.

Instances may be rebooted at any time. Save your data off often.

Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.

Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.

We reserve the right to shutdown, delete or revoke access to any instances at any time for any reason.

Images

We will provide fedora and centos and rhel images.

If you need to add images, please name them the same as their filename. Ie, "Fedora 22 Beta TC 2" is fine, please don't use 'test image' as we have no idea what it might be.

Major users

  • The copr buildsystem is housed entirely in the Fedora Infrastructure Private cloud.
  • jenkins. Fedora infrastructure provides a jenkins instance to run tests on some open source projects.
  • Many Infrastructure dev instances are housed in the Fedora Infrastructure private cloud.
  • The twisted project runs some buildbot tests.

hardware access

ssh access to the bare nodes will be for sysadmin-cloud and possibly fi-apprentice (with no sudo).

maint windows

With the move to the new icehouse cloud, we will be reserving the right to update and reboot the cloud when and as needed. We will schedule these outages as we do for any outage and will spin back up any persistent cloud instances we have in our ansible inventory after the outage is over. It's up to owners of any other instances to spin up new versions of them after the outage and make sure all updates are applied.

Contact / more info

Please contact the #fedora-admin channel or the fedora infrastructure list for any issues or questions around our private cloud.