From Fedora Project Wiki

(add some updates)
(Add in section that this page is dead.)
 
(13 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 +
{{admon/warning|This page is obsolete and should be removed by 2021-06. There is currently no Fedora Infrastructure Cloud and no plans to bring it back. Currently Fedora Infrastructure relies on gifted cloud units from Amazon for this.}}
 +
 
= Background =
 
= Background =
  
Fedora Infrastructure is looking to setup a private cloud instance in 2012. This cloud instance will be used in a number of ways to benefit Fedora. We are continuing to evaluate a number of cloud technologies for the software side of this cloud.
+
Fedora Infrastructure is running a private cloud infrastructure for various infrastructure and community projects. This infrastructure is currently running RHOSP 5 (Red Hat Open Stack Platform 5).  
  
 
= Two Cloudlets =
 
= Two Cloudlets =
  
Our original setup was going to be a single eucalyptus cloud instance. However, when testing deployment, we determined it would be better to split our resources into 2 clouds. This will allow us to do things like upgrade or re-install one cloud while the other is running. Resources could be redirected/rebooted into the other cloud to allow one to be in downtime.
+
We have things setup in 2 cloudlets to allow us to serve existing cloud needs, while still having the ability to test new software and tech. From time to time we may migrate uses from one to the other as a newer version or kind of setup is determined to meet our production needs more closely.  
 
 
= Software =
 
 
 
When we evaluated software early in 2012, eucalyptus was the clear leader. However, later in 2012 things are not as clear, so we are investigating other cloud software to determine which we wish to go with. We may well decide on one for one cloudlet and another for the other one, depending on ongoing setup and maint costs.
 
 
 
= Use cases =
 
 
 
== Doesn't need persistent storage ==
 
 
 
* Fedora QA may use instances with it's AutoQA setup. Instances would be created, tests run and destroyed. It's unknown how many instances we would need here.
 
 
 
* Chainbuilding / Kopers may use this cloud to build chains of packages that are not yet in Fedora and thus cannot be build via scratch builds in the existing buildsystem. These may also be used for spinning test live or install images by QA. This may be open to Fedora contributors or restricted to a subset such as packagers.
 
 
 
* Mass rebuilds of Fedora packages. This could be done for testing a new global rpm/package change, or to discover FTBFS (Fails to build from source) packages. This would use as many builders as we could easily spin up to reduce time for building all 10,000+ Fedora packages. Could use the chainbuilding setup as above as a scaffolding. Additionally, extra builder instances could be potentially used by the official build system during mass rebuilds to reduce rebuild time.
 
 
 
* Docs folks need to generate i18n versions of docs. This would require an instance, tools and a script running. Then data is synced off and the instance could be destroyed.
 
 
 
== Needs persistent storage, but possibly can use a /mnt ed volume ==
 
 
 
* Test instances may be used for testing new tech or applications as a proof of concept before persuing a RFR. We currently have several publictest instances.
 
 
 
== Needs persistent storage and snapshots ==
 
 
 
* Infrastructure Development hosts may be moved to this cloud. These instances could possibly be 'on demand' when development needs to take place. Currently we have about 8 development instances.
 
 
 
* Infrastructure Staging hosted may be moved to this cloud. Some of these may be 'always on' and some may be on demand. Currently we have about 13 of these instances.
 
 
 
* We may want to move some of our one-off instances that are outside phx2 into the cloud for easier management. Things like keyservers, unbound instances, listservers or hosted resources.  
 
  
Further down the road:
+
= Current primary setup =
  
* Instances for qa/packagers to test new packages or track down bugs.
+
* fed-cloud09 is the main controller node
 
+
* fed-cloud03,04,05,06,07,08,10,11,12,13,14,15 and fed-cloud-ppc02 are compute nodes.  
* Instances for demos or events to show off Fedora.
 
 
 
For initial deployment, we would need to be able to run ~30 or so instances at a time with ability to grow rapidly above that for qa and building needs. 
 
 
 
= Dependencies =
 
 
 
* Need a way to easily provision new instances with limited admin intervention. Looking at ansible for this task.
 
 
 
* Would like to be able to create images via kickstart and normal install/deployment methods if needed.
 
 
 
* Hardware needs to be ordered and installed.
 
 
 
* Public IP addresses need to be made available.
 
 
 
* Would be nice to get full EPEL packages to deploy with.  
 
  
 
= Setup / deployment =
 
= Setup / deployment =
  
This hardware will be on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This will allow us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers for caching with additional netapp space for images and data.
+
This hardware is setup on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This allows us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers.  
  
We have 8 physical servers for this deployment. 4 will be in each 'cloudlet'. One node will be a controller node with access to external IP's and the other 3 will be compute nodes.  
+
We have 17 physical servers total. Currently 14 of them are in production, and 3 are being used for testing purposes by the infrastructure team.
  
= Implementation overview / timelines =
+
Storage is provided by 2 dell equalogics boxes (one with ~20TB space, the other with ~10TB)
  
<strike>2012-04 - Hardware is being determined and finalized. </strike>
+
The current setup has only 1 controller node so outages can and will occur when upgrades are being done, etc.  
  
<strike>2012-07 - Initial hardware setup and install</strike>
+
Nodes in this cloud use the 'fedorainfracloud.org' domain in most cases (with some few exceptions).
  
<strike>2012-08 - Initial use cases gathered</strike>
+
x86_64 and ppc64 and ppc64le instances are provided by the current cloud.
  
2012-09 - Finish software evaluations, setup 'production' instances.
+
= Upcoming plans =
  
2012-10 - Announce availability and collect more use cases.  
+
As many of the existing hardware boxes are reaching end of life/support, we have ordered some new hardware in Q2 of 2017. We plan to setup RHOSP10 (or later) on this new hardware and then recreate instances in this new cloud and retire the old one. This is planned for mid/late 2017. This new install should allow us to setup 2 head nodes with HA so we can do upgrades or the like without much in the way of outages. Additionally we hope to add armv7 and aarch64 support via openstack ironic.  
 
 
2012-11 - Evaluate load and expansion needs.
 
  
 
= Policies =  
 
= Policies =  
  
This section is currently under discussion. We need to setup clear policies on usage and access to the private cloud. In general we plan to open things to a small group of trusted contributors, take their feedback and usage and expand access out to larger groups as capacity and desire allows.
+
Users or groups that need rare one off images can simply request one via a infrastructure ticket.  
  
(This section is a DRAFT)
+
Users or groups that often need instances may be granted accounts to spin up and down their own images.  
 
 
Users or groups that need rare one off images can simply request one via a ticket. Users or groups that often need instances will be granted accounts to spin up and down their own images.  
 
  
 
Instances may be rebooted at any time. Save your data off often.  
 
Instances may be rebooted at any time. Save your data off often.  
  
 
Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.  
 
Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.  
 
Default instance time to live would be a week.
 
 
Default network policy will allow only ports 80, 443, and 22 tcp.
 
  
 
Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.  
 
Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.  
  
 
We reserve the right to shutdown, delete or revoke access to any instances at any time for any reason.
 
We reserve the right to shutdown, delete or revoke access to any instances at any time for any reason.
 
= Eucalyptus Cloud Information =
 
 
The Eucalyptus cloudlet is up and under limited testing now. If you are given an account this section will show some basic workflow and setup and example commands to help you getting started.
 
 
== Obtaining an account ==
 
 
Currently we are under limited testing. Please ask in #fedora-admin on IRC if you are willing to help us test. Note that we are going to be pretty limited on testers for a while until we have more bugs worked out and policies in place. See above section.
 
 
== First steps ==
 
 
Once you have an account issued to you, you will go to the web interface at: https://ec2.cloud.fedoraproject.org:8443/ and enter the account, user and password you were given. The interface will ask you to change your password, please do so and pick a nice long passphrase. After logging in, select your username at the top and from the pull down list, choose: "Download New credentials". This will download a .zip file to your local computer. Unpack this zip file and 'source eucarc' to setup your env.
 
 
== Simple euca commands ==
 
 
Install the command line euca tools: 'yum install euca2ools'
 
 
Check what kinds of images are available: 'euca-describe-images'
 
 
Check what instances are running: 'euca-describe-instances'
 
 
Create a ssh keypair for use with instances: 'euca-add-keypair keyname > keyname.pub'
 
 
Create an instance with that key setup: 'euca-run-instances -k kevins ami-00000006' (the ami is the image ami from the describe images above)
 
 
Login to your instance: "ssh ec2-user@externalIP"
 
 
Terminate your instance: 'euca-terminate-instances i-NNNNN' (where this is from 'euca-describe-instances')
 
 
Many more commands at: http://cheat.errtheblog.com/s/euca2ools/1
 
  
 
= Images =
 
= Images =
  
Currently we have available: Fedora 16, Fedora 17 and RHEL6.3
+
We will provide fedora, centos and rhel images.  
 
 
We should customize available images for the above use cases:
 
 
 
== All images ==
 
 
 
Standardize on root or ec2user access.
 
 
 
Set hostname to ami name or something uniq and descriptive. Perhaps "fedoracloud-AMI" or something?
 
 
 
Sudo configured for the ec2user
 
 
 
git-core installed.
 
 
 
== Infrastructure Dev Instances ==
 
 
 
Based on rhel6 image.
 
 
 
Should contain:
 
 
 
mod_wsgi, httpd, git-core, puppet, persistent volume mounted on /srv
 
 
 
== QA images ==
 
 
 
TBD
 
 
 
== Builder Images ==
 
 
 
For mockchain/kopers use. Should be limited to 24 hours.
 
 
 
= Moving to "production" =
 
 
 
This section is a checklist of things we need to do before we can consider either of the cloudlets "production". Once we move to production mode on them we will move to scheduling outages, try and keep instances running smoothly and just perform upgrades and maint on the cloudlets. We want to make sure before we do this that things are stable and processes are ready for users.
 
 
 
* SOP needs to be written for creating images. (whats in them, update policy, ssh keys policy, etc)
 
 
 
* Decide who gets a login to manage instances, and who can just request instances be made for them.
 
 
 
* SOP on making an instance for a requestor (via infra ticket?)
 
 
 
* Decide on time limits or other resource limits per account/tennat. Setup initial accounts/tenants.  
 
  
* OpenStack cloudlet needs ansible playbooks written to install/configure it.  
+
If you need to add images, please name them the same as their filename. Ie, "Fedora 22 Beta TC 2" is fine, please don't use 'test image' as we have no idea what it might be.  
  
* <strike>OpenStack needs folsom testing performed. </stike> folsom installed now.
+
= Major users =
  
* <strike>OpenStack needs vlan testing performed. </strike> vlans in use.
+
* The copr buildsystem is housed entirely in the Fedora Infrastructure Private cloud.  
  
* <strike>OpenStack needs non glusterfs testing done. </strike> current install has base fs.  
+
* jenkins. Fedora infrastructure provides a jenkins instance to run tests on some open source projects.  
  
* <strike>Do we need to decide between euca and openstack? When?</strike> For now we are going to do both.  
+
* Many Infrastructure dev instances are housed in the Fedora Infrastructure private cloud.  
  
* We need monitoring added. nagios? Controllers down, nodes down, capacity issues, etc.  
+
* Fedora magazine and community blogs are hosted here.  
  
* We need reporting added. Note when instances are made, etc. Either logging to log02, or some seperate report for cloud-sysadmins. Possibly some export from the software about cpu/mem/disk, etc.  
+
* The twisted project runs some buildbot tests.  
  
* Need to determine who has access to physical cloud machines. Repurpose sysadmin-cloud, setup fas and sudo?
+
= hardware access =
  
* Setup group that can run ansible against physical cloud machines for updates, etc. (see above question too)
+
ssh access to the bare nodes will be for sysadmin-cloud and possibly fi-apprentice (with no sudo).
  
* Consider a re-occuring maint window for reboots/updates... ie, tell everyone that every month we have a window to do so, save work before then?
+
= maint windows =
  
* Figure out how to handle dns. Should we setup some kind of dyndns? Should we just leave it with generic dns? Should we ask for control of reverse dns?
+
We are reserving the right to update and reboot the cloud when and as needed. We will schedule these outages as we do for any outage and will spin back up any persistent cloud instances we have in our ansible inventory after the outage is over. It's up to owners of any other instances to spin up new versions of them after the outage and make sure all updates are applied.
  
* How do we back these systems up and what should we actually be backing up
+
= Contact / more info =
  
* Figure out how to make some system be or seem to be persistent
+
Please contact the #fedora-admin channel or the fedora infrastructure list for any issues or questions around our private cloud.

Latest revision as of 12:18, 4 April 2021

Warning.png
This page is obsolete and should be removed by 2021-06. There is currently no Fedora Infrastructure Cloud and no plans to bring it back. Currently Fedora Infrastructure relies on gifted cloud units from Amazon for this.

Background

Fedora Infrastructure is running a private cloud infrastructure for various infrastructure and community projects. This infrastructure is currently running RHOSP 5 (Red Hat Open Stack Platform 5).

Two Cloudlets

We have things setup in 2 cloudlets to allow us to serve existing cloud needs, while still having the ability to test new software and tech. From time to time we may migrate uses from one to the other as a newer version or kind of setup is determined to meet our production needs more closely.

Current primary setup

  • fed-cloud09 is the main controller node
  • fed-cloud03,04,05,06,07,08,10,11,12,13,14,15 and fed-cloud-ppc02 are compute nodes.

Setup / deployment

This hardware is setup on the 'edge' of the network and not connected to the rest of Fedora Infrastructure except via external networks. This allows us to us external ip's and make sure the cloud instance doesn't have access to anything in the regular Fedora Infrastructure. Storage will be on the local servers.

We have 17 physical servers total. Currently 14 of them are in production, and 3 are being used for testing purposes by the infrastructure team.

Storage is provided by 2 dell equalogics boxes (one with ~20TB space, the other with ~10TB)

The current setup has only 1 controller node so outages can and will occur when upgrades are being done, etc.

Nodes in this cloud use the 'fedorainfracloud.org' domain in most cases (with some few exceptions).

x86_64 and ppc64 and ppc64le instances are provided by the current cloud.

Upcoming plans

As many of the existing hardware boxes are reaching end of life/support, we have ordered some new hardware in Q2 of 2017. We plan to setup RHOSP10 (or later) on this new hardware and then recreate instances in this new cloud and retire the old one. This is planned for mid/late 2017. This new install should allow us to setup 2 head nodes with HA so we can do upgrades or the like without much in the way of outages. Additionally we hope to add armv7 and aarch64 support via openstack ironic.

Policies

Users or groups that need rare one off images can simply request one via a infrastructure ticket.

Users or groups that often need instances may be granted accounts to spin up and down their own images.

Instances may be rebooted at any time. Save your data off often.

Persistent storage may be available as seperate volumes. Data retention and Quotas may be imposed on this data.

Instances are assist in furthering the work related to the Fedora Project. Please don't use them for unrelated activities.

We reserve the right to shutdown, delete or revoke access to any instances at any time for any reason.

Images

We will provide fedora, centos and rhel images.

If you need to add images, please name them the same as their filename. Ie, "Fedora 22 Beta TC 2" is fine, please don't use 'test image' as we have no idea what it might be.

Major users

  • The copr buildsystem is housed entirely in the Fedora Infrastructure Private cloud.
  • jenkins. Fedora infrastructure provides a jenkins instance to run tests on some open source projects.
  • Many Infrastructure dev instances are housed in the Fedora Infrastructure private cloud.
  • Fedora magazine and community blogs are hosted here.
  • The twisted project runs some buildbot tests.

hardware access

ssh access to the bare nodes will be for sysadmin-cloud and possibly fi-apprentice (with no sudo).

maint windows

We are reserving the right to update and reboot the cloud when and as needed. We will schedule these outages as we do for any outage and will spin back up any persistent cloud instances we have in our ansible inventory after the outage is over. It's up to owners of any other instances to spin up new versions of them after the outage and make sure all updates are applied.

Contact / more info

Please contact the #fedora-admin channel or the fedora infrastructure list for any issues or questions around our private cloud.