VC3 Infrastructure
Prerequisites for all services
Adding keys
When a piece of static infrastructure is initialized on OpenStack or AWS, it will only contain your public key at first. You'll need to add the public keys for other developers as well.
vi .ssh/authorized_keys
The CentOS account should, by default, have sudo
privileges.
Installing the repos
On all pieces of the static infrastructure, you will need to install the VC3 package repository. Add the following contents to /etc/yum.repos.d/vc3.repo
:
[vc3-x86_64]
name=VC3 x86_64
baseurl=http://build.virtualclusters.org/production/x86_64
gpgcheck=0
[vc3-noarch]
name=VC3 noarch
baseurl=http://build.virtualclusters.org/production/noarch
gpgcheck=0
Bootstrapping authentication on the Master
Installing Credible and setting up certificates
Credible will be used to issue certificates to all VC3 components for authentication. We will only set it up on the Master, and then copy certificates and keys to other pieces of the infrastructure.
Install credible from the repos:
yum install credible
And update /etc/credible/credible.conf
as appropriate:
[credible]
storageplugin = Memory
vardir = /var/credible
[credible-ssh]
keytype = rsa
bitlength = 4096
[credible-ssca]
roottemplate=/etc/credible/openssl.cnf.root.template
intermediatetemplate=/etc/credible/openssl.cnf.intermediate.template
vardir = /var/credible
bitlength = 4096
# Test defaults.
sscaname = VC3
country = US
locality = Upton
state = NY
organization = BNL
orgunit = SDCC
email = jhover@bnl.gov
Issuing certificates
Retrieve the CA Chain and generate a host certificate and key for the Master:
credible -c /etc/credible/credible.conf hostcert master-test.virtualclusters.org > /etc/pki/tls/certs/hostcert.pem
credible -c /etc/credible/credible.conf hostkey master-test.virtualclusters.org > /etc/pki/tls/private/hostkey.pem
credible -c /etc/credible/credible.conf certchain > /etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
VC3 Infoservice
Prerequisites
As with the Master, you will need to install the VC3 repo and any public keys.
Installing the Infoservice
Assuming you have already configured the VC3 repos, install the following:
yum install epel-release -y
yum install vc3-infoservice pluginmanager python-pip openssl -y
pip install pyOpenSSL CherryPy==11.0.0
And edit config at /etc/vc3/vc3-infoservice.conf
:
[DEFAULT]
loglevel = debug
[netcomm]
chainfile=/etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
certfile=/etc/pki/tls/certs/hostcert.pem
keyfile=/etc/pki/tls/private/hostkey.pem
sslmodule=pyopenssl
httpport=20333
httpsport=20334
[persistence]
plugin = DiskDump
[plugin-diskdump]
filename=/tmp/infoservice.diskdump
On the Master host, you will need to issue certificates for the Infoservice. Copy and paste into the appropriate files:
(master)# credible -c /etc/credible/credible.conf hostcert info-test.virtualclusters.org
(infoservice)# vi /etc/pki/tls/certs/hostcert.pem
(master)# credible -c /etc/credible/credible.conf hostkey info-test.virtualclusters.org
(infoservice)# vi /etc/pki/tls/private/hostkey.pem
(master)# credible -c /etc/credible/credible.conf certchain
(infoservice)# /etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
Starting the Infoservice
Due to a bug (CORE-261), we need to create /var/log/vc3
mkdir -p /var/log/vc3
For now, we'll use the sysv-init style startup scripts:
/etc/init.d/vc3-infoservice.init start
If it's running, you should see the following after /etc/init.d/vc3-infoservice.init status
:
● vc3-infoservice.init.service - LSB: start and stop vc3-info-service
Loaded: loaded (/etc/rc.d/init.d/vc3-infoservice.init; bad; vendor preset: disabled)
Active: active (running) since Fri 2017-09-08 16:31:13 UTC; 36s ago
Check the logs for any ERROR statements:
grep "ERROR" /var/log/vc3/*log || echo "Everything OK"
VC3 Master
Installing the Master
The Master depends on the vc3-client and vc3-infoservice packages for the client APIs. We also need ansible and the VC3 playbooks to configure nodes. Install them along with the plugin manager:
yum install vc3-client vc3-infoservice vc3-master pluginmanager ansible vc3-playbooks -y
If using OpenStack for dynamic head node provisioning, you'll also need python-novaclient from the OpenStack repositories.
yum install centos-release-openstack-ocata -y
yum install python-novaclient -y
And configure /etc/vc3/vc3-master.conf
:
[DEFAULT]
loglevel = debug
[master]
taskconf=/etc/vc3/tasks.conf
[credible]
credconf=/etc/credible/credible.conf
[dynamic]
plugin=Execute
[netcomm]
chainfile=/etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
certfile=/etc/pki/tls/certs/hostcert.pem
keyfile=/etc/pki/tls/private/hostkey.pem
infohost=info-test.virtualclusters.org
httpport=20333
httpsport=20334
[core]
whitelist = cctools-catalog-server,vc3-factory
You will also need to modify the client config at /etc/vc3/vc3-client.conf
:
[DEFAULT]
logLevel = warn
[netcomm]
chainfile=/etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
certfile=/etc/pki/tls/certs/hostcert.pem
keyfile=/etc/pki/tls/private/hostkey.pem
infohost=info-test.virtualclusters.org
httpport=20333
httpsport=20334
Finally, configure the Master for Openstack/Ansible in /etc/vc3/tasks.conf
:
[DEFAULT]
# in seconds
polling_interval = 120
[vc3init]
taskplugins = InitInstanceAuth,HandlePairingRequests
[vcluster-lifecycle]
# run at least once per vcluster-requestcycle
taskplugins = InitResources,HandleAllocations
polling_interval = 45
[vcluster-requestcycle]
taskplugins = HandleRequests
polling_interval = 60
[consistency-checks]
taskplugins = CheckAllocations
[access-checks]
taskplugins = CheckResourceAccess
polling_interval = 360
[vcluster-headnodecycle]
taskplugins = HandleHeadNodes
polling_interval = 10
username = myosuser
password = secret
user_domain_name = default
project_domain_name = default
auth_url = http://10.32.70.9:5000/v3
#CentOS 7 vanilla minimal install
#node_image = 730253d8-d585-43d2-b2d2-16d3af388306
#CentOS 7 minimal install + condor,cvmfs,gcc,epel,osg-oasis
node_image = 093fd316-fffc-441c-944c-6ba2de582f8f
#large: 2 VCPUS 4GB RAM 10 GB Disk
node_flavor = 344f29c8-7370-49b4-aaf8-b1427582970f
#small: 1 VCPUS 2GB RAM 10 GB Disk
#node_flavor = 15d4a4c3-3b97-409a-91b2-4bc1226382d3
node_user = centos
node_private_key_file = ~/.ssh/initnode
node_public_key_name = initnode-openstack-name
node_security_groups = ssh,default
node_network_id = 04e64bbe-d017-4aef-928b-0c2c0dd3fc9e
node_prefix = dev-
node_max_no_contact_time = 900
node_max_initializing_count = 3
ansible_path = /etc/vc3/vc3-playbooks/login
ansible_playbook = login-dynamic.yaml
ansible_debug_file = /var/log/vc3/ansible.log
Note the username and password you will need to fill in. You'll also need to put the private key for root on the head nodes here under /etc/vc3/keys
.
Launching the Master
Once the Infoservce has been started, you can start the VC3 Master to process requests. Make sure that the infohost=
is pointed to the correct hostname in /etc/vc3/vc3/master.conf
.
It may be necessary to create the vc3 log directory (see CORE-140) and change permissions on the Credible directory (see CORE-144):
mkdir -p /var/log/vc3
chown vc3: /var/log/vc3
Finally, start the service:
systemctl start vc3-master
You should see something similar to the following if things are working:
vc3-master.service - VC3 Master
Loaded: loaded (/usr/lib/systemd/system/vc3-master.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2017-09-08 17:38:48 UTC; 48s ago
You can check to see if things are working with the following command:
grep "ERROR" /var/log/vc3/*.log || echo "Everything OK"
VC3 Factory
The VC3 factory launches, maintains, and destroys virtual clusters by sending and monitoring pilot jobs that contain the VC3 builder.
Prerequisites
As with the master and infoservice, you'll need the VC3 repo installed and public keys distributed before continuing.
Installing the Factory
In addition to the factory itself, you'll need to install VC3-specific plugins, the infoservice and client for APIs, and the pluginmanager.
yum install epel-release -y
yum install autopyfactory vc3-factory-plugins vc3-client vc3-infoservice pluginmanager vc3-remote-manager vc3-builder python-paramiko -y
We will also need the HTCondor software. Install the public key, repo, and condor package:
rpm --import http://research.cs.wisc.edu/htcondor/yum/RPM-GPG-KEY-HTCondor
curl http://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel7.repo > /etc/yum.repos.d/htcondor-stable-rhel7.repo
yum install condor
As before, we will need certificates issued to the Factory host:
(master)# credible -c /etc/credible/credible.conf hostcert apf-test.virtualclusters.org
(factory)# vi /etc/pki/tls/certs/hostcert.pem
(master)# credible -c /etc/credible/credible.conf hostkey apf-test.virtualclusters.org
(factory)# vi /etc/pki/tls/private/hostkey.pem
(master)# credible -c /etc/credible/credible.conf certchain
(factory)# vi /etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
The client will need to be configured to point at the test infoservice in /etc/vc3/vc3-client.conf
:
[DEFAULT]
logLevel = warn
[netcomm]
chainfile=/etc/pki/ca-trust/extracted/pem/vc3-chain.cert.pem
certfile=/etc/pki/tls/certs/hostcert.pem
keyfile=/etc/pki/tls/private/hostkey.pem
infohost=info-test.virtualclusters.org
httpport=20333
httpsport=20334
The factory defaults for VC3, in /etc/autopyfactory/vc3defaults.conf
, may need to be adjusted as follows:
[DEFAULT]
vo = VC3
status = online
override = True
enabled = True
cleanlogs.keepdays = 7
batchstatusplugin = Condor
wmsstatusplugin = None
schedplugin = KeepNRunning, MinPerCycle, MaxPerCycle, MaxPending
sched.maxtorun.maximum = 9999
sched.maxpending.maximum = 100
sched.maxpercycle.maximum = 50
sched.minpercycle.minimum = 0
sched.keepnrunning.keep_running = 0
monitorsection = dummy-monitor
builder = /usr/local/libexec/vc3-builder
periodic_remove = periodic_remove=(JobStatus == 5 && (CurrentTime - EnteredCurrentStatus) > 3600) || (JobStatus == 1 && globusstatus =!= 1 && (CurrentTime - EnteredCurrentStatus) > 86400) || (JobStatus == 2 && (CurrentTime - EnteredCurrentStatus) > 604800)
batchsubmit.condorosgce.proxy = None
batchsubmit.condorec2.proxy = None
batchsubmit.condorec2.peaceful = True
batchsubmit.condorlocal.proxy = None
batchsubmit.condorssh.killorder = newest
batchsubmit.condorssh.peaceful = False
apfqueue.sleep = 60
batchstatus.condor.sleep = 25
Likewise, the main autopyfactory.conf
needs to be adjusted:
# =================================================================================================================
#
# autopyfactory.conf Configuration file for main Factory component of AutoPyFactory.
#
# Documentation:
# https://twiki.grid.iu.edu/bin/view/Documentation/Release3/AutoPyFactory
# https://twiki.grid.iu.edu/bin/view/Documentation/Release3/AutoPyFactoryConfiguration#5_2_autopyfactory_conf
#
# =================================================================================================================
# template for a configuration file
[Factory]
factoryAdminEmail = neo@matrix.net
factoryId = MYSITE-hostname-sysadminname
factorySMTPServer = mail.matrix.net
factoryMinEmailRepeatSeconds = 43200
factoryUser = autopyfactory
enablequeues = True
queueConf = file:///etc/autopyfactory/queues.conf
queueDirConf = None
proxyConf = /etc/autopyfactory/proxy.conf
authmanager.enabled = True
proxymanager.enabled = True
proxymanager.sleep = 30
authmanager.sleep = 30
authConf = /etc/autopyfactory/auth.conf
monitorConf = /etc/autopyfactory/monitor.conf
mappingsConf = /etc/autopyfactory/mappings.conf
cycles = 9999999
cleanlogs.keepdays = 14
factory.sleep=30
wmsstatus.panda.sleep = 150
wmsstatus.panda.maxage = 360
wmsstatus.condor.sleep = 150
wmsstatus.condor.maxage = 360
batchstatus.condor.sleep = 150
batchstatus.condor.maxage = 360
baseLogDir = /home/autopyfactory/factory/logs
baseLogDirUrl = http://myhost.matrix.net:25880
logserver.enabled = True
logserver.index = True
logserver.allowrobots = False
# Automatic (re)configuration
config.reconfig = True
config.reconfig.interval = 30
config.queues.plugin = File, VC3
config.auth.plugin = File, VC3
config.queues.vc3.vc3clientconf = /etc/vc3/vc3-client.conf
config.queues.vc3.tempfile = ~/queues.conf.tmp
# For static central factory, use 'all' and will check all requests.
config.queues.vc3.requestname = all
config.auth.vc3.vc3clientconf = /etc/vc3/vc3-client.conf
config.auth.vc3.tempfile = ~/auth.conf.tmp
config.auth.vc3.requestname = all
# For the factory-level monitor plugin VC3
monitor = VC3
monitor.vc3.vc3clientconf = /etc/vc3/vc3-client.conf
Starting the Factory and Condor
Once the Factory, Condor, and the builder have been installed on the factory host, you'll need to start the services:
service condor start
service autopyfactory start
As usual, check for any errors in the factory startup:
grep "ERROR" /var/log/autopyfactory/*.log || echo "Everything OK"
Monitoring the pilots
We use a graphite server provided by MWT2 to plot time series data. Create the monitoring script in /usr/local/bin/monitor-pilots.sh
:
#!/bin/bash
condor_q -nobatch -global -const 'Jobstatus == 1' -long | grep "MATCH_APF" | sort | uniq -c | sed 's/\"//g' |awk -v date=$(date +%s) -v hostname=$(hostname | tr '.' '_') '{ print "condor.factory."hostname".idle." $4,$1,date}' | nc -w 30 graphite.mwt2.org 2003
condor_q -nobatch -global -const 'Jobstatus == 2' -long | grep "MATCH_APF" | sort | uniq -c | sed 's/\"//g' |awk -v date=$(date +%s) -v hostname=$(hostname | tr '.' '_') '{ print "condor.factory."hostname".running." $4,$1,date}' | nc -w 30 graphite.mwt2.org 2003
Make it executable, install nc
if necessary, and test for any errors:
chmod +x /usr/local/bin/monitor-pilots.sh
yum install nc -y
/usr/local/bin/monitor-pilots.sh
If successful, there should be no output. Finally add it to root's crontab (crontab -e
as root) with the following cron entry:
* * * * * /usr/local/bin/monitor-pilots.sh