summaryrefslogtreecommitdiff
path: root/README.md
blob: 772402fb08127dacac3235d147995657f4630b1b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
Baserock project public infrastructure
======================================

This repository contains the definitions for all of the Baserock Project's
infrastructure. This includes every service used by the project, except for
the mailing lists (hosted by [Pepperfish]) the wiki (hosted by [Branchable])
and the GitLab CI runners (set up by Javier Jardón).

Some of these systems are Baserock systems. This has proved an obstacle to
keeping them up to date with security updates, and we plan to switch everything
to run on mainstream distros in future.

All files necessary for (re)deploying the systems should be contained in this
Git repository. Private tokens should be encrypted using
[ansible-vault](https://www.ansible.com/blog/2014/02/19/ansible-vault).

[Pepperfish]: http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo
[Branchable]: http://www.branchable.com/


General notes
-------------

When instantiating a machine that will be public, remember to give shell
access everyone on the ops team. This can be done using a post-creation
customisation script that injects all of their SSH keys.

Additionally, ensure SSH password login is disabled in all systems you deploy!
See: <https://testbit.eu/is-ssh-insecure/> for why.

The Ansible playbook `admin/sshd_config.yaml` can ensure that all systems have
password login disabled, and all the SSH keys installed.

    ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml


Administration
--------------

You can use [Ansible] to automate tasks on the baserock.org systems.

To run a playbook:

    ansible-playbook -i hosts $PLAYBOOK.yaml

To run an ad-hoc command (upgrading, for example):

    ansible -i hosts ubuntu -m command -a 'sudo apt -y upgrade'

[Ansible]: http://www.ansible.com


Security updates
----------------

The [LWN Alerts](https://lwn.net/Alerts/) service gives you info from all major
Linux distributions.

If there is a vulnerability discovered in some software we use, we might need
to upgrade all of the systems that use that component at baserock.org.

Bear in mind some systems are not accessible except via the frontend-haproxy
system. Those are usually less at risk than those that face the web directly.
Also bear in mind we use OpenStack security groups to block most ports.

### Check the inventory

Make sure the Ansible inventory file is up to date, and that you have access to
all machines. Run this:

    ansible \* -i ./hosts -m ping

You should see lots of this sort of output:

    frontend-haproxy | success >> {
        "changed": false,
        "ping": "pong"
    }

You may find some host key errors like this:

    paste | FAILED => SSH Error: Host key verification failed.
    It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

If you have a host key problem, that could be because somebody redeployed
the system since the last time you connected to it with SSH, and did not
transfer the SSH host keys from the old system to the new system. Check with
other ops teams members about this. If you are sure the new host keys can
be trusted, you can remove the old ones with `ssh-keygen -R 10.3.x.y`, where
10.3.x.y is the internal IP address of the machine. You'll then be prompted to
accept the new ones when you run Ansible again.

Once all machines respond to the Ansible 'ping' module, double check that
every machine you can see in the OpenStack Horizon dashboard has a
corresponding entry in the 'hosts' file, to ensure the next steps operate
on all of the machines.

### Check and update Debian/Ubuntu systems

Check what version of a package is in use with this command (using NGINX as an
example).

    ansible ubuntu -i hosts -m command -a 'dpkg -s nginx'

You can see what updates are available using the `apt-cache policy' command,
which also gives you information about the installed one.

    ansible -i hosts fedora -m command -a 'apt-cache policy nginx'

You can then use `apt -y upgrade` to install all available updates. Or use
`apt-get --only-upgrade install <package name>` to update just that package.

You will then need to restart services, but rebooting the whole machine is
probably easiest.


Deployment to OpenStack
-----------------------

The intention is that all of the systems defined here are deployed to an
OpenStack cloud. The instructions here harcode some details about the specific
tenancy at [CityCloud](https://citycontrolpanel.com/) that the Baserock project
uses. It should be easy to adapt them for other OpenStack hosts, though.

### Credentials

The instructions below assume you have the following environment variables set
according to the OpenStack host you are deploying to:

 - `OS_AUTH_URL`
 - `OS_TENANT_NAME`
 - `OS_USERNAME`
 - `OS_PASSWORD`

For CityCloud you also need to ensure that `OS_REGION_NAME` is set to `Fra1`
(for the Frankfurt datacentre).

Backups
-------

Backups of git.baserock.org's data volume are run by and stored on on a
Codethink-managed machine named 'access'. They will need to migrate off this
system before long.  The backups are taken without pausing services or
snapshotting the data, so they will not be 100% clean. The current
git.baserock.org data volume does not use LVM and cannot be easily snapshotted.

> Note: backups currently not running

Systems
-------

All the servers needed are deployed using Terraform. To install all the systems
below, you need to first run Terraform to create all the needed bits in your
service provider

    cd terraform
    terraform init
    terraform apply


This will create/update the tfstate currently stored in OpenStack (via Swift).
If you want to download the state file you can run the following command, but
this isn't necessary:

    openstack object  save terraform-state-baserock tfstate.tf

> The `tfstate` is common for everyone. Is not recommended to have multiple
> people working at the same time on the Terraform side of the infrastructure.


These scripts will create:
 - Networks, subnetworks, floating IPs
 - Security groups
 - Volumes
 - Instances (servers) using all the above

### Front-end

The front-end provides a reverse proxy, to allow more flexible routing than
simply pointing each subdomain to a different instance using separate public
IPs. It also provides a starting point for future load-balancing and failover
configuration.

To deploy this system:

    ansible-playbook -i hosts baserock_frontend/image-config.yml
    ansible-playbook -i hosts baserock_frontend/instance-config.yml \
        --vault-password-file=~/vault-infra-pass
    # backups not being done at the moment
    # ansible-playbook -i hosts baserock_frontend/instance-backup-config.yml


The baserock_frontend system is stateless.

Full HAProxy 2.0 documentation: <https://cbonte.github.io/haproxy-dconv/2.0/configuration.html>.

If you want to add a new service to the Baserock Project infrastructure via
the frontend, do the following:

- request a subdomain that points at the frontend IP
- alter the haproxy.cfg file in the baserock_frontend/ directory in this repo
  as necessary to proxy requests to the real instance
- run the baserock_frontend/instance-config.yml playbook

OpenStack doesn't provide any kind of internal DNS service, so you must put the
fixed IP of each instance.

The internal IP address of this machine is hardcoded in some places (beyond the
usual haproxy.cfg file), use 'git grep' to find all of them. You'll need to
update all the relevant config files. We really need some internal DNS system
to avoid this hassle.

### General webserver

The general-purpose webserver provides downloads, plus IRC logging and a
pastebin service.

To deploy to production:

    ansible-playbook -i hosts baserock_webserver/image-config.yml
    ansible-playbook -i hosts baserock_webserver/instance-config.yml
    ansible-playbook -i hosts baserock_webserver/instance-gitlabirced-config.yml \
        --vault-password-file ~/vault-infra-pass
    ansible-playbook -i hosts baserock_webserver/instance-hastebin-config.yml \
        --vault-password-file ~/vault-infra-pass
    ansible-playbook -i hosts baserock_webserver/instance-irclogs-config.yml

### Trove

Deployment of Trove is done using [Lorry Depot]. To do so you can:

    git clone https://gitlab.com/CodethinkLabs/lorry/lorry-depot
    cd lorry-depot
    git clone https://gitlab.com/baserock/git.baserock.org.git
    ansible-playbook -i git.baserock.org/static-inventory.yml lorry-depots.yml


### OSTree artifact cache

To deploy this system to production:

    ansible-playbook -i hosts baserock_ostree/image-config.yml
    ansible-playbook -i hosts baserock_ostree/instance-config.yml
    ansible-playbook -i hosts baserock_ostree/ostree-access-config.yml


SSL certificates
================

The certificates used for our infrastructure are provided for free
by Let's Encrypt. These certificates expire every 3 months, but are
automatically updated via certbot.


GitLab CI runners setup
=======================

Baserock uses [GitLab CI] for build and test automation. For performance reasons
we provide our own runners and avoid using the free, shared runners provided by
GitLab. The runners are hosted at [DigitalOcean] and managed by the 'baserock'
team account there.

There is a persistent 'manager' machine with a public IP of 138.68.150.249 that
runs GitLab Runner and [docker-machine]. This doesn't run any builds itself --
we use the [autoscaling feature] of GitLab Runner to spawn new VMs for building
in. The configuration for this is in `/etc/gitlab-runner/config.toml`.

Each build occurs in a Docker container on one of the transient VMs. As per
the ['runners.docker' section] of `config.toml`, each gets a newly created
volume mounted at `/cache`. The YBD and BuildStream cache directories get
located here because jobs were running out of disk space when using the default
configuration.

There is a second persistent machine with a public IP of 46.101.48.48 that
hosts a Docker registry and a [Minio] cache. These services run as Docker
containers. The Docker registry exists to cache the Docker images we use which
improves the spin-up time of the transient builder VMs, as documented
[here](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-docker-registry-mirroring).
The Minio cache is used for the [distributed caching] feature of GitLab CI.


[GitLab CI]: https://about.gitlab.com/features/gitlab-ci-cd/
[DigitalOcean]: https://cloud.digitalocean.com/
[docker-machine]: https://docs.docker.com/machine/
[autoscaling feature]: https://docs.gitlab.com/runner/configuration/autoscale.html
[Minio]: https://www.minio.io/
['runners.docker' section]: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section
[distributed caching]: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching
[Lorry Depot]: https://gitlab.com/CodethinkLabs/lorry/lorry-depot