summaryrefslogtreecommitdiff
path: root/doc/source/ovn/migration.rst
blob: 2123767882725ee76c50be4ef68e64b61ca810ba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
.. _ovn_migration:

Migration Strategy
==================

This document details an in-place migration strategy from ML2/OVS to ML2/OVN
in either ovs-firewall or ovs-hybrid mode for a TripleO OpenStack deployment.

For non TripleO deployments, please refer to the file ``migration/README.rst``
and the ansible playbook ``migration/migrate-to-ovn.yml``.

Overview
--------
The migration process is orchestrated through the shell script
ovn_migration.sh, which is provided with the OVN driver.

The administrator uses ovn_migration.sh to perform readiness steps
and migration from the undercloud node.
The readiness steps, such as host inventory production, DHCP and MTU
adjustments, prepare the environment for the procedure.

Subsequent steps start the migration via Ansible.

Plan for a 24-hour wait after the reduce-dhcp-t1 step to allow VMs to catch up
with the new MTU size from the DHCP server. The default neutron ML2/OVS
configuration has a dhcp_lease_duration of 86400 seconds (24h).

Also, if there are instances using static IP assignment, the administrator
should be ready to update the MTU of those instances to the new value of 8
bytes less than the ML2/OVS (VXLAN) MTU value. For example, the typical
1500 MTU network value that makes VXLAN tenant networks use 1450 bytes of MTU
will need to change to 1442 under Geneve. Or under the same overlay network,
a GRE encapsulated tenant network would use a 1458 MTU, but again a 1442 MTU
for Geneve.

If there are instances which use DHCP but don't support lease update during
the T1 period the administrator will need to reboot them to ensure that MTU
is updated inside those instances.


Steps for migration
-------------------

Perform the following steps in the overcloud/undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Ensure that you have updated to the latest openstack/neutron version.

Perform the following steps in the undercloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Install openstack-neutron-ovn-migration-tool.

   .. code-block:: console

      # yum install openstack-neutron-ovn-migration-tool

2. Create a working directory on the undercloud, and copy the ansible playbooks

   .. code-block:: console

      $ mkdir ~/ovn_migration
      $ cd ~/ovn_migration
      $ cp -rfp /usr/share/ansible/neutron-ovn-migration/playbooks .

3. Create  ``~/overcloud-deploy-ovn.sh`` script in your ``$HOME``.
   This script must source your stackrc file, and then execute an ``openstack
   overcloud deploy`` with your original deployment parameters, plus
   the following environment files, added to the end of the command
   in the following order:

   When your network topology is DVR and your compute nodes have connectivity
   to the external network:

   .. code-block:: none

      -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \
      -e $HOME/ovn-extras.yaml

   When your compute nodes don't have external connectivity and you don't use
   DVR:

   .. code-block:: none

      -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml \
      -e $HOME/ovn-extras.yaml

   Make sure that all users have execution privileges on the script, because it
   will be called by ovn_migration.sh/ansible during the migration process.

   .. code-block:: console

      $ chmod a+x ~/overcloud-deploy-ovn.sh

4. To configure the parameters of your migration you can set the environment
   variables that will be used by ``ovn_migration.sh``. You can skip setting
   any values matching the defaults.

   * STACKRC_FILE - must point to your stackrc file in your undercloud.
     Default:  ~/stackrc

   * OVERCLOUDRC_FILE - must point to your overcloudrc file in your
     undercloud.
     Default: ~/overcloudrc

   * OVERCLOUD_OVN_DEPLOY_SCRIPT - must point to the script described in
     step 1.
     Default: ~/overcloud-deploy-ovn.sh

   * UNDERCLOUD_NODE_USER - user used on the undercloud nodes
     Default: heat-admin

   * STACK_NAME - Name or ID of the heat stack
     Default: 'overcloud'
     If the stack that is migrated differs from the default, please set this
     environment variable to the stack name or ID.

   * PUBLIC_NETWORK_NAME - Name of your public network.
     Default: 'public'.
     To support migration validation, this network must have available
     floating IPs, and those floating IPs must be pingable from the
     undercloud. If that's not possible please configure VALIDATE_MIGRATION
     to False.

   * OOO_WORKDIR - Name of TripleO working directory
     Default: '$HOME/overcloud-deploy'
     This directory contains different stacks in TripleO and its files. It
     should be configured if TripleO commands were invoked with --work-dir
     option.

   * IMAGE_NAME - Name/ID of the glance image to us for booting a test server.
     Default:'cirros'.
     If the image does not exist it will automatically download and use
     cirros during the pre-validation / post-validation process.

   * VALIDATE_MIGRATION - Create migration resources to validate the
     migration. The migration script, before starting the migration, boot a
     server and validates that the server is reachable after the migration.
     Default: False

   * SERVER_USER_NAME - User name to use for logging into the migration
     instances.
     Default: 'cirros'.

   * DHCP_RENEWAL_TIME - DHCP renewal time in seconds to configure in DHCP
     agent configuration file. This renewal time is used only temporarily
     during migration to ensure a synchronized MTU switch across the networks.
     Default: 30

   * CREATE_BACKUP - Flag to create a backup of the controllers that can be
     used as a revert mechanism.
     Default: True

   * BACKUP_MIGRATION_IP - Only used if CREATE_BACKUP is enabled, IP of the
     server that will be used as a NFS server to store the backup.
     Default: 192.168.24.1

   .. warning::

      Please note that VALIDATE_MIGRATION requires enough quota (2
      available floating ips, 2 networks, 2 subnets, 2 instances,
      and 2 routers as admin).

   For example:

   .. code-block:: console

      $ export PUBLIC_NETWORK_NAME=my-public-network
      $ ovn_migration.sh .........

5. Run ``ovn_migration.sh generate-inventory`` to generate the inventory
   file - ``hosts_for_migration`` and ``ansible.cfg``. Please review
   ``hosts_for_migration`` for correctness.

   .. code-block:: console

      $ ovn_migration.sh generate-inventory


   At this step the script will inspect the TripleO ansible inventory
   and generate an inventory of hosts, specifically tagged to work
   with the migration playbooks.


6. Run ``ovn_migration.sh reduce-dhcp-t1``

   .. code-block:: console

      $ ovn_migration.sh reduce-dhcp-t1


   This lowers the T1 parameter
   of the internal neutron DHCP servers configuring the ``dhcp_renewal_time``
   in /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
   in all the nodes where DHCP agent is running.

   We lower the T1 parameter to make sure that the instances start refreshing
   the DHCP lease quicker (every 30 seconds by default) during the migration
   proccess. The reason why we force this is to make sure that the MTU update
   happens quickly across the network during step 8, this is very important
   because during those 30 seconds there will be connectivity issues with
   bigger packets (MTU missmatchess across the network), this is also why
   step 7 is very important, even though we reduce T1, the previous T1 value
   the instances leased from the DHCP server will be much higher
   (24h by default) and we need to wait those 24h to make sure they have
   updated T1. After migration the DHCP T1 parameter returns to normal values.

7. If you are using VXLAN or GRE tenant networking, ``wait at least 24 hours``
   before continuing. This will allow VMs to catch up with the new MTU size
   of the next step.

   .. warning::

      If you are using VXLAN or GRE networks, this 24-hour wait step is critical.
      If you are using VLAN tenant networks you can proceed to the next step without delay.

   .. warning::

      If you have any instance with static IP assignment on VXLAN or
      GRE tenant networks, you must manually modify the configuration of those instances.
      If your instances don't honor the T1 parameter of DHCP they will need
      to be rebooted.
      to configure the new geneve MTU, which is the current VXLAN MTU minus 8 bytes.
      For instance, if the VXLAN-based MTU was 1450, change it to 1442.

   .. note::

      24 hours is the time based on default configuration. It actually depends on
      /var/lib/config-data/puppet-generated/neutron/etc/neutron/dhcp_agent.ini
      dhcp_renewal_time and
      /var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf
      dhcp_lease_duration parameters. (defaults to 86400 seconds)

   .. note::

      Please note that migrating a deployment which uses VLAN for tenant/project
      networks is not recommended at this time because of a bug in core ovn,
      full support is being worked out here:
      https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347594.html


   One way to verify that the T1 parameter has propagated to existing VMs
   is to connect to one of the compute nodes, and run ``tcpdump`` over one
   of the VM taps attached to a tenant network. If T1 propegation was a success,
   you should see that requests happen on an interval of approximately 30 seconds.

   .. code-block:: shell

      [heat-admin@overcloud-novacompute-0 ~]$ sudo tcpdump -i tap52e872c2-e6 port 67 or port 68 -n
      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
      listening on tap52e872c2-e6, link-type EN10MB (Ethernet), capture size 262144 bytes
      13:17:28.954675 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300
      13:17:28.961321 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355
      13:17:56.241156 IP 192.168.99.5.bootpc > 192.168.99.3.bootps: BOOTP/DHCP, Request from fa:16:3e:6b:41:3d, length 300
      13:17:56.249899 IP 192.168.99.3.bootps > 192.168.99.5.bootpc: BOOTP/DHCP, Reply, length 355

   .. note::

      This verification is not possible with cirros VMs. The cirros
      udhcpc implementation does not obey DHCP option 58 (T1). Please
      try this verification on a port that belongs to a full linux VM.
      We recommend you to check all the different types of workloads your
      system runs (Windows, different flavors of linux, etc..).

8. Run ``ovn_migration.sh reduce-mtu``.

   This lowers the MTU of the pre migration VXLAN and GRE networks. The
   tool will ignore non-VXLAN/GRE networks, so if you use VLAN for tenant
   networks it will be fine if you find this step not doing anything.

   .. code-block:: console

      $ ovn_migration.sh reduce-mtu

   This step will go network by network reducing the MTU, and tagging with
   ``adapted_mtu`` the networks which have been already handled.

   Every time a network is updated all the existing L3/DHCP agents
   connected to such network will update their internal leg MTU, instances
   will start fetching the new MTU as the DHCP T1 timer expires. As explained
   before, instances not obeying the DHCP T1 parameter will need to be
   restarted, and instances with static IP assignment will need to be manually
   updated.


9. Make TripleO ``prepare the new container images`` for OVN.

   If your deployment didn't have a containers-prepare-parameter.yaml, you can
   create one with:

   .. code-block:: console

       $ test -f $HOME/containers-prepare-parameter.yaml || \
             openstack tripleo container image prepare default \
                   --output-env-file $HOME/containers-prepare-parameter.yaml


   If you had to create the file, please make sure it's included at the end of
   your $HOME/overcloud-deploy-ovn.sh and $HOME/overcloud-deploy.sh

   Change the neutron_driver in the containers-prepare-parameter.yaml file to
   ovn:

   .. code-block:: console

      $ sed -i -E 's/neutron_driver:([ ]\w+)/neutron_driver: ovn/' $HOME/containers-prepare-parameter.yaml

   You can verify with:

   .. code-block:: shell

      $ grep neutron_driver $HOME/containers-prepare-parameter.yaml
      neutron_driver: ovn


   Then update the images:

   .. code-block:: console

      $ openstack tripleo container image prepare \
           --environment-file $HOME/containers-prepare-parameter.yaml

   .. note::

      It's important to provide the full path to your containers-prepare-parameter.yaml
      otherwise the command will finish very quickly and won't work (current
      version doesn't seem to output any error).


   During this step TripleO will build a list of containers, pull them from
   the remote registry and push them to your deployment local registry.


10. Run ``ovn_migration.sh start-migration`` to kick start the migration
    process.

    .. code-block:: console

       $ ovn_migration.sh start-migration


    During this step, this is what will happen:

    * Create pre-migration resources (network and VM) to validate existing
      deployment and final migration.

    * Update the overcloud stack to deploy OVN alongside reference
      implementation services using a temporary bridge "br-migration" instead
      of br-int.

    * Start the migration process:

      1. generate the OVN north db by running neutron-ovn-db-sync util
      2. clone the existing resources from br-int to br-migration, so OVN
         can find the same resources UUIDS over br-migration
      3. re-assign ovn-controller to br-int instead of br-migration
      4. cleanup network namespaces (fip, snat, qrouter, qdhcp),
      5. remove any unnecessary patch ports on br-int
      6. remove br-tun and br-migration ovs bridges
      7. delete qr-*, ha-* and qg-* ports from br-int (via neutron netns
         cleanup)

    * Delete neutron agents and neutron HA internal networks from the database
      via API.

    * Validate connectivity on pre-migration resources.

    * Delete pre-migration resources.

    * Create post-migration resources.

    * Validate connectivity on post-migration resources.

    * Cleanup post-migration resources.

    * Re-run deployment tool to update OVN on br-int, this step ensures
      that the TripleO database is updated with the final integration bridge.

    * Run an extra validation round to ensure the final state of the system is
      fully operational.

Migration is complete !!!