summaryrefslogtreecommitdiff
path: root/doc/development
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/README.md4
-rw-r--r--doc/development/database_merge_request_checklist.md15
-rw-r--r--doc/development/fe_guide/style_guide_js.md17
-rw-r--r--doc/development/hash_indexes.md20
-rw-r--r--doc/development/ordering_table_columns.md127
-rw-r--r--doc/development/serializing_data.md3
-rw-r--r--doc/development/sql.md26
-rw-r--r--doc/development/testing.md69
-rw-r--r--doc/development/verifying_database_capabilities.md26
9 files changed, 303 insertions, 4 deletions
diff --git a/doc/development/README.md b/doc/development/README.md
index 58993c52dcd..dd150421b65 100644
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -46,6 +46,7 @@
## Databases
+- [Merge Request Checklist](database_merge_request_checklist.md)
- [What requires downtime?](what_requires_downtime.md)
- [Adding database indexes](adding_database_indexes.md)
- [Post Deployment Migrations](post_deployment_migrations.md)
@@ -56,6 +57,9 @@
- [Background Migrations](background_migrations.md)
- [Storing SHA1 Hashes As Binary](sha1_as_binary.md)
- [Iterating Tables In Batches](iterating_tables_in_batches.md)
+- [Ordering Table Columns](ordering_table_columns.md)
+- [Verifying Database Capabilities](verifying_database_capabilities.md)
+- [Hash Indexes](hash_indexes.md)
## i18n
diff --git a/doc/development/database_merge_request_checklist.md b/doc/development/database_merge_request_checklist.md
new file mode 100644
index 00000000000..75c395b61ef
--- /dev/null
+++ b/doc/development/database_merge_request_checklist.md
@@ -0,0 +1,15 @@
+# Merge Request Checklist
+
+When creating a merge request that performs database related changes (schema
+changes, adjusting queries to optimise performance, etc) you should use the
+merge request template called "Database Changes". This template contains a
+checklist of steps to follow to make sure the changes are up to snuff.
+
+To use the checklist, create a new merge request and click on the "Choose a
+template" dropdown, then click "Database Changes".
+
+An example of this checklist can be found at
+https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12463.
+
+The source code of the checklist can be found in at
+https://gitlab.com/gitlab-org/gitlab-ce/blob/master/.gitlab/merge_request_templates/Database%20Changes.md
diff --git a/doc/development/fe_guide/style_guide_js.md b/doc/development/fe_guide/style_guide_js.md
index 6ade3231fac..9c72fda0229 100644
--- a/doc/development/fe_guide/style_guide_js.md
+++ b/doc/development/fe_guide/style_guide_js.md
@@ -511,7 +511,24 @@ A forEach will cause side effects, it will be mutating the array being iterated.
$('span').tooltip('fixTitle');
```
+### The Javascript/Vue Accord
+The goal of this accord is to make sure we are all on the same page.
+1. When writing Vue, you may not use jQuery in your application.
+1.1 If you need to grab data from the DOM, you may query the DOM 1 time while bootstrapping your application to grab data attributes using `dataset`. You can do this without jQuery.
+1.2 You may use a jQuery dependency in Vue.js following [this example from the docs](https://vuejs.org/v2/examples/select2.html).
+1.3 If an outside jQuery Event needs to be listen to inside the Vue application, you may use jQuery event listeners.
+1.4 We will avoid adding new jQuery events when they are not required. Instead of adding new jQuery events take a look at [different methods to do the same task](https://vuejs.org/v2/api/#vm-emit).
+
+1. You may query the `window` object 1 time, while bootstrapping your application for application specific data (e.g. `scrollTo` is ok to access anytime). Do this access during the bootstrapping of your application.
+
+1. You may have a temporary but immediate need to create technical debt by writing code that does not follow our standards, to be refactored later. Maintainers need to be ok with the tech debt in the first place. An issue should be created for that tech debt to evaluate it further and discuss. In the coming months you should fix that tech debt, with it's priority to be determined by maintainers.
+
+1. When creating tech debt you must write the tests for that code before hand and those tests may not be rewritten. e.g. jQuery tests rewritten to Vue tests.
+
+1. You may choose to use VueX as a centralized state management. If you choose not to use VueX, you must use the *store pattern* which can be found in the [Vue.js documentation](https://vuejs.org/v2/guide/state-management.html#Simple-State-Management-from-Scratch).
+
+1. Once you have chosen a centralized state management solution you must use it for your entire application. i.e. Don't mix and match your state management solutions.
## SCSS
- [SCSS](style_guide_scss.md)
diff --git a/doc/development/hash_indexes.md b/doc/development/hash_indexes.md
new file mode 100644
index 00000000000..e6c1b3590b1
--- /dev/null
+++ b/doc/development/hash_indexes.md
@@ -0,0 +1,20 @@
+# Hash Indexes
+
+Both PostgreSQL and MySQL support hash indexes besides the regular btree
+indexes. Hash indexes however are to be avoided at all costs. While they may
+_sometimes_ provide better performance the cost of rehashing can be very high.
+More importantly: at least until PostgreSQL 10.0 hash indexes are not
+WAL-logged, meaning they are not replicated to any replicas. From the PostgreSQL
+documentation:
+
+> Hash index operations are not presently WAL-logged, so hash indexes might need
+> to be rebuilt with REINDEX after a database crash if there were unwritten
+> changes. Also, changes to hash indexes are not replicated over streaming or
+> file-based replication after the initial base backup, so they give wrong
+> answers to queries that subsequently use them. For these reasons, hash index
+> use is presently discouraged.
+
+RuboCop is configured to register an offence when it detects the use of a hash
+index.
+
+Instead of using hash indexes you should use regular btree indexes.
diff --git a/doc/development/ordering_table_columns.md b/doc/development/ordering_table_columns.md
new file mode 100644
index 00000000000..249e70c7b0e
--- /dev/null
+++ b/doc/development/ordering_table_columns.md
@@ -0,0 +1,127 @@
+# Ordering Table Columns
+
+Similar to C structures the space of a table is influenced by the order of
+columns. This is because the size of columns is aligned depending on the type of
+the column. Take the following column order for example:
+
+* id (integer, 4 bytes)
+* name (text, variable)
+* user_id (integer, 4 bytes)
+
+Integers are aligned to the word size. This means that on a 64 bit platform the
+actual size of each column would be: 8 bytes, variable, 8 bytes. This means that
+each row will require at least 16 bytes for the two integers, and a variable
+amount for the text field. If a table has a few rows this is not an issue, but
+once you start storing millions of rows you can save space by using a different
+order. For the above example a more ideal column order would be the following:
+
+* id (integer, 4 bytes)
+* user_id (integer, 4 bytes)
+* name (text, variable)
+
+In this setup the `id` and `user_id` columns can be packed together, which means
+we only need 8 bytes to store _both_ of them. This in turn each row will require
+8 bytes less of space.
+
+For GitLab we require that columns of new tables are ordered based to use the
+least amount of space. An easy way of doing this is to order them based on the
+type size in descending order with variable sizes (string and text columns for
+example) at the end.
+
+## Type Sizes
+
+While the PostgreSQL docuemntation
+(https://www.postgresql.org/docs/current/static/datatype.html) contains plenty
+of information we will list the sizes of common types here so it's easier to
+look them up. Here "word" refers to the word size, which is 4 bytes for a 32
+bits platform and 8 bytes for a 64 bits platform.
+
+| Type | Size | Aligned To |
+|:-----------------|:-------------------------------------|:-----------|
+| smallint | 2 bytes | 1 word |
+| integer | 4 bytes | 1 word |
+| bigint | 8 bytes | 8 bytes |
+| real | 4 bytes | 1 word |
+| double precision | 8 bytes | 8 bytes |
+| boolean | 1 byte | not needed |
+| text / string | variable, 1 byte plus the data | 1 word |
+| bytea | variable, 1 or 4 bytes plus the data | 1 word |
+| timestamp | 8 bytes | 8 bytes |
+| timestamptz | 8 bytes | 8 bytes |
+| date | 4 bytes | 1 word |
+
+A "variable" size means the actual size depends on the value being stored. If
+PostgreSQL determines this can be embedded directly into a row it may do so, but
+for very large values it will store the data externally and store a pointer (of
+1 word in size) in the column. Because of this variable sized columns should
+always be at the end of a table.
+
+## Real Example
+
+Let's use the "events" table as an example, which currently has the following
+layout:
+
+| Column | Type | Size |
+|:------------|:----------------------------|:---------|
+| id | integer | 4 bytes |
+| target_type | character varying | variable |
+| target_id | integer | 4 bytes |
+| title | character varying | variable |
+| data | text | variable |
+| project_id | integer | 4 bytes |
+| created_at | timestamp without time zone | 8 bytes |
+| updated_at | timestamp without time zone | 8 bytes |
+| action | integer | 4 bytes |
+| author_id | integer | 4 bytes |
+
+After adding padding to align the columns this would translate to columns being
+divided into fixed size chunks as follows:
+
+| Chunk Size | Columns |
+|:-----------|:------------------|
+| 8 bytes | id |
+| variable | target_type |
+| 8 bytes | target_id |
+| variable | title |
+| variable | data |
+| 8 bytes | project_id |
+| 8 bytes | created_at |
+| 8 bytes | updated_at |
+| 8 bytes | action, author_id |
+
+This means that excluding the variable sized data we need at least 48 bytes per
+row.
+
+We can optimise this by using the following column order instead:
+
+| Column | Type | Size |
+|:------------|:----------------------------|:---------|
+| created_at | timestamp without time zone | 8 bytes |
+| updated_at | timestamp without time zone | 8 bytes |
+| id | integer | 4 bytes |
+| target_id | integer | 4 bytes |
+| project_id | integer | 4 bytes |
+| action | integer | 4 bytes |
+| author_id | integer | 4 bytes |
+| target_type | character varying | variable |
+| title | character varying | variable |
+| data | text | variable |
+
+This would produce the following chunks:
+
+| Chunk Size | Columns |
+|:-----------|:-------------------|
+| 8 bytes | created_at |
+| 8 bytes | updated_at |
+| 8 bytes | id, target_id |
+| 8 bytes | project_id, action |
+| 8 bytes | author_id |
+| variable | target_type |
+| variable | title |
+| variable | data |
+
+Here we only need 40 bytes per row excluding the variable sized data. 8 bytes
+being saved may not sound like much, but for tables as large as the "events"
+table it does begin to matter. For example, when storing 80 000 000 rows this
+translates to a space saving of at least 610 MB: all by just changing the order
+of a few columns.
diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md
index 2b56f48bc44..37332c20147 100644
--- a/doc/development/serializing_data.md
+++ b/doc/development/serializing_data.md
@@ -1,7 +1,8 @@
# Serializing Data
**Summary:** don't store serialized data in the database, use separate columns
-and/or tables instead.
+and/or tables instead. This includes storing of comma separated values as a
+string.
Rails makes it possible to store serialized data in JSON, YAML or other formats.
Such a field can be defined as follows:
diff --git a/doc/development/sql.md b/doc/development/sql.md
index 23fd7604957..974b1d99dff 100644
--- a/doc/development/sql.md
+++ b/doc/development/sql.md
@@ -216,4 +216,30 @@ exact same results. This also means there's no need to add an index on
`created_at` to ensure consistent performance as `id` is already indexed by
default.
+## Use WHERE EXISTS instead of WHERE IN
+
+While `WHERE IN` and `WHERE EXISTS` can be used to produce the same data it is
+recommended to use `WHERE EXISTS` whenever possible. While in many cases
+PostgreSQL can optimise `WHERE IN` quite well there are also many cases where
+`WHERE EXISTS` will perform (much) better.
+
+In Rails you have to use this by creating SQL fragments:
+
+```ruby
+Project.where('EXISTS (?)', User.select(1).where('projects.creator_id = users.id AND users.foo = X'))
+```
+
+This would then produce a query along the lines of the following:
+
+```sql
+SELECT *
+FROM projects
+WHERE EXISTS (
+ SELECT 1
+ FROM users
+ WHERE projects.creator_id = users.id
+ AND users.foo = X
+)
+```
+
[gin-index]: http://www.postgresql.org/docs/current/static/gin.html
diff --git a/doc/development/testing.md b/doc/development/testing.md
index ea94c87d8c6..efd56484b12 100644
--- a/doc/development/testing.md
+++ b/doc/development/testing.md
@@ -157,8 +157,9 @@ trade-off:
- Unit tests are usually cheap, and you should consider them like the basement
of your house: you need them to be confident that your code is behaving
- correctly. However if you run only unit tests without integration / system tests, you might [miss] the [big] [picture]!
-- Integration tests are a bit more expensive, but don't abuse them. A feature test
+ correctly. However if you run only unit tests without integration / system
+ tests, you might [miss] the [big] [picture]!
+- Integration tests are a bit more expensive, but don't abuse them. A system test
is often better than an integration test that is stubbing a lot of internals.
- System tests are expensive (compared to unit tests), even more if they require
a JavaScript driver. Make sure to follow the guidelines in the [Speed](#test-speed)
@@ -195,11 +196,27 @@ Please consult the [dedicated "Frontend testing" guide](./fe_guide/testing.md).
- Try to match the ordering of tests to the ordering within the class.
- Try to follow the [Four-Phase Test][four-phase-test] pattern, using newlines
to separate phases.
-- Try to use `Gitlab.config.gitlab.host` rather than hard coding `'localhost'`
+- Use `Gitlab.config.gitlab.host` rather than hard coding `'localhost'`
+- Don't assert against the absolute value of a sequence-generated attribute (see
+ [Gotchas](gotchas.md#dont-assert-against-the-absolute-value-of-a-sequence-generated-attribute)).
+- Don't supply the `:each` argument to hooks since it's the default.
- On `before` and `after` hooks, prefer it scoped to `:context` over `:all`
[four-phase-test]: https://robots.thoughtbot.com/four-phase-test
+### Automatic retries and flaky tests detection
+
+On our CI, we use [rspec-retry] to automatically retry a failing example a few
+times (see [`spec/spec_helper.rb`] for the precise retries count).
+
+We also use a home-made `RspecFlaky::Listener` listener which records flaky
+examples in a JSON report file on `master` (`retrieve-tests-metadata` and `update-tests-metadata` jobs), and warns when a new flaky example
+is detected in any other branch (`flaky-examples-check` job). In the future, the
+`flaky-examples-check` job will not be allowed to fail.
+
+[rspec-retry]: https://github.com/NoRedInk/rspec-retry
+[`spec/spec_helper.rb`]: https://gitlab.com/gitlab-org/gitlab-ce/blob/master/spec/spec_helper.rb
+
### `let` variables
GitLab's RSpec suite has made extensive use of `let` variables to reduce
@@ -262,6 +279,43 @@ end
- Avoid scenario titles that add no information, such as "successfully".
- Avoid scenario titles that repeat the feature title.
+### Table-based / Parameterized tests
+
+This style of testing is used to exercise one piece of code with a comprehensive
+range of inputs. By specifying the test case once, alongside a table of inputs
+and the expected output for each, your tests can be made easier to read and more
+compact.
+
+We use the [rspec-parameterized](https://github.com/tomykaira/rspec-parameterized)
+gem. A short example, using the table syntax and checking Ruby equality for a
+range of inputs, might look like this:
+
+```ruby
+describe "#==" do
+ using Rspec::Parameterized::TableSyntax
+
+ let(:project1) { create(:project) }
+ let(:project2) { create(:project) }
+ where(:a, :b, :result) do
+ 1 | 1 | true
+ 1 | 2 | false
+ true | true | true
+ true | false | false
+ project1 | project1 | true
+ project2 | project2 | true
+ project 1 | project2 | false
+ end
+
+ with_them do
+ it { expect(a == b).to eq(result) }
+
+ it 'is isomorphic' do
+ expect(b == a).to eq(result)
+ end
+ end
+end
+```
+
### Matchers
Custom matchers should be created to clarify the intent and/or hide the
@@ -270,6 +324,15 @@ complexity of RSpec expectations.They should be placed under
a certain type of specs only (e.g. features, requests etc.) but shouldn't be if
they apply to multiple type of specs.
+#### have_gitlab_http_status
+
+Prefer `have_gitlab_http_status` over `have_http_status` because the former
+could also show the response body whenever the status mismatched. This would
+be very useful whenever some tests start breaking and we would love to know
+why without editing the source and rerun the tests.
+
+This is especially useful whenever it's showing 500 internal server error.
+
### Shared contexts
All shared contexts should be be placed under `spec/support/shared_contexts/`.
diff --git a/doc/development/verifying_database_capabilities.md b/doc/development/verifying_database_capabilities.md
new file mode 100644
index 00000000000..cc6d62957e3
--- /dev/null
+++ b/doc/development/verifying_database_capabilities.md
@@ -0,0 +1,26 @@
+# Verifying Database Capabilities
+
+Sometimes certain bits of code may only work on a certain database and/or
+version. While we try to avoid such code as much as possible sometimes it is
+necessary to add database (version) specific behaviour.
+
+To facilitate this we have the following methods that you can use:
+
+* `Gitlab::Database.postgresql?`: returns `true` if PostgreSQL is being used
+* `Gitlab::Database.mysql?`: returns `true` if MySQL is being used
+* `Gitlab::Database.version`: returns the PostgreSQL version number as a string
+ in the format `X.Y.Z`. This method does not work for MySQL
+
+This allows you to write code such as:
+
+```ruby
+if Gitlab::Database.postgresql?
+ if Gitlab::Database.version.to_f >= 9.6
+ run_really_fast_query
+ else
+ run_fast_query
+ end
+else
+ run_query
+end
+```