summaryrefslogtreecommitdiff
path: root/lib/elixir/pages/Library Guidelines.md
blob: 71e4b991632aa06686ee12fa71ee759cfd71366f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# Library Guidelines

This document outlines general guidelines, anti-patterns, and rules for those writing and publishing Elixir libraries meant to be consumed by other developers.

## Getting started

You can create a new Elixir library by running the `mix new` command:

    $ mix new my_library

The project name is given in the `snake_case` convention where all letters are lowercase and words are separate with underscores. This is the same convention used by variables, function names and atoms in Elixir. See the [Naming Conventions](naming-conventions.html) document for more information.

Every project has a `mix.exs` file, with instructions on how to build, compile, run tests, and so on. Libraries commonly have a `lib` directory, which includes Elixir source code, and a `test` directory. A `src` directory may also exist for Erlang sources.

For more information on running your project, see the official [Mix & OTP guide](https://elixir-lang.org/getting-started/mix-otp/introduction-to-mix.html) or [Mix documentation](https://hexdocs.pm/mix/Mix.html).

### Applications with supervision tree

The `mix new` command also allows the `--sup` flag to scaffold an application with a supervision tree out of the box. We talk about supervision trees later on when discussing one of the common anti-patterns when writing libraries.

## Publishing

Writing code is only the first of many steps to publish a package. We strongly recommend developers to:

  * Choose a versioning schema. Elixir requires versions to be in the format `MAJOR.MINOR.PATCH` but the meaning of those numbers is up to you. Most projects choose [Semantic Versioning](https://semver.org/).

  * Choose a [license](https://choosealicense.com/). The most common licenses in the Elixir community are the [MIT License](https://choosealicense.com/licenses/mit/) and the [Apache 2.0 License](https://choosealicense.com/licenses/apache-2.0/). The latter is also the one used by Elixir itself.

  * Run the [code formatter](https://hexdocs.pm/mix/Mix.Tasks.Format.html). The code formatter formats your code according to a consistent style shared by your library and the whole community, making it easier for other developers to understand your code and contribute.

  * Write tests. Elixir ships with a test-framework named [ExUnit](https://hexdocs.pm/ex_unit/ExUnit.html). The project generated by `mix new` includes sample tests and doctests.

  * Write documentation. The Elixir community is proud of treating documentation as a first-class citizen and making documentation easily accessible. Libraries contribute to the status quo by providing complete API documentation with examples for their modules, types and functions. See the [Writing Documentation](writing-documentation.html) guide for more information. Projects like [ExDoc](https://github.com/elixir-lang/ex_doc) can be used to generate HTML and EPUB documents from the documentation. ExDoc also supports "extra pages", like this one that you are reading. Such pages augment the documentation with tutorials, guides and references.

Projects are often made available to other developers [by publishing a Hex package](https://hex.pm/docs/publish). Hex also [supports private packages for organizations](https://hex.pm/pricing). If ExDoc is configured for the Mix project, publishing a package on Hex will also automatically publish the generated documentation to [HexDocs](https://hexdocs.pm).

## Anti-patterns

In this section we document common anti-patterns to avoid when writing libraries.

### Avoid using exceptions for control-flow

You should avoid using exceptions for control-flow. For example, instead of:

```elixir
try do
  contents = File.read!("some_path_that_may_or_may_not_exist")
  {:it_worked, contents}
rescue
  File.Error ->
    :it_failed
end
```

you should prefer:

```elixir
case File.read("some_path_that_may_or_may_not_exist") do
  {:ok, contents} -> {:it_worked, contents}
  {:error, _} -> :it_failed
end
```

As a library author, it is your responsibility to make sure users are not required to use exceptions for control-flow in their applications. You can follow the same convention as Elixir here, using the name without `!` for returning `:ok`/`:error` tuples and appending `!` for a version of the function which raises an exception.

It is important to note that a name without `!` does not mean a function will never raise. For example, even `File.read/1` can fail in case of bad arguments:

```iex
iex> File.read(1)
** (FunctionClauseError) no function clause matching in IO.chardata_to_string/1
```

The usage of `:ok`/`:error` tuples is about the domain that the function works on, in this case, filesystem access. Bad arguments, logical errors, invalid options should raise regardless of the function name. If in doubt, prefer to return tuples instead of raising, as users of your library can always match on the results and raise if necessary.

### Avoid working with invalid data

Elixir programs should prefer to validate data as close to the end user as possible, so the errors are easy to locate and fix. This practice also saves you from writing defensive code in the internals of the library.

For example, imagine you have an API that receives a filename as a binary. At some point you will want to write to this file. You could have a function like this:

```elixir
def my_fun(some_arg, file_to_write_to, options \\ []) do
  ...some code...
  AnotherModuleInLib.invoke_something_that_will_eventually_write_to_file(file_to_write_to)
  ...more code...
end
```

The problem with the code above is that, if the user supplies an invalid input, the error will be raised deep inside the library, which makes it confusing for users. Furthermore, when you don't validate the values at the boundary, the internals of your library are never quite sure which kind of values they are working with.

A better function definition would be:

```elixir
def my_fun(some_arg, file_to_write_to, options \\ []) when is_binary(file_to_write_to) do
```

Elixir also leverages pattern matching and guards in function clauses to provide clear error messages in case invalid arguments are given.

This advice does not only apply to libraries but to any Elixir code. Every time you receive multiple options or work with external data, you should validate the data at the boundary and convert it to structured data. For example, if you provide a `GenServer` that can be started with multiple options, you want to validate those options when the server starts and rely only on structured data throughout the process life cycle. Similarly, if a database or a socket gives you a map of strings, after you receive the data, you should validate it and potentially convert it to a struct or a map of atoms.

### Avoid application configuration

You should avoid using [the application environment](https://hexdocs.pm/elixir/Application.html#get_env/2) as the configuration mechanism for libraries. The application environment is **global** which means it becomes impossible for two dependencies to use your library in two different ways.

Let's see a simple example. Imagine that you implement a library that breaks a string in two parts based on the first occurrence of the dash `-` character:

```elixir
defmodule DashSplitter do
  def split(string) when is_binary(string) do
    String.split(string, "-", parts: 2)
  end
end
```

Now imagine someone wants to split the string in three parts. You decide to make the number of parts configurable via the application environment:

```elixir
def split(string) when is_binary(string) do
  parts = Application.get_env(:dash_splitter, :parts, 2)
  String.split(string, "-", parts: parts)
end
```

Now users can configure your library in their `config/config.exs` file as follows:

```elixir
config :dash_splitter, :parts, 3
```

Once your library is configured, it will change the behaviour of all users of your library. If a library was expecting it to split the string in 2 parts, since the configuration is global, it will now split it in 3 parts.

The solution is to provide configuration as close as possible to where it is used and not via the application environment. In case of a function, you could expect keyword lists as a new argument:

```elixir
def split(string, opts \\ []) when is_binary(string) and is_list(opts) do
  parts = Keyword.get(opts, :parts, 2)
  String.split(string, "-", parts: parts)
end
```

In case you need to configure a process, the options should be passed when starting that process.

The application environment should be reserved only for configurations that are truly global, for example, to control your application boot process and its supervision tree.

For all remaining scenarios, libraries should not force their users to use the application environment for configuration. If the user of a library believes that certain parameter should be configured globally, then they can wrap the library functionality with their own application environment configuration.

### Avoid `use` when an `import` is enough

A library should not provide `use MyLib` functionality if all `use MyLib` does is to `import`/`alias` the module itself. For example, this is an anti-pattern:

```elixir
defmodule MyLib do
  defmacro __using__(_) do
    quote do
      import MyLib
    end
  end

  def some_fun(arg1, arg2) do
    ...
  end
end
```

The reason why defining the `__using__` macro above should be avoided is because when a developer writes:

```elixir
defmodule MyApp do
  use MyLib
end
```

it allows `use MyLib` to run *any* code into the `MyApp` module. For someone reading the code, it is impossible to assess the impact that `use MyLib` has in a module without looking at the implementation of `__using__`.

The following code is much clearer:

```elixir
defmodule MyApp do
  import MyLib
end
```

The code above says we are only bringing in the functions from `MyLib` so we can invoke `some_fun(arg1, arg2)` directly without the `MyLib.` prefix. Even more important, `import MyLib` says that we have an option to not `import MyLib` at all as we can simply invoke the function as `MyLib.some_fun(arg1, arg2)`.

If the module you want to invoke a function on has a long name, such as `SomeLibrary.Namespace.MyLib`, and you find it verbose, you can leverage the `alias/2` special form and still refer to the module as `MyLib`.

While there are many situations where using a module is required, `use` should be skipped when all it does is to `import` or `alias` a module. In a nutshell, `alias` is simpler and clearer than `import`, and `import` is simpler and clearer than `use`.

### Avoid macros

Although the previous section could be summarized as "avoid macros", both topics are important enough to deserve their own sections.

To quote [the official guide on Macros](https://elixir-lang.org/getting-started/meta/macros.html):

> Even though Elixir attempts its best to provide a safe environment for macros, the major responsibility of writing clean code with macros falls on developers. Macros are harder to write than ordinary Elixir functions and it’s considered to be bad style to use them when they’re not necessary. So write macros responsibly.
>
> Elixir already provides mechanisms to write your everyday code in a simple and readable fashion by using its data structures and functions. Macros should only be used as a last resort. Remember that **explicit is better than implicit**. **Clear code is better than concise code**.

When you absolutely have to use a macro, make sure that a macro is not the only way the user can interface with your library and keep the amount of code generated by a macro to a minimum. For example, the `Logger` module provides `debug/2`, `info/2` and friends as macros that are capable of extracting environment information, but a low-level mechanism for logging is still available with `Logger.bare_log/3`.

### Avoid using processes for code organization

A developer must never use a process for code organization purposes. A process must be used to model runtime properties such as:

  * Mutable state and access to shared resources (such as ets, files, etc)
  * Concurrency and distribution
  * Initialization, shutdown and restart logic (as seen in supervisors)
  * System messages such as timer messages and monitoring events

In Elixir, code organization is done by modules and functions, processes are not necessary. For example, imagine you are implementing a calculator and you decide to put all the calculator operations behind a `GenServer`:

```elixir
def add(a, b) do
  GenServer.call(__MODULE__, {:add, a, b})
end

def handle_call({:add, a, b}, _from, state) do
  {:reply, a + b, state}
end

def handle_call({:subtract, a, b}, _from, state) do
  {:reply, a - b, state}
end
```

This is an anti-pattern not only because it convolutes the calculator logic but also because you put the calculator logic behind a single process that will potentially become a bottleneck in your system, especially as the number of calls grow. Instead just define the functions directly:

```elixir
def add(a, b) do
  a + b
end

def subtract(a, b) do
  a - b
end
```

Use processes only to model runtime properties, never for code organization. And even when you think something could be done in parallel with processes, often it is best to let the callers of your library decide how to paralleiize, rather than impose a certain execution flow in users of your code.

### Avoid spawning unsupervised processes

You should avoid spawning processes outside of a supervision tree, especially long-running ones. Instead, processes must be started inside supervision trees. This guarantees developers have full control over the initialization, restarts, and shutdown of the system.

If your application does not have a supervision tree, one can be added by changing `def application` inside `mix.exs` to include a `:mod` key with the application callback name:

```elixir
def application do
  [
    extra_applications: [:logger],
    mod: {MyApp.Application, []}
  ]
end
```

and then defining a `my_app/application.ex` file with the following template:

```elixir
defmodule MyApp.Application do
    # See https://hexdocs.pm/elixir/Application.html
    # for more information on OTP Applications
    @moduledoc false

    use Application

    def start(_type, _args) do
      # List all child processes to be supervised
      children = [
        # Starts a worker by calling: MyApp.Worker.start_link(arg)
        # {MyApp.Worker, arg},
      ]

      # See https://hexdocs.pm/elixir/Supervisor.html
      # for other strategies and supported options
      opts = [strategy: :one_for_one, name: MyApp.Supervisor]
      Supervisor.start_link(children, opts)
    end
  end
end
```

This is the same template generated by `mix new --sup`.

Each process started with the application must be listed as a child under the `Supervisor` above. We call those "static processes" because they are known upfront. For handling dynamic processes, such as the ones started during requests and other user inputs, look at the `DynamicSupervisor` module.

One of the few times where it is acceptable to start a process outside of a supervision tree is with `Task.async/1` and `Task.await/2`. Opposite to `Task.start_link/1`, the `async/await` mechanism gives you full control over the spawned process life cycle - which is also why you must always call `Task.await/2` after starting a task with `Task.async/1`. Even though, if your application is spawning multiple async processes, you should consider using `Task.Supervisor` for better visibility when instrumenting and monitoring the system.