DOC: filter documentation was bigger than DEVELOP so separated it out. Hopefully it may get read more

author: Daniel Black <grooverdan@users.sourceforge.net> 2013-12-29 07:26:41 +0000
committer: Daniel Black <grooverdan@users.sourceforge.net> 2013-12-29 07:26:41 +0000
commit: 2b9d4f86cd1155b548a92b4429f05ed36a7a7a79 (patch)
tree: 32647b00f6cac3eb7196a8b2f9d8d0182458fcee /DEVELOP
parent: 4a0e42856341e73412f998f50f1bfea98ed59cee (diff)
download: fail2ban-2b9d4f86cd1155b548a92b4429f05ed36a7a7a79.tar.gz
1 files changed, 1 insertions, 459 deletions
diff --git a/DEVELOP b/DEVELOP
index 3939fe47..18d29ad4 100644
--- a/DEVELOP
+++ b/DEVELOP
@@ -34,465 +34,7 @@ When submitting pull requests on GitHub we ask you to:
 * Include a change to the relevant section of the ChangeLog; and
 * Include yourself in THANKS if not already there.
 
-Filters
-=======
-
-Filters are tricky. They need to:
-* work with a variety of the versions of the software that generates the logs;
-* work with the range of logging configuration options available in the
-  software;
-* work with multiple operating systems;
-* not make assumptions about the log format in excess of the software
-  (e.g. do not assume a username doesn't contain spaces and use \S+ unless
-  you've checked the source code);
-* account for how future versions of the software will log messages
-  (e.g. guess what would happen to the log message if different authentication
-  types are added);
-* not be susceptible to DoS vulnerabilities (see Filter Security below); and
-* match intended log lines only.
-
-Please follow the steps from Filter Test Cases to Developing Filter Regular
-Expressions and submit a GitHub pull request (PR) afterwards. If you get stuck,
-you can push your unfinished changes and still submit a PR -- describe
-what you have done, what is the hurdle, and we'll attempt to help (PR
-will be automagically updated with future commits you would push to
-complete it).
-
-Filter test cases
------------------
-
-Purpose:
-
-Start by finding the log messages that the application generates related to
-some form of authentication failure. If you are adding to an existing filter
-think about whether the log messages are of a similar importance and purpose
-to the existing filter. If you were a user of Fail2Ban, and did a package
-update of Fail2Ban that started matching new log messages, would anything
-unexpected happen?  Would the bantime/findtime for the jail be appropriate for
-the new log messages?  If it doesn't, perhaps it needs to be in a separate
-filter definition, for example like exim filter aims at authentication failures
-and exim-spam at log messages related to spam.
-
-Even if it is a new filter you may consider separating the log messages into
-different filters based on purpose.
-
-Cause:
-
-Are some of the log lines a result of the same action? For example, is a PAM
-failure log message, followed by an application specific failure message the
-result of the same user/script action?  If you add regular expressions for
-both you would end up with two failures for a single action.
-Therefore, select the most appropriate log message and document the other log
-message) with a test case not to match it and a description as to why you chose
-one over another.
-
-With the selected log lines consider what action has caused those log
-messages and whether they could have been generated by accident? Could
-the log message be occurring due to the first step towards the application
-asking for authentication? Could the log messages occur often? If some of
-these are true make a note of this in the jail.conf example that you provide.
-
-Samples:
-
-It is important to include log file samples so any future change in the regular
-expression will still work with the log lines you have identified.
-
-The sample log messages are provided in a file under testcases/files/logs/
-named identically as the corresponding filter (but without .conf extension).
-Each log line should be preceded by a line with failJSON metadata (so the logs
-lines are tested in the test suite) directly above the log line. If there is
-any specific information about the log message, such as version or an
-application configuration option that is needed for the message to occur,
-include this in a comment (line beginning with #) above the failJSON metadata.
-
-Log samples should include only one, definitely not more than 3, examples of
-log messages of the same form. If log messages are different in different
-versions of the application log messages that show this are encouraged.
-
-Also attempt to inject an IP into the application (e.g. by specifying
-it as a username) so that Fail2Ban possibly detects the IP
-from user input rather than the true origin. See the Filter Security section
-and the top example in testcases/files/logs/apache-auth as to how to do this.
-One you have discovered that this is possible, correct the regex so it doesn't
-match and provide this as a test case with "match": false (see failJSON below).
-
-If the mechanism to create the log message isn't obvious provide a
-configuration and/or sample scripts testcases/files/config/{filtername} and
-reference these in the comments above the log line.
-
-FailJSON metadata:
-
-A failJSON metadata is a comment immediately above the log message. It will
-look like:
-
-# failJSON: { "time": "2013-06-10T10:10:59", "match": true , "host": "93.184.216.119" }
-
-Time should match the time of the log message. It is in a specific format of
-Year-Month-Day'T'Hour:minute:Second.  If your log message does not include a
-year, like the example below, the year should be listed as 2005, if before Sun
-Aug 14 10am UTC, and 2004 if afterwards.  Here is an example failJSON
-line preceding a sample log line:
-
-# failJSON: { "time": "2005-03-24T15:25:51", "match": true , "host": "198.51.100.87" }
-Mar 24 15:25:51 buffalo1 dropbear[4092]: bad password attempt for 'root' from 198.51.100.87:5543
-
-The "host" in failJSON should contain the IP or domain that should be blocked.
-
-For long lines that you do not want to be matched (e.g. from log injection
-attacks) and any log lines to be excluded (see "Cause" section above), set
-"match": false in the failJSON and describe the reason in the comment above.
-
-After developing regexes, the following command will test all failJSON metadata
-against the log lines in all sample log files
-
-./fail2ban-testcases testSampleRegex
-
-Developing Filter Regular Expressions
--------------------------------------
-
-Date/Time:
-
-At the moment, Fail2Ban depends on log lines to have time stamps.  That is why
-before starting to develop failregex, check if your log line format known to
-Fail2Ban.  Copy the time component from the log line and append an IP address to
-test with following command:
-
-./fail2ban-regex "2013-09-19 02:46:12 1.2.3.4" "<HOST>"
-
-Output of such command should contain something like:
-
-Date template hits:
-|- [# of hits] date format
-|  [1] Year-Month-Day Hour:Minute:Second
-
-Ensure that the template description matches time/date elements in your log line
-time stamp.  If there is no matched format then date template needs to be added
-to server/datedetector.py.  Ensure that a new template is added in the order
-that more specific matches occur first and that there is no confusion between a
-Day and a Month.
-
-Filter file:
-
-The filter is specified in a config/filter.d/{filtername}.conf file. Filter file
-can have sections INCLUDES (optional) and Definition as follows:
-
-[INCLUDES]
-
-before = common.conf
-
-after = filtername.local
-
-[Definition]
-
-failregex = ....
-
-ignoreregex = ....
-
-This is also documented in the man page jail.conf (section 5). Other definitions
-can be added to make failregex's more readable and maintainable to be used
-through string Interpolations (see http://docs.python.org/2.7/library/configparser.html)
-
-
-General rules:
-
-Use "before" if you need to include a common set of rules, like syslog or if
-there is a common set of regexes for multiple filters.
-
-Use "after" if you wish to allow the user to overwrite a set of customisations
-of the current filter. This file doesn't need to exist.
-
-Try to avoid using ignoreregex mainly for performance reasons. The case when you
-would use it is if in trying to avoid using it, you end up with an unreadable
-failregex.
-
-Syslog:
-
-If your application logs to syslog you can take advantage of log line prefix
-definitions present in common.conf.  So as a base use:
-
-[INCLUDES]
-
-before = common.conf
-
-[Definition]
-
-_daemon = app
-
-failregex = ^%(__prefix_line)s
-
-In this example common.conf defines __prefix_line which also contains the
-_daemon name (in syslog terms the service) you have just specified. _daemon
-can also be a regex.
-
-For example, to capture following line _daemon should be set to "dovecot"
-
-Dec 12 11:19:11 dunnart dovecot: pop3-login: Aborted login (tried to use disabled plaintext auth): rip=190.210.136.21, lip=113.212.99.193
-
-and then ^%(__prefix_line)s would match "Dec 12 11:19:11 dunnart dovecot:
-". Note it matches the trailing space(s) as well.
-
-Substitutions (AKA string interpolations):
-
-We have used string interpolations in above examples.  They are useful for
-making the regexes more readable, reuse generic patterns in multiple failregex
-lines, and also to refer definition of regex parts to specific filters or even
-to the user.  General principle is that value of a _name variable replaces
-occurrences of %(_name)s within the same section or anywhere in the config file
-if defined in [DEFAULT] section.
-
-Regular Expressions:
-
-Regular expressions (failregex, ignoreregex) assume that the date/time has been
-removed from the log line (this is just how fail2ban works internally ATM).
-
-If the format is like '<date...> error 1.2.3.4 is evil' then you need to match
-the < at the start so regex should be similar to '^<> <HOST> is evil$' using
-<HOST> where the IP/domain name appears in the log line.
-
-The following general rules apply to regular expressions:
-
-* ensure regexes start with a ^ and are as restrictive as possible. E.g. do not
-  use .* if \d+ is sufficient;
-* use functionality of Python regexes defined in the standard Python re library
-  http://docs.python.org/2/library/re.html;
-* make regular expressions readable (as much as possible). E.g.
-  (?:...) represents a non-capturing regex but (...) is more readable, thus
-  preferred.
-
-If you have only a basic knowledge of regular repressions we advise to read
-http://docs.python.org/2/library/re.html first.  It doesn't take long and would
-remind you e.g. which characters you need to escape and which you don't.
-
-Developing/testing a regex:
-
-You can develop a regex in a file or using command line depending on your
-preference. You can also use samples you have already created in the test cases
-or test them one at a time.
-
-The general tool for testing Fail2Ban regexes is fail2ban-regex. To see how to
-use it run:
-
-./fail2ban-regex --help
-
-Take note of  -l heavydebug  / -l debug  and -v as they might be very useful.
-
-TIP: Take a look at the source code of the application you are developing
-     failregex for. You may see optional or extra log messages, or parts there
-     of, that need to form part of your regex.  It may also reveal how some
-     parts are constrained and different formats depending on configuration or
-     less common usages.
-
-TIP: For looking through source code - http://sourcecodebrowser.com/ . It has
-     call graphs and can browse different versions.
-
-TIP: Some applications log spaces at the end. If you are not sure add \s*$ as
-     the end part of the regex.
-
-If your regex is not matching, http://www.debuggex.com/?flavor=python can help
-to tune it.  fail2ban-regex -D ...  will present Debuggex URLs for the regexs
-and sample log files that you pass into it.
-
-In general use when using regex debuggers for generating fail2ban filters:
-* use regex from the ./fail2ban-regex output (to ensure all substitutions are
-done)
-* replace <HOST> with (?&.ipv4)
-* make sure that regex type set to Python
-* for the test data put your log output with the date/time removed
-
-When you have fixed the regex put it back into your filter file.
-
-Please spread the good word about Debuggex - Serge Toarca is kindly continuing
-its free availability to Open Source developers.
-
-Finishing up:
-
-If you've added a new filter, add a new entry in config/jail.conf. The theory
-here is that a user will create a jail.local with [filtername]\nenable=true to
-enable your jail.
-
-So more specifically in the [filter] section in jail.conf:
-* ensure that you have "enabled = false" (users will enable as needed);
-* use "filter =" set to your filter name;
-* use a typical action to disable ports associated with the application;
-* set "logpath" to the usual location of application log file;
-* if the default findtime or bantime isn't appropriate to the filter, specify
-  more appropriate choices (possibly with a brief comment line).
-
-Submit github pull request (See "Pull Requests" above) for
-github.com/fail2ban/fail2ban containing your great work.
-
-Filter Security
----------------
-
-Poor filter regular expressions are susceptible to DoS attacks.
-
-When a remote user has the ability to introduce text that would match filter's
-failregex, while matching inserted text to the <HOST> part, they have the
-ability to deny any host they choose.
-
-So the <HOST> part must be anchored on text generated by the application, and
-not the user, to an extent sufficient to prevent user inserting the entire text
-matching this or any other failregex.
-
-Ideally filter regex should anchor at the beginning and at the end of log line.
-However as more applications log at the beginning than the end, anchoring the
-beginning is more important. If the log file used by the application is shared
-with other applications, like system logs, ensure the other application that use
-that log file do not log user generated text at the beginning of the line, or,
-if they do, ensure the regexes of the filter are sufficient to mitigate the risk
-of insertion.
-
-
-Examples of poor filters
-------------------------
-
-1. Too restrictive
-
-We find a log message:
-
-    Apr-07-13 07:08:36 Invalid command fial2ban from 1.2.3.4
-
-We make a failregex
-
-    ^Invalid command \S+ from <HOST>
-
-Now think evil. The user does the command 'blah from 1.2.3.44'
-
-The program diligently logs:
-
-    Apr-07-13 07:08:36 Invalid command blah from 1.2.3.44 from 1.2.3.4
-
-And fail2ban matches 1.2.3.44 as the IP that it ban. A DoS attack was successful.
-
-The fix here is that the command can be anything so .* is appropriate.
-
-    ^Invalid command .* from <HOST>
-
-Here the .* will match until the end of the string. Then realise it has more to
-match, i.e. "from <HOST>" and go back until it find this. Then it will ban
-1.2.3.4 correctly. Since the <HOST> is always at the end, end the regex with a $.
-
-    ^Invalid command .* from <HOST>$
-
-Note if we'd just had the expression:
-
-    ^Invalid command \S+ from <HOST>$
-
-Then provided the user put a space in their command they would have never been
-banned.
-
-2. Unanchored regex can match other user injected data
-
-From the Apache vulnerability CVE-2013-2178
-( original ref: https://vndh.net/note:fail2ban-089-denial-service ).
-
-An example bad regex for Apache:
-
-    failregex = [[]client <HOST>[]] user .* not found
-
-Since the user can do a get request on:
-
-    GET /[client%20192.168.0.1]%20user%20root%20not%20found HTTP/1.0
-Host: remote.site
-
-Now the log line will be:
-
-    [Sat Jun 01 02:17:42 2013] [error] [client 192.168.33.1] File does not exist: /srv/http/site/[client 192.168.0.1] user root not found
-
-As this log line doesn't match other expressions hence it matches the above
-regex and blocks 192.168.33.1 as a denial of service from the HTTP requester.
-
-3.  Over greedy pattern matching
-
-From: https://github.com/fail2ban/fail2ban/pull/426
-
-An example ssh log (simplified)
-
-    Sep 29 17:15:02 spaceman sshd[12946]: Failed password for user from 127.0.0.1 port 20000 ssh1: ruser remoteuser
-
-As we assume username can include anything including spaces its prudent to put
-.* here. The remote user can also exist as anything so lets not make assumptions again.
-
-    failregex = ^%(__prefix_line)sFailed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
-
-So this works. The problem is if the .* after remote user is injected by the
-user to be 'from 1.2.3.4'. The resultant log line is.
-
-    Sep 29 17:15:02 spaceman sshd[12946]: Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4
-
-Testing with:
-
-    fail2ban-regex -v 'Sep 29 17:15:02 Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4' '^ Failed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$'
-
-TIP: I've removed the bit that matches __prefix_line from the regex and log.
-
-Shows:
-
-    1) [1] ^ Failed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
-       1.2.3.4  Sun Sep 29 17:15:02 2013
-
-It should of matched 127.0.0.1. So the first greedy part of the greedy regex
-matched until the end of the string. The was no "from <HOST>" so the regex
-engine worked backwards from the end of the string until this was matched.
-
-The result was that 1.2.3.4 was matched, injected by the user, and the wrong IP
-was banned.
-
-The solution here is to make the first .* non-greedy with .*?. Here it matches
-as little as required and the fail2ban-regex tool shows the output:
-
-    fail2ban-regex -v 'Sep 29 17:15:02 Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4' '^ Failed \S+ for .*? from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$'
-
-    1) [1] ^ Failed \S+ for .*? from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
-       127.0.0.1  Sun Sep 29 17:15:02 2013
-
-So the general case here is a log line that contains:
-
-    (fixed_data_1)<HOST>(fixed_data_2)(user_injectable_data)
-
-Where the regex that matches fixed_data_1 is gready and matches the entire
-string, before moving backwards and user_injectable_data can match the entire
-string.
-
-Another case:
-
-ref: https://www.debuggex.com/r/CtAbeKMa2sDBEfA2/0
-
-A webserver logs the following without URL escaping:
-
-    [error] 2865#0: *66647 user "xyz" was not found in "/file", client: 1.2.3.1, server: www.host.com, request: "GET ", client: 3.2.1.1, server: fake.com, request: "GET exploited HTTP/3.3", host: "injected.host", host: "www.myhost.com"
-
-regex:
-
-    failregex = ^ \[error\] \d+#\d+: \*\d+ user "\S+":? (?:password mismatch|was not found in ".*"), client: <HOST>, server: \S+, request: "\S+ .+ HTTP/\d+\.\d+", host: "\S+"
-
-The .* matches to the end of the string. Finds that it can't continue to match
-", client ... so it moves from the back and find that the user injected web URL:
-
-    ", client: 3.2.1.1, server: fake.com, request: "GET exploited HTTP/3.3", host: "injected.host
-
-In this case there is a fixed host: "www.myhost.com" at the end so the solution
-is to anchor the regex at the end with a $.
-
-If this wasn't the case then first .* needed to be made so it didn't capture
-beyond <HOST>.
-
-4. Application generates two identical log messages with different meanings
-
-If the application generates the following two messages under different
-circumstances:
-
-    client <IP>: authentication failed
-    client <USER>: authentication failed
-
-
-Then it's obvious that a regex of "^client <HOST>: authentication
-failed$" will still cause problems if the user can trigger the second
-log message with a <USER> of 123.1.1.1.
-
-Here there's nothing to do except request/change the application so it logs
-messages differently.
-
+If you are developing filters see the FILTERS file for documentation.
 
 Code Testing
 ============
author	Daniel Black <grooverdan@users.sourceforge.net>	2013-12-29 07:26:41 +0000
committer	Daniel Black <grooverdan@users.sourceforge.net>	2013-12-29 07:26:41 +0000
commit	2b9d4f86cd1155b548a92b4429f05ed36a7a7a79 (patch)
tree	32647b00f6cac3eb7196a8b2f9d8d0182458fcee /DEVELOP
parent	4a0e42856341e73412f998f50f1bfea98ed59cee (diff)
download	fail2ban-2b9d4f86cd1155b548a92b4429f05ed36a7a7a79.tar.gz