summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* DOC: add size format section to manualHEADmasterDaniel Epperson2023-05-171-2/+19
| | | | | | The manual refers to an HAProxy size format but does not define it. This patch adds a section to the manual to define the HAProxy size format.
* [RELEASE] Released version 2.8-dev12v2.8-dev12Christopher Faulet2023-05-174-3/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Released version 2.8-dev12 with the following main changes : - BUILD: mjson: Fix warning about unused variables - MINOR: spoe: Don't stop disabled proxies - BUG/MEDIUM: filters: Don't deinit filters for disabled proxies during startup - BUG/MINOR: hlua_fcn/queue: fix broken pop_wait() - BUG/MINOR: hlua_fcn/queue: fix reference leak - CLEANUP: hlua_fcn/queue: make queue:push() easier to read - BUG/MINOR: quic: Buggy acknowlegments of acknowlegments function - DEBUG: list: add DEBUG_LIST to purposely corrupt list heads after delete - MINOR: stats: report the total number of warnings issued - MINOR: stats: report the number of times the global maxconn was reached - BUG/MINOR: mux-quic: do not prevent shutw on error - BUG/MINOR: mux-quic: do not free frame already released by quic-conn - BUG/MINOR: mux-quic: no need to subscribe for detach streams - MINOR: mux-quic: add traces for stream wake - MINOR: mux-quic: do not send STREAM frames if already subscribe - MINOR: mux-quic: factorize send subscribing - MINOR: mux-quic: simplify return path of qc_send() - MEDIUM: quic: streamline error notification - MEDIUM: mux-quic: adjust transport layer error handling - MINOR: stats: report the listener's protocol along with the address in stats - BUG/MEDIUM: mux-fcgi: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR - BUG/MEDIUM: mux-fcgi: Don't request more room if mux is waiting for more data - MINOR: stconn: Add a cross-reference between SE descriptor - BUG/MINOR: proxy: missing free in free_proxy for redirect rules - MINOR: proxy: add http_free_redirect_rule() function - BUG/MINOR: http_rules: fix errors paths in http_parse_redirect_rule() - CLEANUP: http_act: use http_free_redirect_rule() to clean redirect act - MINOR: tree-wide: use free_acl_cond() where relevant - CLEANUP: acl: discard prune_acl_cond() function - BUG/MINOR: cli: don't complain about empty command on empty lines - MINOR: cli: add an option to display the uptime in the CLI's prompt - MINOR: master/cli: also implement the timed prompt on the master CLI - MINOR: cli: make "show fd" identify QUIC connections and listeners - MINOR: httpclient: allow to disable the DNS resolvers of the httpclient - BUILD: debug: fix build issue on 32-bit platforms in "debug dev task" - MINOR: ncbuf: missing malloc checks in standalone code - DOC: lua: fix core.{proxies,frontends,backends} visibility - EXAMPLES: fix race condition in lua mailers script - BUG/MINOR: errors: handle malloc failure in usermsgs_put() - BUG/MINOR: log: fix memory error handling in parse_logsrv() - BUG/MINOR: quic: Wrong redispatch for external data on connection socket - MINOR: htx: add function to set EOM reliably - MINOR: mux-quic: remove dedicated function to handle standalone FIN - BUG/MINOR: mux-quic: properly handle buf alloc failure - BUG/MINOR: mux-quic: handle properly recv ncbuf alloc failure - BUG/MINOR: quic: do not alloc buf count on alloc failure - BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc - BUG/MINOR: mux-quic: free task on qc_init() app ops failure - MEDIUM: session/ssl: return the SSL error string during a SSL handshake error - CI: enable monthly Fedora Rawhide clang builds - MEDIUM: mworker/cli: does not disconnect the master CLI upon error - MINOR: stconn: Remove useless test on sedesc on detach to release the xref - MEDIUM: proxy: stop emitting logs for internal proxies when stopping - MINOR: ssl: add new sample ssl_c_r_dn - BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout - BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1 - BUG/MINOR: mux-quic: handle properly Tx buf exhaustion - BUG/MINOR: h3: missing goto on buf alloc failure - BUILD: ssl: get0_verified chain is available on libreSSL - BUG/MINOR: makefile: use USE_LIBATOMIC instead of USE_ATOMIC - MINOR: mux-quic: add trace to stream rcv_buf operation - MINOR: mux-quic: properly report end-of-stream on recv - MINOR: mux-quic: uninline qc_attach_sc() - BUG/MEDIUM: mux-quic: fix EOI for request without payload - MINOR: checks: make sure spread-checks is used also at boot time - BUG/MINOR: tcp-rules: Don't shortened the inspect-delay when EOI is set - REGTESTS: log: Reduce response inspect-delay for last_rule.vtc - DOC: config: Clarify conditions to shorten the inspect-delay for TCP rules - CLEANUP: server: remove useless tmptrash assigments in srv_update_status() - BUG/MINOR: server: memory leak in _srv_update_status_op() on server DOWN - CLEANUP: check; Remove some useless assignments to NULL - CLEANUP: stats: update the trash chunk where it's used - MINOR: clock: measure the total boot time - MINOR: stats: report the boot time in "show info" - BUG/MINOR: checks: postpone the startup of health checks by the boot time - MINOR: clock: provide a function to automatically adjust now_offset - BUG/MINOR: clock: automatically adjust the internal clock with the boot time - CLEANUP: fcgi-app; Remove useless assignment to NULL - REGTESTS: log: Reduce again response inspect-delay for last_rule.vtc - CI: drop Fedora m32 pipeline in favour of cross matrix - MEDIUM: checks: Stop scheduling healthchecks during stopping stage - MEDIUM: resolvers: Stop scheduling resolution during stopping stage - BUG/MINOR: hlua: SET_SAFE_LJMP misuse in hlua_event_runner() - BUG/MINOR: debug: fix pointer check in debug_parse_cli_task()
* BUG/MINOR: debug: fix pointer check in debug_parse_cli_task()Aurelien DARRAGON2023-05-171-1/+1
| | | | | | | | | | | | | | | | | | | Task pointer check in debug_parse_cli_task() computes the theoric end address of provided task pointer to check if it is valid or not thanks to may_access() helper function. However, relative ending address is calculated by adding task size to 't' pointer (which is a struct task pointer), thus it will result to incorrect address since the compiler automatically translates 't + x' to 't + x * sizeof(*t)' internally (with sizeof(*t) != 1 here). Solving the issue by using 'ptr' (which is the void * raw address) as starting address to prevent automatic address scaling. This was revealed by coverity, see GH #2157. No backport is needed, unless 9867987 ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") gets backported.
* BUG/MINOR: hlua: SET_SAFE_LJMP misuse in hlua_event_runner()Aurelien DARRAGON2023-05-171-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | When hlua_event_runner() pauses the subscription (ie: if the consumer can't keep up the pace), hlua_traceback() is used to get the current lua trace (running context) to provide some info to the user. However, as hlua_traceback() may raise an error (__LJMP) is set, it is used within a SET_SAFE_LJMP() / RESET_SAFE_LJMP() combination to ensure lua errors are properly handled and don't result in unexpected behavior. But the current usage of SET_SAFE_LJMP() within the function is wrong since hlua_traceback() will run a second time (unprotected) if the first (protected) attempt fails. This is undefined behavior and could even lead to crashes. Hopefully it is very hard to trigger this code path, thus we can consider this as a minor bug. Also using this as an opportunity to enhance the message report to make it more meaningful to the user. This should fix GH #2159. It is a 2.8 specific bug, no backport needed unless c84899c636 ("MEDIUM: hlua/event_hdl: initial support for event handlers") gets backported.
* MEDIUM: resolvers: Stop scheduling resolution during stopping stageChristopher Faulet2023-05-171-1/+4
| | | | | | | | | | | | | | | | | | When the process is stopping, the server resolutions are suspended. However the task is still periodically woken up for nothing. If there is a huge number of resolution, it may lead to a noticeable CPU consumption for no reason. To avoid this extra CPU cost, we stop to schedule the the resolution tasks during the stopping stage. Of course, it is only true for server resolutinos. Dynamic ones, via do-resolve actions, are not concerned. These ones must still be triggered during stopping stage. Concretly, during the stopping stage, the resolvers task is no longer scheduled if there is no running resolutions. In this case, if a do-resolve action is evaluated, the task is woken up. This patch should partially solve the issue #2145.
* MEDIUM: checks: Stop scheduling healthchecks during stopping stageChristopher Faulet2023-05-171-2/+7
| | | | | | | | | | | | When the process is stopping, the health-checks are suspended. However the task is still periodically woken up for nothing. If there is a huge number of health-checks and if they are woken up in same time, it may lead to a noticeable CPU consumption for no reason. To avoid this extra CPU cost, we stop to schedule the health-check tasks when the proxy is disabled or stopped. This patch should partially solve the issue #2145.
* CI: drop Fedora m32 pipeline in favour of cross matrixIlya Shipitsin2023-05-171-42/+0
| | | | | | Fedora m32 monthly was introduced before cross matrix. Actually, many of cross builds are 32 bit, no need to keep dedicated Fedora definition
* REGTESTS: log: Reduce again response inspect-delay for last_rule.vtcChristopher Faulet2023-05-171-1/+1
| | | | | | It was previously reduced from 10s to 1s but it remains too high, espeically for the CI. It may be drastically reduced to 100ms. Idea is to just be sure we will wait for the response before evaluating the TCP rules.
* CLEANUP: fcgi-app; Remove useless assignment to NULLChristopher Faulet2023-05-171-1/+0
| | | | | | | When the fcgi configuration is checked and fcgi rules are created, a useless assignment to NULL is reported by Covertiy. Let's remove it. This patch should fix the coverity report #2161.
* BUG/MINOR: clock: automatically adjust the internal clock with the boot timeWilly Tarreau2023-05-172-2/+5
| | | | | | | | | | | | | | | | | This is a better and more general solution to the problem described in this commit: BUG/MINOR: checks: postpone the startup of health checks by the boot time Now we're updating the now_offset that is used to compute now_ms at the few points where we update the ready date during boot. This ensures that now_ms while being stable during all the boot process will be correct and will start with the boot value right after the boot is finished. As such the patch above is rolled back (we don't want to count the boot time twice). This must not be backported because it relies on the more flexible clock architecture in 2.8.
* MINOR: clock: provide a function to automatically adjust now_offsetWilly Tarreau2023-05-172-0/+6
| | | | | | | | | Right now there's no way to enforce a specific value of now_ms upon startup in order to compensate for the time it takes to load a config, specifically when dealing with the health check startup. For this we'd need to force the now_offset value to compensate for the last known value of the current date. This patch exposes a function to do exactly this.
* BUG/MINOR: checks: postpone the startup of health checks by the boot timeWilly Tarreau2023-05-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When health checks are started at boot, now_ms could be off by the boot time. In general it's not even noticeable, but with very large configs taking up to one or even a few seconds to start, this can result in a part of the servers' checks being scheduled slightly in the past. As such all of them will start groupped, partially defeating the purpose of the spread-checks setting. For example, this can cause a burst of connections for the network, or an excess of CPU usage during SSL handshakes, possibly even causing some timeouts to expire early. Here in order to compensate for this, we simply add the known boot time to the computed delay when scheduling the startup of checks. That's very simple and particularly efficient. For example, a config with 5k servers in 800 backends checked every 5 seconds, that was taking 3.8 seconds to start used to show this distribution of health checks previously despite the spread-checks 50: 3690 08:59:25 417 08:59:26 213 08:59:27 71 08:59:28 428 08:59:29 860 08:59:30 918 08:59:31 938 08:59:32 1124 08:59:33 904 08:59:34 647 08:59:35 890 08:59:36 973 08:59:37 856 08:59:38 893 08:59:39 154 08:59:40 Now with the fix it shows this: 470 08:59:59 929 09:00:00 896 09:00:01 937 09:00:02 854 09:00:03 827 09:00:04 906 09:00:05 863 09:00:06 913 09:00:07 873 09:00:08 162 09:00:09 This should be backported to all supported versions. It depends on this commit: MINOR: clock: measure the total boot time For 2.8 where the internal clock is now totally independent on the human one, an more generic fix will consist in simply updating now_ms to reflect the startup time.
* MINOR: stats: report the boot time in "show info"Willy Tarreau2023-05-172-0/+6
| | | | | | | | | Just like we have the uptime in "show info", let's add the boot time. It's trivial to collect as it's just the difference between the ready date and the start date, and will allow users to monitor this element in order to take action before it starts becoming problematic. Here the boot time is reported in milliseconds, so this allows to even observe sub-second anomalies in startup delays.
* MINOR: clock: measure the total boot timeWilly Tarreau2023-05-174-3/+17
| | | | | | | | | | | | | | | Some huge configs take a significant amount of time to start and this can cause some trouble (e.g. health checks getting delayed and grouped, process not responding to the CLI etc). For example, some configs might start fast in certain environments and slowly in other ones just due to the use of a wrong DNS server that delays all libc's resolutions. Let's first start by measuring it by keeping a copy of the most recently known ready date, once before calling check_config_validity() and then refine it when leaving this function. A last call is finally performed just before deciding to split between master and worker processes, and it covers the whole boot. It's trivial to collect and even allows to get rid of a call to clock_update_date() in function check_config_validity() that was used in hope to better schedule future events.
* CLEANUP: stats: update the trash chunk where it's usedWilly Tarreau2023-05-171-1/+1
| | | | | | | | | When integrating the number of warnings in "show info" in 2.8 with commit 3c4a297d2 ("MINOR: stats: report the total number of warnings issued"), the update of the trash buffer used by the Tainted flag got displaced lower. There's no harm for now util someone adds a new metric requiring a call to chunk_newstr() and gets both values merged. Let's move the call to its location now.
* CLEANUP: check; Remove some useless assignments to NULLChristopher Faulet2023-05-171-3/+1
| | | | | | | | In process_chk_conn(), some assignments to NULL are useless and are reported by Coverity as unused value. while it is harmless, these assignments can be removed. This patch should fix the coverity report #2158.
* BUG/MINOR: server: memory leak in _srv_update_status_op() on server DOWNAurelien DARRAGON2023-05-171-0/+1
| | | | | | | | | | | | | | When server is transitionning from UP to DOWN, a log message is generated. e.g.: "Server backend_name/server_name is DOWN") However since f71e064 ("MEDIUM: server: split srv_update_status() in two functions"), the allocated buffer tmptrash which is used to prepare the log message is not freed after it has been used, resulting in a small memory leak each time a server goes DOWN because of an operational change. This is a 2.8 specific bug, no backport needed unless the above commit gets backported.
* CLEANUP: server: remove useless tmptrash assigments in srv_update_status()Aurelien DARRAGON2023-05-171-11/+0
| | | | | | | | | | | | | | | Within srv_update_status subfunctions _op() and _adm(), each time tmptrash is freed, we assign it to NULL to ensure it will not be reused. However, within those functions it is not very useful given that tmptrash is never checked against NULL except upon allocation through alloc_trash_chunk(), which happens everytime a new log message is generated, sent, and then freed right away, so there are no code paths that could lead to tmptrash being checked for reuse (tmptrash is systematically overwritten since all log messages are independant from each other). This was raised by coverity, see GH #2162.
* DOC: config: Clarify conditions to shorten the inspect-delay for TCP rulesChristopher Faulet2023-05-171-0/+3
| | | | Add a sentence to state when the inspect-delay is shortened for a TCP rule.
* REGTESTS: log: Reduce response inspect-delay for last_rule.vtcChristopher Faulet2023-05-171-1/+1
| | | | | | | Because of the previous fix, log/last_rule.vtc script is failing. The inspect-delay is no longer shorten when the end of the message is reached. Thus WAIT_END acl is trully respected. 10s is too high and hit the Vtext timeout, making the script fails.
* BUG/MINOR: tcp-rules: Don't shortened the inspect-delay when EOI is setChristopher Faulet2023-05-171-2/+2
| | | | | | | | | | | | | | | A regression was introduced with the commit cb59e0bc3 ("BUG/MINOR: tcp-rules: Stop content rules eval on read error and end-of-input"). We should not shorten the inspect-delay when the EOI flag is set on the SC. Idea of the inspect-delay is to wait a TCP rule is matching. It is only interrupted if an error occurs, on abort or if the peer shuts down. It is also interrupted if the buffer is full. This last case is a bit ambiguous and discutable. It could be good to add ACLS, like "wait_complete" and "wait_full" to do so. But for now, we only remove the test on SC_FL_EOI flag. This patch must be backported to all stable versions.
* MINOR: checks: make sure spread-checks is used also at boot timeWilly Tarreau2023-05-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes use of spread-checks also for the startup of the check tasks. This provides a smoother load on startup for uneven configurations which tend to enable only *some* servers. Below is the connection distribution per second of the SSL checks of a config with 5k servers spread over 800 backends, with a check inter of 5 seconds: - default: 682 08:00:50 826 08:00:51 773 08:00:52 1016 08:00:53 885 08:00:54 889 08:00:55 825 08:00:56 773 08:00:57 1016 08:00:58 884 08:00:59 888 08:01:00 491 08:01:01 - with spread-checks 50: 437 08:01:19 866 08:01:20 777 08:01:21 1023 08:01:22 1118 08:01:23 923 08:01:24 641 08:01:25 859 08:01:26 962 08:01:27 860 08:01:28 929 08:01:29 909 08:01:30 866 08:01:31 849 08:01:32 114 08:01:33 - with spread-checks 50 + this patch: 680 08:01:55 922 08:01:56 962 08:01:57 899 08:01:58 819 08:01:59 843 08:02:00 916 08:02:01 896 08:02:02 886 08:02:03 846 08:02:04 903 08:02:05 894 08:02:06 178 08:02:07 The load is much smoother from the start, this can help initial health checks succeed when many target the same overloaded server for example. This could be backported as it should make border-line configs more reliable across reloads.
* BUG/MEDIUM: mux-quic: fix EOI for request without payloadAmaury Denoyelle2023-05-163-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | When a full message is received for a stream, MUX is responsible to set EOI flag. This was done through rcv_buf stream callback by checking if QCS HTX buffer contained the EOM flag. This is not correct for HTTP without body. In this case, QCS HTX buffer is never used. Only a local HTX buffer is used to transfer headers just as stream endpoint is created. As such, EOI is never transmitted to the upper layer. If the transfer occur without any issue, this does not seem to cause any problem. However, in case the transfer is aborted, the stream is never released which cause a memory leak and prevent the process soft-stop. To fix this, also check if EOM is put by application layer during headers conversion. If true, this is transferred through a new argument to qc_attach_sc() MUX function which is responsible to set the EOI flag. This issue was reproduced using h2load with hundred of connections. h2load is interrupted with a SIGINT which causes streams to never be closed on haproxy side. This should be backported up to 2.6.
* MINOR: mux-quic: uninline qc_attach_sc()Amaury Denoyelle2023-05-162-44/+46
| | | | | | | Uninline and move qc_attach_sc() function to implementation source file. This will be useful for next commit to add traces in it. This should be backported up to 2.7.
* MINOR: mux-quic: properly report end-of-stream on recvAmaury Denoyelle2023-05-162-1/+32
| | | | | | | | | | | | | | | | MUX is responsible to put EOS on stream when read channel is closed. This happens if underlying connection is closed or a RESET_STREAM is received. FIN STREAM is ignored in this case. For connection closure, simply check for CO_FL_SOCK_RD_SH. For RESET_STREAM reception, a new flag QC_CF_RECV_RESET has been introduced. It is set when RESET_STREAM is received, unless we already received all data. This is conform to QUIC RFC which allows to ignore a RESET_STREAM in this case. During RESET_STREAM processing, input buffer is emptied so EOS can be reported right away on recv_buf operation. This should be backported up to 2.7.
* MINOR: mux-quic: add trace to stream rcv_buf operationAmaury Denoyelle2023-05-161-6/+11
| | | | | | | | | Add traces to render each stream transition more explicit. Also, move ERR_PENDING to ERROR transition after other stream flags are set, as with the MUX H2 implementation. This is purely a cosmetic change and it should have no functional impact. This should be backported up to 2.7.
* BUG/MINOR: makefile: use USE_LIBATOMIC instead of USE_ATOMICDragan Dosen2023-05-151-1/+1
| | | | | | | The issue was introduced with commit c108f37c2 ("BUILD: makefile: rework 51D to split v3/v4"), and is also related to commit b16d9b58 ("BUILD: makefile: never force -latomic, set USE_LIBATOMIC instead") where USE_ATOMIC has been replaced.
* BUILD: ssl: get0_verified chain is available on libreSSLWilliam Lallemand2023-05-151-0/+4
| | | | Define HAVE_SSL_get0_verified_chain when it's using libreSSL >= 3.3.6.
* BUG/MINOR: h3: missing goto on buf alloc failureAmaury Denoyelle2023-05-151-0/+1
| | | | | | | | | | | | | | | | The following patch introduced proper error management on buffer allocation failure : 0abde9dee69fe151f5f181a34e0782ef840abe53 BUG/MINOR: mux-quic: properly handle buf alloc failure However, when decoding an empty STREAM frame with just FIN bit set, this was not done correctly. Indeed, there is a missing goto statement in case of a NULL buffer check. This was reported thanks to coverity analysis. This should fix github issue #2163. This must be backported up to 2.6.
* BUG/MINOR: mux-quic: handle properly Tx buf exhaustionAmaury Denoyelle2023-05-151-2/+2
| | | | | | | | | | | | | | | | | | | Since the following patch commit 6c501ed23bea953518059117e7dd19e8d6cb6bd8 BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc alloc it is not possible to check if Tx buf allocation failed due to a configured limit exhaustion or a simple memory failure. This patch fixes it as the condition was inverted. Indeed, if buf_avail is null, this means that the limit has been reached. On the contrary case, this is a real memory alloc failure. This caused the flag QC_CF_CONN_FULL to not be properly used and may have caused disruption on transfer with several streams or large data. This was detected due to an abnormal error QUIC MUX traces. Also change in consequence trace for limit exhaustion to be more explicit. This must be backported up to 2.6.
* BUILD: ssl: ssl_c_r_dn fetches uses functiosn only available since 1.1.1William Lallemand2023-05-154-2/+9
| | | | | | | | Fix the openssl build with older openssl version by disabling the new ssl_c_r_dn fetch. This also disable the ssl_client_samples.vtc file for OpenSSL version older than 1.1.1
* BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeoutWilly Tarreau2023-05-151-23/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christopher found as part of the analysis of Tim's issue #1891 that commit 15a4733d5 ("BUG/MEDIUM: mux-h2: make use of http-request and keep-alive timeouts") introduced in 2.6 incompletely addressed a timeout issue in the H2 mux. The problem was that the http-keepalive and http-request timeouts were not applied before it. With that commit they are now considered, but if a GOAWAY is sent (or even attempted to be sent), then they are not used anymore again, because the way the code is arranged consists in applying the client-fin timeout (if set) to the current date, and falling back to the client timeout, without considering the idle_start period. This means that a config having a "timeout http-keepalive" would still not close the connection quickly when facing a client that periodically sends PING, PRIORITY or whatever other frame types. In addition, after the GOAWAY was attempted to be sent, there was no check for pending data in the output buffer, meaning that it would be possible to truncate some responses in configs involving a very short client-fin timeout. Finally the spreading of the closures during the soft-stop brought in 2.6 by commit b5d968d9b ("MEDIUM: global: Add a "close-spread-time" option to spread soft-stop on time window") didn't consider the particular case of an idle "pre-connect" connection, which would also live long if a browser failed to deliver a valid request for a long time. All of this indicates that the conditions must be reworked so as not to have that level of exclusion between conditions, but rather stick to the rules from the doc that are already enforced on other muxes: - timeout client always applies if there are data pending, and is relative to each new I/O ; - timeout http-request applies before the first complete request and is relative to the entry in idle state ; - timeout http-keepalive applies between idle and the next complete request and is relative to the entry in idle state ; - timeout client-fin applies when in idle after a shut was sent (here the shut is the GOAWAY). The shut may only be considered as sent if the buffer is empty and the flags indicate that it was successfully sent (or failed) but not if it's still waiting for some room in the output buffer for example. This implies that this timeout may then lower the http-keepalive/http-request ones. This is what this patch implements. Of course the client timeout still applies as a fallback when all the ones above are not set or when their conditions are not met. It would seem reasoanble to backport this to 2.7 first, then only after one or two releases to 2.6.
* MINOR: ssl: add new sample ssl_c_r_dnAbhijeet Rastogi2023-05-155-0/+101
| | | | | | This patch addresses #1514, adds the ability to fetch DN of the root ca that was in the chain when client certificate was verified during SSL handshake.
* MEDIUM: proxy: stop emitting logs for internal proxies when stoppingWilliam Lallemand2023-05-151-2/+2
| | | | | | | | | The HTTPCLIENT and the OCSP-UPDATE proxies are internal proxies, we don't need to display logs of them stopping during the stopping of the process. This patch checks if a proxy has the flag PR_CAP_INT so it doesn't display annoying messages.
* MINOR: stconn: Remove useless test on sedesc on detach to release the xrefChristopher Faulet2023-05-151-7/+5
| | | | | | | | When the SC is detached from the endpoint, the xref between the endpoints is removed. At this stage, the sedesc cannot be undefined. So we can remove the test on it. This issue should fix the issue #2156. No backport needed.
* MEDIUM: mworker/cli: does not disconnect the master CLI upon errorWilliam Lallemand2023-05-141-8/+24
| | | | | | | | | | | | | | | | In the proxy CLI analyzer, when pcli_parse_request returns -1, the client was shut to prevent any problem with the master CLI. This behavior is a little bit excessive and not handy at all in prompt mode. For example one could have activated multiples mode, then have an error which disconnect the CLI, and they would have to reconnect and enter all the modes again. This patch introduces the pcli_error() function, which only output an error and flush the input buffer, instead of closing everything. When encountering a parsing error, this function is used, and the prompt is written again, without any disconnection.
* CI: enable monthly Fedora Rawhide clang buildsIlya Shipitsin2023-05-131-3/+1
| | | | | | | that was temporarily disabled due to https://github.com/haproxy/haproxy/issues/1868 we are unblocked, let us enable clang in matrix
* MEDIUM: session/ssl: return the SSL error string during a SSL handshake errorWilliam Lallemand2023-05-121-9/+44
| | | | | | | | | | | | | | | | | SSL hanshake error were unable to dump the OpenSSL error string by default, to do so it was mandatory to configure a error-log-format with the ssl_fc_err fetch. This patch implements the session_build_err_string() function which creates the error log to send during session_kill_embryonic(), a special case is made with CO_ER_SSL_HANDSHAKE which is able to dump the error string with ERR_error_string(). Before: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure After: <134>May 12 17:14:04 haproxy[183151]: 127.0.0.1:49346 [12/May/2023:17:14:04.571] frt2/1: SSL handshake failure (error:0A000418:SSL routines::tlsv1 alert unknown ca)
* BUG/MINOR: mux-quic: free task on qc_init() app ops failureAmaury Denoyelle2023-05-121-0/+1
| | | | | | | | | | | | | | | | | | | | qc_init() is used to initialize a QUIC MUX instance. On failure, each resources are released via a series of goto statements. There is one issue if the app_ops.init callback fails. In this case, MUX task is not freed. This can cause a crash as the task is already scheduled. When the handler will run, it will crash when trying to access qcc instance. To fix this, properly destroy qcc task on fail_install_app_ops label. The impact of this bug is minor as app_ops.init callback succeeds most of the time. However, it may fail on allocation failure due to memory exhaustion. This may fix github issue #2154. This must be backported up to 2.7.
* BUG/MINOR: mux-quic: differentiate failure on qc_stream_desc allocAmaury Denoyelle2023-05-123-10/+17
| | | | | | | | | | | | | | | | | | | qc_stream_buf_alloc() can fail for two reasons : * limit of Tx buffer per connection reached * allocation failure The first case is properly treated. A flag QC_CF_CONN_FULL is set on the connection to interrupt emission. It is cleared when a buffer became available after in order ACK reception and the MUX tasklet is woken up. The allocation failure was handled with the same mechanism which in this case is not appropriate and could lead to a connection transfer freeze. Instead, prefer to close the connection with a QUIC internal error code. To differentiate the two causes, qc_stream_buf_alloc() API was changed to return the number of available buffers to the caller. This must be backported up to 2.6.
* BUG/MINOR: quic: do not alloc buf count on alloc failureAmaury Denoyelle2023-05-121-2/+2
| | | | | | | | | | | | | | | | The total number of buffer per connection for sending is limited by a configuration value. To ensure this, <stream_buf_count> quic_conn field is incremented on qc_stream_buf_alloc(). qc_stream_buf_alloc() may fail if the buffer cannot be allocated. In this case, <stream_buf_count> should not be incremented. To fix this, simply move increment operation after buffer allocation. The impact of this bug is low. However, if a connection suffers from several buffer allocation failure, it may cause the <stream_buf_count> to be incremented over the limit without being able to go back down. This must be backported up to 2.6.
* BUG/MINOR: mux-quic: handle properly recv ncbuf alloc failureAmaury Denoyelle2023-05-121-5/+9
| | | | | | | | | | | The function qc_get_ncbuf() is used to allocate a ncbuf content. Allocation failure was handled using a plain BUG_ON. Fix this by a proper error management. This buffer is only used for STREAM frame reception to support out-of-order offsets. When an allocation failed, close the connection with a QUIC internal error code. This should be backported up to 2.6.
* BUG/MINOR: mux-quic: properly handle buf alloc failureAmaury Denoyelle2023-05-122-12/+44
| | | | | | | | | | | | | A convenience function qc_get_buf() is implemented to centralize buffer allocation on MUX and H3 layers. However, allocation failure was not handled properly with a BUG_ON() statement. Replace this by proper error management. On emission, streams is temporarily skip over until the next qc_send() invocation. On reception, H3 uses this function for HTX conversion; on alloc failure the connection will be closed with QUIC internal error code. This must be backported up to 2.6.
* MINOR: mux-quic: remove dedicated function to handle standalone FINAmaury Denoyelle2023-05-124-26/+24
| | | | | | | | | | | | | | Remove QUIC MUX function qcs_http_handle_standalone_fin(). The purpose of this function was only used when receiving an empty STREAM frame with FIN bit. Besides, it was called by each application protocol which could have different approach and render the function purpose unclear. Invocation of qcs_http_handle_standalone_fin() have been replaced by explicit code in both H3 and HTTP/0.9 module. In the process, use htx_set_eom() to reliably put EOM on the HTX message. This should be backported up to 2.7, along with the previous patch which introduced htx_set_eom().
* MINOR: htx: add function to set EOM reliablyAmaury Denoyelle2023-05-122-12/+21
| | | | | | | | | | | | Implement a new HTX utility function htx_set_eom(). If the HTX message is empty, it will first add a dummy EOT block. This is a small trick needed to ensure readers will detect the HTX buffer as not empty and retrieve the EOM flag. Replace the H2 code related by a htx_set_eom() invocation. QUIC also has the same code which will be replaced in the next commit. This should be backported up to 2.7 before the related QUIC patch.
* BUG/MINOR: quic: Wrong redispatch for external data on connection socketFrédéric Lécaille2023-05-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to receive datagram from other connection on a dedicated quic-conn socket. This is due to a race condition between bind() and connect() system calls. To handle this, an explicit check is done on each datagram. If the DCID is not associated to the connection which owns the socket, the datagram is redispatch as if it arrived on the listener socket. This redispatch step was not properly done because the source address specified for the redispatch function was incorrect. Instead of using the datagram source address, we used the address of the socket quic-conn which received the datagram due to the above race condition. Fix this simply by using the address from the recvmsg() system call. The impact of this bug is minor as redispatch on connection socket should be really rare. However, when it happens it can lead to several kinds of problems, like for example a connection initialized with an incorrect peer address. It can also break the Retry token check as this relies on the peer address. In fact, Retry token check failure was the reason this bug was found. When using h2load with thousands of clients, the counter of Retry token failure was unusually high. With this patch, no failure is reported anymore for Retry. Must be backported to 2.7.
* BUG/MINOR: log: fix memory error handling in parse_logsrv()Aurelien DARRAGON2023-05-121-0/+4
| | | | | | | | | | | | A check was missing in parse_logsrv() to make sure that malloc-dependent variable is checked for non-NULL before using it. If malloc fails, the function raises an error and stops, like it's already done at a few other places within the function. This partially fixes GH #2130. It should be backported to every stable versions.
* BUG/MINOR: errors: handle malloc failure in usermsgs_put()Aurelien DARRAGON2023-05-121-1/+2
| | | | | | | | | | | | | | | usermsgs_buf.size is set without first checking if previous malloc attempt succeeded. This could fool the buffer API into assuming that the buffer is initialized, resulting in unsafe read/writes. Guarding usermsgs_buf.size assignment with the malloc attempt result to make the buffer initialization safe against malloc failures. This partially fixes GH #2130. It should be backported up to 2.6.
* EXAMPLES: fix race condition in lua mailers scriptAurelien DARRAGON2023-05-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christopher reported a rare race condition involving 'healthcheckmail.vtc' The regtest would randomly FAIL with this kind of error: ** S1 === expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Health check for server b... **** S1 EXPECT MATCH ~ "[^:\[ ]\[581669\]: Health check for server be1/srv1 failed.+check duration: [[:digit:]]+ms.+status: 0/1 DOWN." ** S1 === recv info **** S1 syslog|<25>May 11 15:38:46 haproxy[581669]: Server be1/srv1 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. **** S1 syslog|<24>May 11 15:38:46 haproxy[581669]: backend be1 has no server available! It turns out that this it due to the recent commit 7963fb5 ("REGTESTS: use lua mailer script for mailers tests") in which we tell the regtest to use the new lua mailers instead of the legacy mailers API. However, in the lua mailers script, due to the event subscriptions being performed from a lua task, it is possible that the subscription may be delayed during startup. Indeed lua tasks relie on the scheduler which runs tasks with no ordering guarantees. Thus early tasks, including server checks which are used in the regtest are competing during startup. As such, we may end up with some events that are generated right before the lua mailers script starts subscribing to events (because the lua task is scheduled but started yet), resulting in events loss from lua point of view. To fix this and to make lua mailers more reliable during startup, we now perform the events subscription from an init function instead of an asynchronous task. (The init function is called synchronously during haproxy post_init, and exclusively runs before the scheduler starts) This should be enough to prevent healthcheckmail.vtc from randomly failing
* DOC: lua: fix core.{proxies,frontends,backends} visibilityAurelien DARRAGON2023-05-121-3/+3
| | | | | | | | Despite the doc not mentionning it, core.{proxies,frontends,backends} methods are also available from init context. (through core.register_init() functions) Updating the documentation to reflect this possibility.