From c81c7f51c38de6dff5ffc55b5184061b84c7ea5f Mon Sep 17 00:00:00 2001 From: Itamar Haber Date: Wed, 23 Feb 2022 22:34:58 +0200 Subject: Add stream consumer group lag tracking and reporting (#9127) Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish Co-authored-by: Oran Agra --- src/stream.h | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) (limited to 'src/stream.h') diff --git a/src/stream.h b/src/stream.h index 724f2c2ad..2d4997919 100644 --- a/src/stream.h +++ b/src/stream.h @@ -15,8 +15,11 @@ typedef struct streamID { typedef struct stream { rax *rax; /* The radix tree holding the stream. */ - uint64_t length; /* Number of elements inside this stream. */ + uint64_t length; /* Current number of elements inside this stream. */ streamID last_id; /* Zero if there are yet no items. */ + streamID first_id; /* The first non-tombstone entry, zero if empty. */ + streamID max_deleted_entry_id; /* The maximal ID that was deleted. */ + uint64_t entries_added; /* All time count of elements added. */ rax *cgroups; /* Consumer groups dictionary: name -> streamCG */ } stream; @@ -34,6 +37,7 @@ typedef struct streamIterator { unsigned char *master_fields_ptr; /* Master field to emit next. */ int entry_flags; /* Flags of entry we are emitting. */ int rev; /* True if iterating end to start (reverse). */ + int skip_tombstones; /* True if not emitting tombstone entries. */ uint64_t start_key[2]; /* Start key as 128 bit big endian. */ uint64_t end_key[2]; /* End key as 128 bit big endian. */ raxIterator ri; /* Rax iterator. */ @@ -52,6 +56,11 @@ typedef struct streamCG { streamID last_id; /* Last delivered (not acknowledged) ID for this group. Consumers that will just ask for more messages will served with IDs > than this. */ + long long entries_read; /* In a perfect world (CG starts at 0-0, no dels, no + XGROUP SETID, ...), this is the total number of + group reads. In the real world, the reasoning behind + this value is detailed at the top comment of + streamEstimateDistanceFromFirstEverEntry(). */ rax *pel; /* Pending entries list. This is a radix tree that has every message delivered to consumers (without the NOACK option) that was yet not acknowledged @@ -105,6 +114,8 @@ struct client; #define SCC_NO_NOTIFY (1<<0) /* Do not notify key space if consumer created */ #define SCC_NO_DIRTIFY (1<<1) /* Do not dirty++ if consumer created */ +#define SCG_INVALID_ENTRIES_READ -1 + stream *streamNew(void); void freeStream(stream *s); unsigned long streamLength(const robj *subject); @@ -117,7 +128,7 @@ void streamIteratorStop(streamIterator *si); streamCG *streamLookupCG(stream *s, sds groupname); streamConsumer *streamLookupConsumer(streamCG *cg, sds name, int flags); streamConsumer *streamCreateConsumer(streamCG *cg, sds name, robj *key, int dbid, int flags); -streamCG *streamCreateCG(stream *s, char *name, size_t namelen, streamID *id); +streamCG *streamCreateCG(stream *s, char *name, size_t namelen, streamID *id, long long entries_read); streamNACK *streamCreateNACK(streamConsumer *consumer); void streamDecodeID(void *buf, streamID *id); int streamCompareID(streamID *a, streamID *b); @@ -131,6 +142,8 @@ int streamParseID(const robj *o, streamID *id); robj *createObjectFromStreamID(streamID *id); int streamAppendItem(stream *s, robj **argv, int64_t numfields, streamID *added_id, streamID *use_id, int seq_given); int streamDeleteItem(stream *s, streamID *id); +void streamGetEdgeID(stream *s, int first, int skip_tombstones, streamID *edge_id); +long long streamEstimateDistanceFromFirstEverEntry(stream *s, streamID *id); int64_t streamTrimByLength(stream *s, long long maxlen, int approx); int64_t streamTrimByID(stream *s, streamID minid, int approx); -- cgit v1.2.1