Added some basic implementation documentation for stored procedures.

author: unknown <pem@mysql.com> 2003-02-02 17:49:42 +0100
committer: unknown <pem@mysql.com> 2003-02-02 17:49:42 +0100
commit: 77f30e19a6da4d17a5e282e5967cf4c61b35ddec (patch)
tree: 5336654be31b33f8285f915b0af7ccad2375fd86 /Docs/sp-imp-spec.txt
parent: a3111ad09b074dcf7c95b0a055e0bbbf6005b2ba (diff)
download: mariadb-git-77f30e19a6da4d17a5e282e5967cf4c61b35ddec.tar.gz
1 files changed, 469 insertions, 0 deletions
diff --git a/Docs/sp-imp-spec.txt b/Docs/sp-imp-spec.txt
new file mode 100644
index 00000000000..23559497cae
--- /dev/null
+++ b/Docs/sp-imp-spec.txt
@@ -0,0 +1,469 @@
+
+	      Implementation specification for Stored Procedures
+	      ==================================================
+
+This is a first draft, only covering the basics for parsing, creating, and
+calling a PROCEDURE.
+
+
+- How parsing and execution of queries work
+
+  In order to execute a query, the function sql_parse.cc:mysql_parse() is
+  called, which in turn calls the parser (yyparse()) with an updated Lex
+  structure as the result. mysql_parse() then calls mysql_execute_command()
+  which dispatches on the command code (in Lex) to the corresponding code for
+  executing that particular query.
+
+  There are thre structures involved in the execution of a query which are of
+  interest to the stored procedure implementation:
+
+  - Lex (mentioned above) is the "compiled" query, that is the output from
+    the parser and what is then interpreted to do the actual work.
+    It constains an enum value (sql_command) which is the query type, and
+    all the data collected by the parser needed for the execution  (table
+    names, fields, values, etc).
+  - THD is the "run-time" state of a connection, containing all that is
+    needed for a particular client connection, and, among other things, the
+    Lex structure currently being executed.
+  - Item_*:  During parsing, all data is translated into "items", objects of
+    the subclasses of "Item", such as Item_int, Item_real, Item_string, etc,
+    for basic datatypes, and also various more specialized Item types for
+    expressions to be evaluated (Item_func objects).
+
+
+- How to fit Stored Procedure into this scheme
+
+  - An overview of the classes and files for stored procedures
+    (More detailed APIs at the end of this file)
+
+    - class sp_head (sp_head.{cc,h})
+      This contains, among other things, an array of "instructions" and the
+      method for executing the procedure.
+
+    - class sp_pcontext (sp_pcontext.{cc,h}
+      This is the parse context for the procedure. It's primarily used during
+      parsing to keep track of local parameters, variables and labels, but
+      it's also used at CALL time do find parameters mode (IN, OUT or INOUT)
+      and type when setting up the runtime context.
+
+    - class sp_instr (sp_head.{cc,h})
+      This is the base class for "instructions", that is, what is generated
+      by the parser. It turns out that we only need 4 different sub classes:
+      - sp_instr_stmt
+        Execute a statement. This is the "call-out" any normal SQL statement,
+        like a SELECT, INSERT etc. It contains the Lex structure for the
+        statement in question.
+      - sp_instr_set
+        Set the value of a local variable (or parameter)
+      - sp_instr_jump
+        An unconditional jump.
+      - sp_instr_jump_if_not
+        Jump if condition is not true. It turns out that the negative test is
+        most convenient when generating the code for the flow control
+        constructs.
+
+    - class sp_rcontext (sp_rcontext.h)
+      This is the runtime context in the THD structure.
+      It contains an array of items, the parameters and local variables for
+      the currently executing stored procedure.
+      This means that variable value lookup is in runtime is constant time,
+      a simple index operation.
+
+    - class Item_splocal (Item.{cc,h})
+      This is a subclass of Item. Its sole purpose is to hide the fact that
+      the real Item is actually in the current frame (runtime context).
+      It contains the frame offset and defers all methods to the real Item
+      in the frame. This is what the parser generates for local variables.
+
+    - Utility functions (sp.{cc,h})
+      This contains functions for creating, dropping and finding a stored
+      procedure in the mysql.proc table (or internal cache, when it is
+      implemented).
+
+
+  - Parsing CREATE PROCEDURE ...
+
+    When parsing a CREATE PROCEDURE the parser first initializes the
+    sphead and spcont (runtime context) fields in the Lex.
+    The sql_command code for the result of parsing a is
+    SQLCOM_CREATE_PROCEDURE.
+
+    The parsing of the parameter list and body is relatively
+    straight-forward:
+
+    - Parameters:
+        name, type and mode (IN/OUT/INOUT) is pushed to spcont
+    - Declared local variables:
+        Same as parameters (mode is then IN)
+    - Local Variable references:
+        If an identifier is found in in spcont, an Item_splocal is created
+        with the variable's frame index, otherwise an Item_field or Item_ref
+        is created (as before).
+    - Statements:
+        The Lex in THD is replaced by a new Lex structure and the statement,
+        is parsed as usual. A sp_instr_stmt is created, containing the new
+        Lex, and added to added to the instructions in  sphead.
+        Afterwards, the procedure's Lex is restored in THD.
+    - SET var:
+        Setting a local variable generates a sp_instr_set instruction,
+        containing the variable's frame offset, the expression (an Item),
+        and the type.
+    - Flow control:
+        Flow control constructs like, IF, WHILE, etc, generate a conditional
+        and unconditional jumps in the "obvious" way, but a few notes may
+        be required:
+        - Forward jumps: When jumping forward, the exact destination is not
+          known at the time of the creation of the jump instruction. The
+          sphead therefore contains list of instruction-label pairs for
+          each forward reference. When the position later is known, the
+          instructions in the list are updated with the correct location.
+        - Loop constructs have optional labels. If a loop doesn't have a
+          label, an anonymous label is generated to simplify the parsing.
+        - There are two types of CASE. The "simple" case is implemented
+          with an anonymous variable bound to the value to be tested.
+
+
+    - An example
+
+      Parsing the procedure:
+
+      create procedure a(s char(16))
+      begin
+        declare x int;
+        set x = 3;
+        while x > 0 do
+          set x = x-1;
+          insert into db.tab values (x, s);
+        end while
+      end
+
+      would generate the following structures:
+            ______
+      thd: |      |     _________
+           | lex -+--->|         |                     ___________________
+           |______|    | spcont -+------------------->| "s",in,char(16):0 |
+                       | sphead -+------              |("x",in,int     :1)|
+                       |_________|      |             |___________________|
+                                    ____V__________________
+                                   | m_name: "a"           |
+                                   | m_defstr: "create ..."|
+                                   | m_instr: ...          |
+                                   |_______________________|
+
+      Note that the contents of the spcont is changing during the parsing,
+      at all times reflecting the state of the would-be runtime frame.
+      The m_instr is an array of instructions:
+
+      Pos.  Instruction
+       0    sp_instr_set(1, '3')
+       1    sp_instr_jump_if_not(5, 'x>0')
+       2    sp_instr_set(1, 'x-1')
+       3    sp_instr_stmt('insert into ...')
+       4    sp_instr_jump(1)
+       5    <end>
+
+     Here, '3', 'x>0', etc, represent the Items or Lex for the respective
+     expressions or statements.
+
+
+  - Storing, caching, dropping...
+
+    As seen above, the entired definition string, including the "CREATE
+    PROCEDURE" is kept. The procedure definition string is stored in the
+    table mysql.proc with the name as the key.
+
+    This means that we can reparse the procedure as many time as we want.
+    The first time, the resulting Lex is used to store the procedure in
+    the database (using the function sp.c:sp_create_procedure()).
+
+    The simplest way would be to just leave it at that, and re-read the
+    procedure from the database each time it is called. (And in fact, that's
+    the way the earliest implementation will work.)
+    However, this is not very efficient, and we can do better. The full
+    implementation should work like this:
+
+    1) Upon creation time, parse and store the procedure. Note that we still
+       need to parse it to catch syntax errors, but we can't check if called
+       procedures exists for instance.
+    2) Upon first CALL, read from the database, parse it, and cache the
+       resulting Lex in memory. This time we can do more error checking.
+    3) Upon subsequent CALLs, use the cached Lex.
+
+    Note that this implies that the Lex structure with its sphead must be
+    reentrant, that is, reusable and shareable between different threads
+    and calls. The runtime state for a procedure is kept in the sp_rcontext
+    in THD.
+
+    The mechanisms of storing, finding, and dropping procedures are
+    encapsulated in the files sp.{cc,h}.
+
+
+  - CALL
+
+    A CALL is parsed just like any statement. The resulting Lex has the
+    sql_command SQLCOM_CALL, the procedure's name and the parameters are
+    pushed to the Lex' value_list.
+
+    sql_parse.cc:mysql_execute_command() then uses sp.cc:sp_find() to
+    get the sp_head for the procedure (which may have been read from the
+    database or feetched from the in-memory cache) and calls the sp_head's
+    method execute().
+    Note: It's important that substatements called by the procedure do not
+          do send_ok(). Fortunately, there is a flag in THD->net to disable
+          this during CALLs. If a substatement fails, it will however send
+          an error back to the client, so the CALL mechanism must return
+          immediately and without sending an error.
+
+    The sp_head::execute() method works as follows:
+
+    1) Keep a pointer to the old runtime context in THD (if any)
+    2) Create a new runtime context. The information about the required size
+       is in sp_head's parse time context.
+    3) Push each parameter (from the CALL's Lex->value_list) to the new
+       context. If it's an OUT or INOUT parameter, the parameter's offset
+       in the caller's frame is set in the new context as well.
+    4) For each instruction, call its execute() method.
+       The result is a pointer to the next instruction to execute (or NULL)
+       if an error occured.
+    5) On success, set the new values of the OUT and INOUT parameters in
+       the caller's frame.
+
+
+    - Evaluating Items
+
+      There are three occasions where we need to evaluate an expression:
+
+      - When SETing a variable
+      - When CALLing a procedure
+      - When testing an expression for a branch (in IF, WHILE, etc)
+
+      The semantics in stored procedures is "call-by-value", so we have to
+      evaluate any "func" Items at the point of the CALL or SET, otherwise
+      we would get a kind of "lazy" evaluation with unexpected results with
+      respect to OUT parameters for instance.
+      For this the support function, sp_head.cc:eval_func_item() is needed.
+
+
+  - Parsing DROP PROCEDURE
+
+    The procedure name is pushed to Lex->value_list.
+    The sql_command code for the result of parsing a is
+    SQLCOM_DROP_PROCEDURE.
+
+    Dropping is done by simply getting the procedure with the sp_find()
+    function and calling sp_drop() (both in sp.{cc,h}).
+
+
+  - Class and function APIs
+
+    - The parser context: sp_pcontext.h
+
+      typedef enum
+      {
+        sp_param_in,
+	sp_param_out,
+	sp_param_inout
+      } sp_param_mode_t;
+
+      typedef struct
+      {
+        Item_string *name;
+	enum enum_field_types type;
+	sp_param_mode_t mode;
+	uint offset;			// Offset in current frame
+	my_bool isset;
+      } sp_pvar_t;
+
+      class sp_pcontext
+      {
+        sp_pcontext();
+
+	// Return the maximum frame size
+	uint max_framesize();
+
+	// Return the current frame size
+	uint current_framesize();
+
+	// Return the number of parameters
+	uint params();
+
+	// Set the number of parameters to the current frame size
+	void set_params();
+
+	// Set type of the variable at offset 'i' in the frame
+	void set_type(uint i, enum enum_field_types type);
+
+	// Mark the i:th variable to "set" (i.e. having a value) with
+	// 'val' true.
+	void set_isset(uint i, my_bool val);
+
+	// Push the variable 'name' to the frame.
+        void push(LEX_STRING *name,
+	          enum enum_field_types type, sp_param_mode_t mode);
+
+        // Pop 'num' variables from the frame.
+        void pop(uint num = 1);
+
+	// Find variable by name
+	sp_pvar_t *find_pvar(LEX_STRING *name);
+
+	// Find variable by index
+	sp_pvar_t *find_pvar(uint i);
+
+	// Push label 'name' of instruction index 'ip' to the label context
+	sp_label_t *push_label(char *name, uint ip);
+
+	// Find label 'name' in the context
+	sp_label_t *find_label(char *name);
+
+	// Return the last pushed label
+	sp_label_t *last_label();
+
+	// Return and remove the last pushed label.
+	sp_label_t *pop_label();
+      }
+
+
+    - The run-time context (call frame): sp_rcontext.h
+
+    class sp_rcontext
+    {
+      // 'size' is the max size of the context
+      sp_rcontext(uint size);
+
+      // Push value (parameter) 'i' to the frame
+      void push_item(Item *i);
+
+      // Set slot 'idx' to value 'i'
+      void set_item(uint idx, Item *i);
+
+      // Return the item in slot 'idx'
+      Item *get_item(uint idx);
+
+      // Set the "out" index 'oidx' for slot 'idx. If it's an IN slot,
+      // use 'oidx' -1.
+      void set_oindex(uint idx, int oidx);
+
+      // Return the "out" index for slot 'idx'
+      int get_oindex(uint idx);
+    }
+
+
+    - The procedure: sp_head.h
+
+      class sp_head
+      {
+        sp_head(LEX_STRING *name, LEX*);
+
+        // Store this procedure in the database. This is a wrapper around
+        // the function sp_create_procedure().
+        int create(THD *);
+
+        // CALL this procedure.
+        int execute(THD *);
+
+        // Add the instruction to this procedure.
+        void add_instr(sp_instr *);
+
+        // Return the number of instructions.
+        uint instructions();
+
+        // Resets lex in 'thd' and keeps a copy of the old one.
+        void reset_lex(THD *);
+
+        // Restores lex in 'thd' from our copy, but keeps some status from the
+        // one in 'thd', like ptr, tables, fields, etc.
+        void restore_lex(THD *);
+
+        // Put the instruction on the backpatch list, associated with
+        // the label.
+        void push_backpatch(sp_instr *, struct sp_label *);
+
+        // Update all instruction with this label in the backpatch list to
+        // the current position.
+        void backpatch(struct sp_label *);
+      }
+
+    - Instructions
+
+      - The base class:
+        class sp_instr
+        {
+          // 'ip' is the index of this instruction
+          sp_instr(uint ip);
+
+          // Execute this instrution.
+          // '*nextp' will be set to the index of the next instruction
+          // to execute. (For most instruction this will be the
+          // instruction following this one.)
+          // Returns 0 on success, non-zero if some error occured.
+          virtual int execute(THD *, uint *nextp)
+        }
+
+      - Statement instruction:
+        class sp_instr_stmt : public sp_instr
+        {
+          sp_instr_stmt(uint ip);
+
+          int execute(THD *, uint *nextp);
+
+	  // Set the statement's Lex
+          void set_lex(LEX *);
+
+	  // Return the statement's Lex
+	  LEX *get_lex();
+        }
+
+     -  SET instruction:
+        class sp_instr_set : public sp_instr
+	{
+	  // 'offset' is the variable's frame offset, 'val' the value,
+	  // and 'type' the variable type.
+	  sp_instr_set(uint ip,
+	               uint offset, Item *val, enum enum_field_types type);
+
+	  int execute(THD *, uint *nextp);
+	}
+
+      - Unconditional jump
+        class sp_instr_jump : public sp_instr
+	{
+	  // No destination, must be set.
+	  sp_instr_jump(uint ip);
+
+	  // 'dest' is the destination instruction index.
+	  sp_instr_jump(uint ip, uint dest);
+
+	  virtual int execute(THD *, uint *nextp);
+
+	  // Set the destination instruction 'dest'.
+	  void set_destination(uint dest);
+        }
+
+      - Conditional jump
+        class sp_instr_jump_if_not : public sp_instr_jump
+	{
+	  // Jump if 'i' evaluates to false. Destination not set yet.
+	  sp_instr_jump_if_not(uint ip, Item *i);
+
+	  // Jump to 'dest' if 'i' evaluates to false.
+	  sp_instr_jump_if_not(uint ip, Item *i, uint dest)
+
+	  int execute(THD *, uint *nextp);
+        }
+
+    - Utility functions: sp.h
+
+      // Finds a stored procedure given its name. Returns NULL if not found.
+      sp_head *sp_find(THD *, Item_string *name);
+
+      // Store the procedure 'name' in the database. 'def' is the complete
+      // definition string ("create procedure ...").
+      int sp_create_procedure(THD *,
+                              char *name, uint namelen,
+			      char *def, uint deflen);
+
+      // Drop the procedure 'name' from the database.
+      int sp_drop(THD *, char *name, uint namelen);
+
+--
author	unknown <pem@mysql.com>	2003-02-02 17:49:42 +0100
committer	unknown <pem@mysql.com>	2003-02-02 17:49:42 +0100
commit	77f30e19a6da4d17a5e282e5967cf4c61b35ddec (patch)
tree	5336654be31b33f8285f915b0af7ccad2375fd86 /Docs/sp-imp-spec.txt
parent	a3111ad09b074dcf7c95b0a055e0bbbf6005b2ba (diff)
download	mariadb-git-77f30e19a6da4d17a5e282e5967cf4c61b35ddec.tar.gz