Update documentation, add tutorial

author: David Schleef <ds@schleef.org> 2010-08-20 12:15:14 -0700
committer: David Schleef <ds@schleef.org> 2010-08-20 12:15:14 -0700
commit: 02ec311f66e21d23541830424bdb735a4d2df484 (patch)
tree: 022db8ec4ac72884074c4281dd478343f71af1c6 /doc/tutorial.xml
parent: 241a7ad309ee2969672500792933f6c0afe330ee (diff)
download: orc-02ec311f66e21d23541830424bdb735a4d2df484.tar.gz
1 files changed, 510 insertions, 0 deletions
diff --git a/doc/tutorial.xml b/doc/tutorial.xml
new file mode 100644
index 0000000..c8b8a62
--- /dev/null
+++ b/doc/tutorial.xml
@@ -0,0 +1,510 @@
+<?xml version="1.0"?>
+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
+<!ENTITY % version-entities SYSTEM "version.entities">
+%version-entities;
+<!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
+]>
+<refentry id="orc-tutorial" revision="29 may 2009">
+<refmeta>
+<refentrytitle>Orc Tutorial</refentrytitle>
+<manvolnum>3</manvolnum>
+<refmiscinfo>Orc</refmiscinfo>
+</refmeta>
+
+<refnamediv>
+<refname>Orc Tutorial</refname>
+<refpurpose>
+Getting started writing Orc code.
+</refpurpose>
+</refnamediv>
+
+<refsect1>
+<title>Orc Tutorial</title>
+
+  <para>
+    This section walks you through several examples of increasing
+    complexity to get you started working with Orc.  Each of these
+    examples are available in the Orc source code, in the examples
+    directory.  The first three examples use static Orc code that
+    is in a source file, and is compiled into intermediate C code
+    by the orcc tool.
+  </para>
+
+  <para>
+    The first example demonstrates how to add two arrays of 16-bit
+    signed integers together.  A possible use case for this is
+    combining two stereo audio streams together.
+  </para>
+
+  <para>
+    The second example builds from the first, replacing one of the
+    stereo input streams with a mono stream, converting it to stereo
+    in the process, and also adjusting the volume of the stream.
+  </para>
+
+  <para>
+    The third example shows how to convert a planar 4:2:0 video
+    image into a packed 4:4:4 video image with an alpha channel.
+  </para>
+
+</refsect1>
+
+<refsect1>
+<title>Example 1</title>
+
+<para>
+  This example demonstrates combining two stereo audio streams
+  by adding.  Uncompressed audio streams (i.e., PCM format) can
+  be in a variety of formats, but one of the most common is
+  interleaved signed 16-bit integers, and we will choose that
+  for the purposes of this example.  Extending to other formats
+  is left as an exercise for the reader.  Interleaved means that
+  left and right channel samples are consecutive: in memory, the
+  data look like LRLRLR...  The sampling rate is unimportant, as
+  long as both streams are the same.
+</para>
+
+<para>
+  One important feature/limitation of signed 16-bit audio samples
+  is that adding two together could cause an overflow.  For example,
+  adding the value 25000 to 10000 gives 35000, but this overflows
+  16 bits, so a standard addition would instead give the value
+  -30536 (35000-65536).  Overflows handled this way sound like
+  crackling or worse, so we would like a better solution.  One
+  solution is to use saturating addition: in this case, the addition
+  of 25000 and 10000 would be limited by the upper end of signed
+  16-bit values to give 32767.  Although this still causes
+  distortion in the output signal, it is much less audible and
+  annoying.
+</para>
+
+<para>
+  In normal C code, 16-bit saturating addition is difficult to express
+  without using 32-bit intermediates.  In Orc, saturating addition
+  is a basic operation with opcodes for each size, both signed and
+  unsigned.  In this case, we want "addssw", for "add signed saturated
+  word".
+</para>
+
+<para>
+  Also, we're going to make a one simplification: Adding two
+  interleaved stereo streams is the same as adding two mono streams
+  with twice as many samples.  So we'll use 2*n_samples in the calling
+  code.
+</para>
+
+<para>
+  To the code:
+
+<programlisting>
+.function audio_add_s16
+.dest 2 d1
+.source 2 s1
+.source 2 s2
+
+addssw d1, s1, s2
+</programlisting>
+</para>
+
+<para>
+  Line by line:
+
+<programlisting>
+.function audio_add_s16
+</programlisting>
+
+  This starts a function.  A function (represented internally by the
+  object OrcProgram) is equivalent to a C function.  When you generate
+  C code from this Orc exmaple using the orcc tool, it generates a C
+  stub function called "audio_add_s16()", which at runtime will
+  generate an OrcProgram object corresponding to the above code,
+  compile it, and then run it.
+
+<programlisting>
+.dest 2 d1 short
+</programlisting>
+
+  This specifies that you want a destination (output) array named "d1",
+  with the element size being 2.  Orc does not differentiate between
+  signed and unsigned arrays (or even floating point), however, you
+  may optionally specify a type afterwards that will be used in any
+  autogenerated C code.
+
+<programlisting>
+.source 2 s1 short
+.source 2 s2 short
+</programlisting>
+
+  This specifies that you want two source (input) arrays, "s1" and "s2",
+  similar to the destination array.
+
+<programlisting>
+addssw d1, s1, s2
+</programlisting>
+
+  This specifies the (only) opcode that we want for this program: signed
+  saturated addition of each member of the two source arrays, and store
+  the result in the destination array.
+</para>
+
+<para>
+  A few notes about the above program: The loop over the array members
+  is implied.  Everything that Orc does is based on looping over each
+  array element and executing the opcodes in a program.
+</para>
+
+<para>
+  When you generate C code from the above Orc code using
+  'orcc --implementation example1.orc',
+  you get a bunch of boilerplate code, plus three C functions:
+
+<programlisting>
+/* audio_add_s16 */
+#ifdef DISABLE_ORC
+void
+audio_add_s16 (int16 * d1, const int16 * s1, const int16 * s2, int n)
+{
+  ...
+}
+</programlisting>
+  
+  This function is used if DISABLE_ORC is defined.  As one might guess,
+  if you define DISABLE_ORC, no runtime Orc features are used, and all
+  calls to audio_add_s16() use this function.  The interior of the function
+  is a for() loop that implements the Orc function.  The generated code
+  may not necessarily be easy to read, but it is straightforward: all
+  the verbosity and use of unions is to avoid compiler warnings without
+  making the compiler too complex.  But this is the place to go if you
+  are trying to understand what Orc is doing.
+
+<programlisting>
+#else
+static void
+_backup_audio_add_s16 (OrcExecutor * ORC_RESTRICT ex)
+{
+  ...
+}
+</programlisting>
+ 
+  This function is used when runtime Orc is enabled, but Orc was unable
+  to generate code for the function at runtime.  There are various
+  reasons why that might happen -- unimplemented rules for a target, or
+  more temporary variables used than available registers.
+
+<programlisting>
+void
+audio_add_s16 (short * d1, const short * s1, const short * s2, int n)
+{
+  ...
+}
+</programlisting>
+
+  The third generated function is the important part: It is used when
+  Orc is enabled at runtime, and creates the OrcProgram corresponding
+  to the function you defined.  Then it compiles the function and
+  calls it.
+</para>
+
+<para>
+  After generating the C code, you should generate the header file,
+  using: 'orcc --header example1orc.orc -o example1orc.h'.
+  After similar boilerplate code, there is the expected declaration
+  of audio_add_s16():
+
+<programlisting>
+void audio_add_s16 (short * d1, const short * s1, const short * s2, int n);
+</programlisting>
+
+
+</para>
+
+<para>
+  Some C code to generate sample data, call the generated code, and
+  print out the results:
+
+<programlisting>
+#include &lt;stdio.h&gt;
+#include "example1orc.h"
+
+#define N 10
+
+short a[N];
+short b[N];
+short c[N];
+
+int
+main (int argc, char *argv[])
+{
+  int i;
+
+  /* Create some data in the source arrays */
+  for(i=0;i &lt; N;i++){
+    a[i] = 100*i;
+    b[i] = 32000;
+  }
+
+  /* Call a function that uses Orc */
+  audio_add_s16 (c, a, b, N);
+
+  /* Print the results */
+  for(i=0;i &lt; N;i++){
+    printf("%d: %d %d -&gt; %d\n", i, a[i], b[i], c[i]);
+  }
+
+  return 0;
+}
+</programlisting>
+</para>
+
+<para>
+  The output of the program:
+
+<programlisting>
+0: 0 32000 -> 32000
+1: 100 32000 -> 32100
+2: 200 32000 -> 32200
+3: 300 32000 -> 32300
+4: 400 32000 -> 32400
+5: 500 32000 -> 32500
+6: 600 32000 -> 32600
+7: 700 32000 -> 32700
+8: 800 32000 -> 32767
+9: 900 32000 -> 32767
+</programlisting>
+</para>
+
+<para>
+  
+</para>
+
+</refsect1>
+
+<refsect1>
+<title>Example 2</title>
+
+<para>
+  In this example, we will expand on the previous example by making
+  one of the input arrays a mono stream, and also scale the mono
+  input stream by a volume.  Rather than iterating over each
+  signed 16-bit value, in this example we will iterate over samples,
+  meaning the member size for the stereo arrays is 4, since each
+  array member contains a left and right 16 bit value.
+</para>
+
+<para>
+<programlisting>
+.function audio_add_mono_to_stereo_scaled_s16
+.dest 4 d1 short
+.source 4 s1 short
+.source 2 s2 short
+.param 2 volume
+.temp 4 s2_scaled
+.temp 2 t
+.temp 4 s2_stereo
+
+mulswl s2_scaled, s2, volume
+shrsl s2_scaled, s2_scaled, 12
+convssslw t, s2_scaled
+mergewl s2_stereo, t, t
+x2 addssw d1, s1, s2_stereo
+</programlisting>
+
+  Piece by piece:
+
+<programlisting>
+.function audio_add_mono_to_stereo_scaled_s16
+.dest 4 d1 short
+.source 4 s1 short
+.source 2 s2 short
+</programlisting>
+  
+  This is the same as the previous example, except that the stereo
+  arrays are increased in size to 4.  However, we'll use the short
+  type, since Orc does not care what type we use, and short is 
+  the type of the array we want to use in the C code.
+
+<programlisting>
+.param 2 volume
+</programlisting>
+
+  This specifies a parameter, which is an integer that is passed to
+  an Orc function.  In the generated C code, parameters are always of
+  type int.  There are also float parameters for the floating point
+  equivalent.
+
+<programlisting>
+.temp 4 s2_scaled
+.temp 2 t
+.temp 4 s2_stereo
+</programlisting>
+
+  This specifies a few temporary variables that are used later in the
+  code.  These definitions are similar to defining local variables in
+  C code.  Note that the size is important:  each opcode has
+  specific sizes for source and destination operands, and it is
+  important to match these correctly with temporary variables.
+
+<programlisting>
+mulswl s2_scaled, s2, volume
+shrsl s2_scaled, s2_scaled, 12
+</programlisting>
+
+  This scales the mono input: signed multiply of s2 and volume, giving
+  a 32-bit value, and then a signed right shift by 12.  Since the
+  second operand of mulswl is 16-bit, only the lower 16 bits of
+  volume will be used in the multiply.  The right shift is
+  effectively the same as dividing by 4096.  Thus, a neutral scaling
+  that does not increase or decrease the mono input would correspond
+  to calling the function with a parameter value of 4096.
+
+<programlisting>
+convssslw t, s2_scaled
+mergewl s2_stereo, t, t
+</programlisting>
+
+  The first instruction is "convert saturated signed 32-bit to signed
+  16-bit", and the second merges the two values of (16 bit) t into the
+  high and low halves of s2_stereo.  This duplicates the mono signal
+  into the right and left channels.  It is important to use the
+  saturated conversion, since the effective scaling value may have
+  been greater than 1.0, thus the larger values may need to be clipped.
+
+<programlisting>
+x2 addssw d1, s1, s2_stereo
+</programlisting>
+
+  The "x2" prefix indicates that we want the operation specified to be
+  done twice, first to the upper half of all operands, and again
+  separately to the lower half of all operands.  Since addssw is
+  normally a 16-bit operation, the x2 prefix causes it to be a 32-bit
+  operation.  And so, it adds the newly created right and left values
+  of the scaled mono signal into the s1 signal.
+</para>
+
+<para>
+  There are several variations of the above program that might be
+  more suitable for a particular application.  This function only
+  handles a limited dynamic range of volume scaling factors, however,
+  by changing the shift constant, or turning the shift into a
+  parameter, the dynamic range can be increased significantly.
+</para>
+
+
+</refsect1>
+
+<refsect1>
+<title>Example 3</title>
+
+<para>
+  The third example shows how to convert a planar 4:2:0 video
+  image into a packed 4:4:4 video image with an alpha channel.  The
+  first format is often referred to as I420 and the second as AYUV.
+</para>
+
+<para>
+  For simplicity in the following discussion, we'll assume that the
+  image dimensions are 640x480.  The 4:2:0 subsampling means the
+  input chroma planes are 320x240 (subsampled by 2 in each direction).
+  These need to be upsampled to 640x480, then repacked with the input
+  Y plane, with an added dummy alpha value.  There are many ways to
+  perform upsampling; the simplest is to duplicate each value
+  horizontally and vertically.  The result is low quality, but
+  adequate for demonstration purposes.
+</para>
+
+<para>
+  There are several choices for the Orc array size and dimensionality.
+  Iterating vertically can be done in the C code or in the Orc code.  If
+  done in the Orc code, we would need to use an array size of 240 and
+  have two separate arrays for the even and odd Y rows.  If done in the
+  C code, there is no such limitation.  Horizontally, the story is
+  different: we can use the loadupsdb opcode to duplicate each byte in
+  the U and V arrays, so we can iterate over 640 array elements.  It
+  is also possible to iterate over 320 elements and duplicate the U
+  and V elements using mergebw.  There is a very slight speed
+  advantage to iterating vertically in Orc, and for demonstration
+  purposes, we will choose to use the loadupsdb opcode, thus we will
+  be iterating over 320x240 elements.
+</para>
+
+<para>
+  The code:
+
+<programlisting>
+.function convert_I420_AYUV
+.flags 2d
+.dest 4 d1
+.dest 4 d2
+.source 1 y1
+.source 1 y2
+.source 1 u
+.source 1 v
+.const 1 c255 255
+.temp 2 uv
+.temp 2 ay
+.temp 1 tu
+.temp 1 tv
+
+loadupdb tu, u
+loadupdb tv, v
+mergebw uv, tu, tv
+mergebw ay, c255, y1
+mergewl d1, ay, uv
+mergebw ay, c255, y2
+mergewl d2, ay, uv
+</programlisting>
+
+  A few things of note: The ".flags 2d" line is used to indicate that
+  Orc should iterate over two dimensions, and generate a prototype that
+  includes row strides for each array and a size parameter for the
+  second dimension.
+</para>
+
+<para>
+  Since we are working on two input Y lines and two output AYUV lines
+  at a time, we need two source and destination arrays corresponding
+  to the even and odd lines.  The row strides for these are doubled
+  compared to the normal 2-D array.
+</para>
+
+<para>
+  The mergebw and mergewl opcodes join two 8-bit values into one 16-bit
+  value (or 16-bit values into a 32-bit value) by concatinating them
+  in memory order.  Thus, to get AYUV in memory order, we merge AY and
+  UV, and to get UV, we merge U and V.  Since we're duplicating each
+  U and V line, we use the same UV value for the even and odd output
+  lines.
+</para>
+
+<para>
+  The prototype that is generated is:
+
+<programlisting>
+void convert_I420_AYUV (orc_uint32 * d1, int d1_stride, orc_uint32 * d2,
+  int d2_stride, const orc_uint8 * s1, int s1_stride, const orc_uint8 * s2,
+  int s2_stride, const orc_uint8 * s3, int s3_stride, const orc_uint8 * s4,
+  int s4_stride, int n, int m);
+</programlisting>
+
+  The orcc tool unhelpfully changed the names of the parameters,
+  however, the order is standard: first destinations, then sources, then
+  parameters, then array sizes.  Think of it like memcpy() or memset().
+</para>
+
+<para>
+  Calling the function:
+
+<programlisting>
+convert_I420_AYUV (output, 1280*4, output + 640, 1280 * 4,
+    input_y, 1280, input_y + 640, 1280,
+    input_u, 320, input_v, 320,
+    320, 240);
+</programlisting>
+
+</para>
+
+</refsect1>
+
+</refentry>
+
author	David Schleef <ds@schleef.org>	2010-08-20 12:15:14 -0700
committer	David Schleef <ds@schleef.org>	2010-08-20 12:15:14 -0700
commit	02ec311f66e21d23541830424bdb735a4d2df484 (patch)
tree	022db8ec4ac72884074c4281dd478343f71af1c6 /doc/tutorial.xml
parent	241a7ad309ee2969672500792933f6c0afe330ee (diff)
download	orc-02ec311f66e21d23541830424bdb735a4d2df484.tar.gz