summaryrefslogtreecommitdiff
path: root/doc/tutorial.xml
diff options
context:
space:
mode:
authorDavid Schleef <ds@schleef.org>2010-08-20 12:15:14 -0700
committerDavid Schleef <ds@schleef.org>2010-08-20 12:15:14 -0700
commit02ec311f66e21d23541830424bdb735a4d2df484 (patch)
tree022db8ec4ac72884074c4281dd478343f71af1c6 /doc/tutorial.xml
parent241a7ad309ee2969672500792933f6c0afe330ee (diff)
downloadorc-02ec311f66e21d23541830424bdb735a4d2df484.tar.gz
Update documentation, add tutorial
Diffstat (limited to 'doc/tutorial.xml')
-rw-r--r--doc/tutorial.xml510
1 files changed, 510 insertions, 0 deletions
diff --git a/doc/tutorial.xml b/doc/tutorial.xml
new file mode 100644
index 0000000..c8b8a62
--- /dev/null
+++ b/doc/tutorial.xml
@@ -0,0 +1,510 @@
+<?xml version="1.0"?>
+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+ "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
+<!ENTITY % version-entities SYSTEM "version.entities">
+%version-entities;
+<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
+]>
+<refentry id="orc-tutorial" revision="29 may 2009">
+<refmeta>
+<refentrytitle>Orc Tutorial</refentrytitle>
+<manvolnum>3</manvolnum>
+<refmiscinfo>Orc</refmiscinfo>
+</refmeta>
+
+<refnamediv>
+<refname>Orc Tutorial</refname>
+<refpurpose>
+Getting started writing Orc code.
+</refpurpose>
+</refnamediv>
+
+<refsect1>
+<title>Orc Tutorial</title>
+
+ <para>
+ This section walks you through several examples of increasing
+ complexity to get you started working with Orc. Each of these
+ examples are available in the Orc source code, in the examples
+ directory. The first three examples use static Orc code that
+ is in a source file, and is compiled into intermediate C code
+ by the orcc tool.
+ </para>
+
+ <para>
+ The first example demonstrates how to add two arrays of 16-bit
+ signed integers together. A possible use case for this is
+ combining two stereo audio streams together.
+ </para>
+
+ <para>
+ The second example builds from the first, replacing one of the
+ stereo input streams with a mono stream, converting it to stereo
+ in the process, and also adjusting the volume of the stream.
+ </para>
+
+ <para>
+ The third example shows how to convert a planar 4:2:0 video
+ image into a packed 4:4:4 video image with an alpha channel.
+ </para>
+
+</refsect1>
+
+<refsect1>
+<title>Example 1</title>
+
+<para>
+ This example demonstrates combining two stereo audio streams
+ by adding. Uncompressed audio streams (i.e., PCM format) can
+ be in a variety of formats, but one of the most common is
+ interleaved signed 16-bit integers, and we will choose that
+ for the purposes of this example. Extending to other formats
+ is left as an exercise for the reader. Interleaved means that
+ left and right channel samples are consecutive: in memory, the
+ data look like LRLRLR... The sampling rate is unimportant, as
+ long as both streams are the same.
+</para>
+
+<para>
+ One important feature/limitation of signed 16-bit audio samples
+ is that adding two together could cause an overflow. For example,
+ adding the value 25000 to 10000 gives 35000, but this overflows
+ 16 bits, so a standard addition would instead give the value
+ -30536 (35000-65536). Overflows handled this way sound like
+ crackling or worse, so we would like a better solution. One
+ solution is to use saturating addition: in this case, the addition
+ of 25000 and 10000 would be limited by the upper end of signed
+ 16-bit values to give 32767. Although this still causes
+ distortion in the output signal, it is much less audible and
+ annoying.
+</para>
+
+<para>
+ In normal C code, 16-bit saturating addition is difficult to express
+ without using 32-bit intermediates. In Orc, saturating addition
+ is a basic operation with opcodes for each size, both signed and
+ unsigned. In this case, we want "addssw", for "add signed saturated
+ word".
+</para>
+
+<para>
+ Also, we're going to make a one simplification: Adding two
+ interleaved stereo streams is the same as adding two mono streams
+ with twice as many samples. So we'll use 2*n_samples in the calling
+ code.
+</para>
+
+<para>
+ To the code:
+
+<programlisting>
+.function audio_add_s16
+.dest 2 d1
+.source 2 s1
+.source 2 s2
+
+addssw d1, s1, s2
+</programlisting>
+</para>
+
+<para>
+ Line by line:
+
+<programlisting>
+.function audio_add_s16
+</programlisting>
+
+ This starts a function. A function (represented internally by the
+ object OrcProgram) is equivalent to a C function. When you generate
+ C code from this Orc exmaple using the orcc tool, it generates a C
+ stub function called "audio_add_s16()", which at runtime will
+ generate an OrcProgram object corresponding to the above code,
+ compile it, and then run it.
+
+<programlisting>
+.dest 2 d1 short
+</programlisting>
+
+ This specifies that you want a destination (output) array named "d1",
+ with the element size being 2. Orc does not differentiate between
+ signed and unsigned arrays (or even floating point), however, you
+ may optionally specify a type afterwards that will be used in any
+ autogenerated C code.
+
+<programlisting>
+.source 2 s1 short
+.source 2 s2 short
+</programlisting>
+
+ This specifies that you want two source (input) arrays, "s1" and "s2",
+ similar to the destination array.
+
+<programlisting>
+addssw d1, s1, s2
+</programlisting>
+
+ This specifies the (only) opcode that we want for this program: signed
+ saturated addition of each member of the two source arrays, and store
+ the result in the destination array.
+</para>
+
+<para>
+ A few notes about the above program: The loop over the array members
+ is implied. Everything that Orc does is based on looping over each
+ array element and executing the opcodes in a program.
+</para>
+
+<para>
+ When you generate C code from the above Orc code using
+ 'orcc --implementation example1.orc',
+ you get a bunch of boilerplate code, plus three C functions:
+
+<programlisting>
+/* audio_add_s16 */
+#ifdef DISABLE_ORC
+void
+audio_add_s16 (int16 * d1, const int16 * s1, const int16 * s2, int n)
+{
+ ...
+}
+</programlisting>
+
+ This function is used if DISABLE_ORC is defined. As one might guess,
+ if you define DISABLE_ORC, no runtime Orc features are used, and all
+ calls to audio_add_s16() use this function. The interior of the function
+ is a for() loop that implements the Orc function. The generated code
+ may not necessarily be easy to read, but it is straightforward: all
+ the verbosity and use of unions is to avoid compiler warnings without
+ making the compiler too complex. But this is the place to go if you
+ are trying to understand what Orc is doing.
+
+<programlisting>
+#else
+static void
+_backup_audio_add_s16 (OrcExecutor * ORC_RESTRICT ex)
+{
+ ...
+}
+</programlisting>
+
+ This function is used when runtime Orc is enabled, but Orc was unable
+ to generate code for the function at runtime. There are various
+ reasons why that might happen -- unimplemented rules for a target, or
+ more temporary variables used than available registers.
+
+<programlisting>
+void
+audio_add_s16 (short * d1, const short * s1, const short * s2, int n)
+{
+ ...
+}
+</programlisting>
+
+ The third generated function is the important part: It is used when
+ Orc is enabled at runtime, and creates the OrcProgram corresponding
+ to the function you defined. Then it compiles the function and
+ calls it.
+</para>
+
+<para>
+ After generating the C code, you should generate the header file,
+ using: 'orcc --header example1orc.orc -o example1orc.h'.
+ After similar boilerplate code, there is the expected declaration
+ of audio_add_s16():
+
+<programlisting>
+void audio_add_s16 (short * d1, const short * s1, const short * s2, int n);
+</programlisting>
+
+
+</para>
+
+<para>
+ Some C code to generate sample data, call the generated code, and
+ print out the results:
+
+<programlisting>
+#include &lt;stdio.h&gt;
+#include "example1orc.h"
+
+#define N 10
+
+short a[N];
+short b[N];
+short c[N];
+
+int
+main (int argc, char *argv[])
+{
+ int i;
+
+ /* Create some data in the source arrays */
+ for(i=0;i &lt; N;i++){
+ a[i] = 100*i;
+ b[i] = 32000;
+ }
+
+ /* Call a function that uses Orc */
+ audio_add_s16 (c, a, b, N);
+
+ /* Print the results */
+ for(i=0;i &lt; N;i++){
+ printf("%d: %d %d -&gt; %d\n", i, a[i], b[i], c[i]);
+ }
+
+ return 0;
+}
+</programlisting>
+</para>
+
+<para>
+ The output of the program:
+
+<programlisting>
+0: 0 32000 -> 32000
+1: 100 32000 -> 32100
+2: 200 32000 -> 32200
+3: 300 32000 -> 32300
+4: 400 32000 -> 32400
+5: 500 32000 -> 32500
+6: 600 32000 -> 32600
+7: 700 32000 -> 32700
+8: 800 32000 -> 32767
+9: 900 32000 -> 32767
+</programlisting>
+</para>
+
+<para>
+
+</para>
+
+</refsect1>
+
+<refsect1>
+<title>Example 2</title>
+
+<para>
+ In this example, we will expand on the previous example by making
+ one of the input arrays a mono stream, and also scale the mono
+ input stream by a volume. Rather than iterating over each
+ signed 16-bit value, in this example we will iterate over samples,
+ meaning the member size for the stereo arrays is 4, since each
+ array member contains a left and right 16 bit value.
+</para>
+
+<para>
+<programlisting>
+.function audio_add_mono_to_stereo_scaled_s16
+.dest 4 d1 short
+.source 4 s1 short
+.source 2 s2 short
+.param 2 volume
+.temp 4 s2_scaled
+.temp 2 t
+.temp 4 s2_stereo
+
+mulswl s2_scaled, s2, volume
+shrsl s2_scaled, s2_scaled, 12
+convssslw t, s2_scaled
+mergewl s2_stereo, t, t
+x2 addssw d1, s1, s2_stereo
+</programlisting>
+
+ Piece by piece:
+
+<programlisting>
+.function audio_add_mono_to_stereo_scaled_s16
+.dest 4 d1 short
+.source 4 s1 short
+.source 2 s2 short
+</programlisting>
+
+ This is the same as the previous example, except that the stereo
+ arrays are increased in size to 4. However, we'll use the short
+ type, since Orc does not care what type we use, and short is
+ the type of the array we want to use in the C code.
+
+<programlisting>
+.param 2 volume
+</programlisting>
+
+ This specifies a parameter, which is an integer that is passed to
+ an Orc function. In the generated C code, parameters are always of
+ type int. There are also float parameters for the floating point
+ equivalent.
+
+<programlisting>
+.temp 4 s2_scaled
+.temp 2 t
+.temp 4 s2_stereo
+</programlisting>
+
+ This specifies a few temporary variables that are used later in the
+ code. These definitions are similar to defining local variables in
+ C code. Note that the size is important: each opcode has
+ specific sizes for source and destination operands, and it is
+ important to match these correctly with temporary variables.
+
+<programlisting>
+mulswl s2_scaled, s2, volume
+shrsl s2_scaled, s2_scaled, 12
+</programlisting>
+
+ This scales the mono input: signed multiply of s2 and volume, giving
+ a 32-bit value, and then a signed right shift by 12. Since the
+ second operand of mulswl is 16-bit, only the lower 16 bits of
+ volume will be used in the multiply. The right shift is
+ effectively the same as dividing by 4096. Thus, a neutral scaling
+ that does not increase or decrease the mono input would correspond
+ to calling the function with a parameter value of 4096.
+
+<programlisting>
+convssslw t, s2_scaled
+mergewl s2_stereo, t, t
+</programlisting>
+
+ The first instruction is "convert saturated signed 32-bit to signed
+ 16-bit", and the second merges the two values of (16 bit) t into the
+ high and low halves of s2_stereo. This duplicates the mono signal
+ into the right and left channels. It is important to use the
+ saturated conversion, since the effective scaling value may have
+ been greater than 1.0, thus the larger values may need to be clipped.
+
+<programlisting>
+x2 addssw d1, s1, s2_stereo
+</programlisting>
+
+ The "x2" prefix indicates that we want the operation specified to be
+ done twice, first to the upper half of all operands, and again
+ separately to the lower half of all operands. Since addssw is
+ normally a 16-bit operation, the x2 prefix causes it to be a 32-bit
+ operation. And so, it adds the newly created right and left values
+ of the scaled mono signal into the s1 signal.
+</para>
+
+<para>
+ There are several variations of the above program that might be
+ more suitable for a particular application. This function only
+ handles a limited dynamic range of volume scaling factors, however,
+ by changing the shift constant, or turning the shift into a
+ parameter, the dynamic range can be increased significantly.
+</para>
+
+
+</refsect1>
+
+<refsect1>
+<title>Example 3</title>
+
+<para>
+ The third example shows how to convert a planar 4:2:0 video
+ image into a packed 4:4:4 video image with an alpha channel. The
+ first format is often referred to as I420 and the second as AYUV.
+</para>
+
+<para>
+ For simplicity in the following discussion, we'll assume that the
+ image dimensions are 640x480. The 4:2:0 subsampling means the
+ input chroma planes are 320x240 (subsampled by 2 in each direction).
+ These need to be upsampled to 640x480, then repacked with the input
+ Y plane, with an added dummy alpha value. There are many ways to
+ perform upsampling; the simplest is to duplicate each value
+ horizontally and vertically. The result is low quality, but
+ adequate for demonstration purposes.
+</para>
+
+<para>
+ There are several choices for the Orc array size and dimensionality.
+ Iterating vertically can be done in the C code or in the Orc code. If
+ done in the Orc code, we would need to use an array size of 240 and
+ have two separate arrays for the even and odd Y rows. If done in the
+ C code, there is no such limitation. Horizontally, the story is
+ different: we can use the loadupsdb opcode to duplicate each byte in
+ the U and V arrays, so we can iterate over 640 array elements. It
+ is also possible to iterate over 320 elements and duplicate the U
+ and V elements using mergebw. There is a very slight speed
+ advantage to iterating vertically in Orc, and for demonstration
+ purposes, we will choose to use the loadupsdb opcode, thus we will
+ be iterating over 320x240 elements.
+</para>
+
+<para>
+ The code:
+
+<programlisting>
+.function convert_I420_AYUV
+.flags 2d
+.dest 4 d1
+.dest 4 d2
+.source 1 y1
+.source 1 y2
+.source 1 u
+.source 1 v
+.const 1 c255 255
+.temp 2 uv
+.temp 2 ay
+.temp 1 tu
+.temp 1 tv
+
+loadupdb tu, u
+loadupdb tv, v
+mergebw uv, tu, tv
+mergebw ay, c255, y1
+mergewl d1, ay, uv
+mergebw ay, c255, y2
+mergewl d2, ay, uv
+</programlisting>
+
+ A few things of note: The ".flags 2d" line is used to indicate that
+ Orc should iterate over two dimensions, and generate a prototype that
+ includes row strides for each array and a size parameter for the
+ second dimension.
+</para>
+
+<para>
+ Since we are working on two input Y lines and two output AYUV lines
+ at a time, we need two source and destination arrays corresponding
+ to the even and odd lines. The row strides for these are doubled
+ compared to the normal 2-D array.
+</para>
+
+<para>
+ The mergebw and mergewl opcodes join two 8-bit values into one 16-bit
+ value (or 16-bit values into a 32-bit value) by concatinating them
+ in memory order. Thus, to get AYUV in memory order, we merge AY and
+ UV, and to get UV, we merge U and V. Since we're duplicating each
+ U and V line, we use the same UV value for the even and odd output
+ lines.
+</para>
+
+<para>
+ The prototype that is generated is:
+
+<programlisting>
+void convert_I420_AYUV (orc_uint32 * d1, int d1_stride, orc_uint32 * d2,
+ int d2_stride, const orc_uint8 * s1, int s1_stride, const orc_uint8 * s2,
+ int s2_stride, const orc_uint8 * s3, int s3_stride, const orc_uint8 * s4,
+ int s4_stride, int n, int m);
+</programlisting>
+
+ The orcc tool unhelpfully changed the names of the parameters,
+ however, the order is standard: first destinations, then sources, then
+ parameters, then array sizes. Think of it like memcpy() or memset().
+</para>
+
+<para>
+ Calling the function:
+
+<programlisting>
+convert_I420_AYUV (output, 1280*4, output + 640, 1280 * 4,
+ input_y, 1280, input_y + 640, 1280,
+ input_u, 320, input_v, 320,
+ 320, 240);
+</programlisting>
+
+</para>
+
+</refsect1>
+
+</refentry>
+