diff options
author | David Schleef <ds@schleef.org> | 2010-08-20 12:15:14 -0700 |
---|---|---|
committer | David Schleef <ds@schleef.org> | 2010-08-20 12:15:14 -0700 |
commit | 02ec311f66e21d23541830424bdb735a4d2df484 (patch) | |
tree | 022db8ec4ac72884074c4281dd478343f71af1c6 /doc/tutorial.xml | |
parent | 241a7ad309ee2969672500792933f6c0afe330ee (diff) | |
download | orc-02ec311f66e21d23541830424bdb735a4d2df484.tar.gz |
Update documentation, add tutorial
Diffstat (limited to 'doc/tutorial.xml')
-rw-r--r-- | doc/tutorial.xml | 510 |
1 files changed, 510 insertions, 0 deletions
diff --git a/doc/tutorial.xml b/doc/tutorial.xml new file mode 100644 index 0000000..c8b8a62 --- /dev/null +++ b/doc/tutorial.xml @@ -0,0 +1,510 @@ +<?xml version="1.0"?> +<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" + "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ +<!ENTITY % version-entities SYSTEM "version.entities"> +%version-entities; +<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> +]> +<refentry id="orc-tutorial" revision="29 may 2009"> +<refmeta> +<refentrytitle>Orc Tutorial</refentrytitle> +<manvolnum>3</manvolnum> +<refmiscinfo>Orc</refmiscinfo> +</refmeta> + +<refnamediv> +<refname>Orc Tutorial</refname> +<refpurpose> +Getting started writing Orc code. +</refpurpose> +</refnamediv> + +<refsect1> +<title>Orc Tutorial</title> + + <para> + This section walks you through several examples of increasing + complexity to get you started working with Orc. Each of these + examples are available in the Orc source code, in the examples + directory. The first three examples use static Orc code that + is in a source file, and is compiled into intermediate C code + by the orcc tool. + </para> + + <para> + The first example demonstrates how to add two arrays of 16-bit + signed integers together. A possible use case for this is + combining two stereo audio streams together. + </para> + + <para> + The second example builds from the first, replacing one of the + stereo input streams with a mono stream, converting it to stereo + in the process, and also adjusting the volume of the stream. + </para> + + <para> + The third example shows how to convert a planar 4:2:0 video + image into a packed 4:4:4 video image with an alpha channel. + </para> + +</refsect1> + +<refsect1> +<title>Example 1</title> + +<para> + This example demonstrates combining two stereo audio streams + by adding. Uncompressed audio streams (i.e., PCM format) can + be in a variety of formats, but one of the most common is + interleaved signed 16-bit integers, and we will choose that + for the purposes of this example. Extending to other formats + is left as an exercise for the reader. Interleaved means that + left and right channel samples are consecutive: in memory, the + data look like LRLRLR... The sampling rate is unimportant, as + long as both streams are the same. +</para> + +<para> + One important feature/limitation of signed 16-bit audio samples + is that adding two together could cause an overflow. For example, + adding the value 25000 to 10000 gives 35000, but this overflows + 16 bits, so a standard addition would instead give the value + -30536 (35000-65536). Overflows handled this way sound like + crackling or worse, so we would like a better solution. One + solution is to use saturating addition: in this case, the addition + of 25000 and 10000 would be limited by the upper end of signed + 16-bit values to give 32767. Although this still causes + distortion in the output signal, it is much less audible and + annoying. +</para> + +<para> + In normal C code, 16-bit saturating addition is difficult to express + without using 32-bit intermediates. In Orc, saturating addition + is a basic operation with opcodes for each size, both signed and + unsigned. In this case, we want "addssw", for "add signed saturated + word". +</para> + +<para> + Also, we're going to make a one simplification: Adding two + interleaved stereo streams is the same as adding two mono streams + with twice as many samples. So we'll use 2*n_samples in the calling + code. +</para> + +<para> + To the code: + +<programlisting> +.function audio_add_s16 +.dest 2 d1 +.source 2 s1 +.source 2 s2 + +addssw d1, s1, s2 +</programlisting> +</para> + +<para> + Line by line: + +<programlisting> +.function audio_add_s16 +</programlisting> + + This starts a function. A function (represented internally by the + object OrcProgram) is equivalent to a C function. When you generate + C code from this Orc exmaple using the orcc tool, it generates a C + stub function called "audio_add_s16()", which at runtime will + generate an OrcProgram object corresponding to the above code, + compile it, and then run it. + +<programlisting> +.dest 2 d1 short +</programlisting> + + This specifies that you want a destination (output) array named "d1", + with the element size being 2. Orc does not differentiate between + signed and unsigned arrays (or even floating point), however, you + may optionally specify a type afterwards that will be used in any + autogenerated C code. + +<programlisting> +.source 2 s1 short +.source 2 s2 short +</programlisting> + + This specifies that you want two source (input) arrays, "s1" and "s2", + similar to the destination array. + +<programlisting> +addssw d1, s1, s2 +</programlisting> + + This specifies the (only) opcode that we want for this program: signed + saturated addition of each member of the two source arrays, and store + the result in the destination array. +</para> + +<para> + A few notes about the above program: The loop over the array members + is implied. Everything that Orc does is based on looping over each + array element and executing the opcodes in a program. +</para> + +<para> + When you generate C code from the above Orc code using + 'orcc --implementation example1.orc', + you get a bunch of boilerplate code, plus three C functions: + +<programlisting> +/* audio_add_s16 */ +#ifdef DISABLE_ORC +void +audio_add_s16 (int16 * d1, const int16 * s1, const int16 * s2, int n) +{ + ... +} +</programlisting> + + This function is used if DISABLE_ORC is defined. As one might guess, + if you define DISABLE_ORC, no runtime Orc features are used, and all + calls to audio_add_s16() use this function. The interior of the function + is a for() loop that implements the Orc function. The generated code + may not necessarily be easy to read, but it is straightforward: all + the verbosity and use of unions is to avoid compiler warnings without + making the compiler too complex. But this is the place to go if you + are trying to understand what Orc is doing. + +<programlisting> +#else +static void +_backup_audio_add_s16 (OrcExecutor * ORC_RESTRICT ex) +{ + ... +} +</programlisting> + + This function is used when runtime Orc is enabled, but Orc was unable + to generate code for the function at runtime. There are various + reasons why that might happen -- unimplemented rules for a target, or + more temporary variables used than available registers. + +<programlisting> +void +audio_add_s16 (short * d1, const short * s1, const short * s2, int n) +{ + ... +} +</programlisting> + + The third generated function is the important part: It is used when + Orc is enabled at runtime, and creates the OrcProgram corresponding + to the function you defined. Then it compiles the function and + calls it. +</para> + +<para> + After generating the C code, you should generate the header file, + using: 'orcc --header example1orc.orc -o example1orc.h'. + After similar boilerplate code, there is the expected declaration + of audio_add_s16(): + +<programlisting> +void audio_add_s16 (short * d1, const short * s1, const short * s2, int n); +</programlisting> + + +</para> + +<para> + Some C code to generate sample data, call the generated code, and + print out the results: + +<programlisting> +#include <stdio.h> +#include "example1orc.h" + +#define N 10 + +short a[N]; +short b[N]; +short c[N]; + +int +main (int argc, char *argv[]) +{ + int i; + + /* Create some data in the source arrays */ + for(i=0;i < N;i++){ + a[i] = 100*i; + b[i] = 32000; + } + + /* Call a function that uses Orc */ + audio_add_s16 (c, a, b, N); + + /* Print the results */ + for(i=0;i < N;i++){ + printf("%d: %d %d -> %d\n", i, a[i], b[i], c[i]); + } + + return 0; +} +</programlisting> +</para> + +<para> + The output of the program: + +<programlisting> +0: 0 32000 -> 32000 +1: 100 32000 -> 32100 +2: 200 32000 -> 32200 +3: 300 32000 -> 32300 +4: 400 32000 -> 32400 +5: 500 32000 -> 32500 +6: 600 32000 -> 32600 +7: 700 32000 -> 32700 +8: 800 32000 -> 32767 +9: 900 32000 -> 32767 +</programlisting> +</para> + +<para> + +</para> + +</refsect1> + +<refsect1> +<title>Example 2</title> + +<para> + In this example, we will expand on the previous example by making + one of the input arrays a mono stream, and also scale the mono + input stream by a volume. Rather than iterating over each + signed 16-bit value, in this example we will iterate over samples, + meaning the member size for the stereo arrays is 4, since each + array member contains a left and right 16 bit value. +</para> + +<para> +<programlisting> +.function audio_add_mono_to_stereo_scaled_s16 +.dest 4 d1 short +.source 4 s1 short +.source 2 s2 short +.param 2 volume +.temp 4 s2_scaled +.temp 2 t +.temp 4 s2_stereo + +mulswl s2_scaled, s2, volume +shrsl s2_scaled, s2_scaled, 12 +convssslw t, s2_scaled +mergewl s2_stereo, t, t +x2 addssw d1, s1, s2_stereo +</programlisting> + + Piece by piece: + +<programlisting> +.function audio_add_mono_to_stereo_scaled_s16 +.dest 4 d1 short +.source 4 s1 short +.source 2 s2 short +</programlisting> + + This is the same as the previous example, except that the stereo + arrays are increased in size to 4. However, we'll use the short + type, since Orc does not care what type we use, and short is + the type of the array we want to use in the C code. + +<programlisting> +.param 2 volume +</programlisting> + + This specifies a parameter, which is an integer that is passed to + an Orc function. In the generated C code, parameters are always of + type int. There are also float parameters for the floating point + equivalent. + +<programlisting> +.temp 4 s2_scaled +.temp 2 t +.temp 4 s2_stereo +</programlisting> + + This specifies a few temporary variables that are used later in the + code. These definitions are similar to defining local variables in + C code. Note that the size is important: each opcode has + specific sizes for source and destination operands, and it is + important to match these correctly with temporary variables. + +<programlisting> +mulswl s2_scaled, s2, volume +shrsl s2_scaled, s2_scaled, 12 +</programlisting> + + This scales the mono input: signed multiply of s2 and volume, giving + a 32-bit value, and then a signed right shift by 12. Since the + second operand of mulswl is 16-bit, only the lower 16 bits of + volume will be used in the multiply. The right shift is + effectively the same as dividing by 4096. Thus, a neutral scaling + that does not increase or decrease the mono input would correspond + to calling the function with a parameter value of 4096. + +<programlisting> +convssslw t, s2_scaled +mergewl s2_stereo, t, t +</programlisting> + + The first instruction is "convert saturated signed 32-bit to signed + 16-bit", and the second merges the two values of (16 bit) t into the + high and low halves of s2_stereo. This duplicates the mono signal + into the right and left channels. It is important to use the + saturated conversion, since the effective scaling value may have + been greater than 1.0, thus the larger values may need to be clipped. + +<programlisting> +x2 addssw d1, s1, s2_stereo +</programlisting> + + The "x2" prefix indicates that we want the operation specified to be + done twice, first to the upper half of all operands, and again + separately to the lower half of all operands. Since addssw is + normally a 16-bit operation, the x2 prefix causes it to be a 32-bit + operation. And so, it adds the newly created right and left values + of the scaled mono signal into the s1 signal. +</para> + +<para> + There are several variations of the above program that might be + more suitable for a particular application. This function only + handles a limited dynamic range of volume scaling factors, however, + by changing the shift constant, or turning the shift into a + parameter, the dynamic range can be increased significantly. +</para> + + +</refsect1> + +<refsect1> +<title>Example 3</title> + +<para> + The third example shows how to convert a planar 4:2:0 video + image into a packed 4:4:4 video image with an alpha channel. The + first format is often referred to as I420 and the second as AYUV. +</para> + +<para> + For simplicity in the following discussion, we'll assume that the + image dimensions are 640x480. The 4:2:0 subsampling means the + input chroma planes are 320x240 (subsampled by 2 in each direction). + These need to be upsampled to 640x480, then repacked with the input + Y plane, with an added dummy alpha value. There are many ways to + perform upsampling; the simplest is to duplicate each value + horizontally and vertically. The result is low quality, but + adequate for demonstration purposes. +</para> + +<para> + There are several choices for the Orc array size and dimensionality. + Iterating vertically can be done in the C code or in the Orc code. If + done in the Orc code, we would need to use an array size of 240 and + have two separate arrays for the even and odd Y rows. If done in the + C code, there is no such limitation. Horizontally, the story is + different: we can use the loadupsdb opcode to duplicate each byte in + the U and V arrays, so we can iterate over 640 array elements. It + is also possible to iterate over 320 elements and duplicate the U + and V elements using mergebw. There is a very slight speed + advantage to iterating vertically in Orc, and for demonstration + purposes, we will choose to use the loadupsdb opcode, thus we will + be iterating over 320x240 elements. +</para> + +<para> + The code: + +<programlisting> +.function convert_I420_AYUV +.flags 2d +.dest 4 d1 +.dest 4 d2 +.source 1 y1 +.source 1 y2 +.source 1 u +.source 1 v +.const 1 c255 255 +.temp 2 uv +.temp 2 ay +.temp 1 tu +.temp 1 tv + +loadupdb tu, u +loadupdb tv, v +mergebw uv, tu, tv +mergebw ay, c255, y1 +mergewl d1, ay, uv +mergebw ay, c255, y2 +mergewl d2, ay, uv +</programlisting> + + A few things of note: The ".flags 2d" line is used to indicate that + Orc should iterate over two dimensions, and generate a prototype that + includes row strides for each array and a size parameter for the + second dimension. +</para> + +<para> + Since we are working on two input Y lines and two output AYUV lines + at a time, we need two source and destination arrays corresponding + to the even and odd lines. The row strides for these are doubled + compared to the normal 2-D array. +</para> + +<para> + The mergebw and mergewl opcodes join two 8-bit values into one 16-bit + value (or 16-bit values into a 32-bit value) by concatinating them + in memory order. Thus, to get AYUV in memory order, we merge AY and + UV, and to get UV, we merge U and V. Since we're duplicating each + U and V line, we use the same UV value for the even and odd output + lines. +</para> + +<para> + The prototype that is generated is: + +<programlisting> +void convert_I420_AYUV (orc_uint32 * d1, int d1_stride, orc_uint32 * d2, + int d2_stride, const orc_uint8 * s1, int s1_stride, const orc_uint8 * s2, + int s2_stride, const orc_uint8 * s3, int s3_stride, const orc_uint8 * s4, + int s4_stride, int n, int m); +</programlisting> + + The orcc tool unhelpfully changed the names of the parameters, + however, the order is standard: first destinations, then sources, then + parameters, then array sizes. Think of it like memcpy() or memset(). +</para> + +<para> + Calling the function: + +<programlisting> +convert_I420_AYUV (output, 1280*4, output + 640, 1280 * 4, + input_y, 1280, input_y + 640, 1280, + input_u, 320, input_v, 320, + 320, 240); +</programlisting> + +</para> + +</refsect1> + +</refentry> + |