Creating a 3.5 branch, which is compatible with LLVM 3.5

This branch will probably not be mantained. Its purpose is to mark the last commit that is compatible with LLVM 3.5. llvm-svn: 225040
author: Tom Stellard <thomas.stellard@amd.com> 2014-12-31 14:56:56 +0000
committer: Tom Stellard <thomas.stellard@amd.com> 2014-12-31 14:56:56 +0000
commit: 6ad55e925a906382607f275be2c78da988d13d2a (patch)
tree: 4389f06ba859bb72d4c8e0f68e74d1de6736d0fe
parent: 30a0597a0d84153fda5a595fd3b801aaea183cd8 (diff)
parent: 67978556a5eb7e69ba54113c8b3be38e30ebc2a9 (diff)
download: llvm-6ad55e925a906382607f275be2c78da988d13d2a.tar.gz
361 files changed, 7878 insertions, 0 deletions
diff --git a/libclc/CREDITS.TXT b/libclc/CREDITS.TXT
new file mode 100644
index 000000000000..b18d40bd7339
--- /dev/null
+++ b/libclc/CREDITS.TXT
@@ -0,0 +1,2 @@
+N: Peter Collingbourne
+E: peter@pcc.me.uk
diff --git a/libclc/LICENSE.TXT b/libclc/LICENSE.TXT
new file mode 100644
index 000000000000..03a00447d6f8
--- /dev/null
+++ b/libclc/LICENSE.TXT
@@ -0,0 +1,64 @@
+==============================================================================
+libclc License
+==============================================================================
+
+The libclc library is dual licensed under both the University of Illinois
+"BSD-Like" license and the MIT license.  As a user of this code you may choose
+to use it under either license.  As a contributor, you agree to allow your code
+to be used under both.
+
+Full text of the relevant licenses is included below.
+
+==============================================================================
+
+Copyright (c) 2011-2014 by the contributors listed in CREDITS.TXT
+
+All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal with
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+    * Redistributions of source code must retain the above copyright notice,
+      this list of conditions and the following disclaimers.
+
+    * Redistributions in binary form must reproduce the above copyright notice,
+      this list of conditions and the following disclaimers in the
+      documentation and/or other materials provided with the distribution.
+
+    * The names of the contributors may not be used to endorse or promote
+      products derived from this Software without specific prior written
+      permission.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
+SOFTWARE.
+
+==============================================================================
+
+Copyright (c) 2011-2014 by the contributors listed in CREDITS.TXT
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff --git a/libclc/README.TXT b/libclc/README.TXT
new file mode 100644
index 000000000000..00ae6bfa40a1
--- /dev/null
+++ b/libclc/README.TXT
@@ -0,0 +1,52 @@
+libclc
+------
+
+libclc is an open source, BSD licensed implementation of the library
+requirements of the OpenCL C programming language, as specified by the
+OpenCL 1.1 Specification. The following sections of the specification
+impose library requirements:
+
+  * 6.1: Supported Data Types
+  * 6.2.3: Explicit Conversions
+  * 6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
+  * 6.9: Preprocessor Directives and Macros
+  * 6.11: Built-in Functions
+  * 9.3: Double Precision Floating-Point
+  * 9.4: 64-bit Atomics
+  * 9.5: Writing to 3D image memory objects
+  * 9.6: Half Precision Floating-Point
+
+libclc is intended to be used with the Clang compiler's OpenCL frontend.
+
+libclc is designed to be portable and extensible. To this end, it provides
+generic implementations of most library requirements, allowing the target
+to override the generic implementation at the granularity of individual
+functions.
+
+libclc currently only supports the PTX target, but support for more
+targets is welcome.
+
+Compiling and installing with Make
+----------------------------------
+
+$ ./configure.py --with-llvm-config=/path/to/llvm-config && make
+$ make install
+
+Note you can use the DESTDIR Makefile variable to do staged installs.
+
+$ make install DESTDIR=/path/for/staged/install
+
+Compiling and installing with Ninja
+-----------------------------------
+
+$ ./configure.py -g ninja --with-llvm-config=/path/to/llvm-config && ninja
+$ ninja install
+
+Note you can use the DESTDIR environment variable to do staged installs.
+
+$ DESTDIR=/path/for/staged/install ninja install
+
+Website
+-------
+
+http://www.pcc.me.uk/~peter/libclc/
diff --git a/libclc/build/metabuild.py b/libclc/build/metabuild.py
new file mode 100644
index 000000000000..4ab5db58e06e
--- /dev/null
+++ b/libclc/build/metabuild.py
@@ -0,0 +1,100 @@
+import ninja_syntax
+import os
+
+# Simple meta-build system.
+
+class Make(object):
+  def __init__(self):
+    self.output = open(self.output_filename(), 'w')
+    self.rules = {}
+    self.rule_text = ''
+    self.all_targets = []
+    self.default_targets = []
+    self.clean_files = []
+    self.distclean_files = []
+    self.output.write("""all::
+
+ifndef VERBOSE
+  Verb = @
+endif
+
+""")
+
+  def output_filename(self):
+    return 'Makefile'
+
+  def rule(self, name, command, description=None, depfile=None,
+           generator=False):
+    self.rules[name] = {'command': command, 'description': description,
+                        'depfile': depfile, 'generator': generator}
+
+  def build(self, output, rule, inputs=[], implicit=[], order_only=[]):
+    inputs = self._as_list(inputs)
+    implicit = self._as_list(implicit)
+    order_only = self._as_list(order_only)
+
+    output_dir = os.path.dirname(output)
+    if output_dir != '' and not os.path.isdir(output_dir):
+      os.makedirs(output_dir)
+
+    dollar_in = ' '.join(inputs)
+    subst = lambda text: text.replace('$in', dollar_in).replace('$out', output)
+
+    deps = ' '.join(inputs + implicit)
+    if order_only:
+      deps += ' | '
+      deps += ' '.join(order_only)
+    self.output.write('%s: %s\n' % (output, deps))
+
+    r = self.rules[rule]
+    command = subst(r['command'])
+    if r['description']:
+      desc = subst(r['description'])
+      self.output.write('\t@echo %s\n\t$(Verb) %s\n' % (desc, command))
+    else:
+      self.output.write('\t%s\n' % command)
+    if r['depfile']:
+      depfile = subst(r['depfile'])
+      self.output.write('-include '+depfile+'\n')
+    self.output.write('\n')
+
+    self.all_targets.append(output)
+    if r['generator']:
+      self.distclean_files.append(output)
+      if r['depfile']:
+        self.distclean_files.append(depfile)
+    else:
+      self.clean_files.append(output)
+      if r['depfile']:
+        self.distclean_files.append(depfile)
+
+
+  def _as_list(self, input):
+    if isinstance(input, list):
+      return input
+    return [input]
+
+  def default(self, paths):
+    self.default_targets += self._as_list(paths)
+
+  def finish(self):
+    self.output.write('all:: %s\n\n' % ' '.join(self.default_targets or self.all_targets))
+    self.output.write('clean: \n\trm -f %s\n\n' % ' '.join(self.clean_files))
+    self.output.write('distclean: clean\n\trm -f %s\n' % ' '.join(self.distclean_files))
+
+class Ninja(ninja_syntax.Writer):
+  def __init__(self):
+    ninja_syntax.Writer.__init__(self, open(self.output_filename(), 'w'))
+
+  def output_filename(self):
+    return 'build.ninja'
+
+  def finish(self):
+    pass
+
+def from_name(name):
+  if name == 'make':
+    return Make()
+  if name == 'ninja':
+    return Ninja()
+  raise LookupError, 'unknown generator: %s; supported generators are make and ninja' % name
diff --git a/libclc/build/ninja_syntax.py b/libclc/build/ninja_syntax.py
new file mode 100644
index 000000000000..7d9f592dfadf
--- /dev/null
+++ b/libclc/build/ninja_syntax.py
@@ -0,0 +1,118 @@
+#!/usr/bin/python
+
+"""Python module for generating .ninja files.
+
+Note that this is emphatically not a required piece of Ninja; it's
+just a helpful utility for build-file-generation systems that already
+use Python.
+"""
+
+import textwrap
+import re
+
+class Writer(object):
+    def __init__(self, output, width=78):
+        self.output = output
+        self.width = width
+
+    def newline(self):
+        self.output.write('\n')
+
+    def comment(self, text):
+        for line in textwrap.wrap(text, self.width - 2):
+            self.output.write('# ' + line + '\n')
+
+    def variable(self, key, value, indent=0):
+        if value is None:
+            return
+        if isinstance(value, list):
+            value = ' '.join(value)
+        self._line('%s = %s' % (key, value), indent)
+
+    def rule(self, name, command, description=None, depfile=None,
+             generator=False):
+        self._line('rule %s' % name)
+        self.variable('command', escape(command), indent=1)
+        if description:
+            self.variable('description', description, indent=1)
+        if depfile:
+            self.variable('depfile', depfile, indent=1)
+        if generator:
+            self.variable('generator', '1', indent=1)
+
+    def build(self, outputs, rule, inputs=None, implicit=None, order_only=None,
+              variables=None):
+        outputs = self._as_list(outputs)
+        all_inputs = self._as_list(inputs)[:]
+
+        if implicit:
+            all_inputs.append('|')
+            all_inputs.extend(self._as_list(implicit))
+        if order_only:
+            all_inputs.append('||')
+            all_inputs.extend(self._as_list(order_only))
+
+        self._line('build %s: %s %s' % (' '.join(outputs),
+                                        rule,
+                                        ' '.join(all_inputs)))
+
+        if variables:
+            for key, val in variables:
+                self.variable(key, val, indent=1)
+
+        return outputs
+
+    def include(self, path):
+        self._line('include %s' % path)
+
+    def subninja(self, path):
+        self._line('subninja %s' % path)
+
+    def default(self, paths):
+        self._line('default %s' % ' '.join(self._as_list(paths)))
+
+    def _line(self, text, indent=0):
+        """Write 'text' word-wrapped at self.width characters."""
+        leading_space = '  ' * indent
+        while len(text) > self.width:
+            # The text is too wide; wrap if possible.
+
+            # Find the rightmost space that would obey our width constraint.
+            available_space = self.width - len(leading_space) - len(' $')
+            space = text.rfind(' ', 0, available_space)
+            if space < 0:
+                # No such space; just use the first space we can find.
+                space = text.find(' ', available_space)
+            if space < 0:
+                # Give up on breaking.
+                break
+
+            self.output.write(leading_space + text[0:space] + ' $\n')
+            text = text[space+1:]
+
+            # Subsequent lines are continuations, so indent them.
+            leading_space = '  ' * (indent+2)
+
+        self.output.write(leading_space + text + '\n')
+
+    def _as_list(self, input):
+        if input is None:
+            return []
+        if isinstance(input, list):
+            return input
+        return [input]
+
+
+def escape(string):
+    """Escape a string such that Makefile and shell variables are
+       correctly escaped for use in a Ninja file.
+    """
+    assert '\n' not in string, 'Ninja syntax does not allow newlines'
+    # We only have one special metacharacter: '$'.
+
+    # We should leave $in and $out untouched.
+    # Just look for makefile/shell style substitutions
+    return re.sub(r'(\$[{(][a-z_]+[})])',
+                  r'$\1',
+                  string,
+                  flags=re.IGNORECASE)
diff --git a/libclc/compile-test.sh b/libclc/compile-test.sh
new file mode 100755
index 000000000000..47c7f385bb92
--- /dev/null
+++ b/libclc/compile-test.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+
+clang -target nvptx--nvidiacl -Iptx-nvidiacl/include -Igeneric/include -Xclang -mlink-bitcode-file -Xclang nvptx--nvidiacl/lib/builtins.bc -include clc/clc.h -Dcl_clang_storage_class_specifiers -Dcl_khr_fp64 "$@"
diff --git a/libclc/configure.py b/libclc/configure.py
new file mode 100755
index 000000000000..7170f46cd7a5
--- /dev/null
+++ b/libclc/configure.py
@@ -0,0 +1,247 @@
+#!/usr/bin/python
+
+def c_compiler_rule(b, name, description, compiler, flags):
+  command = "%s -MMD -MF $out.d %s -c -o $out $in" % (compiler, flags)
+  b.rule(name, command, description + " $out", depfile="$out.d")
+
+version_major = 0;
+version_minor = 0;
+version_patch = 1;
+
+from optparse import OptionParser
+import os
+import string
+from subprocess import *
+import sys
+
+srcdir = os.path.dirname(sys.argv[0])
+
+sys.path.insert(0, os.path.join(srcdir, 'build'))
+import metabuild
+
+p = OptionParser()
+p.add_option('--with-llvm-config', metavar='PATH',
+             help='use given llvm-config script')
+p.add_option('--with-cxx-compiler', metavar='PATH',
+             help='use given C++ compiler')
+p.add_option('--prefix', metavar='PATH',
+             help='install to given prefix')
+p.add_option('--libexecdir', metavar='PATH',
+             help='install *.bc to given dir')
+p.add_option('--includedir', metavar='PATH',
+             help='install include files to given dir')
+p.add_option('--pkgconfigdir', metavar='PATH',
+             help='install clc.pc to given dir')
+p.add_option('-g', metavar='GENERATOR', default='make',
+             help='use given generator (default: make)')
+(options, args) = p.parse_args()
+
+llvm_config_exe = options.with_llvm_config or "llvm-config"
+
+prefix = options.prefix
+if not prefix:
+  prefix = '/usr/local'
+
+libexecdir = options.libexecdir
+if not libexecdir:
+  libexecdir = os.path.join(prefix, 'lib/clc')
+
+includedir = options.includedir
+if not includedir:
+  includedir = os.path.join(prefix, 'include')
+
+pkgconfigdir = options.pkgconfigdir
+if not pkgconfigdir:
+  pkgconfigdir = os.path.join(prefix, 'share/pkgconfig')
+
+def llvm_config(args):
+  try:
+    proc = Popen([llvm_config_exe] + args, stdout=PIPE)
+    return proc.communicate()[0].rstrip().replace('\n', ' ')
+  except OSError:
+    print "Error executing llvm-config."
+    print "Please ensure that llvm-config is in your $PATH, or use --with-llvm-config."
+    sys.exit(1)
+
+llvm_version = string.split(string.replace(llvm_config(['--version']), 'svn', ''), '.')
+llvm_system_libs = ''
+if (int(llvm_version[0]) == 3 and int(llvm_version[1]) >= 5) or int(llvm_version[0]) > 3:
+    llvm_system_libs = llvm_config(['--system-libs'])
+llvm_bindir = llvm_config(['--bindir'])
+llvm_core_libs = llvm_config(['--libs', 'core', 'bitreader', 'bitwriter']) + ' ' + \
+                 llvm_system_libs + ' ' + \
+                 llvm_config(['--ldflags'])
+llvm_cxxflags = llvm_config(['--cxxflags']) + ' -fno-exceptions -fno-rtti'
+llvm_libdir = llvm_config(['--libdir'])
+
+llvm_clang = os.path.join(llvm_bindir, 'clang')
+llvm_link = os.path.join(llvm_bindir, 'llvm-link')
+llvm_opt = os.path.join(llvm_bindir, 'opt')
+
+cxx_compiler = options.with_cxx_compiler
+if not cxx_compiler:
+  cxx_compiler = os.path.join(llvm_bindir, 'clang++')
+
+available_targets = {
+  'r600--' : { 'devices' :
+               [{'gpu' : 'cedar',   'aliases' : ['palm', 'sumo', 'sumo2', 'redwood', 'juniper']},
+                {'gpu' : 'cypress', 'aliases' : ['hemlock']},
+                {'gpu' : 'barts',   'aliases' : ['turks', 'caicos']},
+                {'gpu' : 'cayman',  'aliases' : ['aruba']},
+                {'gpu' : 'tahiti',  'aliases' : ['pitcairn', 'verde', 'oland', 'hainan', 'bonaire', 'kabini', 'kaveri', 'hawaii','mullins']}]},
+  'nvptx--'   : { 'devices' : [{'gpu' : '', 'aliases' : []}] },
+  'nvptx64--'   : { 'devices' : [{'gpu' : '', 'aliases' : []}] },
+  'nvptx--nvidiacl'   : { 'devices' : [{'gpu' : '', 'aliases' : []}] },
+  'nvptx64--nvidiacl' : { 'devices' : [{'gpu' : '', 'aliases' : []}] }
+}
+
+default_targets = ['nvptx--nvidiacl', 'nvptx64--nvidiacl', 'r600--']
+
+targets = args
+if not targets:
+  targets = default_targets
+
+b = metabuild.from_name(options.g)
+
+b.rule("LLVM_AS", "%s -o $out $in" % os.path.join(llvm_bindir, "llvm-as"),
+       'LLVM-AS $out')
+b.rule("LLVM_LINK", command = llvm_link + " -o $out $in",
+       description = 'LLVM-LINK $out')
+b.rule("OPT", command = llvm_opt + " -O3 -o $out $in",
+       description = 'OPT $out')
+
+c_compiler_rule(b, "LLVM_TOOL_CXX", 'CXX', cxx_compiler, llvm_cxxflags)
+b.rule("LLVM_TOOL_LINK", cxx_compiler + " -o $out $in %s" % llvm_core_libs + " -Wl,-rpath %s" % llvm_libdir, 'LINK $out')
+
+prepare_builtins = os.path.join('utils', 'prepare-builtins')
+b.build(os.path.join('utils', 'prepare-builtins.o'), "LLVM_TOOL_CXX",
+        os.path.join(srcdir, 'utils', 'prepare-builtins.cpp'))
+b.build(prepare_builtins, "LLVM_TOOL_LINK",
+        os.path.join('utils', 'prepare-builtins.o'))
+
+b.rule("PREPARE_BUILTINS", "%s -o $out $in" % prepare_builtins,
+       'PREPARE-BUILTINS $out')
+b.rule("PYTHON_GEN", "python < $in > $out", "PYTHON_GEN $out")
+b.build('generic/lib/convert.cl', "PYTHON_GEN", ['generic/lib/gen_convert.py'])
+
+manifest_deps = set([sys.argv[0], os.path.join(srcdir, 'build', 'metabuild.py'),
+                     os.path.join(srcdir, 'build', 'ninja_syntax.py')])
+
+install_files_bc = []
+install_deps = []
+
+# Create libclc.pc
+clc = open('libclc.pc', 'w')
+clc.write('includedir=%(inc)s\nlibexecdir=%(lib)s\n\nName: libclc\nDescription: Library requirements of the OpenCL C programming language\nVersion: %(maj)s.%(min)s.%(pat)s\nCflags: -I${includedir}\nLibs: -L${libexecdir}' %
+{'inc': includedir, 'lib': libexecdir, 'maj': version_major, 'min': version_minor, 'pat': version_patch})
+clc.close()
+
+for target in targets:
+  (t_arch, t_vendor, t_os) = target.split('-')
+  archs = [t_arch]
+  if t_arch == 'nvptx' or t_arch == 'nvptx64':
+    archs.append('ptx')
+  archs.append('generic')
+
+  subdirs = []
+  for arch in archs:
+    subdirs.append("%s-%s-%s" % (arch, t_vendor, t_os))
+    subdirs.append("%s-%s" % (arch, t_os))
+    subdirs.append(arch)
+
+  incdirs = filter(os.path.isdir,
+               [os.path.join(srcdir, subdir, 'include') for subdir in subdirs])
+  libdirs = filter(lambda d: os.path.isfile(os.path.join(d, 'SOURCES')),
+                   [os.path.join(srcdir, subdir, 'lib') for subdir in subdirs])
+
+  clang_cl_includes = ' '.join(["-I%s" % incdir for incdir in incdirs])
+
+  for device in available_targets[target]['devices']:
+    # The rule for building a .bc file for the specified architecture using clang.
+    clang_bc_flags = "-target %s -I`dirname $in` %s " \
+                     "-fno-builtin " \
+                     "-Dcl_clang_storage_class_specifiers " \
+                     "-Dcl_khr_fp64 " \
+                     "-Dcles_khr_int64 " \
+                     "-D__CLC_INTERNAL " \
+                     "-emit-llvm" % (target, clang_cl_includes)
+    if device['gpu'] != '':
+      clang_bc_flags += ' -mcpu=' + device['gpu']
+    clang_bc_rule = "CLANG_CL_BC_" + target + "_" + device['gpu']
+    c_compiler_rule(b, clang_bc_rule, "LLVM-CC", llvm_clang, clang_bc_flags)
+
+    objects = []
+    sources_seen = set()
+
+    if device['gpu'] == '':
+      full_target_name = target
+      obj_suffix = ''
+    else:
+      full_target_name = device['gpu'] + '-' + target
+      obj_suffix = '.' + device['gpu']
+
+    for libdir in libdirs:
+      subdir_list_file = os.path.join(libdir, 'SOURCES')
+      manifest_deps.add(subdir_list_file)
+      override_list_file = os.path.join(libdir, 'OVERRIDES')
+
+      # Add target overrides
+      if os.path.exists(override_list_file):
+        for override in open(override_list_file).readlines():
+          override = override.rstrip()
+          sources_seen.add(override)
+
+      for src in open(subdir_list_file).readlines():
+        src = src.rstrip()
+        if src not in sources_seen:
+          sources_seen.add(src)
+          obj = os.path.join(target, 'lib', src + obj_suffix + '.bc')
+          objects.append(obj)
+          src_file = os.path.join(libdir, src)
+          ext = os.path.splitext(src)[1]
+          if ext == '.ll':
+            b.build(obj, 'LLVM_AS', src_file)
+          else:
+            b.build(obj, clang_bc_rule, src_file)
+
+    builtins_link_bc = os.path.join(target, 'lib', 'builtins.link' + obj_suffix + '.bc')
+    builtins_opt_bc = os.path.join(target, 'lib', 'builtins.opt' + obj_suffix + '.bc')
+    builtins_bc = os.path.join('built_libs', full_target_name + '.bc')
+    b.build(builtins_link_bc, "LLVM_LINK", objects)
+    b.build(builtins_opt_bc, "OPT", builtins_link_bc)
+    b.build(builtins_bc, "PREPARE_BUILTINS", builtins_opt_bc, prepare_builtins)
+    install_files_bc.append((builtins_bc, builtins_bc))
+    install_deps.append(builtins_bc)
+    for alias in device['aliases']:
+      # Ninja cannot have multiple rules with same name so append suffix
+      ruleName = "CREATE_ALIAS_{0}_for_{1}".format(alias, device['gpu'])
+      b.rule(ruleName, "ln -fs %s $out" % os.path.basename(builtins_bc)
+             ,"CREATE-ALIAS $out")
+
+      alias_file = os.path.join('built_libs', alias + '-' + target + '.bc')
+      b.build(alias_file, ruleName, builtins_bc)
+      install_files_bc.append((alias_file, alias_file))
+      install_deps.append(alias_file)
+    b.default(builtins_bc)
+
+
+install_cmd = ' && '.join(['mkdir -p ${DESTDIR}/%(dst)s && cp -r %(src)s ${DESTDIR}/%(dst)s' %
+                           {'src': file,
+                            'dst': libexecdir}
+                           for (file, dest) in install_files_bc])
+install_cmd = ' && '.join(['%(old)s && mkdir -p ${DESTDIR}/%(dst)s && cp -r %(srcdir)s/generic/include/clc ${DESTDIR}/%(dst)s' %
+                           {'old': install_cmd,
+                            'dst': includedir,
+                            'srcdir': srcdir}])
+install_cmd = ' && '.join(['%(old)s && mkdir -p ${DESTDIR}/%(dst)s && cp -r libclc.pc ${DESTDIR}/%(dst)s' %
+                           {'old': install_cmd, 
+                            'dst': pkgconfigdir}])
+  
+b.rule('install', command = install_cmd, description = 'INSTALL')
+b.build('install', 'install', install_deps)
+
+b.rule("configure", command = ' '.join(sys.argv), description = 'CONFIGURE',
+       generator = True)
+b.build(b.output_filename(), 'configure', list(manifest_deps))
+
+b.finish()
diff --git a/libclc/generic/include/clc/as_type.h b/libclc/generic/include/clc/as_type.h
new file mode 100644
index 000000000000..0bb9ee2e8313
--- /dev/null
+++ b/libclc/generic/include/clc/as_type.h
@@ -0,0 +1,68 @@
+#define as_char(x) __builtin_astype(x, char)
+#define as_uchar(x) __builtin_astype(x, uchar)
+#define as_short(x) __builtin_astype(x, short)
+#define as_ushort(x) __builtin_astype(x, ushort)
+#define as_int(x) __builtin_astype(x, int)
+#define as_uint(x) __builtin_astype(x, uint)
+#define as_long(x) __builtin_astype(x, long)
+#define as_ulong(x) __builtin_astype(x, ulong)
+#define as_float(x) __builtin_astype(x, float)
+
+#define as_char2(x) __builtin_astype(x, char2)
+#define as_uchar2(x) __builtin_astype(x, uchar2)
+#define as_short2(x) __builtin_astype(x, short2)
+#define as_ushort2(x) __builtin_astype(x, ushort2)
+#define as_int2(x) __builtin_astype(x, int2)
+#define as_uint2(x) __builtin_astype(x, uint2)
+#define as_long2(x) __builtin_astype(x, long2)
+#define as_ulong2(x) __builtin_astype(x, ulong2)
+#define as_float2(x) __builtin_astype(x, float2)
+
+#define as_char3(x) __builtin_astype(x, char3)
+#define as_uchar3(x) __builtin_astype(x, uchar3)
+#define as_short3(x) __builtin_astype(x, short3)
+#define as_ushort3(x) __builtin_astype(x, ushort3)
+#define as_int3(x) __builtin_astype(x, int3)
+#define as_uint3(x) __builtin_astype(x, uint3)
+#define as_long3(x) __builtin_astype(x, long3)
+#define as_ulong3(x) __builtin_astype(x, ulong3)
+#define as_float3(x) __builtin_astype(x, float3)
+
+#define as_char4(x) __builtin_astype(x, char4)
+#define as_uchar4(x) __builtin_astype(x, uchar4)
+#define as_short4(x) __builtin_astype(x, short4)
+#define as_ushort4(x) __builtin_astype(x, ushort4)
+#define as_int4(x) __builtin_astype(x, int4)
+#define as_uint4(x) __builtin_astype(x, uint4)
+#define as_long4(x) __builtin_astype(x, long4)
+#define as_ulong4(x) __builtin_astype(x, ulong4)
+#define as_float4(x) __builtin_astype(x, float4)
+
+#define as_char8(x) __builtin_astype(x, char8)
+#define as_uchar8(x) __builtin_astype(x, uchar8)
+#define as_short8(x) __builtin_astype(x, short8)
+#define as_ushort8(x) __builtin_astype(x, ushort8)
+#define as_int8(x) __builtin_astype(x, int8)
+#define as_uint8(x) __builtin_astype(x, uint8)
+#define as_long8(x) __builtin_astype(x, long8)
+#define as_ulong8(x) __builtin_astype(x, ulong8)
+#define as_float8(x) __builtin_astype(x, float8)
+
+#define as_char16(x) __builtin_astype(x, char16)
+#define as_uchar16(x) __builtin_astype(x, uchar16)
+#define as_short16(x) __builtin_astype(x, short16)
+#define as_ushort16(x) __builtin_astype(x, ushort16)
+#define as_int16(x) __builtin_astype(x, int16)
+#define as_uint16(x) __builtin_astype(x, uint16)
+#define as_long16(x) __builtin_astype(x, long16)
+#define as_ulong16(x) __builtin_astype(x, ulong16)
+#define as_float16(x) __builtin_astype(x, float16)
+
+#ifdef cl_khr_fp64
+#define as_double(x) __builtin_astype(x, double)
+#define as_double2(x) __builtin_astype(x, double2)
+#define as_double3(x) __builtin_astype(x, double3)
+#define as_double4(x) __builtin_astype(x, double4)
+#define as_double8(x) __builtin_astype(x, double8)
+#define as_double16(x) __builtin_astype(x, double16)
+#endif
diff --git a/libclc/generic/include/clc/async/async_work_group_copy.h b/libclc/generic/include/clc/async/async_work_group_copy.h
new file mode 100644
index 000000000000..39c637b0e265
--- /dev/null
+++ b/libclc/generic/include/clc/async/async_work_group_copy.h
@@ -0,0 +1,15 @@
+#define __CLC_DST_ADDR_SPACE local
+#define __CLC_SRC_ADDR_SPACE global
+#define __CLC_BODY <clc/async/async_work_group_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_DST_ADDR_SPACE
+#undef __CLC_SRC_ADDR_SPACE
+#undef __CLC_BODY
+
+#define __CLC_DST_ADDR_SPACE global
+#define __CLC_SRC_ADDR_SPACE local
+#define __CLC_BODY <clc/async/async_work_group_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_DST_ADDR_SPACE
+#undef __CLC_SRC_ADDR_SPACE
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/async/async_work_group_copy.inc b/libclc/generic/include/clc/async/async_work_group_copy.inc
new file mode 100644
index 000000000000..d85df6c8fadd
--- /dev/null
+++ b/libclc/generic/include/clc/async/async_work_group_copy.inc
@@ -0,0 +1,5 @@
+_CLC_OVERLOAD _CLC_DECL event_t async_work_group_copy(
+  __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
+  const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
+  size_t num_gentypes,
+  event_t event);
diff --git a/libclc/generic/include/clc/async/async_work_group_strided_copy.h b/libclc/generic/include/clc/async/async_work_group_strided_copy.h
new file mode 100644
index 000000000000..bfa6f31faca8
--- /dev/null
+++ b/libclc/generic/include/clc/async/async_work_group_strided_copy.h
@@ -0,0 +1,15 @@
+#define __CLC_DST_ADDR_SPACE local
+#define __CLC_SRC_ADDR_SPACE global
+#define __CLC_BODY <clc/async/async_work_group_strided_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_DST_ADDR_SPACE
+#undef __CLC_SRC_ADDR_SPACE
+#undef __CLC_BODY
+
+#define __CLC_DST_ADDR_SPACE global
+#define __CLC_SRC_ADDR_SPACE local
+#define __CLC_BODY <clc/async/async_work_group_strided_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_DST_ADDR_SPACE
+#undef __CLC_SRC_ADDR_SPACE
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/async/async_work_group_strided_copy.inc b/libclc/generic/include/clc/async/async_work_group_strided_copy.inc
new file mode 100644
index 000000000000..bdbea3aa4a16
--- /dev/null
+++ b/libclc/generic/include/clc/async/async_work_group_strided_copy.inc
@@ -0,0 +1,6 @@
+_CLC_OVERLOAD _CLC_DECL event_t async_work_group_strided_copy(
+  __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
+  const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
+  size_t num_gentypes,
+  size_t stride,
+  event_t event);
diff --git a/libclc/generic/include/clc/async/gentype.inc b/libclc/generic/include/clc/async/gentype.inc
new file mode 100644
index 000000000000..6b79acdff171
--- /dev/null
+++ b/libclc/generic/include/clc/async/gentype.inc
@@ -0,0 +1,204 @@
+
+#define __CLC_GENTYPE char
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE char2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE char4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE char8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE char16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uchar
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uchar2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uchar4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uchar8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uchar16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE short
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE short2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE short4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE short8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE short16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ushort
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ushort2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ushort4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ushort8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ushort16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE long
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE long2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE long4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE long8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE long16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ulong
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ulong2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ulong4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ulong8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE ulong16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#ifdef cl_khr_fp64
+
+#define __CLC_GENTYPE double
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#endif
diff --git a/libclc/generic/include/clc/async/prefetch.h b/libclc/generic/include/clc/async/prefetch.h
new file mode 100644
index 000000000000..f64bc2045de9
--- /dev/null
+++ b/libclc/generic/include/clc/async/prefetch.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/async/prefetch.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/async/prefetch.inc b/libclc/generic/include/clc/async/prefetch.inc
new file mode 100644
index 000000000000..f817a66c249c
--- /dev/null
+++ b/libclc/generic/include/clc/async/prefetch.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL void prefetch(const global __CLC_GENTYPE *p, size_t num_gentypes);
diff --git a/libclc/generic/include/clc/async/wait_group_events.h b/libclc/generic/include/clc/async/wait_group_events.h
new file mode 100644
index 000000000000..799efa0a791c
--- /dev/null
+++ b/libclc/generic/include/clc/async/wait_group_events.h
@@ -0,0 +1 @@
+void wait_group_events(int num_events, event_t *event_list);
diff --git a/libclc/generic/include/clc/atomic/atomic_add.h b/libclc/generic/include/clc/atomic/atomic_add.h
new file mode 100644
index 000000000000..7dd4fd3c682e
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_add.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_add
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_and.h b/libclc/generic/include/clc/atomic/atomic_and.h
new file mode 100644
index 000000000000..a198c46b7ee9
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_and.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_and
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_cmpxchg.h b/libclc/generic/include/clc/atomic/atomic_cmpxchg.h
new file mode 100644
index 000000000000..2e4f1c21dcc2
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_cmpxchg.h
@@ -0,0 +1,15 @@
+#define __CLC_FUNCTION atomic_cmpxchg
+
+#define __CLC_DECLARE_ATOMIC_3_ARG(ADDRSPACE, TYPE) \
+	_CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE, TYPE);
+
+#define __CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(TYPE) \
+	__CLC_DECLARE_ATOMIC_3_ARG(global, TYPE) \
+	__CLC_DECLARE_ATOMIC_3_ARG(local, TYPE)
+
+__CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(int)
+__CLC_DECLARE_ATOMIC_ADDRSPACE_3_ARG(uint)
+
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC_3_ARG
+#undef __CLC_DECLARE_ATOMIC_ADDRESS_SPACE_3_ARG
diff --git a/libclc/generic/include/clc/atomic/atomic_dec.h b/libclc/generic/include/clc/atomic/atomic_dec.h
new file mode 100644
index 000000000000..15d05884aeb4
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_dec.h
@@ -0,0 +1 @@
+#define atomic_dec(p) atomic_sub(p, 1)
diff --git a/libclc/generic/include/clc/atomic/atomic_decl.inc b/libclc/generic/include/clc/atomic/atomic_decl.inc
new file mode 100644
index 000000000000..49ccde2bae52
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_decl.inc
@@ -0,0 +1,10 @@
+
+#define __CLC_DECLARE_ATOMIC(ADDRSPACE, TYPE) \
+	_CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE);
+
+#define __CLC_DECLARE_ATOMIC_ADDRSPACE(TYPE) \
+	__CLC_DECLARE_ATOMIC(global, TYPE) \
+	__CLC_DECLARE_ATOMIC(local, TYPE)
+
+__CLC_DECLARE_ATOMIC_ADDRSPACE(int)
+__CLC_DECLARE_ATOMIC_ADDRSPACE(uint)
diff --git a/libclc/generic/include/clc/atomic/atomic_inc.h b/libclc/generic/include/clc/atomic/atomic_inc.h
new file mode 100644
index 000000000000..d8bc342aa5f6
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_inc.h
@@ -0,0 +1 @@
+#define atomic_inc(p) atomic_add(p, 1)
diff --git a/libclc/generic/include/clc/atomic/atomic_max.h b/libclc/generic/include/clc/atomic/atomic_max.h
new file mode 100644
index 000000000000..ed09ec9caef2
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_max.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_max
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_min.h b/libclc/generic/include/clc/atomic/atomic_min.h
new file mode 100644
index 000000000000..6a46af403d06
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_min.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_min
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_or.h b/libclc/generic/include/clc/atomic/atomic_or.h
new file mode 100644
index 000000000000..2369d81a3a06
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_or.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_or
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_sub.h b/libclc/generic/include/clc/atomic/atomic_sub.h
new file mode 100644
index 000000000000..993e995001fa
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_sub.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_sub
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_xchg.h b/libclc/generic/include/clc/atomic/atomic_xchg.h
new file mode 100644
index 000000000000..ebe0d9af8098
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_xchg.h
@@ -0,0 +1,6 @@
+#define __CLC_FUNCTION atomic_xchg
+#include <clc/atomic/atomic_decl.inc>
+__CLC_DECLARE_ATOMIC_ADDRSPACE(float);
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/atomic/atomic_xor.h b/libclc/generic/include/clc/atomic/atomic_xor.h
new file mode 100644
index 000000000000..2cb74803ca92
--- /dev/null
+++ b/libclc/generic/include/clc/atomic/atomic_xor.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION atomic_xor
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_DECLARE_ATOMIC
+#undef __CLC_DECLARE_ATOMIC_ADDRSPACE
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h
new file mode 100644
index 000000000000..9740b3ddab63
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_add.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_add(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_add(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h
new file mode 100644
index 000000000000..168f423396a6
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_cmpxchg(global int *p, int cmp, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_cmpxchg(global unsigned int *p, unsigned int cmp, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
new file mode 100644
index 000000000000..bbc872ce0527
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h
new file mode 100644
index 000000000000..050747c79403
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_inc(global int *p);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_inc(global unsigned int *p);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h
new file mode 100644
index 000000000000..c435c726798c
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_sub(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h
new file mode 100644
index 000000000000..6a18e9e8e1b1
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_xchg(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_xchg(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h
new file mode 100644
index 000000000000..19df7d6ed6ea
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_and(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_and(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h
new file mode 100644
index 000000000000..b46ce29c40c5
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_max(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_max(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h
new file mode 100644
index 000000000000..0e458eb60eae
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_min(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_min(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h
new file mode 100644
index 000000000000..91cde56a4d7b
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_or(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_or(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h
new file mode 100644
index 000000000000..f787849cff00
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_xor(global int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_xor(global unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h
new file mode 100644
index 000000000000..096d01107d89
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_add.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_add(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_add(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h
new file mode 100644
index 000000000000..e10a84f2cb47
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_cmpxchg(local int *p, int cmp, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_cmpxchg(local unsigned int *p, unsigned int cmp, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h
new file mode 100644
index 000000000000..e74d8fc12b92
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_dec(local int *p);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(local unsigned int *p);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h
new file mode 100644
index 000000000000..718f1f2b8041
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_inc(local int *p);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_inc(local unsigned int *p);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h
new file mode 100644
index 000000000000..6363780e9dec
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_sub(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_sub(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h
new file mode 100644
index 000000000000..c5a1f09b0849
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_xchg(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_xchg(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h
new file mode 100644
index 000000000000..96d7b1a89b6e
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_and(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_and(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h
new file mode 100644
index 000000000000..7d6b17df2a55
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_max(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_max(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h
new file mode 100644
index 000000000000..ddb6cf379284
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_min(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_min(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h
new file mode 100644
index 000000000000..518c256dfbb8
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_or(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_or(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h
new file mode 100644
index 000000000000..e6c9f2f57521
--- /dev/null
+++ b/libclc/generic/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h
@@ -0,0 +1,2 @@
+_CLC_OVERLOAD _CLC_DECL int atom_xor(local int *p, int val);
+_CLC_OVERLOAD _CLC_DECL unsigned int atom_xor(local unsigned int *p, unsigned int val);
diff --git a/libclc/generic/include/clc/clc.h b/libclc/generic/include/clc/clc.h
new file mode 100644
index 000000000000..bd92fdb12b5a
--- /dev/null
+++ b/libclc/generic/include/clc/clc.h
@@ -0,0 +1,195 @@
+#ifndef cl_clang_storage_class_specifiers
+#error Implementation requires cl_clang_storage_class_specifiers extension!
+#endif
+
+#pragma OPENCL EXTENSION cl_clang_storage_class_specifiers : enable
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+/* Function Attributes */
+#include <clc/clcfunc.h>
+
+/* 6.1 Supported Data Types */
+#include <clc/clctypes.h>
+
+/* 6.2.3 Explicit Conversions */
+#include <clc/convert.h>
+
+/* 6.2.4.2 Reinterpreting Types Using as_type() and as_typen() */
+#include <clc/as_type.h>
+
+/* 6.9 Preprocessor Directives and Macros */
+#include <clc/clcversion.h>
+
+/* 6.11.1 Work-Item Functions */
+#include <clc/workitem/get_global_size.h>
+#include <clc/workitem/get_global_id.h>
+#include <clc/workitem/get_local_size.h>
+#include <clc/workitem/get_local_id.h>
+#include <clc/workitem/get_num_groups.h>
+#include <clc/workitem/get_group_id.h>
+
+/* 6.11.2 Math Functions */
+#include <clc/math/acos.h>
+#include <clc/math/asin.h>
+#include <clc/math/atan.h>
+#include <clc/math/atan2.h>
+#include <clc/math/copysign.h>
+#include <clc/math/cos.h>
+#include <clc/math/ceil.h>
+#include <clc/math/exp.h>
+#include <clc/math/exp10.h>
+#include <clc/math/exp2.h>
+#include <clc/math/fabs.h>
+#include <clc/math/floor.h>
+#include <clc/math/fma.h>
+#include <clc/math/fmax.h>
+#include <clc/math/fmin.h>
+#include <clc/math/fmod.h>
+#include <clc/math/hypot.h>
+#include <clc/math/log.h>
+#include <clc/math/log1p.h>
+#include <clc/math/log2.h>
+#include <clc/math/mad.h>
+#include <clc/math/mix.h>
+#include <clc/math/nextafter.h>
+#include <clc/math/pow.h>
+#include <clc/math/pown.h>
+#include <clc/math/rint.h>
+#include <clc/math/round.h>
+#include <clc/math/sin.h>
+#include <clc/math/sincos.h>
+#include <clc/math/sqrt.h>
+#include <clc/math/tan.h>
+#include <clc/math/trunc.h>
+#include <clc/math/native_cos.h>
+#include <clc/math/native_divide.h>
+#include <clc/math/native_exp.h>
+#include <clc/math/native_exp10.h>
+#include <clc/math/native_exp2.h>
+#include <clc/math/native_log.h>
+#include <clc/math/native_log2.h>
+#include <clc/math/native_powr.h>
+#include <clc/math/native_sin.h>
+#include <clc/math/native_sqrt.h>
+#include <clc/math/rsqrt.h>
+
+/* 6.11.2.1 Floating-point macros */
+#include <clc/float/definitions.h>
+
+/* 6.11.3 Integer Functions */
+#include <clc/integer/abs.h>
+#include <clc/integer/abs_diff.h>
+#include <clc/integer/add_sat.h>
+#include <clc/integer/clz.h>
+#include <clc/integer/hadd.h>
+#include <clc/integer/mad24.h>
+#include <clc/integer/mad_hi.h>
+#include <clc/integer/mad_sat.h>
+#include <clc/integer/mul24.h>
+#include <clc/integer/mul_hi.h>
+#include <clc/integer/rhadd.h>
+#include <clc/integer/rotate.h>
+#include <clc/integer/sub_sat.h>
+#include <clc/integer/upsample.h>
+
+/* 6.11.3 Integer Definitions */
+#include <clc/integer/definitions.h>
+
+/* 6.11.2 and 6.11.3 Shared Integer/Math Functions */
+#include <clc/shared/clamp.h>
+#include <clc/shared/max.h>
+#include <clc/shared/min.h>
+#include <clc/shared/vload.h>
+#include <clc/shared/vstore.h>
+
+/* 6.11.4 Common Functions */
+#include <clc/common/sign.h>
+
+/* 6.11.5 Geometric Functions */
+#include <clc/geometric/cross.h>
+#include <clc/geometric/dot.h>
+#include <clc/geometric/length.h>
+#include <clc/geometric/normalize.h>
+
+/* 6.11.6 Relational Functions */
+#include <clc/relational/all.h>
+#include <clc/relational/any.h>
+#include <clc/relational/bitselect.h>
+#include <clc/relational/isequal.h>
+#include <clc/relational/isfinite.h>
+#include <clc/relational/isgreater.h>
+#include <clc/relational/isgreaterequal.h>
+#include <clc/relational/isinf.h>
+#include <clc/relational/isless.h>
+#include <clc/relational/islessequal.h>
+#include <clc/relational/islessgreater.h>
+#include <clc/relational/isnan.h>
+#include <clc/relational/isnormal.h>
+#include <clc/relational/isnotequal.h>
+#include <clc/relational/isordered.h>
+#include <clc/relational/isunordered.h>
+#include <clc/relational/select.h>
+#include <clc/relational/signbit.h>
+
+/* 6.11.8 Synchronization Functions */
+#include <clc/synchronization/cl_mem_fence_flags.h>
+#include <clc/synchronization/barrier.h>
+
+/* 6.11.10 Async Copy and Prefetch Functions */
+#include <clc/async/async_work_group_copy.h>
+#include <clc/async/async_work_group_strided_copy.h>
+#include <clc/async/prefetch.h>
+#include <clc/async/wait_group_events.h>
+
+/* 6.11.11 Atomic Functions */
+#include <clc/atomic/atomic_add.h>
+#include <clc/atomic/atomic_and.h>
+#include <clc/atomic/atomic_cmpxchg.h>
+#include <clc/atomic/atomic_dec.h>
+#include <clc/atomic/atomic_inc.h>
+#include <clc/atomic/atomic_max.h>
+#include <clc/atomic/atomic_min.h>
+#include <clc/atomic/atomic_or.h>
+#include <clc/atomic/atomic_sub.h>
+#include <clc/atomic/atomic_xchg.h>
+#include <clc/atomic/atomic_xor.h>
+
+/* cl_khr_global_int32_base_atomics Extension Functions */
+#include <clc/cl_khr_global_int32_base_atomics/atom_add.h>
+#include <clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h>
+#include <clc/cl_khr_global_int32_base_atomics/atom_dec.h>
+#include <clc/cl_khr_global_int32_base_atomics/atom_inc.h>
+#include <clc/cl_khr_global_int32_base_atomics/atom_sub.h>
+#include <clc/cl_khr_global_int32_base_atomics/atom_xchg.h>
+
+/* cl_khr_global_int32_extended_atomics Extension Functions */
+#include <clc/cl_khr_global_int32_extended_atomics/atom_and.h>
+#include <clc/cl_khr_global_int32_extended_atomics/atom_max.h>
+#include <clc/cl_khr_global_int32_extended_atomics/atom_min.h>
+#include <clc/cl_khr_global_int32_extended_atomics/atom_or.h>
+#include <clc/cl_khr_global_int32_extended_atomics/atom_xor.h>
+
+/* cl_khr_local_int32_base_atomics Extension Functions */
+#include <clc/cl_khr_local_int32_base_atomics/atom_add.h>
+#include <clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h>
+#include <clc/cl_khr_local_int32_base_atomics/atom_dec.h>
+#include <clc/cl_khr_local_int32_base_atomics/atom_inc.h>
+#include <clc/cl_khr_local_int32_base_atomics/atom_sub.h>
+#include <clc/cl_khr_local_int32_base_atomics/atom_xchg.h>
+
+/* cl_khr_local_int32_extended_atomics Extension Functions */
+#include <clc/cl_khr_local_int32_extended_atomics/atom_and.h>
+#include <clc/cl_khr_local_int32_extended_atomics/atom_max.h>
+#include <clc/cl_khr_local_int32_extended_atomics/atom_min.h>
+#include <clc/cl_khr_local_int32_extended_atomics/atom_or.h>
+#include <clc/cl_khr_local_int32_extended_atomics/atom_xor.h>
+
+/* libclc internal defintions */
+#ifdef __CLC_INTERNAL
+#include <math/clc_nextafter.h>
+#endif
+
+#pragma OPENCL EXTENSION all : disable
diff --git a/libclc/generic/include/clc/clcfunc.h b/libclc/generic/include/clc/clcfunc.h
new file mode 100644
index 000000000000..5f166c5a4143
--- /dev/null
+++ b/libclc/generic/include/clc/clcfunc.h
@@ -0,0 +1,4 @@
+#define _CLC_OVERLOAD __attribute__((overloadable))
+#define _CLC_DECL
+#define _CLC_DEF __attribute__((always_inline))
+#define _CLC_INLINE __attribute__((always_inline)) inline
diff --git a/libclc/generic/include/clc/clctypes.h b/libclc/generic/include/clc/clctypes.h
new file mode 100644
index 000000000000..2e3db60dbdfe
--- /dev/null
+++ b/libclc/generic/include/clc/clctypes.h
@@ -0,0 +1,89 @@
+/* 6.1.1 Built-in Scalar Data Types */
+
+typedef unsigned char uchar;
+typedef unsigned short ushort;
+typedef unsigned int uint;
+typedef unsigned long ulong;
+
+typedef __SIZE_TYPE__ size_t;
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+#define __stdint_join3(a,b,c) a ## b ## c
+
+#define  __intn_t(n) __stdint_join3(__INT, n, _TYPE__)
+#define __uintn_t(n) __stdint_join3(unsigned __INT, n, _TYPE__)
+
+typedef  __intn_t(__INTPTR_WIDTH__)  intptr_t;
+typedef __uintn_t(__INTPTR_WIDTH__) uintptr_t;
+
+#undef __uintn_t
+#undef __intn_t
+#undef __stdint_join3
+
+/* 6.1.2 Built-in Vector Data Types */
+
+typedef __attribute__((ext_vector_type(2))) char char2;
+typedef __attribute__((ext_vector_type(3))) char char3;
+typedef __attribute__((ext_vector_type(4))) char char4;
+typedef __attribute__((ext_vector_type(8))) char char8;
+typedef __attribute__((ext_vector_type(16))) char char16;
+
+typedef __attribute__((ext_vector_type(2))) uchar uchar2;
+typedef __attribute__((ext_vector_type(3))) uchar uchar3;
+typedef __attribute__((ext_vector_type(4))) uchar uchar4;
+typedef __attribute__((ext_vector_type(8))) uchar uchar8;
+typedef __attribute__((ext_vector_type(16))) uchar uchar16;
+
+typedef __attribute__((ext_vector_type(2))) short short2;
+typedef __attribute__((ext_vector_type(3))) short short3;
+typedef __attribute__((ext_vector_type(4))) short short4;
+typedef __attribute__((ext_vector_type(8))) short short8;
+typedef __attribute__((ext_vector_type(16))) short short16;
+
+typedef __attribute__((ext_vector_type(2))) ushort ushort2;
+typedef __attribute__((ext_vector_type(3))) ushort ushort3;
+typedef __attribute__((ext_vector_type(4))) ushort ushort4;
+typedef __attribute__((ext_vector_type(8))) ushort ushort8;
+typedef __attribute__((ext_vector_type(16))) ushort ushort16;
+
+typedef __attribute__((ext_vector_type(2))) int int2;
+typedef __attribute__((ext_vector_type(3))) int int3;
+typedef __attribute__((ext_vector_type(4))) int int4;
+typedef __attribute__((ext_vector_type(8))) int int8;
+typedef __attribute__((ext_vector_type(16))) int int16;
+
+typedef __attribute__((ext_vector_type(2))) uint uint2;
+typedef __attribute__((ext_vector_type(3))) uint uint3;
+typedef __attribute__((ext_vector_type(4))) uint uint4;
+typedef __attribute__((ext_vector_type(8))) uint uint8;
+typedef __attribute__((ext_vector_type(16))) uint uint16;
+
+typedef __attribute__((ext_vector_type(2))) long long2;
+typedef __attribute__((ext_vector_type(3))) long long3;
+typedef __attribute__((ext_vector_type(4))) long long4;
+typedef __attribute__((ext_vector_type(8))) long long8;
+typedef __attribute__((ext_vector_type(16))) long long16;
+
+typedef __attribute__((ext_vector_type(2))) ulong ulong2;
+typedef __attribute__((ext_vector_type(3))) ulong ulong3;
+typedef __attribute__((ext_vector_type(4))) ulong ulong4;
+typedef __attribute__((ext_vector_type(8))) ulong ulong8;
+typedef __attribute__((ext_vector_type(16))) ulong ulong16;
+
+typedef __attribute__((ext_vector_type(2))) float float2;
+typedef __attribute__((ext_vector_type(3))) float float3;
+typedef __attribute__((ext_vector_type(4))) float float4;
+typedef __attribute__((ext_vector_type(8))) float float8;
+typedef __attribute__((ext_vector_type(16))) float float16;
+
+/* 9.3 Double Precision Floating-Point */
+
+#ifdef cl_khr_fp64
+typedef __attribute__((ext_vector_type(2))) double double2;
+typedef __attribute__((ext_vector_type(3))) double double3;
+typedef __attribute__((ext_vector_type(4))) double double4;
+typedef __attribute__((ext_vector_type(8))) double double8;
+typedef __attribute__((ext_vector_type(16))) double double16;
+#endif
+
+#define NULL ((void *)0)
diff --git a/libclc/generic/include/clc/clcversion.h b/libclc/generic/include/clc/clcversion.h
new file mode 100644
index 000000000000..57c989e3b713
--- /dev/null
+++ b/libclc/generic/include/clc/clcversion.h
@@ -0,0 +1,8 @@
+#if __OPENCL_VERSION__ >= 110
+#define CLC_VERSION_1_0 100
+#define CLC_VERSION_1_1 110
+#endif
+
+#if __OPENCL_VERSION__ >= 120
+#define CLC_VERSION_1_2 120
+#endif
diff --git a/libclc/generic/include/clc/common/sign.h b/libclc/generic/include/clc/common/sign.h
new file mode 100644
index 000000000000..fa9aa096541f
--- /dev/null
+++ b/libclc/generic/include/clc/common/sign.h
@@ -0,0 +1,5 @@
+#define __CLC_FUNCTION sign
+#define __CLC_BODY <clc/math/unary_decl.inc>
+#include <clc/math/gentype.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/convert.h b/libclc/generic/include/clc/convert.h
new file mode 100644
index 000000000000..f0ba796864d4
--- /dev/null
+++ b/libclc/generic/include/clc/convert.h
@@ -0,0 +1,60 @@
+#define _CLC_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \
+  _CLC_OVERLOAD _CLC_DECL TO_TYPE convert_##TO_TYPE##SUFFIX(FROM_TYPE x);
+
+#define _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE, TO_TYPE, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE##2, TO_TYPE##2, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE##3, TO_TYPE##3, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE##4, TO_TYPE##4, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE##8, TO_TYPE##8, SUFFIX) \
+  _CLC_CONVERT_DECL(FROM_TYPE##16, TO_TYPE##16, SUFFIX)
+
+#define _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, char, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, uchar, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, int, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, uint, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, short, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, ushort, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, long, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, ulong, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, float, SUFFIX)
+
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_CONVERT_FROM(FROM_TYPE, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX) \
+  _CLC_VECTOR_CONVERT_DECL(FROM_TYPE, double, SUFFIX)
+#else
+#define _CLC_VECTOR_CONVERT_FROM(FROM_TYPE, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM1(FROM_TYPE, SUFFIX)
+#endif
+
+#define _CLC_VECTOR_CONVERT_TO1(SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(char, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(uchar, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(int, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(uint, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(short, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(ushort, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(long, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(ulong, SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(float, SUFFIX)
+
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_CONVERT_TO(SUFFIX) \
+  _CLC_VECTOR_CONVERT_TO1(SUFFIX) \
+  _CLC_VECTOR_CONVERT_FROM(double, SUFFIX)
+#else
+#define _CLC_VECTOR_CONVERT_TO(SUFFIX) \
+  _CLC_VECTOR_CONVERT_TO1(SUFFIX)
+#endif
+
+#define _CLC_VECTOR_CONVERT_TO_SUFFIX(ROUND) \
+  _CLC_VECTOR_CONVERT_TO(_sat##ROUND) \
+  _CLC_VECTOR_CONVERT_TO(ROUND)
+
+_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtn)
+_CLC_VECTOR_CONVERT_TO_SUFFIX(_rte)
+_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtz)
+_CLC_VECTOR_CONVERT_TO_SUFFIX(_rtp)
+_CLC_VECTOR_CONVERT_TO_SUFFIX()
diff --git a/libclc/generic/include/clc/float/definitions.h b/libclc/generic/include/clc/float/definitions.h
new file mode 100644
index 000000000000..329b6238c3f4
--- /dev/null
+++ b/libclc/generic/include/clc/float/definitions.h
@@ -0,0 +1,74 @@
+#define MAXFLOAT        0x1.fffffep127f
+#define HUGE_VALF       __builtin_huge_valf()
+#define INFINITY        __builtin_inff()
+#define NAN             __builtin_nanf("")
+
+#define FLT_DIG         6
+#define FLT_MANT_DIG    24
+#define FLT_MAX_10_EXP  +38
+#define FLT_MAX_EXP     +128
+#define FLT_MIN_10_EXP  -37
+#define FLT_MIN_EXP     -125
+#define FLT_RADIX       2
+#define FLT_MAX         MAXFLOAT
+#define FLT_MIN         0x1.0p-126f
+#define FLT_EPSILON     0x1.0p-23f
+
+#define M_E_F           0x1.5bf0a8p+1f
+#define M_LOG2E_F       0x1.715476p+0f
+#define M_LOG10E_F      0x1.bcb7b2p-2f
+#define M_LN2_F         0x1.62e430p-1f
+#define M_LN10_F        0x1.26bb1cp+1f
+#define M_PI_F          0x1.921fb6p+1f
+#define M_PI_2_F        0x1.921fb6p+0f
+#define M_PI_4_F        0x1.921fb6p-1f
+#define M_1_PI_F        0x1.45f306p-2f
+#define M_2_PI_F        0x1.45f306p-1f
+#define M_2_SQRTPI_F    0x1.20dd76p+0f
+#define M_SQRT2_F       0x1.6a09e6p+0f
+#define M_SQRT1_2_F     0x1.6a09e6p-1f
+
+#ifdef cl_khr_fp64
+
+#define HUGE_VAL        __builtin_huge_val()
+
+#define DBL_DIG         15
+#define DBL_MANT_DIG    53
+#define DBL_MAX_10_EXP  +308
+#define DBL_MAX_EXP     +1024
+#define DBL_MIN_10_EXP  -307
+#define DBL_MIN_EXP     -1021
+#define DBL_MAX         0x1.fffffffffffffp1023
+#define DBL_MIN         0x1.0p-1022
+#define DBL_EPSILON     0x1.0p-52
+
+#define M_E             0x1.5bf0a8b145769p+1
+#define M_LOG2E         0x1.71547652b82fep+0
+#define M_LOG10E        0x1.bcb7b1526e50ep-2
+#define M_LN2           0x1.62e42fefa39efp-1
+#define M_LN10          0x1.26bb1bbb55516p+1
+#define M_PI            0x1.921fb54442d18p+1
+#define M_PI_2          0x1.921fb54442d18p+0
+#define M_PI_4          0x1.921fb54442d18p-1
+#define M_1_PI          0x1.45f306dc9c883p-2
+#define M_2_PI          0x1.45f306dc9c883p-1
+#define M_2_SQRTPI      0x1.20dd750429b6dp+0
+#define M_SQRT2         0x1.6a09e667f3bcdp+0
+#define M_SQRT1_2       0x1.6a09e667f3bcdp-1
+
+#endif
+
+#ifdef cl_khr_fp16
+
+#if __OPENCL_VERSION__ >= 120
+
+#define HALF_DIG        3
+#define HALF_MANT_DIG   11
+#define HALF_MAX_10_EXP +4
+#define HALF_MAX_EXP    +16
+#define HALF_MIN_10_EXP -4
+#define HALF_MIN_EXP    -13
+
+#endif
+
+#endif
diff --git a/libclc/generic/include/clc/geometric/cross.h b/libclc/generic/include/clc/geometric/cross.h
new file mode 100644
index 000000000000..eee0cc81bb92
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/cross.h
@@ -0,0 +1,7 @@
+_CLC_OVERLOAD _CLC_DECL float3 cross(float3 p0, float3 p1);
+_CLC_OVERLOAD _CLC_DECL float4 cross(float4 p0, float4 p1);
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD _CLC_DECL double3 cross(double3 p0, double3 p1);
+_CLC_OVERLOAD _CLC_DECL double4 cross(double4 p0, double4 p1);
+#endif
diff --git a/libclc/generic/include/clc/geometric/distance.h b/libclc/generic/include/clc/geometric/distance.h
new file mode 100644
index 000000000000..3e91332d7838
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/distance.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/geometric/distance.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/include/clc/geometric/dot.h b/libclc/generic/include/clc/geometric/dot.h
new file mode 100644
index 000000000000..7f65fed9760d
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/dot.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/geometric/dot.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/include/clc/geometric/dot.inc b/libclc/generic/include/clc/geometric/dot.inc
new file mode 100644
index 000000000000..34245e2935a4
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/dot.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_FLOAT dot(__CLC_FLOATN p0, __CLC_FLOATN p1);
diff --git a/libclc/generic/include/clc/geometric/floatn.inc b/libclc/generic/include/clc/geometric/floatn.inc
new file mode 100644
index 000000000000..fb7a9ae601cd
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/floatn.inc
@@ -0,0 +1,45 @@
+#define __CLC_FLOAT float
+
+#define __CLC_FLOATN float
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float2
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float3
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float4
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#undef __CLC_FLOAT
+
+#ifdef cl_khr_fp64
+
+#define __CLC_FLOAT double
+
+#define __CLC_FLOATN double
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double2
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double3
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double4
+#include __CLC_BODY
+#undef __CLC_FLOATN
+
+#undef __CLC_FLOAT
+
+#endif
+
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/geometric/length.h b/libclc/generic/include/clc/geometric/length.h
new file mode 100644
index 000000000000..cb992b9bc72e
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/length.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/geometric/length.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/include/clc/geometric/length.inc b/libclc/generic/include/clc/geometric/length.inc
new file mode 100644
index 000000000000..c2d95e876831
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/length.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_FLOAT length(__CLC_FLOATN p0);
diff --git a/libclc/generic/include/clc/geometric/normalize.h b/libclc/generic/include/clc/geometric/normalize.h
new file mode 100644
index 000000000000..dccff9b4e041
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/normalize.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/geometric/normalize.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/include/clc/geometric/normalize.inc b/libclc/generic/include/clc/geometric/normalize.inc
new file mode 100644
index 000000000000..6eb13150603e
--- /dev/null
+++ b/libclc/generic/include/clc/geometric/normalize.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_FLOATN normalize(__CLC_FLOATN p);
diff --git a/libclc/generic/include/clc/integer/abs.h b/libclc/generic/include/clc/integer/abs.h
new file mode 100644
index 000000000000..77a4cbeb4fe3
--- /dev/null
+++ b/libclc/generic/include/clc/integer/abs.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/abs.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/abs.inc b/libclc/generic/include/clc/integer/abs.inc
new file mode 100644
index 000000000000..952bce7e29e3
--- /dev/null
+++ b/libclc/generic/include/clc/integer/abs.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_U_GENTYPE abs(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/integer/abs_diff.h b/libclc/generic/include/clc/integer/abs_diff.h
new file mode 100644
index 000000000000..3f3b4b43c5d7
--- /dev/null
+++ b/libclc/generic/include/clc/integer/abs_diff.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/abs_diff.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/abs_diff.inc b/libclc/generic/include/clc/integer/abs_diff.inc
new file mode 100644
index 000000000000..e844d46e808b
--- /dev/null
+++ b/libclc/generic/include/clc/integer/abs_diff.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_U_GENTYPE abs_diff(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/add_sat.h b/libclc/generic/include/clc/integer/add_sat.h
new file mode 100644
index 000000000000..2e5e69851442
--- /dev/null
+++ b/libclc/generic/include/clc/integer/add_sat.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/add_sat.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/add_sat.inc b/libclc/generic/include/clc/integer/add_sat.inc
new file mode 100644
index 000000000000..913841a1dada
--- /dev/null
+++ b/libclc/generic/include/clc/integer/add_sat.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE add_sat(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/clz.h b/libclc/generic/include/clc/integer/clz.h
new file mode 100644
index 000000000000..f7cdbf78ec06
--- /dev/null
+++ b/libclc/generic/include/clc/integer/clz.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/clz.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/clz.inc b/libclc/generic/include/clc/integer/clz.inc
new file mode 100644
index 000000000000..45826d10c9fa
--- /dev/null
+++ b/libclc/generic/include/clc/integer/clz.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clz(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/integer/definitions.h b/libclc/generic/include/clc/integer/definitions.h
new file mode 100644
index 000000000000..a407974a0d4e
--- /dev/null
+++ b/libclc/generic/include/clc/integer/definitions.h
@@ -0,0 +1,15 @@
+#define CHAR_BIT 8
+#define INT_MAX 2147483647
+#define INT_MIN -2147483648
+#define LONG_MAX  0x7fffffffffffffffL
+#define LONG_MIN -0x8000000000000000L
+#define SCHAR_MAX 127
+#define SCHAR_MIN -128
+#define CHAR_MAX 127
+#define CHAR_MIN -128
+#define SHRT_MAX 32767
+#define SHRT_MIN -32768
+#define UCHAR_MAX 255
+#define USHRT_MAX 65535
+#define UINT_MAX 0xffffffff
+#define ULONG_MAX 0xffffffffffffffffUL
diff --git a/libclc/generic/include/clc/integer/gentype.inc b/libclc/generic/include/clc/integer/gentype.inc
new file mode 100644
index 000000000000..6f4d6996d8f5
--- /dev/null
+++ b/libclc/generic/include/clc/integer/gentype.inc
@@ -0,0 +1,435 @@
+//These 2 defines only change when switching between data sizes or base types to
+//keep this file manageable.
+#define __CLC_GENSIZE 8
+#define __CLC_SCALAR_GENTYPE char
+
+#define __CLC_GENTYPE char
+#define __CLC_U_GENTYPE uchar
+#define __CLC_S_GENTYPE char
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE char2
+#define __CLC_U_GENTYPE uchar2
+#define __CLC_S_GENTYPE char2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE char3
+#define __CLC_U_GENTYPE uchar3
+#define __CLC_S_GENTYPE char3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE char4
+#define __CLC_U_GENTYPE uchar4
+#define __CLC_S_GENTYPE char4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE char8
+#define __CLC_U_GENTYPE uchar8
+#define __CLC_S_GENTYPE char8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE char16
+#define __CLC_U_GENTYPE uchar16
+#define __CLC_S_GENTYPE char16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE uchar
+
+#define __CLC_GENTYPE uchar
+#define __CLC_U_GENTYPE uchar
+#define __CLC_S_GENTYPE char
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uchar2
+#define __CLC_U_GENTYPE uchar2
+#define __CLC_S_GENTYPE char2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uchar3
+#define __CLC_U_GENTYPE uchar3
+#define __CLC_S_GENTYPE char3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uchar4
+#define __CLC_U_GENTYPE uchar4
+#define __CLC_S_GENTYPE char4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uchar8
+#define __CLC_U_GENTYPE uchar8
+#define __CLC_S_GENTYPE char8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uchar16
+#define __CLC_U_GENTYPE uchar16
+#define __CLC_S_GENTYPE char16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_GENSIZE
+#define __CLC_GENSIZE 16
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE short
+
+#define __CLC_GENTYPE short
+#define __CLC_U_GENTYPE ushort
+#define __CLC_S_GENTYPE short
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE short2
+#define __CLC_U_GENTYPE ushort2
+#define __CLC_S_GENTYPE short2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE short3
+#define __CLC_U_GENTYPE ushort3
+#define __CLC_S_GENTYPE short3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE short4
+#define __CLC_U_GENTYPE ushort4
+#define __CLC_S_GENTYPE short4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE short8
+#define __CLC_U_GENTYPE ushort8
+#define __CLC_S_GENTYPE short8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE short16
+#define __CLC_U_GENTYPE ushort16
+#define __CLC_S_GENTYPE short16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE ushort
+
+#define __CLC_GENTYPE ushort
+#define __CLC_U_GENTYPE ushort
+#define __CLC_S_GENTYPE short
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ushort2
+#define __CLC_U_GENTYPE ushort2
+#define __CLC_S_GENTYPE short2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ushort3
+#define __CLC_U_GENTYPE ushort3
+#define __CLC_S_GENTYPE short3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ushort4
+#define __CLC_U_GENTYPE ushort4
+#define __CLC_S_GENTYPE short4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ushort8
+#define __CLC_U_GENTYPE ushort8
+#define __CLC_S_GENTYPE short8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ushort16
+#define __CLC_U_GENTYPE ushort16
+#define __CLC_S_GENTYPE short16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_GENSIZE
+#define __CLC_GENSIZE 32
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE int
+
+#define __CLC_GENTYPE int
+#define __CLC_U_GENTYPE uint
+#define __CLC_S_GENTYPE int
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE int2
+#define __CLC_U_GENTYPE uint2
+#define __CLC_S_GENTYPE int2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE int3
+#define __CLC_U_GENTYPE uint3
+#define __CLC_S_GENTYPE int3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE int4
+#define __CLC_U_GENTYPE uint4
+#define __CLC_S_GENTYPE int4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE int8
+#define __CLC_U_GENTYPE uint8
+#define __CLC_S_GENTYPE int8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE int16
+#define __CLC_U_GENTYPE uint16
+#define __CLC_S_GENTYPE int16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE uint
+
+#define __CLC_GENTYPE uint
+#define __CLC_U_GENTYPE uint
+#define __CLC_S_GENTYPE int
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uint2
+#define __CLC_U_GENTYPE uint2
+#define __CLC_S_GENTYPE int2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uint3
+#define __CLC_U_GENTYPE uint3
+#define __CLC_S_GENTYPE int3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uint4
+#define __CLC_U_GENTYPE uint4
+#define __CLC_S_GENTYPE int4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uint8
+#define __CLC_U_GENTYPE uint8
+#define __CLC_S_GENTYPE int8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE uint16
+#define __CLC_U_GENTYPE uint16
+#define __CLC_S_GENTYPE int16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_GENSIZE
+#define __CLC_GENSIZE 64
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE long
+
+#define __CLC_GENTYPE long
+#define __CLC_U_GENTYPE ulong
+#define __CLC_S_GENTYPE long
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE long2
+#define __CLC_U_GENTYPE ulong2
+#define __CLC_S_GENTYPE long2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE long3
+#define __CLC_U_GENTYPE ulong3
+#define __CLC_S_GENTYPE long3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE long4
+#define __CLC_U_GENTYPE ulong4
+#define __CLC_S_GENTYPE long4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE long8
+#define __CLC_U_GENTYPE ulong8
+#define __CLC_S_GENTYPE long8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE long16
+#define __CLC_U_GENTYPE ulong16
+#define __CLC_S_GENTYPE long16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_SCALAR_GENTYPE
+#define __CLC_SCALAR_GENTYPE ulong
+
+#define __CLC_GENTYPE ulong
+#define __CLC_U_GENTYPE ulong
+#define __CLC_S_GENTYPE long
+#define __CLC_SCALAR 1
+#include __CLC_BODY
+#undef __CLC_SCALAR
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ulong2
+#define __CLC_U_GENTYPE ulong2
+#define __CLC_S_GENTYPE long2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ulong3
+#define __CLC_U_GENTYPE ulong3
+#define __CLC_S_GENTYPE long3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ulong4
+#define __CLC_U_GENTYPE ulong4
+#define __CLC_S_GENTYPE long4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ulong8
+#define __CLC_U_GENTYPE ulong8
+#define __CLC_S_GENTYPE long8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#define __CLC_GENTYPE ulong16
+#define __CLC_U_GENTYPE ulong16
+#define __CLC_S_GENTYPE long16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_U_GENTYPE
+#undef __CLC_S_GENTYPE
+
+#undef __CLC_GENSIZE
+#undef __CLC_SCALAR_GENTYPE
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/integer/hadd.h b/libclc/generic/include/clc/integer/hadd.h
new file mode 100644
index 000000000000..37304e26cc2d
--- /dev/null
+++ b/libclc/generic/include/clc/integer/hadd.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/hadd.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/hadd.inc b/libclc/generic/include/clc/integer/hadd.inc
new file mode 100644
index 000000000000..f698989cef20
--- /dev/null
+++ b/libclc/generic/include/clc/integer/hadd.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE hadd(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/integer-gentype.inc b/libclc/generic/include/clc/integer/integer-gentype.inc
new file mode 100644
index 000000000000..e4115cf45ebb
--- /dev/null
+++ b/libclc/generic/include/clc/integer/integer-gentype.inc
@@ -0,0 +1,47 @@
+#define __CLC_GENTYPE int
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE int16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE uint16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
diff --git a/libclc/generic/include/clc/integer/mad24.h b/libclc/generic/include/clc/integer/mad24.h
new file mode 100644
index 000000000000..0c120faac2b1
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mad24.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/integer/mad24.inc>
+#include <clc/integer/integer-gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/integer/mad24.inc b/libclc/generic/include/clc/integer/mad24.inc
new file mode 100644
index 000000000000..81fe0c2a8926
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mad24.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad24(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z);
diff --git a/libclc/generic/include/clc/integer/mad_hi.h b/libclc/generic/include/clc/integer/mad_hi.h
new file mode 100644
index 000000000000..863ce92d9f2d
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mad_hi.h
@@ -0,0 +1 @@
+#define mad_hi(a, b, c) (mul_hi((a),(b))+(c))
diff --git a/libclc/generic/include/clc/integer/mad_sat.h b/libclc/generic/include/clc/integer/mad_sat.h
new file mode 100644
index 000000000000..3e92372a27d0
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mad_sat.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/integer/mad_sat.inc>
+#include <clc/integer/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/integer/mad_sat.inc b/libclc/generic/include/clc/integer/mad_sat.inc
new file mode 100644
index 000000000000..5da2bdf8908d
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mad_sat.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad_sat(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z);
diff --git a/libclc/generic/include/clc/integer/mul24.h b/libclc/generic/include/clc/integer/mul24.h
new file mode 100644
index 000000000000..4f97098d70f0
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mul24.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/integer/mul24.inc>
+#include <clc/integer/integer-gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/integer/mul24.inc b/libclc/generic/include/clc/integer/mul24.inc
new file mode 100644
index 000000000000..8cbf7c10ac44
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mul24.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mul24(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/mul_hi.h b/libclc/generic/include/clc/integer/mul_hi.h
new file mode 100644
index 000000000000..27b95d83442f
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mul_hi.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/mul_hi.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/mul_hi.inc b/libclc/generic/include/clc/integer/mul_hi.inc
new file mode 100644
index 000000000000..ce9e5c0b2c18
--- /dev/null
+++ b/libclc/generic/include/clc/integer/mul_hi.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mul_hi(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/rhadd.h b/libclc/generic/include/clc/integer/rhadd.h
new file mode 100644
index 000000000000..69b43faeebd2
--- /dev/null
+++ b/libclc/generic/include/clc/integer/rhadd.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/rhadd.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/rhadd.inc b/libclc/generic/include/clc/integer/rhadd.inc
new file mode 100644
index 000000000000..88ccaf09fd5e
--- /dev/null
+++ b/libclc/generic/include/clc/integer/rhadd.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE rhadd(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/rotate.h b/libclc/generic/include/clc/integer/rotate.h
new file mode 100644
index 000000000000..6320223e7cf2
--- /dev/null
+++ b/libclc/generic/include/clc/integer/rotate.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/rotate.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/rotate.inc b/libclc/generic/include/clc/integer/rotate.inc
new file mode 100644
index 000000000000..c97711ecf882
--- /dev/null
+++ b/libclc/generic/include/clc/integer/rotate.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE rotate(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/sub_sat.h b/libclc/generic/include/clc/integer/sub_sat.h
new file mode 100644
index 000000000000..f84152944817
--- /dev/null
+++ b/libclc/generic/include/clc/integer/sub_sat.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/integer/sub_sat.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/include/clc/integer/sub_sat.inc b/libclc/generic/include/clc/integer/sub_sat.inc
new file mode 100644
index 000000000000..425df2e4b696
--- /dev/null
+++ b/libclc/generic/include/clc/integer/sub_sat.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE sub_sat(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/integer/upsample.h b/libclc/generic/include/clc/integer/upsample.h
new file mode 100644
index 000000000000..0b36b692a2c8
--- /dev/null
+++ b/libclc/generic/include/clc/integer/upsample.h
@@ -0,0 +1,25 @@
+#define __CLC_UPSAMPLE_DECL(BGENTYPE, GENTYPE, UGENTYPE) \
+    _CLC_OVERLOAD _CLC_DECL BGENTYPE upsample(GENTYPE hi, UGENTYPE lo);
+
+#define __CLC_UPSAMPLE_VEC(BGENTYPE, GENTYPE, UGENTYPE) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE, GENTYPE, UGENTYPE) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE##2, GENTYPE##2, UGENTYPE##2) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE##3, GENTYPE##3, UGENTYPE##3) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE##4, GENTYPE##4, UGENTYPE##4) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE##8, GENTYPE##8, UGENTYPE##8) \
+    __CLC_UPSAMPLE_DECL(BGENTYPE##16, GENTYPE##16, UGENTYPE##16) \
+
+#define __CLC_UPSAMPLE_TYPES() \
+    __CLC_UPSAMPLE_VEC(short, char, uchar) \
+    __CLC_UPSAMPLE_VEC(ushort, uchar, uchar) \
+    __CLC_UPSAMPLE_VEC(int, short, ushort) \
+    __CLC_UPSAMPLE_VEC(uint, ushort, ushort) \
+    __CLC_UPSAMPLE_VEC(long, int, uint) \
+    __CLC_UPSAMPLE_VEC(ulong, uint, uint) \
+
+__CLC_UPSAMPLE_TYPES()
+
+#undef __CLC_UPSAMPLE_TYPES
+#undef __CLC_UPSAMPLE_DECL
+#undef __CLC_UPSAMPLE_VEC
+
diff --git a/libclc/generic/include/clc/math/acos.h b/libclc/generic/include/clc/math/acos.h
new file mode 100644
index 000000000000..e753dee36aa5
--- /dev/null
+++ b/libclc/generic/include/clc/math/acos.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/acos.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/acos.inc b/libclc/generic/include/clc/math/acos.inc
new file mode 100644
index 000000000000..4ca8c7538aef
--- /dev/null
+++ b/libclc/generic/include/clc/math/acos.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE acos(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/math/asin.h b/libclc/generic/include/clc/math/asin.h
new file mode 100644
index 000000000000..2a858721e952
--- /dev/null
+++ b/libclc/generic/include/clc/math/asin.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/asin.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/asin.inc b/libclc/generic/include/clc/math/asin.inc
new file mode 100644
index 000000000000..b4ad8ff1231d
--- /dev/null
+++ b/libclc/generic/include/clc/math/asin.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE asin(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/math/atan.h b/libclc/generic/include/clc/math/atan.h
new file mode 100644
index 000000000000..d9697194ee8a
--- /dev/null
+++ b/libclc/generic/include/clc/math/atan.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define __CLC_BODY <clc/math/atan.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/atan.inc b/libclc/generic/include/clc/math/atan.inc
new file mode 100644
index 000000000000..d217c955593f
--- /dev/null
+++ b/libclc/generic/include/clc/math/atan.inc
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE atan(__CLC_GENTYPE a);
diff --git a/libclc/generic/include/clc/math/atan2.h b/libclc/generic/include/clc/math/atan2.h
new file mode 100644
index 000000000000..9c082a082f0a
--- /dev/null
+++ b/libclc/generic/include/clc/math/atan2.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define __CLC_BODY <clc/math/atan2.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/atan2.inc b/libclc/generic/include/clc/math/atan2.inc
new file mode 100644
index 000000000000..ce273da53346
--- /dev/null
+++ b/libclc/generic/include/clc/math/atan2.inc
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE atan2(__CLC_GENTYPE a, __CLC_GENTYPE b);
diff --git a/libclc/generic/include/clc/math/binary_decl.inc b/libclc/generic/include/clc/math/binary_decl.inc
new file mode 100644
index 000000000000..70a711477704
--- /dev/null
+++ b/libclc/generic/include/clc/math/binary_decl.inc
@@ -0,0 +1,6 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, __CLC_GENTYPE b);
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, float b);
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE a, double b);
+#endif
diff --git a/libclc/generic/include/clc/math/binary_intrin.inc b/libclc/generic/include/clc/math/binary_intrin.inc
new file mode 100644
index 000000000000..cfbe74159ec2
--- /dev/null
+++ b/libclc/generic/include/clc/math/binary_intrin.inc
@@ -0,0 +1,18 @@
+_CLC_OVERLOAD float __CLC_FUNCTION(float, float) __asm(__CLC_INTRINSIC ".f32");
+_CLC_OVERLOAD float2 __CLC_FUNCTION(float2, float2) __asm(__CLC_INTRINSIC ".v2f32");
+_CLC_OVERLOAD float3 __CLC_FUNCTION(float3, float3) __asm(__CLC_INTRINSIC ".v3f32");
+_CLC_OVERLOAD float4 __CLC_FUNCTION(float4, float4) __asm(__CLC_INTRINSIC ".v4f32");
+_CLC_OVERLOAD float8 __CLC_FUNCTION(float8, float8) __asm(__CLC_INTRINSIC ".v8f32");
+_CLC_OVERLOAD float16 __CLC_FUNCTION(float16, float16) __asm(__CLC_INTRINSIC ".v16f32");
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD double __CLC_FUNCTION(double, double) __asm(__CLC_INTRINSIC ".f64");
+_CLC_OVERLOAD double2 __CLC_FUNCTION(double2, double2) __asm(__CLC_INTRINSIC ".v2f64");
+_CLC_OVERLOAD double3 __CLC_FUNCTION(double3, double3) __asm(__CLC_INTRINSIC ".v3f64");
+_CLC_OVERLOAD double4 __CLC_FUNCTION(double4, double4) __asm(__CLC_INTRINSIC ".v4f64");
+_CLC_OVERLOAD double8 __CLC_FUNCTION(double8, double8) __asm(__CLC_INTRINSIC ".v8f64");
+_CLC_OVERLOAD double16 __CLC_FUNCTION(double16, double16) __asm(__CLC_INTRINSIC ".v16f64");
+#endif
+
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
diff --git a/libclc/generic/include/clc/math/ceil.h b/libclc/generic/include/clc/math/ceil.h
new file mode 100644
index 000000000000..5b40abf97c20
--- /dev/null
+++ b/libclc/generic/include/clc/math/ceil.h
@@ -0,0 +1,6 @@
+#undef ceil
+#define ceil __clc_ceil
+
+#define __CLC_FUNCTION __clc_ceil
+#define __CLC_INTRINSIC "llvm.ceil"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/clc_nextafter.h b/libclc/generic/include/clc/math/clc_nextafter.h
new file mode 100644
index 000000000000..81c8f369c3bd
--- /dev/null
+++ b/libclc/generic/include/clc/math/clc_nextafter.h
@@ -0,0 +1,11 @@
+#define __CLC_BODY <clc/math/binary_decl.inc>
+
+#define __CLC_FUNCTION nextafter
+#include <clc/math/gentype.inc>
+#undef __CLC_FUNCTION
+
+#define __CLC_FUNCTION __clc_nextafter
+#include <clc/math/gentype.inc>
+#undef __CLC_FUNCTION
+
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/math/copysign.h b/libclc/generic/include/clc/math/copysign.h
new file mode 100644
index 000000000000..8f0742e451fd
--- /dev/null
+++ b/libclc/generic/include/clc/math/copysign.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/copysign.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/copysign.inc b/libclc/generic/include/clc/math/copysign.inc
new file mode 100644
index 000000000000..6091abcc1fc5
--- /dev/null
+++ b/libclc/generic/include/clc/math/copysign.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE copysign(__CLC_GENTYPE a, __CLC_GENTYPE b);
diff --git a/libclc/generic/include/clc/math/cos.h b/libclc/generic/include/clc/math/cos.h
new file mode 100644
index 000000000000..3d4cf39a0f80
--- /dev/null
+++ b/libclc/generic/include/clc/math/cos.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/math/cos.inc>
+#include <clc/math/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/math/cos.inc b/libclc/generic/include/clc/math/cos.inc
new file mode 100644
index 000000000000..160e625c6912
--- /dev/null
+++ b/libclc/generic/include/clc/math/cos.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE cos(__CLC_GENTYPE a);
diff --git a/libclc/generic/include/clc/math/exp.h b/libclc/generic/include/clc/math/exp.h
new file mode 100644
index 000000000000..986652476295
--- /dev/null
+++ b/libclc/generic/include/clc/math/exp.h
@@ -0,0 +1,9 @@
+#undef exp
+
+#define __CLC_BODY <clc/math/unary_decl.inc>
+#define __CLC_FUNCTION exp
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/math/exp10.h b/libclc/generic/include/clc/math/exp10.h
new file mode 100644
index 000000000000..a1d426a20ab0
--- /dev/null
+++ b/libclc/generic/include/clc/math/exp10.h
@@ -0,0 +1,9 @@
+#undef exp10
+
+#define __CLC_BODY <clc/math/unary_decl.inc>
+#define __CLC_FUNCTION exp10
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/math/exp2.h b/libclc/generic/include/clc/math/exp2.h
new file mode 100644
index 000000000000..ec0dad268a7b
--- /dev/null
+++ b/libclc/generic/include/clc/math/exp2.h
@@ -0,0 +1,6 @@
+#undef exp2
+#define exp2 __clc_exp2
+
+#define __CLC_FUNCTION __clc_exp2
+#define __CLC_INTRINSIC "llvm.exp2"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/fabs.h b/libclc/generic/include/clc/math/fabs.h
new file mode 100644
index 000000000000..ee2f8932a94d
--- /dev/null
+++ b/libclc/generic/include/clc/math/fabs.h
@@ -0,0 +1,6 @@
+#undef fabs
+#define fabs __clc_fabs
+
+#define __CLC_FUNCTION __clc_fabs
+#define __CLC_INTRINSIC "llvm.fabs"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/floor.h b/libclc/generic/include/clc/math/floor.h
new file mode 100644
index 000000000000..2337d35caae6
--- /dev/null
+++ b/libclc/generic/include/clc/math/floor.h
@@ -0,0 +1,6 @@
+#undef floor
+#define floor __clc_floor
+
+#define __CLC_FUNCTION __clc_floor
+#define __CLC_INTRINSIC "llvm.floor"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/fma.h b/libclc/generic/include/clc/math/fma.h
new file mode 100644
index 000000000000..02d39f681675
--- /dev/null
+++ b/libclc/generic/include/clc/math/fma.h
@@ -0,0 +1,6 @@
+#undef fma
+#define fma __clc_fma
+
+#define __CLC_FUNCTION __clc_fma
+#define __CLC_INTRINSIC "llvm.fma"
+#include <clc/math/ternary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/fmax.h b/libclc/generic/include/clc/math/fmax.h
new file mode 100644
index 000000000000..d6956af85a5f
--- /dev/null
+++ b/libclc/generic/include/clc/math/fmax.h
@@ -0,0 +1,11 @@
+#undef fmax
+#define fmax __clc_fmax
+
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION __clc_fmax
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
+
diff --git a/libclc/generic/include/clc/math/fmin.h b/libclc/generic/include/clc/math/fmin.h
new file mode 100644
index 000000000000..5588ba93a8b8
--- /dev/null
+++ b/libclc/generic/include/clc/math/fmin.h
@@ -0,0 +1,11 @@
+#undef fmin
+#define fmin __clc_fmin
+
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION __clc_fmin
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
+
diff --git a/libclc/generic/include/clc/math/fmod.h b/libclc/generic/include/clc/math/fmod.h
new file mode 100644
index 000000000000..49068675b0ef
--- /dev/null
+++ b/libclc/generic/include/clc/math/fmod.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/fmod.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/fmod.inc b/libclc/generic/include/clc/math/fmod.inc
new file mode 100644
index 000000000000..39d915365c25
--- /dev/null
+++ b/libclc/generic/include/clc/math/fmod.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE fmod(__CLC_GENTYPE a, __CLC_GENTYPE b);
diff --git a/libclc/generic/include/clc/math/gentype.inc b/libclc/generic/include/clc/math/gentype.inc
new file mode 100644
index 000000000000..9f79f6eb037f
--- /dev/null
+++ b/libclc/generic/include/clc/math/gentype.inc
@@ -0,0 +1,67 @@
+#define __CLC_SCALAR_GENTYPE float
+#define __CLC_FPSIZE 32
+
+#define __CLC_GENTYPE float
+#define __CLC_SCALAR
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_SCALAR
+
+#define __CLC_GENTYPE float2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE float16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#undef __CLC_FPSIZE
+#undef __CLC_SCALAR_GENTYPE
+
+#ifdef cl_khr_fp64
+#define __CLC_SCALAR_GENTYPE double
+#define __CLC_FPSIZE 64
+
+#define __CLC_SCALAR
+#define __CLC_GENTYPE double
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+#undef __CLC_SCALAR
+
+#define __CLC_GENTYPE double2
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double3
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double4
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double8
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#define __CLC_GENTYPE double16
+#include __CLC_BODY
+#undef __CLC_GENTYPE
+
+#undef __CLC_FPSIZE
+#undef __CLC_SCALAR_GENTYPE
+#endif
+
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/math/hypot.h b/libclc/generic/include/clc/math/hypot.h
new file mode 100644
index 000000000000..c00eb4532461
--- /dev/null
+++ b/libclc/generic/include/clc/math/hypot.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/hypot.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/hypot.inc b/libclc/generic/include/clc/math/hypot.inc
new file mode 100644
index 000000000000..08b46058b0aa
--- /dev/null
+++ b/libclc/generic/include/clc/math/hypot.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE hypot(__CLC_GENTYPE x, __CLC_GENTYPE y);
diff --git a/libclc/generic/include/clc/math/log.h b/libclc/generic/include/clc/math/log.h
new file mode 100644
index 000000000000..644f8575c1b3
--- /dev/null
+++ b/libclc/generic/include/clc/math/log.h
@@ -0,0 +1,4 @@
+#undef log
+
+// log(x) = log2(x) * (1/log2(e))
+#define log(val) (__clc_log2(val) * 0.693147181f)
diff --git a/libclc/generic/include/clc/math/log1p.h b/libclc/generic/include/clc/math/log1p.h
new file mode 100644
index 000000000000..4d716dd18d9c
--- /dev/null
+++ b/libclc/generic/include/clc/math/log1p.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define __CLC_BODY <clc/math/log1p.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/log1p.inc b/libclc/generic/include/clc/math/log1p.inc
new file mode 100644
index 000000000000..4cbfbf38fc11
--- /dev/null
+++ b/libclc/generic/include/clc/math/log1p.inc
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE log1p(__CLC_GENTYPE a);
diff --git a/libclc/generic/include/clc/math/log2.h b/libclc/generic/include/clc/math/log2.h
new file mode 100644
index 000000000000..880124097ed0
--- /dev/null
+++ b/libclc/generic/include/clc/math/log2.h
@@ -0,0 +1,6 @@
+#undef log2
+#define log2 __clc_log2
+
+#define __CLC_FUNCTION __clc_log2
+#define __CLC_INTRINSIC "llvm.log2"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/mad.h b/libclc/generic/include/clc/math/mad.h
new file mode 100644
index 000000000000..c4e50840ced0
--- /dev/null
+++ b/libclc/generic/include/clc/math/mad.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/mad.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/mad.inc b/libclc/generic/include/clc/math/mad.inc
new file mode 100644
index 000000000000..61194b6ca4a7
--- /dev/null
+++ b/libclc/generic/include/clc/math/mad.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mad(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c);
diff --git a/libclc/generic/include/clc/math/mix.h b/libclc/generic/include/clc/math/mix.h
new file mode 100644
index 000000000000..c3c95c1f0c4b
--- /dev/null
+++ b/libclc/generic/include/clc/math/mix.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/mix.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/mix.inc b/libclc/generic/include/clc/math/mix.inc
new file mode 100644
index 000000000000..52cb10ad9027
--- /dev/null
+++ b/libclc/generic/include/clc/math/mix.inc
@@ -0,0 +1,5 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mix(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c);
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE mix(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_SCALAR_GENTYPE c);
+#endif
diff --git a/libclc/generic/include/clc/math/native_cos.h b/libclc/generic/include/clc/math/native_cos.h
new file mode 100644
index 000000000000..c7212cc4b663
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_cos.h
@@ -0,0 +1 @@
+#define native_cos cos
diff --git a/libclc/generic/include/clc/math/native_divide.h b/libclc/generic/include/clc/math/native_divide.h
new file mode 100644
index 000000000000..5c52167fd3e7
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_divide.h
@@ -0,0 +1 @@
+#define native_divide(x, y) ((x) / (y))
diff --git a/libclc/generic/include/clc/math/native_exp.h b/libclc/generic/include/clc/math/native_exp.h
new file mode 100644
index 000000000000..e206de66926d
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_exp.h
@@ -0,0 +1 @@
+#define native_exp exp
diff --git a/libclc/generic/include/clc/math/native_exp10.h b/libclc/generic/include/clc/math/native_exp10.h
new file mode 100644
index 000000000000..1156f58c53a5
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_exp10.h
@@ -0,0 +1 @@
+#define native_exp10 exp10
diff --git a/libclc/generic/include/clc/math/native_exp2.h b/libclc/generic/include/clc/math/native_exp2.h
new file mode 100644
index 000000000000..b6759390ee43
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_exp2.h
@@ -0,0 +1 @@
+#define native_exp2 exp2
diff --git a/libclc/generic/include/clc/math/native_log.h b/libclc/generic/include/clc/math/native_log.h
new file mode 100644
index 000000000000..7805a39ed696
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_log.h
@@ -0,0 +1 @@
+#define native_log log
diff --git a/libclc/generic/include/clc/math/native_log2.h b/libclc/generic/include/clc/math/native_log2.h
new file mode 100644
index 000000000000..0c692eec27f4
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_log2.h
@@ -0,0 +1 @@
+#define native_log2 log2
diff --git a/libclc/generic/include/clc/math/native_powr.h b/libclc/generic/include/clc/math/native_powr.h
new file mode 100644
index 000000000000..e8a37d9cb066
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_powr.h
@@ -0,0 +1 @@
+#define native_powr pow
diff --git a/libclc/generic/include/clc/math/native_sin.h b/libclc/generic/include/clc/math/native_sin.h
new file mode 100644
index 000000000000..569a051ccc75
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_sin.h
@@ -0,0 +1 @@
+#define native_sin sin
diff --git a/libclc/generic/include/clc/math/native_sqrt.h b/libclc/generic/include/clc/math/native_sqrt.h
new file mode 100644
index 000000000000..a9525fccb7c1
--- /dev/null
+++ b/libclc/generic/include/clc/math/native_sqrt.h
@@ -0,0 +1 @@
+#define native_sqrt sqrt
diff --git a/libclc/generic/include/clc/math/nextafter.h b/libclc/generic/include/clc/math/nextafter.h
new file mode 100644
index 000000000000..06e1b2a53c52
--- /dev/null
+++ b/libclc/generic/include/clc/math/nextafter.h
@@ -0,0 +1,5 @@
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION nextafter
+#include <clc/math/gentype.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/math/pow.h b/libclc/generic/include/clc/math/pow.h
new file mode 100644
index 000000000000..320d341a6830
--- /dev/null
+++ b/libclc/generic/include/clc/math/pow.h
@@ -0,0 +1,6 @@
+#undef pow
+#define pow __clc_pow
+
+#define __CLC_FUNCTION __clc_pow
+#define __CLC_INTRINSIC "llvm.pow"
+#include <clc/math/binary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/pown.h b/libclc/generic/include/clc/math/pown.h
new file mode 100644
index 000000000000..bdbf50c1de6f
--- /dev/null
+++ b/libclc/generic/include/clc/math/pown.h
@@ -0,0 +1,24 @@
+#define _CLC_POWN_INTRINSIC "llvm.powi"
+
+#define _CLC_POWN_DECL(GENTYPE, INTTYPE) \
+  _CLC_OVERLOAD _CLC_DECL GENTYPE pown(GENTYPE x, INTTYPE y);
+
+#define _CLC_VECTOR_POWN_DECL(GENTYPE, INTTYPE) \
+  _CLC_POWN_DECL(GENTYPE##2, INTTYPE##2)  \
+  _CLC_POWN_DECL(GENTYPE##3, INTTYPE##3)  \
+  _CLC_POWN_DECL(GENTYPE##4, INTTYPE##4)  \
+  _CLC_POWN_DECL(GENTYPE##8, INTTYPE##8)  \
+  _CLC_POWN_DECL(GENTYPE##16, INTTYPE##16)
+
+_CLC_OVERLOAD float pown(float x, int y) __asm(_CLC_POWN_INTRINSIC ".f32");
+
+_CLC_VECTOR_POWN_DECL(float, int)
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD double pown(double x, int y) __asm(_CLC_POWN_INTRINSIC ".f64");
+_CLC_VECTOR_POWN_DECL(double, int)
+#endif
+
+#undef _CLC_POWN_INTRINSIC
+#undef _CLC_POWN_DECL
+#undef _CLC_VECTOR_POWN_DECL
diff --git a/libclc/generic/include/clc/math/rint.h b/libclc/generic/include/clc/math/rint.h
new file mode 100644
index 000000000000..d257634a6f95
--- /dev/null
+++ b/libclc/generic/include/clc/math/rint.h
@@ -0,0 +1,6 @@
+#undef rint
+#define rint __clc_rint
+
+#define __CLC_FUNCTION __clc_rint
+#define __CLC_INTRINSIC "llvm.rint"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/round.h b/libclc/generic/include/clc/math/round.h
new file mode 100644
index 000000000000..43e16aed028f
--- /dev/null
+++ b/libclc/generic/include/clc/math/round.h
@@ -0,0 +1,9 @@
+#undef round
+#define round __clc_round
+
+#define __CLC_FUNCTION __clc_round
+#define __CLC_INTRINSIC "llvm.round"
+#include <clc/math/unary_intrin.inc>
+
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
diff --git a/libclc/generic/include/clc/math/rsqrt.h b/libclc/generic/include/clc/math/rsqrt.h
new file mode 100644
index 000000000000..9d49ee652262
--- /dev/null
+++ b/libclc/generic/include/clc/math/rsqrt.h
@@ -0,0 +1 @@
+#define rsqrt(x) (1.f/sqrt(x))
diff --git a/libclc/generic/include/clc/math/sin.h b/libclc/generic/include/clc/math/sin.h
new file mode 100644
index 000000000000..6d4cf5a3142c
--- /dev/null
+++ b/libclc/generic/include/clc/math/sin.h
@@ -0,0 +1,3 @@
+#define __CLC_BODY <clc/math/sin.inc>
+#include <clc/math/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/math/sin.inc b/libclc/generic/include/clc/math/sin.inc
new file mode 100644
index 000000000000..e722fa352731
--- /dev/null
+++ b/libclc/generic/include/clc/math/sin.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE sin(__CLC_GENTYPE a);
diff --git a/libclc/generic/include/clc/math/sincos.h b/libclc/generic/include/clc/math/sincos.h
new file mode 100644
index 000000000000..fbb9b55cd1f7
--- /dev/null
+++ b/libclc/generic/include/clc/math/sincos.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/sincos.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/sincos.inc b/libclc/generic/include/clc/math/sincos.inc
new file mode 100644
index 000000000000..444ac82a5204
--- /dev/null
+++ b/libclc/generic/include/clc/math/sincos.inc
@@ -0,0 +1,8 @@
+#define __CLC_DECLARE_SINCOS(ADDRSPACE, TYPE) \
+  _CLC_OVERLOAD _CLC_DECL TYPE sincos (TYPE x, ADDRSPACE TYPE * cosval);
+
+__CLC_DECLARE_SINCOS(global, __CLC_GENTYPE)
+__CLC_DECLARE_SINCOS(local, __CLC_GENTYPE)
+__CLC_DECLARE_SINCOS(private, __CLC_GENTYPE)
+
+#undef __CLC_DECLARE_SINCOS
diff --git a/libclc/generic/include/clc/math/sqrt.h b/libclc/generic/include/clc/math/sqrt.h
new file mode 100644
index 000000000000..f69de847e629
--- /dev/null
+++ b/libclc/generic/include/clc/math/sqrt.h
@@ -0,0 +1,6 @@
+#undef sqrt
+#define sqrt __clc_sqrt
+
+#define __CLC_FUNCTION __clc_sqrt
+#define __CLC_INTRINSIC "llvm.sqrt"
+#include <clc/math/unary_intrin.inc>
diff --git a/libclc/generic/include/clc/math/tan.h b/libclc/generic/include/clc/math/tan.h
new file mode 100644
index 000000000000..d2d52a9459d0
--- /dev/null
+++ b/libclc/generic/include/clc/math/tan.h
@@ -0,0 +1,2 @@
+#define __CLC_BODY <clc/math/tan.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/math/tan.inc b/libclc/generic/include/clc/math/tan.inc
new file mode 100644
index 000000000000..50c5b1d160c8
--- /dev/null
+++ b/libclc/generic/include/clc/math/tan.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE tan(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/math/ternary_intrin.inc b/libclc/generic/include/clc/math/ternary_intrin.inc
new file mode 100644
index 000000000000..9633696ed9c4
--- /dev/null
+++ b/libclc/generic/include/clc/math/ternary_intrin.inc
@@ -0,0 +1,18 @@
+_CLC_OVERLOAD float __CLC_FUNCTION(float, float, float) __asm(__CLC_INTRINSIC ".f32");
+_CLC_OVERLOAD float2 __CLC_FUNCTION(float2, float2, float2) __asm(__CLC_INTRINSIC ".v2f32");
+_CLC_OVERLOAD float3 __CLC_FUNCTION(float3, float3, float3) __asm(__CLC_INTRINSIC ".v3f32");
+_CLC_OVERLOAD float4 __CLC_FUNCTION(float4, float4, float4) __asm(__CLC_INTRINSIC ".v4f32");
+_CLC_OVERLOAD float8 __CLC_FUNCTION(float8, float8, float8) __asm(__CLC_INTRINSIC ".v8f32");
+_CLC_OVERLOAD float16 __CLC_FUNCTION(float16, float16, float16) __asm(__CLC_INTRINSIC ".v16f32");
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD double __CLC_FUNCTION(double, double, double) __asm(__CLC_INTRINSIC ".f64");
+_CLC_OVERLOAD double2 __CLC_FUNCTION(double2, double2, double2) __asm(__CLC_INTRINSIC ".v2f64");
+_CLC_OVERLOAD double3 __CLC_FUNCTION(double3, double3, double3) __asm(__CLC_INTRINSIC ".v3f64");
+_CLC_OVERLOAD double4 __CLC_FUNCTION(double4, double4, double4) __asm(__CLC_INTRINSIC ".v4f64");
+_CLC_OVERLOAD double8 __CLC_FUNCTION(double8, double8, double8) __asm(__CLC_INTRINSIC ".v8f64");
+_CLC_OVERLOAD double16 __CLC_FUNCTION(double16, double16, double16) __asm(__CLC_INTRINSIC ".v16f64");
+#endif
+
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
diff --git a/libclc/generic/include/clc/math/trunc.h b/libclc/generic/include/clc/math/trunc.h
new file mode 100644
index 000000000000..d34f66190433
--- /dev/null
+++ b/libclc/generic/include/clc/math/trunc.h
@@ -0,0 +1,9 @@
+#undef trunc
+#define trunc __clc_trunc
+
+#define __CLC_FUNCTION __clc_trunc
+#define __CLC_INTRINSIC "llvm.trunc"
+#include <clc/math/unary_intrin.inc>
+
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
diff --git a/libclc/generic/include/clc/math/unary_decl.inc b/libclc/generic/include/clc/math/unary_decl.inc
new file mode 100644
index 000000000000..9858d908da09
--- /dev/null
+++ b/libclc/generic/include/clc/math/unary_decl.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE __CLC_FUNCTION(__CLC_GENTYPE x);
diff --git a/libclc/generic/include/clc/math/unary_intrin.inc b/libclc/generic/include/clc/math/unary_intrin.inc
new file mode 100644
index 000000000000..8c62d8827fe7
--- /dev/null
+++ b/libclc/generic/include/clc/math/unary_intrin.inc
@@ -0,0 +1,18 @@
+_CLC_OVERLOAD float __CLC_FUNCTION(float f) __asm(__CLC_INTRINSIC ".f32");
+_CLC_OVERLOAD float2 __CLC_FUNCTION(float2 f) __asm(__CLC_INTRINSIC ".v2f32");
+_CLC_OVERLOAD float3 __CLC_FUNCTION(float3 f) __asm(__CLC_INTRINSIC ".v3f32");
+_CLC_OVERLOAD float4 __CLC_FUNCTION(float4 f) __asm(__CLC_INTRINSIC ".v4f32");
+_CLC_OVERLOAD float8 __CLC_FUNCTION(float8 f) __asm(__CLC_INTRINSIC ".v8f32");
+_CLC_OVERLOAD float16 __CLC_FUNCTION(float16 f) __asm(__CLC_INTRINSIC ".v16f32");
+
+#ifdef cl_khr_fp64
+_CLC_OVERLOAD double __CLC_FUNCTION(double d) __asm(__CLC_INTRINSIC ".f64");
+_CLC_OVERLOAD double2 __CLC_FUNCTION(double2 d) __asm(__CLC_INTRINSIC ".v2f64");
+_CLC_OVERLOAD double3 __CLC_FUNCTION(double3 d) __asm(__CLC_INTRINSIC ".v3f64");
+_CLC_OVERLOAD double4 __CLC_FUNCTION(double4 d) __asm(__CLC_INTRINSIC ".v4f64");
+_CLC_OVERLOAD double8 __CLC_FUNCTION(double8 d) __asm(__CLC_INTRINSIC ".v8f64");
+_CLC_OVERLOAD double16 __CLC_FUNCTION(double16 d) __asm(__CLC_INTRINSIC ".v16f64");
+#endif
+
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
diff --git a/libclc/generic/include/clc/relational/all.h b/libclc/generic/include/clc/relational/all.h
new file mode 100644
index 000000000000..f8b0942444a2
--- /dev/null
+++ b/libclc/generic/include/clc/relational/all.h
@@ -0,0 +1,18 @@
+#define _CLC_ALL_DECL(TYPE) \
+  _CLC_OVERLOAD _CLC_DECL int all(TYPE v);
+
+#define _CLC_VECTOR_ALL_DECL(TYPE) \
+  _CLC_ALL_DECL(TYPE)     \
+  _CLC_ALL_DECL(TYPE##2)  \
+  _CLC_ALL_DECL(TYPE##3)  \
+  _CLC_ALL_DECL(TYPE##4)  \
+  _CLC_ALL_DECL(TYPE##8)  \
+  _CLC_ALL_DECL(TYPE##16)
+
+_CLC_VECTOR_ALL_DECL(char)
+_CLC_VECTOR_ALL_DECL(short)
+_CLC_VECTOR_ALL_DECL(int)
+_CLC_VECTOR_ALL_DECL(long)
+
+#undef _CLC_ALL_DECL
+#undef _CLC_VECTOR_ALL_DECL
diff --git a/libclc/generic/include/clc/relational/any.h b/libclc/generic/include/clc/relational/any.h
new file mode 100644
index 000000000000..4687ed263793
--- /dev/null
+++ b/libclc/generic/include/clc/relational/any.h
@@ -0,0 +1,16 @@
+
+#define _CLC_ANY_DECL(TYPE) \
+  _CLC_OVERLOAD _CLC_DECL int any(TYPE v);
+
+#define _CLC_VECTOR_ANY_DECL(TYPE) \
+  _CLC_ANY_DECL(TYPE)     \
+  _CLC_ANY_DECL(TYPE##2)  \
+  _CLC_ANY_DECL(TYPE##3)  \
+  _CLC_ANY_DECL(TYPE##4)  \
+  _CLC_ANY_DECL(TYPE##8)  \
+  _CLC_ANY_DECL(TYPE##16)
+
+_CLC_VECTOR_ANY_DECL(char)
+_CLC_VECTOR_ANY_DECL(short)
+_CLC_VECTOR_ANY_DECL(int)
+_CLC_VECTOR_ANY_DECL(long)
diff --git a/libclc/generic/include/clc/relational/binary_decl.inc b/libclc/generic/include/clc/relational/binary_decl.inc
new file mode 100644
index 000000000000..c9e4aee839a1
--- /dev/null
+++ b/libclc/generic/include/clc/relational/binary_decl.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_INTN __CLC_FUNCTION(__CLC_FLOATN a, __CLC_FLOATN b);
diff --git a/libclc/generic/include/clc/relational/bitselect.h b/libclc/generic/include/clc/relational/bitselect.h
new file mode 100644
index 000000000000..e91cbfded8b7
--- /dev/null
+++ b/libclc/generic/include/clc/relational/bitselect.h
@@ -0,0 +1 @@
+#define bitselect(x, y, z) ((x) ^ ((z) & ((y) ^ (x))))
diff --git a/libclc/generic/include/clc/relational/floatn.inc b/libclc/generic/include/clc/relational/floatn.inc
new file mode 100644
index 000000000000..8d7fd52cc7da
--- /dev/null
+++ b/libclc/generic/include/clc/relational/floatn.inc
@@ -0,0 +1,81 @@
+
+#define __CLC_FLOATN float
+#define __CLC_INTN int
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float2
+#define __CLC_INTN int2
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float3
+#define __CLC_INTN int3
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float4
+#define __CLC_INTN int4
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float8
+#define __CLC_INTN int8
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN float16
+#define __CLC_INTN int16
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#undef __CLC_FLOAT
+#undef __CLC_INT
+
+#ifdef cl_khr_fp64
+
+#define __CLC_FLOATN double
+#define __CLC_INTN int
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double2
+#define __CLC_INTN long2
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double3
+#define __CLC_INTN long3
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double4
+#define __CLC_INTN long4
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double8
+#define __CLC_INTN long8
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#define __CLC_FLOATN double16
+#define __CLC_INTN long16
+#include __CLC_BODY
+#undef __CLC_INTN
+#undef __CLC_FLOATN
+
+#endif
+
+#undef __CLC_BODY
diff --git a/libclc/generic/include/clc/relational/isequal.h b/libclc/generic/include/clc/relational/isequal.h
new file mode 100644
index 000000000000..c28a98565ee3
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isequal.h
@@ -0,0 +1,20 @@
+#define _CLC_ISEQUAL_DECL(TYPE, RETTYPE) \
+  _CLC_OVERLOAD _CLC_DECL RETTYPE isequal(TYPE x, TYPE y);
+
+#define _CLC_VECTOR_ISEQUAL_DECL(TYPE, RETTYPE) \
+  _CLC_ISEQUAL_DECL(TYPE##2, RETTYPE##2)  \
+  _CLC_ISEQUAL_DECL(TYPE##3, RETTYPE##3)  \
+  _CLC_ISEQUAL_DECL(TYPE##4, RETTYPE##4)  \
+  _CLC_ISEQUAL_DECL(TYPE##8, RETTYPE##8)  \
+  _CLC_ISEQUAL_DECL(TYPE##16, RETTYPE##16)
+
+_CLC_ISEQUAL_DECL(float, int)
+_CLC_VECTOR_ISEQUAL_DECL(float, int)
+
+#ifdef cl_khr_fp64
+_CLC_ISEQUAL_DECL(double, int)
+_CLC_VECTOR_ISEQUAL_DECL(double, long)
+#endif
+
+#undef _CLC_ISEQUAL_DECL
+#undef _CLC_VECTOR_ISEQUAL_DEC
diff --git a/libclc/generic/include/clc/relational/isfinite.h b/libclc/generic/include/clc/relational/isfinite.h
new file mode 100644
index 000000000000..48e261a54ff7
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isfinite.h
@@ -0,0 +1,9 @@
+#undef isfinite
+
+#define __CLC_FUNCTION isfinite
+#define __CLC_BODY <clc/relational/unary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isgreater.h b/libclc/generic/include/clc/relational/isgreater.h
new file mode 100644
index 000000000000..d17ae0c00c82
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isgreater.h
@@ -0,0 +1,9 @@
+#undef isgreater
+
+#define __CLC_FUNCTION isgreater
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isgreaterequal.h b/libclc/generic/include/clc/relational/isgreaterequal.h
new file mode 100644
index 000000000000..835332858d29
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isgreaterequal.h
@@ -0,0 +1,9 @@
+#undef isgreaterequal
+
+#define __CLC_FUNCTION isgreaterequal
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isinf.h b/libclc/generic/include/clc/relational/isinf.h
new file mode 100644
index 000000000000..869f0c8a9ac4
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isinf.h
@@ -0,0 +1,21 @@
+
+#define _CLC_ISINF_DECL(RET_TYPE, ARG_TYPE) \
+  _CLC_OVERLOAD _CLC_DECL RET_TYPE isinf(ARG_TYPE);
+
+#define _CLC_VECTOR_ISINF_DECL(RET_TYPE, ARG_TYPE) \
+  _CLC_ISINF_DECL(RET_TYPE##2, ARG_TYPE##2) \
+  _CLC_ISINF_DECL(RET_TYPE##3, ARG_TYPE##3) \
+  _CLC_ISINF_DECL(RET_TYPE##4, ARG_TYPE##4) \
+  _CLC_ISINF_DECL(RET_TYPE##8, ARG_TYPE##8) \
+  _CLC_ISINF_DECL(RET_TYPE##16, ARG_TYPE##16)
+
+_CLC_ISINF_DECL(int, float)
+_CLC_VECTOR_ISINF_DECL(int, float)
+
+#ifdef cl_khr_fp64
+_CLC_ISINF_DECL(int, double)
+_CLC_VECTOR_ISINF_DECL(long, double)
+#endif
+
+#undef _CLC_ISINF_DECL
+#undef _CLC_VECTOR_ISINF_DECL
diff --git a/libclc/generic/include/clc/relational/isless.h b/libclc/generic/include/clc/relational/isless.h
new file mode 100644
index 000000000000..1debd87f386e
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isless.h
@@ -0,0 +1,7 @@
+#define __CLC_FUNCTION isless
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/islessequal.h b/libclc/generic/include/clc/relational/islessequal.h
new file mode 100644
index 000000000000..e6a99d7f21c8
--- /dev/null
+++ b/libclc/generic/include/clc/relational/islessequal.h
@@ -0,0 +1,7 @@
+#define __CLC_FUNCTION islessequal
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/islessgreater.h b/libclc/generic/include/clc/relational/islessgreater.h
new file mode 100644
index 000000000000..005ba1090789
--- /dev/null
+++ b/libclc/generic/include/clc/relational/islessgreater.h
@@ -0,0 +1,7 @@
+#define __CLC_FUNCTION islessgreater
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isnan.h b/libclc/generic/include/clc/relational/isnan.h
new file mode 100644
index 000000000000..93eb9dffb424
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isnan.h
@@ -0,0 +1,21 @@
+
+#define _CLC_ISNAN_DECL(RET_TYPE, ARG_TYPE) \
+  _CLC_OVERLOAD _CLC_DECL RET_TYPE isnan(ARG_TYPE);
+
+#define _CLC_VECTOR_ISNAN_DECL(RET_TYPE, ARG_TYPE) \
+  _CLC_ISNAN_DECL(RET_TYPE##2, ARG_TYPE##2) \
+  _CLC_ISNAN_DECL(RET_TYPE##3, ARG_TYPE##3) \
+  _CLC_ISNAN_DECL(RET_TYPE##4, ARG_TYPE##4) \
+  _CLC_ISNAN_DECL(RET_TYPE##8, ARG_TYPE##8) \
+  _CLC_ISNAN_DECL(RET_TYPE##16, ARG_TYPE##16)
+
+_CLC_ISNAN_DECL(int, float)
+_CLC_VECTOR_ISNAN_DECL(int, float)
+
+#ifdef cl_khr_fp64
+_CLC_ISNAN_DECL(int, double)
+_CLC_VECTOR_ISNAN_DECL(long, double)
+#endif
+
+#undef _CLC_ISNAN_DECL
+#undef _CLC_VECTOR_ISNAN_DECL
diff --git a/libclc/generic/include/clc/relational/isnormal.h b/libclc/generic/include/clc/relational/isnormal.h
new file mode 100644
index 000000000000..f568c56f8e6e
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isnormal.h
@@ -0,0 +1,9 @@
+#undef isnormal
+
+#define __CLC_FUNCTION isnormal
+#define __CLC_BODY <clc/relational/unary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isnotequal.h b/libclc/generic/include/clc/relational/isnotequal.h
new file mode 100644
index 000000000000..f2ceea211046
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isnotequal.h
@@ -0,0 +1,9 @@
+#undef isnotequal
+
+#define __CLC_FUNCTION isnotequal
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isordered.h b/libclc/generic/include/clc/relational/isordered.h
new file mode 100644
index 000000000000..89e9620a4600
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isordered.h
@@ -0,0 +1,9 @@
+#undef isordered
+
+#define __CLC_FUNCTION isordered
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/isunordered.h b/libclc/generic/include/clc/relational/isunordered.h
new file mode 100644
index 000000000000..a6b8e2557d23
--- /dev/null
+++ b/libclc/generic/include/clc/relational/isunordered.h
@@ -0,0 +1,9 @@
+#undef isunordered
+
+#define __CLC_FUNCTION isunordered
+#define __CLC_BODY <clc/relational/binary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/select.h b/libclc/generic/include/clc/relational/select.h
new file mode 100644
index 000000000000..33a6909fb929
--- /dev/null
+++ b/libclc/generic/include/clc/relational/select.h
@@ -0,0 +1 @@
+#define select(a, b, c) ((c) ? (b) : (a))
diff --git a/libclc/generic/include/clc/relational/signbit.h b/libclc/generic/include/clc/relational/signbit.h
new file mode 100644
index 000000000000..41e5284bb34c
--- /dev/null
+++ b/libclc/generic/include/clc/relational/signbit.h
@@ -0,0 +1,9 @@
+#undef signbit
+
+#define __CLC_FUNCTION signbit
+#define __CLC_BODY <clc/relational/unary_decl.inc>
+
+#include <clc/relational/floatn.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/include/clc/relational/unary_decl.inc b/libclc/generic/include/clc/relational/unary_decl.inc
new file mode 100644
index 000000000000..ab9b776a46ec
--- /dev/null
+++ b/libclc/generic/include/clc/relational/unary_decl.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_INTN __CLC_FUNCTION(__CLC_FLOATN x);
diff --git a/libclc/generic/include/clc/shared/clamp.h b/libclc/generic/include/clc/shared/clamp.h
new file mode 100644
index 000000000000..a389b85d2666
--- /dev/null
+++ b/libclc/generic/include/clc/shared/clamp.h
@@ -0,0 +1,5 @@
+#define __CLC_BODY <clc/shared/clamp.inc>
+#include <clc/integer/gentype.inc>
+
+#define __CLC_BODY <clc/shared/clamp.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/shared/clamp.inc b/libclc/generic/include/clc/shared/clamp.inc
new file mode 100644
index 000000000000..aaff9d0ff07f
--- /dev/null
+++ b/libclc/generic/include/clc/shared/clamp.inc
@@ -0,0 +1,5 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z);
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_SCALAR_GENTYPE y, __CLC_SCALAR_GENTYPE z);
+#endif
diff --git a/libclc/generic/include/clc/shared/max.h b/libclc/generic/include/clc/shared/max.h
new file mode 100644
index 000000000000..ee20b9e64df7
--- /dev/null
+++ b/libclc/generic/include/clc/shared/max.h
@@ -0,0 +1,5 @@
+#define __CLC_BODY <clc/shared/max.inc>
+#include <clc/integer/gentype.inc>
+
+#define __CLC_BODY <clc/shared/max.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/shared/max.inc b/libclc/generic/include/clc/shared/max.inc
new file mode 100644
index 000000000000..590107435e66
--- /dev/null
+++ b/libclc/generic/include/clc/shared/max.inc
@@ -0,0 +1,5 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_GENTYPE b);
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b);
+#endif
diff --git a/libclc/generic/include/clc/shared/min.h b/libclc/generic/include/clc/shared/min.h
new file mode 100644
index 000000000000..e11d9f9551ff
--- /dev/null
+++ b/libclc/generic/include/clc/shared/min.h
@@ -0,0 +1,5 @@
+#define __CLC_BODY <clc/shared/min.inc>
+#include <clc/integer/gentype.inc>
+
+#define __CLC_BODY <clc/shared/min.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/include/clc/shared/min.inc b/libclc/generic/include/clc/shared/min.inc
new file mode 100644
index 000000000000..d8c1568a590c
--- /dev/null
+++ b/libclc/generic/include/clc/shared/min.inc
@@ -0,0 +1,5 @@
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_GENTYPE b);
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b);
+#endif
diff --git a/libclc/generic/include/clc/shared/vload.h b/libclc/generic/include/clc/shared/vload.h
new file mode 100644
index 000000000000..93d07501d4a1
--- /dev/null
+++ b/libclc/generic/include/clc/shared/vload.h
@@ -0,0 +1,37 @@
+#define _CLC_VLOAD_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DECL VEC_TYPE vload##WIDTH(size_t offset, const ADDR_SPACE PRIM_TYPE *x);
+
+#define _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
+
+#define _CLC_VECTOR_VLOAD_PRIM1(PRIM_TYPE) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __private) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __local) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __constant) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __global) \
+
+#define _CLC_VECTOR_VLOAD_PRIM() \
+    _CLC_VECTOR_VLOAD_PRIM1(char) \
+    _CLC_VECTOR_VLOAD_PRIM1(uchar) \
+    _CLC_VECTOR_VLOAD_PRIM1(short) \
+    _CLC_VECTOR_VLOAD_PRIM1(ushort) \
+    _CLC_VECTOR_VLOAD_PRIM1(int) \
+    _CLC_VECTOR_VLOAD_PRIM1(uint) \
+    _CLC_VECTOR_VLOAD_PRIM1(long) \
+    _CLC_VECTOR_VLOAD_PRIM1(ulong) \
+    _CLC_VECTOR_VLOAD_PRIM1(float) \
+        
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_VLOAD() \
+  _CLC_VECTOR_VLOAD_PRIM1(double) \
+  _CLC_VECTOR_VLOAD_PRIM()
+#else
+#define _CLC_VECTOR_VLOAD() \
+  _CLC_VECTOR_VLOAD_PRIM()
+#endif
+
+_CLC_VECTOR_VLOAD()
diff --git a/libclc/generic/include/clc/shared/vstore.h b/libclc/generic/include/clc/shared/vstore.h
new file mode 100644
index 000000000000..1f784f82fec0
--- /dev/null
+++ b/libclc/generic/include/clc/shared/vstore.h
@@ -0,0 +1,36 @@
+#define _CLC_VSTORE_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DECL void vstore##WIDTH(VEC_TYPE vec, size_t offset, ADDR_SPACE PRIM_TYPE *out);
+
+#define _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
+
+#define _CLC_VECTOR_VSTORE_PRIM1(PRIM_TYPE) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __private) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __local) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __global) \
+
+#define _CLC_VECTOR_VSTORE_PRIM() \
+    _CLC_VECTOR_VSTORE_PRIM1(char) \
+    _CLC_VECTOR_VSTORE_PRIM1(uchar) \
+    _CLC_VECTOR_VSTORE_PRIM1(short) \
+    _CLC_VECTOR_VSTORE_PRIM1(ushort) \
+    _CLC_VECTOR_VSTORE_PRIM1(int) \
+    _CLC_VECTOR_VSTORE_PRIM1(uint) \
+    _CLC_VECTOR_VSTORE_PRIM1(long) \
+    _CLC_VECTOR_VSTORE_PRIM1(ulong) \
+    _CLC_VECTOR_VSTORE_PRIM1(float) \
+        
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_VSTORE() \
+  _CLC_VECTOR_VSTORE_PRIM1(double) \
+  _CLC_VECTOR_VSTORE_PRIM()
+#else
+#define _CLC_VECTOR_VSTORE() \
+  _CLC_VECTOR_VSTORE_PRIM()
+#endif
+
+_CLC_VECTOR_VSTORE()
diff --git a/libclc/generic/include/clc/synchronization/barrier.h b/libclc/generic/include/clc/synchronization/barrier.h
new file mode 100644
index 000000000000..7167a3d3f093
--- /dev/null
+++ b/libclc/generic/include/clc/synchronization/barrier.h
@@ -0,0 +1 @@
+_CLC_DECL void barrier(cl_mem_fence_flags flags);
diff --git a/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h b/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h
new file mode 100644
index 000000000000..c57eb4249a41
--- /dev/null
+++ b/libclc/generic/include/clc/synchronization/cl_mem_fence_flags.h
@@ -0,0 +1,4 @@
+typedef uint cl_mem_fence_flags;
+
+#define CLK_LOCAL_MEM_FENCE 1
+#define CLK_GLOBAL_MEM_FENCE 2
diff --git a/libclc/generic/include/clc/workitem/get_global_id.h b/libclc/generic/include/clc/workitem/get_global_id.h
new file mode 100644
index 000000000000..92759f146894
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_global_id.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_global_id(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_global_size.h b/libclc/generic/include/clc/workitem/get_global_size.h
new file mode 100644
index 000000000000..2f8370585397
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_global_size.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_global_size(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_group_id.h b/libclc/generic/include/clc/workitem/get_group_id.h
new file mode 100644
index 000000000000..346c82c6c316
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_group_id.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_group_id(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_local_id.h b/libclc/generic/include/clc/workitem/get_local_id.h
new file mode 100644
index 000000000000..169aeed86786
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_local_id.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_local_id(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_local_size.h b/libclc/generic/include/clc/workitem/get_local_size.h
new file mode 100644
index 000000000000..040ec58a3d8b
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_local_size.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_local_size(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_num_groups.h b/libclc/generic/include/clc/workitem/get_num_groups.h
new file mode 100644
index 000000000000..e555c7efc2d2
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_num_groups.h
@@ -0,0 +1 @@
+_CLC_DECL size_t get_num_groups(uint dim);
diff --git a/libclc/generic/include/clc/workitem/get_work_dim.h b/libclc/generic/include/clc/workitem/get_work_dim.h
new file mode 100644
index 000000000000..6d1982567063
--- /dev/null
+++ b/libclc/generic/include/clc/workitem/get_work_dim.h
@@ -0,0 +1 @@
+_CLC_DECL uint get_work_dim();
diff --git a/libclc/generic/include/math/clc_nextafter.h b/libclc/generic/include/math/clc_nextafter.h
new file mode 100644
index 000000000000..2b674b707956
--- /dev/null
+++ b/libclc/generic/include/math/clc_nextafter.h
@@ -0,0 +1,7 @@
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION __clc_nextafter
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/libclc/generic/lib/SOURCES b/libclc/generic/lib/SOURCES
new file mode 100644
index 000000000000..b76fec98f634
--- /dev/null
+++ b/libclc/generic/lib/SOURCES
@@ -0,0 +1,99 @@
+async/async_work_group_copy.cl
+async/async_work_group_strided_copy.cl
+async/prefetch.cl
+async/wait_group_events.cl
+atomic/atomic_xchg.cl
+atomic/atomic_impl.ll
+cl_khr_global_int32_base_atomics/atom_add.cl
+cl_khr_global_int32_base_atomics/atom_cmpxchg.cl
+cl_khr_global_int32_base_atomics/atom_dec.cl
+cl_khr_global_int32_base_atomics/atom_inc.cl
+cl_khr_global_int32_base_atomics/atom_sub.cl
+cl_khr_global_int32_base_atomics/atom_xchg.cl
+cl_khr_global_int32_extended_atomics/atom_and.cl
+cl_khr_global_int32_extended_atomics/atom_max.cl
+cl_khr_global_int32_extended_atomics/atom_min.cl
+cl_khr_global_int32_extended_atomics/atom_or.cl
+cl_khr_global_int32_extended_atomics/atom_xor.cl
+cl_khr_local_int32_base_atomics/atom_add.cl
+cl_khr_local_int32_base_atomics/atom_cmpxchg.cl
+cl_khr_local_int32_base_atomics/atom_dec.cl
+cl_khr_local_int32_base_atomics/atom_inc.cl
+cl_khr_local_int32_base_atomics/atom_sub.cl
+cl_khr_local_int32_base_atomics/atom_xchg.cl
+cl_khr_local_int32_extended_atomics/atom_and.cl
+cl_khr_local_int32_extended_atomics/atom_max.cl
+cl_khr_local_int32_extended_atomics/atom_min.cl
+cl_khr_local_int32_extended_atomics/atom_or.cl
+cl_khr_local_int32_extended_atomics/atom_xor.cl
+convert.cl
+common/sign.cl
+geometric/cross.cl
+geometric/dot.cl
+geometric/length.cl
+geometric/normalize.cl
+integer/abs.cl
+integer/abs_diff.cl
+integer/add_sat.cl
+integer/add_sat_if.ll
+integer/add_sat_impl.ll
+integer/clz.cl
+integer/clz_if.ll
+integer/clz_impl.ll
+integer/hadd.cl
+integer/mad24.cl
+integer/mad_sat.cl
+integer/mul24.cl
+integer/mul_hi.cl
+integer/rhadd.cl
+integer/rotate.cl
+integer/sub_sat.cl
+integer/sub_sat_if.ll
+integer/sub_sat_impl.ll
+integer/upsample.cl
+math/acos.cl
+math/asin.cl
+math/atan.cl
+math/atan2.cl
+math/copysign.cl
+math/cos.cl
+math/exp.cl
+math/exp10.cl
+math/fmax.cl
+math/fmin.cl
+math/fmod.cl
+math/hypot.cl
+math/log1p.cl
+math/mad.cl
+math/mix.cl
+math/tables.cl
+math/clc_nextafter.cl
+math/nextafter.cl
+math/pown.cl
+math/sin.cl
+math/sincos.cl
+math/sincos_helpers.cl
+math/tan.cl
+relational/all.cl
+relational/any.cl
+relational/isequal.cl
+relational/isfinite.cl
+relational/isgreater.cl
+relational/isgreaterequal.cl
+relational/isinf.cl
+relational/isless.cl
+relational/islessequal.cl
+relational/islessgreater.cl
+relational/isnan.cl
+relational/isnormal.cl
+relational/isnotequal.cl
+relational/isordered.cl
+relational/isunordered.cl
+relational/signbit.cl
+shared/clamp.cl
+shared/max.cl
+shared/min.cl
+shared/vload.cl
+shared/vstore.cl
+workitem/get_global_id.cl
+workitem/get_global_size.cl
diff --git a/libclc/generic/lib/async/async_work_group_copy.cl b/libclc/generic/lib/async/async_work_group_copy.cl
new file mode 100644
index 000000000000..fe20ecfd9bba
--- /dev/null
+++ b/libclc/generic/lib/async/async_work_group_copy.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <async_work_group_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/lib/async/async_work_group_copy.inc b/libclc/generic/lib/async/async_work_group_copy.inc
new file mode 100644
index 000000000000..a143ddfb9f6c
--- /dev/null
+++ b/libclc/generic/lib/async/async_work_group_copy.inc
@@ -0,0 +1,17 @@
+_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy(
+    local __CLC_GENTYPE *dst,
+    const global __CLC_GENTYPE *src,
+    size_t num_gentypes,
+    event_t event) {
+
+  return async_work_group_strided_copy(dst, src, num_gentypes, 1, event);
+}
+
+_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy(
+    global __CLC_GENTYPE *dst,
+    const local __CLC_GENTYPE *src,
+    size_t num_gentypes,
+    event_t event) {
+
+  return async_work_group_strided_copy(dst, src, num_gentypes, 1, event);
+}
diff --git a/libclc/generic/lib/async/async_work_group_strided_copy.cl b/libclc/generic/lib/async/async_work_group_strided_copy.cl
new file mode 100644
index 000000000000..61b88986fe47
--- /dev/null
+++ b/libclc/generic/lib/async/async_work_group_strided_copy.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <async_work_group_strided_copy.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/lib/async/async_work_group_strided_copy.inc b/libclc/generic/lib/async/async_work_group_strided_copy.inc
new file mode 100644
index 000000000000..d81a8b79430d
--- /dev/null
+++ b/libclc/generic/lib/async/async_work_group_strided_copy.inc
@@ -0,0 +1,34 @@
+
+#define STRIDED_COPY(dst, src, num_gentypes, dst_stride, src_stride)       \
+  size_t size = get_local_size(0) * get_local_size(1) * get_local_size(2); \
+  size_t id = (get_local_size(1) * get_local_size(2) * get_local_id(0)) +  \
+              (get_local_size(2) * get_local_id(1)) +                      \
+              get_local_id(2);                                             \
+  size_t i;                                                                \
+                                                                           \
+  for (i = id; i < num_gentypes; i += size) {                              \
+    dst[i * dst_stride] = src[i * src_stride];                             \
+  }
+
+
+_CLC_OVERLOAD _CLC_DEF event_t async_work_group_strided_copy(
+    local __CLC_GENTYPE *dst,
+    const global __CLC_GENTYPE *src,
+    size_t num_gentypes,
+    size_t src_stride,
+    event_t event) {
+
+  STRIDED_COPY(dst, src, num_gentypes, 1, src_stride);
+  return event;
+}
+
+_CLC_OVERLOAD _CLC_DEF event_t async_work_group_strided_copy(
+    global __CLC_GENTYPE *dst,
+    const local __CLC_GENTYPE *src,
+    size_t num_gentypes,
+    size_t dst_stride,
+    event_t event) {
+
+  STRIDED_COPY(dst, src, num_gentypes, dst_stride, 1);
+  return event;
+}
diff --git a/libclc/generic/lib/async/prefetch.cl b/libclc/generic/lib/async/prefetch.cl
new file mode 100644
index 000000000000..45af21b4d9ff
--- /dev/null
+++ b/libclc/generic/lib/async/prefetch.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <prefetch.inc>
+#include <clc/async/gentype.inc>
+#undef __CLC_BODY
diff --git a/libclc/generic/lib/async/prefetch.inc b/libclc/generic/lib/async/prefetch.inc
new file mode 100644
index 000000000000..6747e4cf5819
--- /dev/null
+++ b/libclc/generic/lib/async/prefetch.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DEF void prefetch(const global __CLC_GENTYPE *p, size_t num_gentypes) { }
diff --git a/libclc/generic/lib/async/wait_group_events.cl b/libclc/generic/lib/async/wait_group_events.cl
new file mode 100644
index 000000000000..05c9d58db45e
--- /dev/null
+++ b/libclc/generic/lib/async/wait_group_events.cl
@@ -0,0 +1,5 @@
+#include <clc/clc.h>
+
+_CLC_DEF void wait_group_events(int num_events, event_t *event_list) {
+  barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
+}
diff --git a/libclc/generic/lib/atomic/atomic_impl.ll b/libclc/generic/lib/atomic/atomic_impl.ll
new file mode 100644
index 000000000000..019147f8c509
--- /dev/null
+++ b/libclc/generic/lib/atomic/atomic_impl.ll
@@ -0,0 +1,133 @@
+define i32 @__clc_atomic_add_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile add i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_add_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile add i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_and_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile and i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_and_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile and i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_cmpxchg_addr1(i32 addrspace(1)* nocapture %ptr, i32 %compare, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = cmpxchg volatile i32 addrspace(1)* %ptr, i32 %compare, i32 %value seq_cst seq_cst
+  %1 = extractvalue { i32, i1 } %0, 0
+  ret i32 %1
+}
+
+define i32 @__clc_atomic_cmpxchg_addr3(i32 addrspace(3)* nocapture %ptr, i32 %compare, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = cmpxchg volatile i32 addrspace(3)* %ptr, i32 %compare, i32 %value seq_cst seq_cst
+  %1 = extractvalue { i32, i1 } %0, 0
+  ret i32 %1
+}
+
+define i32 @__clc_atomic_max_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile max i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_max_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile max i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_min_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile min i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_min_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile min i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_or_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile or i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_or_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile or i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_umax_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile umax i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_umax_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile umax i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_umin_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile umin i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_umin_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile umin i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_sub_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile sub i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_sub_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile sub i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_xchg_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile xchg i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_xchg_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile xchg i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_xor_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile xor i32 addrspace(1)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
+
+define i32 @__clc_atomic_xor_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+  %0 = atomicrmw volatile xor i32 addrspace(3)* %ptr, i32 %value seq_cst
+  ret i32 %0
+}
diff --git a/libclc/generic/lib/atomic/atomic_xchg.cl b/libclc/generic/lib/atomic/atomic_xchg.cl
new file mode 100644
index 000000000000..9aee5950141c
--- /dev/null
+++ b/libclc/generic/lib/atomic/atomic_xchg.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+_CLC_OVERLOAD _CLC_DEF float atomic_xchg(volatile global float *p, float val) {
+  return as_float(atomic_xchg((volatile global int *)p, as_int(val)));
+}
+
+_CLC_OVERLOAD _CLC_DEF float atomic_xchg(volatile local float *p, float val) {
+  return as_float(atomic_xchg((volatile local int *)p, as_int(val)));
+}
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl
new file mode 100644
index 000000000000..9151b0ccf8d9
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_add.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_add(global TYPE *p, TYPE val) { \
+  return atomic_add(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl
new file mode 100644
index 000000000000..76477406c7f1
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_cmpxchg.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_cmpxchg(global TYPE *p, TYPE cmp, TYPE val) { \
+  return atomic_cmpxchg(p, cmp, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
new file mode 100644
index 000000000000..a74158d45fc8
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_dec(global TYPE *p) { \
+  return atom_sub(p, 1); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
new file mode 100644
index 000000000000..1404b5aa4477
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_inc(global TYPE *p) { \
+  return atom_add(p, 1); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl
new file mode 100644
index 000000000000..7faa3cc040f0
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_sub.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_sub(global TYPE *p, TYPE val) { \
+  return atomic_sub(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl
new file mode 100644
index 000000000000..9c77db13f309
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_base_atomics/atom_xchg.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_xchg(global TYPE *p, TYPE val) { \
+  return atomic_xchg(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl
new file mode 100644
index 000000000000..e58796961b98
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_and.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_and(global TYPE *p, TYPE val) { \
+  return atomic_and(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
+\ No newline at end of file
diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl
new file mode 100644
index 000000000000..09177ed8eef4
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_max.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_max(global TYPE *p, TYPE val) { \
+  return atomic_max(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl
new file mode 100644
index 000000000000..277c41ba90dc
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_min.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_min(global TYPE *p, TYPE val) { \
+  return atomic_min(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl
new file mode 100644
index 000000000000..a936a8ea7d31
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_or.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_or(global TYPE *p, TYPE val) { \
+  return atomic_or(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl
new file mode 100644
index 000000000000..1a8e35004cd5
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_global_int32_extended_atomics/atom_xor.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_xor(global TYPE *p, TYPE val) { \
+  return atomic_xor(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl
new file mode 100644
index 000000000000..a5dea1824a16
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_add.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_add(local TYPE *p, TYPE val) { \
+  return atomic_add(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl
new file mode 100644
index 000000000000..16e957964dbb
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_cmpxchg.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_cmpxchg(local TYPE *p, TYPE cmp, TYPE val) { \
+  return atomic_cmpxchg(p, cmp, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
new file mode 100644
index 000000000000..d22c333f5d56
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_dec(local TYPE *p) { \
+  return atom_sub(p, 1); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
new file mode 100644
index 000000000000..4ba0d062997c
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_inc(local TYPE *p) { \
+  return atom_add(p, 1); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl
new file mode 100644
index 000000000000..c96696ac2084
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_sub.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_sub(local TYPE *p, TYPE val) { \
+  return atomic_sub(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl
new file mode 100644
index 000000000000..7d4bcca3fe7a
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_base_atomics/atom_xchg.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_xchg(local TYPE *p, TYPE val) { \
+  return atomic_xchg(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl
new file mode 100644
index 000000000000..180103acc01e
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_and.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_and(local TYPE *p, TYPE val) { \
+  return atomic_and(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
+\ No newline at end of file
diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl
new file mode 100644
index 000000000000..b90301ba0f76
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_max.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_max(local TYPE *p, TYPE val) { \
+  return atomic_max(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl
new file mode 100644
index 000000000000..3acedd8350fc
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_min.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_min(local TYPE *p, TYPE val) { \
+  return atomic_min(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl
new file mode 100644
index 000000000000..338ff2c01088
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_or.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_or(local TYPE *p, TYPE val) { \
+  return atomic_or(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl
new file mode 100644
index 000000000000..51ae3c0e9194
--- /dev/null
+++ b/libclc/generic/lib/cl_khr_local_int32_extended_atomics/atom_xor.cl
@@ -0,0 +1,9 @@
+#include <clc/clc.h>
+
+#define IMPL(TYPE) \
+_CLC_OVERLOAD _CLC_DEF TYPE atom_xor(local TYPE *p, TYPE val) { \
+  return atomic_xor(p, val); \
+}
+
+IMPL(int)
+IMPL(unsigned int)
diff --git a/libclc/generic/lib/clcmacro.h b/libclc/generic/lib/clcmacro.h
new file mode 100644
index 000000000000..ef102ea54e9f
--- /dev/null
+++ b/libclc/generic/lib/clcmacro.h
@@ -0,0 +1,76 @@
+#define _CLC_UNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE) \
+  DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x) { \
+    return (RET_TYPE##2)(FUNCTION(x.x), FUNCTION(x.y)); \
+  } \
+\
+  DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x) { \
+    return (RET_TYPE##3)(FUNCTION(x.x), FUNCTION(x.y), FUNCTION(x.z)); \
+  } \
+\
+  DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x) { \
+    return (RET_TYPE##4)(FUNCTION(x.lo), FUNCTION(x.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x) { \
+    return (RET_TYPE##8)(FUNCTION(x.lo), FUNCTION(x.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x) { \
+    return (RET_TYPE##16)(FUNCTION(x.lo), FUNCTION(x.hi)); \
+  }
+
+#define _CLC_BINARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
+  DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y) { \
+    return (RET_TYPE##2)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y)); \
+  } \
+\
+  DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y) { \
+    return (RET_TYPE##3)(FUNCTION(x.x, y.x), FUNCTION(x.y, y.y), \
+                         FUNCTION(x.z, y.z)); \
+  } \
+\
+  DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y) { \
+    return (RET_TYPE##4)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y) { \
+    return (RET_TYPE##8)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y) { \
+    return (RET_TYPE##16)(FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)); \
+  }
+
+#define _CLC_TERNARY_VECTORIZE(DECLSPEC, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE, ARG3_TYPE) \
+  DECLSPEC RET_TYPE##2 FUNCTION(ARG1_TYPE##2 x, ARG2_TYPE##2 y, ARG3_TYPE##2 z) { \
+    return (RET_TYPE##2)(FUNCTION(x.x, y.x, z.x), FUNCTION(x.y, y.y, z.y)); \
+  } \
+\
+  DECLSPEC RET_TYPE##3 FUNCTION(ARG1_TYPE##3 x, ARG2_TYPE##3 y, ARG3_TYPE##3 z) { \
+    return (RET_TYPE##3)(FUNCTION(x.x, y.x, z.x), FUNCTION(x.y, y.y, z.y), \
+                         FUNCTION(x.z, y.z, z.z)); \
+  } \
+\
+  DECLSPEC RET_TYPE##4 FUNCTION(ARG1_TYPE##4 x, ARG2_TYPE##4 y, ARG3_TYPE##4 z) { \
+    return (RET_TYPE##4)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##8 FUNCTION(ARG1_TYPE##8 x, ARG2_TYPE##8 y, ARG3_TYPE##8 z) { \
+    return (RET_TYPE##8)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \
+  } \
+\
+  DECLSPEC RET_TYPE##16 FUNCTION(ARG1_TYPE##16 x, ARG2_TYPE##16 y, ARG3_TYPE##16 z) { \
+    return (RET_TYPE##16)(FUNCTION(x.lo, y.lo, z.lo), FUNCTION(x.hi, y.hi, z.hi)); \
+  }
+
+#define _CLC_DEFINE_BINARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \
+  return BUILTIN(x, y); \
+} \
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE)
+
+#define _CLC_DEFINE_UNARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x) { \
+  return BUILTIN(x); \
+} \
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE)
diff --git a/libclc/generic/lib/common/sign.cl b/libclc/generic/lib/common/sign.cl
new file mode 100644
index 000000000000..25832e0b4f8b
--- /dev/null
+++ b/libclc/generic/lib/common/sign.cl
@@ -0,0 +1,28 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+#define SIGN(TYPE, F) \
+_CLC_DEF _CLC_OVERLOAD TYPE sign(TYPE x) { \
+  if (isnan(x)) { \
+    return 0.0F;   \
+  }               \
+  if (x > 0.0F) { \
+    return 1.0F;  \
+  }               \
+  if (x < 0.0F) { \
+    return -1.0F; \
+  }               \
+  return x; /* -0.0 or +0.0 */  \
+}
+
+SIGN(float, f)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, sign, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+SIGN(double, )
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, sign, double)
+
+#endif
diff --git a/libclc/generic/lib/gen_convert.py b/libclc/generic/lib/gen_convert.py
new file mode 100644
index 000000000000..f91a89a3c321
--- /dev/null
+++ b/libclc/generic/lib/gen_convert.py
@@ -0,0 +1,388 @@
+#!/usr/bin/env python3
+
+# OpenCL built-in library: type conversion functions
+#
+# Copyright (c) 2013 Victor Oliveira <victormatheus@gmail.com>
+# Copyright (c) 2013 Jesse Towner <jessetowner@lavabit.com>
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+# This script generates the file convert_type.cl, which contains all of the
+# OpenCL functions in the form:
+#
+# convert_<destTypen><_sat><_roundingMode>(<sourceTypen>)
+
+types = ['char', 'uchar', 'short', 'ushort', 'int', 'uint', 'long', 'ulong', 'float', 'double']
+int_types = ['char', 'uchar', 'short', 'ushort', 'int', 'uint', 'long', 'ulong']
+unsigned_types = ['uchar', 'ushort', 'uint', 'ulong']
+float_types = ['float', 'double']
+int64_types = ['long', 'ulong']
+float64_types = ['double']
+vector_sizes = ['', '2', '3', '4', '8', '16']
+half_sizes = [('2',''), ('4','2'), ('8','4'), ('16','8')]
+
+saturation = ['','_sat']
+rounding_modes = ['_rtz','_rte','_rtp','_rtn']
+float_prefix = {'float':'FLT_', 'double':'DBL_'}
+float_suffix = {'float':'f', 'double':''}
+
+bool_type = {'char'  : 'char',
+             'uchar' : 'char',
+             'short' : 'short',
+             'ushort': 'short',
+             'int'   : 'int',
+             'uint'  : 'int',
+             'long'  : 'long',
+             'ulong' : 'long',
+             'float'  : 'int',
+             'double' : 'long'}
+
+unsigned_type = {'char'  : 'uchar',
+                 'uchar' : 'uchar',
+                 'short' : 'ushort',
+                 'ushort': 'ushort',
+                 'int'   : 'uint',
+                 'uint'  : 'uint',
+                 'long'  : 'ulong',
+                 'ulong' : 'ulong'}
+
+sizeof_type = {'char'  : 1, 'uchar'  : 1,
+               'short' : 2, 'ushort' : 2,
+               'int'   : 4, 'uint'   : 4,
+               'long'  : 8, 'ulong'  : 8,
+               'float' : 4, 'double' : 8}
+
+limit_max = {'char'  : 'CHAR_MAX',
+             'uchar' : 'UCHAR_MAX',
+             'short' : 'SHRT_MAX',
+             'ushort': 'USHRT_MAX',
+             'int'   : 'INT_MAX',
+             'uint'  : 'UINT_MAX',
+             'long'  : 'LONG_MAX',
+             'ulong' : 'ULONG_MAX'}
+
+limit_min = {'char'  : 'CHAR_MIN',
+             'uchar' : '0',
+             'short' : 'SHRT_MIN',
+             'ushort': '0',
+             'int'   : 'INT_MIN',
+             'uint'  : '0',
+             'long'  : 'LONG_MIN',
+             'ulong' : '0'}
+
+def conditional_guard(src, dst):
+  int64_count = 0
+  float64_count = 0
+  if src in int64_types:
+    int64_count = int64_count +1
+  elif src in float64_types:
+    float64_count = float64_count + 1
+  if dst in int64_types:
+    int64_count = int64_count +1
+  elif dst in float64_types:
+    float64_count = float64_count + 1
+  if float64_count > 0 and int64_count > 0:
+    print("#if defined(cl_khr_fp64) && defined(cles_khr_int64)")
+    return True
+  elif float64_count > 0:
+    print("#ifdef cl_khr_fp64")
+    return True
+  elif int64_count > 0:
+    print("#ifdef cles_khr_int64")
+    return True
+  return False
+
+
+print("""/* !!!! AUTOGENERATED FILE generated by convert_type.py !!!!!
+
+   DON'T CHANGE THIS FILE. MAKE YOUR CHANGES TO convert_type.py AND RUN:
+   $ ./generate-conversion-type-cl.sh
+
+   OpenCL type conversion functions
+
+   Copyright (c) 2013 Victor Oliveira <victormatheus@gmail.com>
+   Copyright (c) 2013 Jesse Towner <jessetowner@lavabit.com>
+
+   Permission is hereby granted, free of charge, to any person obtaining a copy
+   of this software and associated documentation files (the "Software"), to deal
+   in the Software without restriction, including without limitation the rights
+   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+   copies of the Software, and to permit persons to whom the Software is
+   furnished to do so, subject to the following conditions:
+
+   The above copyright notice and this permission notice shall be included in
+   all copies or substantial portions of the Software.
+
+   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+   THE SOFTWARE.
+*/
+
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+""")
+
+#
+# Default Conversions
+#
+# All conversions are in accordance with the OpenCL specification,
+# which cites the C99 conversion rules.
+#
+# Casting from floating point to integer results in conversions
+# with truncation, so it should be suitable for the default convert
+# functions.
+#
+# Conversions from integer to floating-point, and floating-point to
+# floating-point through casting is done with the default rounding
+# mode. While C99 allows dynamically changing the rounding mode
+# during runtime, it is not a supported feature in OpenCL according
+# to Section 7.1 - Rounding Modes in the OpenCL 1.2 specification.
+#
+# Therefore, we can assume for optimization purposes that the
+# rounding mode is fixed to round-to-nearest-even. Platform target
+# authors should ensure that the rounding-control registers remain
+# in this state, and that this invariant holds.
+#
+# Also note, even though the OpenCL specification isn't entirely
+# clear on this matter, we implement all rounding mode combinations
+# even for integer-to-integer conversions. When such a conversion
+# is used, the rounding mode is ignored.
+#
+
+def generate_default_conversion(src, dst, mode):
+  close_conditional = conditional_guard(src, dst)
+
+  # scalar conversions
+  print("""_CLC_DEF _CLC_OVERLOAD
+{DST} convert_{DST}{M}({SRC} x)
+{{
+  return ({DST})x;
+}}
+""".format(SRC=src, DST=dst, M=mode))
+
+  # vector conversions, done through decomposition to components
+  for size, half_size in half_sizes:
+    print("""_CLC_DEF _CLC_OVERLOAD
+{DST}{N} convert_{DST}{N}{M}({SRC}{N} x)
+{{
+  return ({DST}{N})(convert_{DST}{H}(x.lo), convert_{DST}{H}(x.hi));
+}}
+""".format(SRC=src, DST=dst, N=size, H=half_size, M=mode))
+
+  # 3-component vector conversions
+  print("""_CLC_DEF _CLC_OVERLOAD
+{DST}3 convert_{DST}3{M}({SRC}3 x)
+{{
+  return ({DST}3)(convert_{DST}2(x.s01), convert_{DST}(x.s2));
+}}""".format(SRC=src, DST=dst, M=mode))
+
+  if close_conditional:
+    print("#endif")
+
+
+for src in types:
+  for dst in types:
+    generate_default_conversion(src, dst, '')
+
+for src in int_types:
+  for dst in int_types:
+    for mode in rounding_modes:
+      generate_default_conversion(src, dst, mode)
+
+#
+# Saturated Conversions To Integers
+#
+# These functions are dependent on the unsaturated conversion functions
+# generated above, and use clamp, max, min, and select to eliminate
+# branching and vectorize the conversions.
+#
+# Again, as above, we allow all rounding modes for integer-to-integer
+# conversions with saturation.
+#
+
+def generate_saturated_conversion(src, dst, size):
+  # Header
+  close_conditional = conditional_guard(src, dst)
+  print("""_CLC_DEF _CLC_OVERLOAD
+{DST}{N} convert_{DST}{N}_sat({SRC}{N} x)
+{{""".format(DST=dst, SRC=src, N=size))
+
+  # FIXME: This is a work around for lack of select function with
+  # signed third argument when the first two arguments are unsigned types.
+  # We cast to the signed type for sign-extension, then do a bitcast to
+  # the unsigned type.
+  if dst in unsigned_types:
+    bool_prefix = "as_{DST}{N}(convert_{BOOL}{N}".format(DST=dst, BOOL=bool_type[dst], N=size);
+    bool_suffix = ")"
+  else:
+    bool_prefix = "convert_{BOOL}{N}".format(BOOL=bool_type[dst], N=size);
+    bool_suffix = ""
+
+  # Body
+  if src == dst:
+
+    # Conversion between same types
+    print("  return x;")
+
+  elif src in float_types:
+
+    # Conversion from float to int
+    print("""  {DST}{N} y = convert_{DST}{N}(x);
+  y = select(y, ({DST}{N}){DST_MIN}, {BP}(x < ({SRC}{N}){DST_MIN}){BS});
+  y = select(y, ({DST}{N}){DST_MAX}, {BP}(x > ({SRC}{N}){DST_MAX}){BS});
+  return y;""".format(SRC=src, DST=dst, N=size,
+      DST_MIN=limit_min[dst], DST_MAX=limit_max[dst],
+      BP=bool_prefix, BS=bool_suffix))
+
+  else:
+
+    # Integer to integer convesion with sizeof(src) == sizeof(dst)
+    if sizeof_type[src] == sizeof_type[dst]:
+      if src in unsigned_types:
+        print("  x = min(x, ({SRC}){DST_MAX});".format(SRC=src, DST_MAX=limit_max[dst]))
+      else:
+        print("  x = max(x, ({SRC})0);".format(SRC=src))
+
+    # Integer to integer conversion where sizeof(src) > sizeof(dst)
+    elif sizeof_type[src] > sizeof_type[dst]:
+      if src in unsigned_types:
+        print("  x = min(x, ({SRC}){DST_MAX});".format(SRC=src, DST_MAX=limit_max[dst]))
+      else:
+        print("  x = clamp(x, ({SRC}){DST_MIN}, ({SRC}){DST_MAX});"
+          .format(SRC=src, DST_MIN=limit_min[dst], DST_MAX=limit_max[dst]))
+
+    # Integer to integer conversion where sizeof(src) < sizeof(dst)
+    elif src not in unsigned_types and dst in unsigned_types:
+        print("  x = max(x, ({SRC})0);".format(SRC=src))
+
+    print("  return convert_{DST}{N}(x);".format(DST=dst, N=size))
+
+  # Footer
+  print("}")
+  if close_conditional:
+    print("#endif")
+
+
+for src in types:
+  for dst in int_types:
+    for size in vector_sizes:
+      generate_saturated_conversion(src, dst, size)
+
+
+def generate_saturated_conversion_with_rounding(src, dst, size, mode):
+  # Header
+  close_conditional = conditional_guard(src, dst)
+
+  # Body
+  print("""_CLC_DEF _CLC_OVERLOAD
+{DST}{N} convert_{DST}{N}_sat{M}({SRC}{N} x)
+{{
+  return convert_{DST}{N}_sat(x);
+}}
+""".format(DST=dst, SRC=src, N=size, M=mode))
+
+  # Footer
+  if close_conditional:
+    print("#endif")
+
+
+for src in int_types:
+  for dst in int_types:
+    for size in vector_sizes:
+      for mode in rounding_modes:
+        generate_saturated_conversion_with_rounding(src, dst, size, mode)
+
+#
+# Conversions To/From Floating-Point With Rounding
+#
+# Note that we assume as above that casts from floating-point to
+# integer are done with truncation, and that the default rounding
+# mode is fixed to round-to-nearest-even, as per C99 and OpenCL
+# rounding rules.
+#
+# These functions rely on the use of abs, ceil, fabs, floor,
+# nextafter, sign, rint and the above generated conversion functions.
+#
+# Only conversions to integers can have saturation.
+#
+
+def generate_float_conversion(src, dst, size, mode, sat):
+  # Header
+  close_conditional = conditional_guard(src, dst)
+  print("""_CLC_DEF _CLC_OVERLOAD
+{DST}{N} convert_{DST}{N}{S}{M}({SRC}{N} x)
+{{""".format(SRC=src, DST=dst, N=size, M=mode, S=sat))
+
+  # Perform conversion
+  if dst in int_types:
+    if mode == '_rte':
+      print("  x = rint(x);");
+    elif mode == '_rtp':
+      print("  x = ceil(x);");
+    elif mode == '_rtn':
+      print("  x = floor(x);");
+    print("  return convert_{DST}{N}{S}(x);".format(DST=dst, N=size, S=sat))
+  elif mode == '_rte':
+    print("  return convert_{DST}{N}(x);".format(DST=dst, N=size))
+  else:
+    print("  {DST}{N} r = convert_{DST}{N}(x);".format(DST=dst, N=size))
+    print("  {SRC}{N} y = convert_{SRC}{N}(y);".format(SRC=src, N=size))
+    if mode == '_rtz':
+      if src in int_types:
+        print("  {USRC}{N} abs_x = abs(x);".format(USRC=unsigned_type[src], N=size))
+        print("  {USRC}{N} abs_y = abs(y);".format(USRC=unsigned_type[src], N=size))
+      else:
+        print("  {SRC}{N} abs_x = fabs(x);".format(SRC=src, N=size))
+        print("  {SRC}{N} abs_y = fabs(y);".format(SRC=src, N=size))
+      print("  return select(r, nextafter(r, sign(r) * ({DST}{N})-INFINITY), convert_{BOOL}{N}(abs_y > abs_x));"
+        .format(DST=dst, N=size, BOOL=bool_type[dst]))
+    if mode == '_rtp':
+      print("  return select(r, nextafter(r, ({DST}{N})INFINITY), convert_{BOOL}{N}(y < x));"
+        .format(DST=dst, N=size, BOOL=bool_type[dst]))
+    if mode == '_rtn':
+      print("  return select(r, nextafter(r, ({DST}{N})-INFINITY), convert_{BOOL}{N}(y > x));"
+        .format(DST=dst, N=size, BOOL=bool_type[dst]))
+
+  # Footer
+  print("}")
+  if close_conditional:
+    print("#endif")
+
+
+for src in float_types:
+  for dst in int_types:
+    for size in vector_sizes:
+      for mode in rounding_modes:
+        for sat in saturation:
+          generate_float_conversion(src, dst, size, mode, sat)
+
+
+for src in types:
+  for dst in float_types:
+    for size in vector_sizes:
+      for mode in rounding_modes:
+        generate_float_conversion(src, dst, size, mode, '')
diff --git a/libclc/generic/lib/geometric/cross.cl b/libclc/generic/lib/geometric/cross.cl
new file mode 100644
index 000000000000..3b4ca6cafae9
--- /dev/null
+++ b/libclc/generic/lib/geometric/cross.cl
@@ -0,0 +1,25 @@
+#include <clc/clc.h>
+
+_CLC_OVERLOAD _CLC_DEF float3 cross(float3 p0, float3 p1) {
+  return (float3)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z,
+                  p0.x*p1.y - p0.y*p1.x);
+}
+
+_CLC_OVERLOAD _CLC_DEF float4 cross(float4 p0, float4 p1) {
+  return (float4)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z,
+                  p0.x*p1.y - p0.y*p1.x, 0.f);
+}
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_OVERLOAD _CLC_DEF double3 cross(double3 p0, double3 p1) {
+  return (double3)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z,
+                   p0.x*p1.y - p0.y*p1.x);
+}
+
+_CLC_OVERLOAD _CLC_DEF double4 cross(double4 p0, double4 p1) {
+  return (double4)(p0.y*p1.z - p0.z*p1.y, p0.z*p1.x - p0.x*p1.z,
+                   p0.x*p1.y - p0.y*p1.x, 0.f);
+}
+#endif
diff --git a/libclc/generic/lib/geometric/dot.cl b/libclc/generic/lib/geometric/dot.cl
new file mode 100644
index 000000000000..0d6fe6c9a4e8
--- /dev/null
+++ b/libclc/generic/lib/geometric/dot.cl
@@ -0,0 +1,39 @@
+#include <clc/clc.h>
+
+_CLC_OVERLOAD _CLC_DEF float dot(float p0, float p1) {
+  return p0*p1;
+}
+
+_CLC_OVERLOAD _CLC_DEF float dot(float2 p0, float2 p1) {
+  return p0.x*p1.x + p0.y*p1.y;
+}
+
+_CLC_OVERLOAD _CLC_DEF float dot(float3 p0, float3 p1) {
+  return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z;
+}
+
+_CLC_OVERLOAD _CLC_DEF float dot(float4 p0, float4 p1) {
+  return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z + p0.w*p1.w;
+}
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_OVERLOAD _CLC_DEF double dot(double p0, double p1) {
+  return p0*p1;
+}
+
+_CLC_OVERLOAD _CLC_DEF double dot(double2 p0, double2 p1) {
+  return p0.x*p1.x + p0.y*p1.y;
+}
+
+_CLC_OVERLOAD _CLC_DEF double dot(double3 p0, double3 p1) {
+  return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z;
+}
+
+_CLC_OVERLOAD _CLC_DEF double dot(double4 p0, double4 p1) {
+  return p0.x*p1.x + p0.y*p1.y + p0.z*p1.z + p0.w*p1.w;
+}
+
+#endif
diff --git a/libclc/generic/lib/geometric/length.cl b/libclc/generic/lib/geometric/length.cl
new file mode 100644
index 000000000000..ef087c75f9f1
--- /dev/null
+++ b/libclc/generic/lib/geometric/length.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <length.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/lib/geometric/length.inc b/libclc/generic/lib/geometric/length.inc
new file mode 100644
index 000000000000..5faaaffbd6a8
--- /dev/null
+++ b/libclc/generic/lib/geometric/length.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_FLOAT length(__CLC_FLOATN p) {
+  return native_sqrt(dot(p, p));
+}
diff --git a/libclc/generic/lib/geometric/normalize.cl b/libclc/generic/lib/geometric/normalize.cl
new file mode 100644
index 000000000000..b06b2fe3a4c4
--- /dev/null
+++ b/libclc/generic/lib/geometric/normalize.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <normalize.inc>
+#include <clc/geometric/floatn.inc>
diff --git a/libclc/generic/lib/geometric/normalize.inc b/libclc/generic/lib/geometric/normalize.inc
new file mode 100644
index 000000000000..423ff79fc4e2
--- /dev/null
+++ b/libclc/generic/lib/geometric/normalize.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_FLOATN normalize(__CLC_FLOATN p) {
+  return p/length(p);
+}
diff --git a/libclc/generic/lib/integer/abs.cl b/libclc/generic/lib/integer/abs.cl
new file mode 100644
index 000000000000..faff8d05fefc
--- /dev/null
+++ b/libclc/generic/lib/integer/abs.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <abs.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/lib/integer/abs.inc b/libclc/generic/lib/integer/abs.inc
new file mode 100644
index 000000000000..cfe7bfecd294
--- /dev/null
+++ b/libclc/generic/lib/integer/abs.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_U_GENTYPE abs(__CLC_GENTYPE x) {
+  return __builtin_astype((__CLC_GENTYPE)(x > (__CLC_GENTYPE)(0) ? x : -x), __CLC_U_GENTYPE);
+}
diff --git a/libclc/generic/lib/integer/abs_diff.cl b/libclc/generic/lib/integer/abs_diff.cl
new file mode 100644
index 000000000000..3d751057819e
--- /dev/null
+++ b/libclc/generic/lib/integer/abs_diff.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <abs_diff.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/lib/integer/abs_diff.inc b/libclc/generic/lib/integer/abs_diff.inc
new file mode 100644
index 000000000000..f39c3ff4d3e8
--- /dev/null
+++ b/libclc/generic/lib/integer/abs_diff.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_U_GENTYPE abs_diff(__CLC_GENTYPE x, __CLC_GENTYPE y) {
+  return __builtin_astype((__CLC_GENTYPE)(x > y ? x-y : y-x), __CLC_U_GENTYPE);
+}
diff --git a/libclc/generic/lib/integer/add_sat.cl b/libclc/generic/lib/integer/add_sat.cl
new file mode 100644
index 000000000000..d4df66db3ede
--- /dev/null
+++ b/libclc/generic/lib/integer/add_sat.cl
@@ -0,0 +1,53 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+// From add_sat.ll
+_CLC_DECL char   __clc_add_sat_s8(char, char);
+_CLC_DECL uchar  __clc_add_sat_u8(uchar, uchar);
+_CLC_DECL short  __clc_add_sat_s16(short, short);
+_CLC_DECL ushort __clc_add_sat_u16(ushort, ushort);
+_CLC_DECL int    __clc_add_sat_s32(int, int);
+_CLC_DECL uint   __clc_add_sat_u32(uint, uint);
+_CLC_DECL long   __clc_add_sat_s64(long, long);
+_CLC_DECL ulong  __clc_add_sat_u64(ulong, ulong);
+
+_CLC_OVERLOAD _CLC_DEF char add_sat(char x, char y) {
+  return __clc_add_sat_s8(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF uchar add_sat(uchar x, uchar y) {
+  return __clc_add_sat_u8(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF short add_sat(short x, short y) {
+  return __clc_add_sat_s16(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF ushort add_sat(ushort x, ushort y) {
+  return __clc_add_sat_u16(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF int add_sat(int x, int y) {
+  return __clc_add_sat_s32(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF uint add_sat(uint x, uint y) {
+  return __clc_add_sat_u32(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF long add_sat(long x, long y) {
+  return __clc_add_sat_s64(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF ulong add_sat(ulong x, ulong y) {
+  return __clc_add_sat_u64(x, y);
+}
+
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, add_sat, char, char)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, add_sat, uchar, uchar)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, add_sat, short, short)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, add_sat, ushort, ushort)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, add_sat, int, int)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, add_sat, uint, uint)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, add_sat, long, long)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, add_sat, ulong, ulong)
diff --git a/libclc/generic/lib/integer/add_sat_if.ll b/libclc/generic/lib/integer/add_sat_if.ll
new file mode 100644
index 000000000000..bcbe4c0dd348
--- /dev/null
+++ b/libclc/generic/lib/integer/add_sat_if.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+
+define i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+
+define i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+
+define i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+
+define i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+
+define i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+
+define i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
+
+define i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
+  ret i64 %call
+}
+
+declare i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
+
+define i64 @__clc_add_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
+  ret i64 %call
+}
diff --git a/libclc/generic/lib/integer/add_sat_impl.ll b/libclc/generic/lib/integer/add_sat_impl.ll
new file mode 100644
index 000000000000..c150ecb56b8b
--- /dev/null
+++ b/libclc/generic/lib/integer/add_sat_impl.ll
@@ -0,0 +1,83 @@
+declare {i8, i1} @llvm.sadd.with.overflow.i8(i8, i8)
+declare {i8, i1} @llvm.uadd.with.overflow.i8(i8, i8)
+
+define i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call {i8, i1} @llvm.sadd.with.overflow.i8(i8 %x, i8 %y)
+  %res = extractvalue {i8, i1} %call, 0
+  %over = extractvalue {i8, i1} %call, 1
+  %x.msb = ashr i8 %x, 7
+  %x.limit = xor i8 %x.msb, 127
+  %sat = select i1 %over, i8 %x.limit, i8 %res
+  ret i8 %sat
+}
+
+define i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call {i8, i1} @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
+  %res = extractvalue {i8, i1} %call, 0
+  %over = extractvalue {i8, i1} %call, 1
+  %sat = select i1 %over, i8 -1, i8 %res
+  ret i8 %sat
+}
+
+declare {i16, i1} @llvm.sadd.with.overflow.i16(i16, i16)
+declare {i16, i1} @llvm.uadd.with.overflow.i16(i16, i16)
+
+define i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call {i16, i1} @llvm.sadd.with.overflow.i16(i16 %x, i16 %y)
+  %res = extractvalue {i16, i1} %call, 0
+  %over = extractvalue {i16, i1} %call, 1
+  %x.msb = ashr i16 %x, 15
+  %x.limit = xor i16 %x.msb, 32767
+  %sat = select i1 %over, i16 %x.limit, i16 %res
+  ret i16 %sat
+}
+
+define i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call {i16, i1} @llvm.uadd.with.overflow.i16(i16 %x, i16 %y)
+  %res = extractvalue {i16, i1} %call, 0
+  %over = extractvalue {i16, i1} %call, 1
+  %sat = select i1 %over, i16 -1, i16 %res
+  ret i16 %sat
+}
+
+declare {i32, i1} @llvm.sadd.with.overflow.i32(i32, i32)
+declare {i32, i1} @llvm.uadd.with.overflow.i32(i32, i32)
+
+define i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %x, i32 %y)
+  %res = extractvalue {i32, i1} %call, 0
+  %over = extractvalue {i32, i1} %call, 1
+  %x.msb = ashr i32 %x, 31
+  %x.limit = xor i32 %x.msb, 2147483647
+  %sat = select i1 %over, i32 %x.limit, i32 %res
+  ret i32 %sat
+}
+
+define i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %x, i32 %y)
+  %res = extractvalue {i32, i1} %call, 0
+  %over = extractvalue {i32, i1} %call, 1
+  %sat = select i1 %over, i32 -1, i32 %res
+  ret i32 %sat
+}
+
+declare {i64, i1} @llvm.sadd.with.overflow.i64(i64, i64)
+declare {i64, i1} @llvm.uadd.with.overflow.i64(i64, i64)
+
+define i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call {i64, i1} @llvm.sadd.with.overflow.i64(i64 %x, i64 %y)
+  %res = extractvalue {i64, i1} %call, 0
+  %over = extractvalue {i64, i1} %call, 1
+  %x.msb = ashr i64 %x, 63
+  %x.limit = xor i64 %x.msb, 9223372036854775807
+  %sat = select i1 %over, i64 %x.limit, i64 %res
+  ret i64 %sat
+}
+
+define i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call {i64, i1} @llvm.uadd.with.overflow.i64(i64 %x, i64 %y)
+  %res = extractvalue {i64, i1} %call, 0
+  %over = extractvalue {i64, i1} %call, 1
+  %sat = select i1 %over, i64 -1, i64 %res
+  ret i64 %sat
+}
diff --git a/libclc/generic/lib/integer/clz.cl b/libclc/generic/lib/integer/clz.cl
new file mode 100644
index 000000000000..17e3fe014741
--- /dev/null
+++ b/libclc/generic/lib/integer/clz.cl
@@ -0,0 +1,53 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+// From clz.ll
+_CLC_DECL char   __clc_clz_s8(char);
+_CLC_DECL uchar  __clc_clz_u8(uchar);
+_CLC_DECL short  __clc_clz_s16(short);
+_CLC_DECL ushort __clc_clz_u16(ushort);
+_CLC_DECL int    __clc_clz_s32(int);
+_CLC_DECL uint   __clc_clz_u32(uint);
+_CLC_DECL long   __clc_clz_s64(long);
+_CLC_DECL ulong  __clc_clz_u64(ulong);
+
+_CLC_OVERLOAD _CLC_DEF char clz(char x) {
+  return __clc_clz_s8(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF uchar clz(uchar x) {
+  return __clc_clz_u8(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF short clz(short x) {
+  return __clc_clz_s16(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF ushort clz(ushort x) {
+  return __clc_clz_u16(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF int clz(int x) {
+  return __clc_clz_s32(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF uint clz(uint x) {
+  return __clc_clz_u32(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF long clz(long x) {
+  return __clc_clz_s64(x);
+}
+
+_CLC_OVERLOAD _CLC_DEF ulong clz(ulong x) {
+  return __clc_clz_u64(x);
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, clz, char)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, clz, uchar)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, clz, short)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, clz, ushort)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, clz, int)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, clz, uint)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, clz, long)
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, clz, ulong)
diff --git a/libclc/generic/lib/integer/clz_if.ll b/libclc/generic/lib/integer/clz_if.ll
new file mode 100644
index 000000000000..23dfc74a8a82
--- /dev/null
+++ b/libclc/generic/lib/integer/clz_if.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_clz_impl_s8(i8 %x)
+
+define i8 @__clc_clz_s8(i8 %x) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_clz_impl_s8(i8 %x)
+  ret i8 %call
+}
+
+declare i8 @__clc_clz_impl_u8(i8 %x)
+
+define i8 @__clc_clz_u8(i8 %x) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_clz_impl_u8(i8 %x)
+  ret i8 %call
+}
+
+declare i16 @__clc_clz_impl_s16(i16 %x)
+
+define i16 @__clc_clz_s16(i16 %x) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_clz_impl_s16(i16 %x)
+  ret i16 %call
+}
+
+declare i16 @__clc_clz_impl_u16(i16 %x)
+
+define i16 @__clc_clz_u16(i16 %x) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_clz_impl_u16(i16 %x)
+  ret i16 %call
+}
+
+declare i32 @__clc_clz_impl_s32(i32 %x)
+
+define i32 @__clc_clz_s32(i32 %x) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_clz_impl_s32(i32 %x)
+  ret i32 %call
+}
+
+declare i32 @__clc_clz_impl_u32(i32 %x)
+
+define i32 @__clc_clz_u32(i32 %x) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_clz_impl_u32(i32 %x)
+  ret i32 %call
+}
+
+declare i64 @__clc_clz_impl_s64(i64 %x)
+
+define i64 @__clc_clz_s64(i64 %x) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_clz_impl_s64(i64 %x)
+  ret i64 %call
+}
+
+declare i64 @__clc_clz_impl_u64(i64 %x)
+
+define i64 @__clc_clz_u64(i64 %x) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_clz_impl_u64(i64 %x)
+  ret i64 %call
+}
diff --git a/libclc/generic/lib/integer/clz_impl.ll b/libclc/generic/lib/integer/clz_impl.ll
new file mode 100644
index 000000000000..b5c3d98ae418
--- /dev/null
+++ b/libclc/generic/lib/integer/clz_impl.ll
@@ -0,0 +1,44 @@
+declare i8 @llvm.ctlz.i8(i8, i1)
+declare i16 @llvm.ctlz.i16(i16, i1)
+declare i32 @llvm.ctlz.i32(i32, i1)
+declare i64 @llvm.ctlz.i64(i64, i1)
+
+define i8 @__clc_clz_impl_s8(i8 %x) nounwind readnone alwaysinline {
+  %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0)
+  ret i8 %call
+}
+
+define i8 @__clc_clz_impl_u8(i8 %x) nounwind readnone alwaysinline {
+  %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0)
+  ret i8 %call
+}
+
+define i16 @__clc_clz_impl_s16(i16 %x) nounwind readnone alwaysinline {
+  %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0)
+  ret i16 %call
+}
+
+define i16 @__clc_clz_impl_u16(i16 %x) nounwind readnone alwaysinline {
+  %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0)
+  ret i16 %call
+}
+
+define i32 @__clc_clz_impl_s32(i32 %x) nounwind readnone alwaysinline {
+  %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0)
+  ret i32 %call
+}
+
+define i32 @__clc_clz_impl_u32(i32 %x) nounwind readnone alwaysinline {
+  %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0)
+  ret i32 %call
+}
+
+define i64 @__clc_clz_impl_s64(i64 %x) nounwind readnone alwaysinline {
+  %call = call i64 @llvm.ctlz.i64(i64 %x, i1 0)
+  ret i64 %call
+}
+
+define i64 @__clc_clz_impl_u64(i64 %x) nounwind readnone alwaysinline {
+  %call = call i64 @llvm.ctlz.i64(i64 %x, i1 0)
+  ret i64 %call
+}
diff --git a/libclc/generic/lib/integer/hadd.cl b/libclc/generic/lib/integer/hadd.cl
new file mode 100644
index 000000000000..749026e5a8ad
--- /dev/null
+++ b/libclc/generic/lib/integer/hadd.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <hadd.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/lib/integer/hadd.inc b/libclc/generic/lib/integer/hadd.inc
new file mode 100644
index 000000000000..ea59d9bd7db5
--- /dev/null
+++ b/libclc/generic/lib/integer/hadd.inc
@@ -0,0 +1,6 @@
+//hadd = (x+y)>>1
+//This can be simplified to x>>1 + y>>1 + (1 if both x and y have the 1s bit set)
+//This saves us having to do any checks for overflow in the addition sum
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE hadd(__CLC_GENTYPE x, __CLC_GENTYPE y) {
+    return (x>>(__CLC_GENTYPE)1)+(y>>(__CLC_GENTYPE)1)+(x&y&(__CLC_GENTYPE)1);
+}
diff --git a/libclc/generic/lib/integer/mad24.cl b/libclc/generic/lib/integer/mad24.cl
new file mode 100644
index 000000000000..e29e99f28b56
--- /dev/null
+++ b/libclc/generic/lib/integer/mad24.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <mad24.inc>
+#include <clc/integer/integer-gentype.inc>
diff --git a/libclc/generic/lib/integer/mad24.inc b/libclc/generic/lib/integer/mad24.inc
new file mode 100644
index 000000000000..902b0aafe4c8
--- /dev/null
+++ b/libclc/generic/lib/integer/mad24.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mad24(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z){
+  return mul24(x, y) + z;
+}
diff --git a/libclc/generic/lib/integer/mad_sat.cl b/libclc/generic/lib/integer/mad_sat.cl
new file mode 100644
index 000000000000..1708b29efffc
--- /dev/null
+++ b/libclc/generic/lib/integer/mad_sat.cl
@@ -0,0 +1,72 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+_CLC_OVERLOAD _CLC_DEF char mad_sat(char x, char y, char z) {
+  return clamp((short)mad24((short)x, (short)y, (short)z), (short)CHAR_MIN, (short) CHAR_MAX);
+}
+
+_CLC_OVERLOAD _CLC_DEF uchar mad_sat(uchar x, uchar y, uchar z) {
+  return clamp((ushort)mad24((ushort)x, (ushort)y, (ushort)z), (ushort)0, (ushort) UCHAR_MAX);
+}
+
+_CLC_OVERLOAD _CLC_DEF short mad_sat(short x, short y, short z) {
+  return clamp((int)mad24((int)x, (int)y, (int)z), (int)SHRT_MIN, (int) SHRT_MAX);
+}
+
+_CLC_OVERLOAD _CLC_DEF ushort mad_sat(ushort x, ushort y, ushort z) {
+  return clamp((uint)mad24((uint)x, (uint)y, (uint)z), (uint)0, (uint) USHRT_MAX);
+}
+
+_CLC_OVERLOAD _CLC_DEF int mad_sat(int x, int y, int z) {
+  int mhi = mul_hi(x, y);
+  uint mlo = x * y;
+  long m = upsample(mhi, mlo);
+  m += z;
+  if (m > INT_MAX)
+    return INT_MAX;
+  if (m < INT_MIN)
+    return INT_MIN;
+  return m;
+}
+
+_CLC_OVERLOAD _CLC_DEF uint mad_sat(uint x, uint y, uint z) {
+  if (mul_hi(x, y) != 0)
+    return UINT_MAX;
+  return add_sat(x * y, z);
+}
+
+_CLC_OVERLOAD _CLC_DEF long mad_sat(long x, long y, long z) {
+  long hi = mul_hi(x, y);
+  ulong ulo = x * y;
+  long  slo = x * y;
+  /* Big overflow of more than 2 bits, add can't fix this */
+  if (((x < 0) == (y < 0)) && hi != 0)
+    return LONG_MAX;
+  /* Low overflow in mul and z not neg enough to correct it */
+  if (hi == 0 && ulo >= LONG_MAX && (z > 0 || (ulo + z) > LONG_MAX))
+    return LONG_MAX;
+  /* Big overflow of more than 2 bits, add can't fix this */
+  if (((x < 0) != (y < 0)) && hi != -1)
+    return LONG_MIN;
+  /* Low overflow in mul and z not pos enough to correct it */
+  if (hi == -1 && ulo <= ((ulong)LONG_MAX + 1UL) && (z < 0 || z < (LONG_MAX - ulo)))
+    return LONG_MIN;
+  /* We have checked all conditions, any overflow in addition returns
+   * the correct value */
+  return ulo + z;
+}
+
+_CLC_OVERLOAD _CLC_DEF ulong mad_sat(ulong x, ulong y, ulong z) {
+  if (mul_hi(x, y) != 0)
+    return ULONG_MAX;
+  return add_sat(x * y, z);
+}
+
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, mad_sat, char, char, char)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, mad_sat, uchar, uchar, uchar)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, mad_sat, short, short, short)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, mad_sat, ushort, ushort, ushort)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, mad_sat, int, int, int)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, mad_sat, uint, uint, uint)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, mad_sat, long, long, long)
+_CLC_TERNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, mad_sat, ulong, ulong, ulong)
diff --git a/libclc/generic/lib/integer/mul24.cl b/libclc/generic/lib/integer/mul24.cl
new file mode 100644
index 000000000000..8aedca64b859
--- /dev/null
+++ b/libclc/generic/lib/integer/mul24.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <mul24.inc>
+#include <clc/integer/integer-gentype.inc>
diff --git a/libclc/generic/lib/integer/mul24.inc b/libclc/generic/lib/integer/mul24.inc
new file mode 100644
index 000000000000..95a2f1d6f31b
--- /dev/null
+++ b/libclc/generic/lib/integer/mul24.inc
@@ -0,0 +1,11 @@
+
+// We need to use shifts here in order to mantain the sign bit for signed
+// integers.  The compiler should optimize this to (x & 0x00FFFFFF) for
+// unsigned integers.
+#define CONVERT_TO_24BIT(x) (((x) << 8) >> 8)
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mul24(__CLC_GENTYPE x, __CLC_GENTYPE y){
+  return CONVERT_TO_24BIT(x) * CONVERT_TO_24BIT(y);
+}
+
+#undef CONVERT_TO_24BIT
diff --git a/libclc/generic/lib/integer/mul_hi.cl b/libclc/generic/lib/integer/mul_hi.cl
new file mode 100644
index 000000000000..174d893afb14
--- /dev/null
+++ b/libclc/generic/lib/integer/mul_hi.cl
@@ -0,0 +1,109 @@
+#include <clc/clc.h>
+
+//For all types EXCEPT long, which is implemented separately
+#define __CLC_MUL_HI_IMPL(BGENTYPE, GENTYPE, GENSIZE) \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE mul_hi(GENTYPE x, GENTYPE y){ \
+        return (GENTYPE)(((BGENTYPE)x * (BGENTYPE)y) >> GENSIZE); \
+    } \
+
+//FOIL-based long mul_hi
+//
+// Summary: Treat mul_hi(long x, long y) as:
+// (a+b) * (c+d) where a and c are the high-order parts of x and y respectively
+// and b and d are the low-order parts of x and y.
+// Thinking back to algebra, we use FOIL to do the work.
+
+_CLC_OVERLOAD _CLC_DEF long mul_hi(long x, long y){
+    long f, o, i;
+    ulong l;
+
+    //Move the high/low halves of x/y into the lower 32-bits of variables so
+    //that we can multiply them without worrying about overflow.
+    long x_hi = x >> 32;
+    long x_lo = x & UINT_MAX;
+    long y_hi = y >> 32;
+    long y_lo = y & UINT_MAX;
+
+    //Multiply all of the components according to FOIL method
+    f = x_hi * y_hi;
+    o = x_hi * y_lo;
+    i = x_lo * y_hi;
+    l = x_lo * y_lo;
+
+    //Now add the components back together in the following steps:
+    //F: doesn't need to be modified
+    //O/I: Need to be added together.
+    //L: Shift right by 32-bits, then add into the sum of O and I
+    //Once O/I/L are summed up, then shift the sum by 32-bits and add to F.
+    //
+    //We use hadd to give us a bit of extra precision for the intermediate sums
+    //but as a result, we shift by 31 bits instead of 32
+    return (long)(f + (hadd(o, (i + (long)((ulong)l>>32))) >> 31));
+}
+
+_CLC_OVERLOAD _CLC_DEF ulong mul_hi(ulong x, ulong y){
+    ulong f, o, i;
+    ulong l;
+
+    //Move the high/low halves of x/y into the lower 32-bits of variables so
+    //that we can multiply them without worrying about overflow.
+    ulong x_hi = x >> 32;
+    ulong x_lo = x & UINT_MAX;
+    ulong y_hi = y >> 32;
+    ulong y_lo = y & UINT_MAX;
+
+    //Multiply all of the components according to FOIL method
+    f = x_hi * y_hi;
+    o = x_hi * y_lo;
+    i = x_lo * y_hi;
+    l = x_lo * y_lo;
+
+    //Now add the components back together, taking care to respect the fact that:
+    //F: doesn't need to be modified
+    //O/I: Need to be added together.
+    //L: Shift right by 32-bits, then add into the sum of O and I
+    //Once O/I/L are summed up, then shift the sum by 32-bits and add to F.
+    //
+    //We use hadd to give us a bit of extra precision for the intermediate sums
+    //but as a result, we shift by 31 bits instead of 32
+    return (f + (hadd(o, (i + (l>>32))) >> 31));
+}
+
+#define __CLC_MUL_HI_VEC(GENTYPE) \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE##2 mul_hi(GENTYPE##2 x, GENTYPE##2 y){ \
+        return (GENTYPE##2){mul_hi(x.s0, y.s0), mul_hi(x.s1, y.s1)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE##3 mul_hi(GENTYPE##3 x, GENTYPE##3 y){ \
+        return (GENTYPE##3){mul_hi(x.s0, y.s0), mul_hi(x.s1, y.s1), mul_hi(x.s2, y.s2)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE##4 mul_hi(GENTYPE##4 x, GENTYPE##4 y){ \
+        return (GENTYPE##4){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE##8 mul_hi(GENTYPE##8 x, GENTYPE##8 y){ \
+        return (GENTYPE##8){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF GENTYPE##16 mul_hi(GENTYPE##16 x, GENTYPE##16 y){ \
+        return (GENTYPE##16){mul_hi(x.lo, y.lo), mul_hi(x.hi, y.hi)}; \
+    } \
+
+#define __CLC_MUL_HI_DEC_IMPL(BTYPE, TYPE, BITS) \
+    __CLC_MUL_HI_IMPL(BTYPE, TYPE, BITS) \
+    __CLC_MUL_HI_VEC(TYPE)
+
+#define __CLC_MUL_HI_TYPES() \
+    __CLC_MUL_HI_DEC_IMPL(short, char, 8) \
+    __CLC_MUL_HI_DEC_IMPL(ushort, uchar, 8) \
+    __CLC_MUL_HI_DEC_IMPL(int, short, 16) \
+    __CLC_MUL_HI_DEC_IMPL(uint, ushort, 16) \
+    __CLC_MUL_HI_DEC_IMPL(long, int, 32) \
+    __CLC_MUL_HI_DEC_IMPL(ulong, uint, 32) \
+    __CLC_MUL_HI_VEC(long) \
+    __CLC_MUL_HI_VEC(ulong)
+
+__CLC_MUL_HI_TYPES()
+
+#undef __CLC_MUL_HI_TYPES
+#undef __CLC_MUL_HI_DEC_IMPL
+#undef __CLC_MUL_HI_IMPL
+#undef __CLC_MUL_HI_VEC
+#undef __CLC_B32
diff --git a/libclc/generic/lib/integer/rhadd.cl b/libclc/generic/lib/integer/rhadd.cl
new file mode 100644
index 000000000000..c985870f7c7a
--- /dev/null
+++ b/libclc/generic/lib/integer/rhadd.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <rhadd.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/lib/integer/rhadd.inc b/libclc/generic/lib/integer/rhadd.inc
new file mode 100644
index 000000000000..3d6076874808
--- /dev/null
+++ b/libclc/generic/lib/integer/rhadd.inc
@@ -0,0 +1,6 @@
+//rhadd = (x+y+1)>>1
+//This can be simplified to x>>1 + y>>1 + (1 if either x or y have the 1s bit set)
+//This saves us having to do any checks for overflow in the addition sums
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE rhadd(__CLC_GENTYPE x, __CLC_GENTYPE y) {
+    return (x>>(__CLC_GENTYPE)1)+(y>>(__CLC_GENTYPE)1)+((x&(__CLC_GENTYPE)1)|(y&(__CLC_GENTYPE)1));
+}
diff --git a/libclc/generic/lib/integer/rotate.cl b/libclc/generic/lib/integer/rotate.cl
new file mode 100644
index 000000000000..27ce515c7293
--- /dev/null
+++ b/libclc/generic/lib/integer/rotate.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <rotate.inc>
+#include <clc/integer/gentype.inc>
diff --git a/libclc/generic/lib/integer/rotate.inc b/libclc/generic/lib/integer/rotate.inc
new file mode 100644
index 000000000000..33bb0a85241d
--- /dev/null
+++ b/libclc/generic/lib/integer/rotate.inc
@@ -0,0 +1,42 @@
+/**
+ * Not necessarily optimal... but it produces correct results (at least for int)
+ * If we're lucky, LLVM will recognize the pattern and produce rotate
+ * instructions:
+ * http://llvm.1065342.n5.nabble.com/rotate-td47679.html
+ * 
+ * Eventually, someone should feel free to implement an llvm-specific version
+ */
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE rotate(__CLC_GENTYPE x, __CLC_GENTYPE n){
+    //Try to avoid extra work if someone's spinning the value through multiple
+    //full rotations
+    n = n % (__CLC_GENTYPE)__CLC_GENSIZE;
+
+#ifdef __CLC_SCALAR
+    if (n > 0){
+        return (x << n) | (((__CLC_U_GENTYPE)x) >> (__CLC_GENSIZE - n));
+    } else if (n == 0){
+        return x;
+    } else {
+        return ( (((__CLC_U_GENTYPE)x) >> -n) | (x << (__CLC_GENSIZE + n)) );
+    }
+#else
+    //XXX: There's a lot of __builtin_astype calls to cast everything to
+    //     unsigned ... This should be improved so that if __CLC_GENTYPE==__CLC_U_GENTYPE, no
+    //     casts are required.
+    
+    __CLC_U_GENTYPE x_1 = __builtin_astype(x, __CLC_U_GENTYPE);
+
+    //XXX: Is (__CLC_U_GENTYPE >> S__CLC_GENTYPE) | (__CLC_U_GENTYPE << S__CLC_GENTYPE) legal?
+    //     If so, then combine the amt and shifts into a single set of statements
+    
+    __CLC_U_GENTYPE amt;
+    amt = (n < (__CLC_GENTYPE)0 ? __builtin_astype((__CLC_GENTYPE)0-n, __CLC_U_GENTYPE) : (__CLC_U_GENTYPE)0);
+    x_1 = (x_1 >> amt) | (x_1 << ((__CLC_U_GENTYPE)__CLC_GENSIZE - amt));
+
+    amt = (n < (__CLC_GENTYPE)0 ? (__CLC_U_GENTYPE)0 : __builtin_astype(n, __CLC_U_GENTYPE));
+    x_1 = (x_1 << amt) | (x_1 >> ((__CLC_U_GENTYPE)__CLC_GENSIZE - amt));
+
+    return __builtin_astype(x_1, __CLC_GENTYPE);
+#endif
+}
diff --git a/libclc/generic/lib/integer/sub_sat.cl b/libclc/generic/lib/integer/sub_sat.cl
new file mode 100644
index 000000000000..6b42cc86a74c
--- /dev/null
+++ b/libclc/generic/lib/integer/sub_sat.cl
@@ -0,0 +1,53 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+// From sub_sat.ll
+_CLC_DECL char   __clc_sub_sat_s8(char, char);
+_CLC_DECL uchar  __clc_sub_sat_u8(uchar, uchar);
+_CLC_DECL short  __clc_sub_sat_s16(short, short);
+_CLC_DECL ushort __clc_sub_sat_u16(ushort, ushort);
+_CLC_DECL int    __clc_sub_sat_s32(int, int);
+_CLC_DECL uint   __clc_sub_sat_u32(uint, uint);
+_CLC_DECL long   __clc_sub_sat_s64(long, long);
+_CLC_DECL ulong  __clc_sub_sat_u64(ulong, ulong);
+
+_CLC_OVERLOAD _CLC_DEF char sub_sat(char x, char y) {
+  return __clc_sub_sat_s8(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF uchar sub_sat(uchar x, uchar y) {
+  return __clc_sub_sat_u8(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF short sub_sat(short x, short y) {
+  return __clc_sub_sat_s16(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF ushort sub_sat(ushort x, ushort y) {
+  return __clc_sub_sat_u16(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF int sub_sat(int x, int y) {
+  return __clc_sub_sat_s32(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF uint sub_sat(uint x, uint y) {
+  return __clc_sub_sat_u32(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF long sub_sat(long x, long y) {
+  return __clc_sub_sat_s64(x, y);
+}
+
+_CLC_OVERLOAD _CLC_DEF ulong sub_sat(ulong x, ulong y) {
+  return __clc_sub_sat_u64(x, y);
+}
+
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, char, sub_sat, char, char)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uchar, sub_sat, uchar, uchar)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, short, sub_sat, short, short)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ushort, sub_sat, ushort, ushort)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, int, sub_sat, int, int)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, uint, sub_sat, uint, uint)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, long, sub_sat, long, long)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, ulong, sub_sat, ulong, ulong)
diff --git a/libclc/generic/lib/integer/sub_sat_if.ll b/libclc/generic/lib/integer/sub_sat_if.ll
new file mode 100644
index 000000000000..7252574b5b8e
--- /dev/null
+++ b/libclc/generic/lib/integer/sub_sat_if.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y)
+
+define i8 @__clc_sub_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y)
+
+define i8 @__clc_sub_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y)
+
+define i16 @__clc_sub_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y)
+
+define i16 @__clc_sub_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y)
+
+define i32 @__clc_sub_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y)
+
+define i32 @__clc_sub_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y)
+
+define i64 @__clc_sub_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y)
+  ret i64 %call
+}
+
+declare i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y)
+
+define i64 @__clc_sub_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y)
+  ret i64 %call
+}
diff --git a/libclc/generic/lib/integer/sub_sat_impl.ll b/libclc/generic/lib/integer/sub_sat_impl.ll
new file mode 100644
index 000000000000..e82b632f43b4
--- /dev/null
+++ b/libclc/generic/lib/integer/sub_sat_impl.ll
@@ -0,0 +1,83 @@
+declare {i8, i1} @llvm.ssub.with.overflow.i8(i8, i8)
+declare {i8, i1} @llvm.usub.with.overflow.i8(i8, i8)
+
+define i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call {i8, i1} @llvm.ssub.with.overflow.i8(i8 %x, i8 %y)
+  %res = extractvalue {i8, i1} %call, 0
+  %over = extractvalue {i8, i1} %call, 1
+  %x.msb = ashr i8 %x, 7
+  %x.limit = xor i8 %x.msb, 127
+  %sat = select i1 %over, i8 %x.limit, i8 %res
+  ret i8 %sat
+}
+
+define i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call {i8, i1} @llvm.usub.with.overflow.i8(i8 %x, i8 %y)
+  %res = extractvalue {i8, i1} %call, 0
+  %over = extractvalue {i8, i1} %call, 1
+  %sat = select i1 %over, i8 0, i8 %res
+  ret i8 %sat
+}
+
+declare {i16, i1} @llvm.ssub.with.overflow.i16(i16, i16)
+declare {i16, i1} @llvm.usub.with.overflow.i16(i16, i16)
+
+define i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call {i16, i1} @llvm.ssub.with.overflow.i16(i16 %x, i16 %y)
+  %res = extractvalue {i16, i1} %call, 0
+  %over = extractvalue {i16, i1} %call, 1
+  %x.msb = ashr i16 %x, 15
+  %x.limit = xor i16 %x.msb, 32767
+  %sat = select i1 %over, i16 %x.limit, i16 %res
+  ret i16 %sat
+}
+
+define i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call {i16, i1} @llvm.usub.with.overflow.i16(i16 %x, i16 %y)
+  %res = extractvalue {i16, i1} %call, 0
+  %over = extractvalue {i16, i1} %call, 1
+  %sat = select i1 %over, i16 0, i16 %res
+  ret i16 %sat
+}
+
+declare {i32, i1} @llvm.ssub.with.overflow.i32(i32, i32)
+declare {i32, i1} @llvm.usub.with.overflow.i32(i32, i32)
+
+define i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %x, i32 %y)
+  %res = extractvalue {i32, i1} %call, 0
+  %over = extractvalue {i32, i1} %call, 1
+  %x.msb = ashr i32 %x, 31
+  %x.limit = xor i32 %x.msb, 2147483647
+  %sat = select i1 %over, i32 %x.limit, i32 %res
+  ret i32 %sat
+}
+
+define i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %x, i32 %y)
+  %res = extractvalue {i32, i1} %call, 0
+  %over = extractvalue {i32, i1} %call, 1
+  %sat = select i1 %over, i32 0, i32 %res
+  ret i32 %sat
+}
+
+declare {i64, i1} @llvm.ssub.with.overflow.i64(i64, i64)
+declare {i64, i1} @llvm.usub.with.overflow.i64(i64, i64)
+
+define i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call {i64, i1} @llvm.ssub.with.overflow.i64(i64 %x, i64 %y)
+  %res = extractvalue {i64, i1} %call, 0
+  %over = extractvalue {i64, i1} %call, 1
+  %x.msb = ashr i64 %x, 63
+  %x.limit = xor i64 %x.msb, 9223372036854775807
+  %sat = select i1 %over, i64 %x.limit, i64 %res
+  ret i64 %sat
+}
+
+define i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call {i64, i1} @llvm.usub.with.overflow.i64(i64 %x, i64 %y)
+  %res = extractvalue {i64, i1} %call, 0
+  %over = extractvalue {i64, i1} %call, 1
+  %sat = select i1 %over, i64 0, i64 %res
+  ret i64 %sat
+}
diff --git a/libclc/generic/lib/integer/upsample.cl b/libclc/generic/lib/integer/upsample.cl
new file mode 100644
index 000000000000..da77315f8f93
--- /dev/null
+++ b/libclc/generic/lib/integer/upsample.cl
@@ -0,0 +1,34 @@
+#include <clc/clc.h>
+
+#define __CLC_UPSAMPLE_IMPL(BGENTYPE, GENTYPE, UGENTYPE, GENSIZE) \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE upsample(GENTYPE hi, UGENTYPE lo){ \
+        return ((BGENTYPE)hi << GENSIZE) | lo; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE##2 upsample(GENTYPE##2 hi, UGENTYPE##2 lo){ \
+        return (BGENTYPE##2){upsample(hi.s0, lo.s0), upsample(hi.s1, lo.s1)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE##3 upsample(GENTYPE##3 hi, UGENTYPE##3 lo){ \
+        return (BGENTYPE##3){upsample(hi.s0, lo.s0), upsample(hi.s1, lo.s1), upsample(hi.s2, lo.s2)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE##4 upsample(GENTYPE##4 hi, UGENTYPE##4 lo){ \
+        return (BGENTYPE##4){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE##8 upsample(GENTYPE##8 hi, UGENTYPE##8 lo){ \
+        return (BGENTYPE##8){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \
+    } \
+    _CLC_OVERLOAD _CLC_DEF BGENTYPE##16 upsample(GENTYPE##16 hi, UGENTYPE##16 lo){ \
+        return (BGENTYPE##16){upsample(hi.lo, lo.lo), upsample(hi.hi, lo.hi)}; \
+    } \
+
+#define __CLC_UPSAMPLE_TYPES() \
+    __CLC_UPSAMPLE_IMPL(short, char, uchar, 8) \
+    __CLC_UPSAMPLE_IMPL(ushort, uchar, uchar, 8) \
+    __CLC_UPSAMPLE_IMPL(int, short, ushort, 16) \
+    __CLC_UPSAMPLE_IMPL(uint, ushort, ushort, 16) \
+    __CLC_UPSAMPLE_IMPL(long, int, uint, 32) \
+    __CLC_UPSAMPLE_IMPL(ulong, uint, uint, 32) \
+
+__CLC_UPSAMPLE_TYPES()
+
+#undef __CLC_UPSAMPLE_TYPES
+#undef __CLC_UPSAMPLE_IMPL
diff --git a/libclc/generic/lib/math/acos.cl b/libclc/generic/lib/math/acos.cl
new file mode 100644
index 000000000000..3ce96554fef3
--- /dev/null
+++ b/libclc/generic/lib/math/acos.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <acos.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/acos.inc b/libclc/generic/lib/math/acos.inc
new file mode 100644
index 000000000000..8612415f37bd
--- /dev/null
+++ b/libclc/generic/lib/math/acos.inc
@@ -0,0 +1,21 @@
+/*
+ * There are multiple formulas for calculating arccosine of x:
+ * 1) acos(x) = (1/2*pi) + i * ln(i*x + sqrt(1-x^2)) (notice the 'i'...)
+ * 2) acos(x) = pi/2 + asin(-x) (asin isn't implemented yet)
+ * 3) acos(x) = pi/2 - asin(x) (ditto)
+ * 4) acos(x) = 2*atan2(sqrt(1-x), sqrt(1+x))
+ * 5) acos(x) = pi/2 - atan2(x, ( sqrt(1-x^2) ) )
+ *
+ * Options 1-3 are not currently usable, #5 generates more concise radeonsi
+ * bitcode and assembly than #4 (134 vs 132 instructions on radeonsi), but
+ * precision of #4 may be better.
+ */
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE acos(__CLC_GENTYPE x) {
+  return (
+    (__CLC_GENTYPE) 2.0 * atan2(
+      sqrt((__CLC_GENTYPE) 1.0 - x),
+      sqrt((__CLC_GENTYPE) 1.0 + x)
+    )
+  );
+}
diff --git a/libclc/generic/lib/math/asin.cl b/libclc/generic/lib/math/asin.cl
new file mode 100644
index 000000000000..d56dbd780a7b
--- /dev/null
+++ b/libclc/generic/lib/math/asin.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <asin.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/asin.inc b/libclc/generic/lib/math/asin.inc
new file mode 100644
index 000000000000..a109c367fc79
--- /dev/null
+++ b/libclc/generic/lib/math/asin.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE asin(__CLC_GENTYPE x) {
+  return atan2(x, sqrt( (__CLC_GENTYPE)1.0 -(x*x) ));
+}
+\ No newline at end of file
diff --git a/libclc/generic/lib/math/atan.cl b/libclc/generic/lib/math/atan.cl
new file mode 100644
index 000000000000..fa3633cef748
--- /dev/null
+++ b/libclc/generic/lib/math/atan.cl
@@ -0,0 +1,183 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "math.h"
+#include "../clcmacro.h"
+
+#include <clc/clc.h>
+
+_CLC_OVERLOAD _CLC_DEF float atan(float x)
+{
+    const float piby2 = 1.5707963267948966f; // 0x3ff921fb54442d18
+
+    uint ux = as_uint(x);
+    uint aux = ux & EXSIGNBIT_SP32;
+    uint sx = ux ^ aux;
+
+    float spiby2 = as_float(sx | as_uint(piby2));
+
+    float v = as_float(aux);
+
+    // Return for NaN
+    float ret = x;
+
+    // 2^26 <= |x| <= Inf => atan(x) is close to piby2
+    ret = aux <= PINFBITPATT_SP32  ? spiby2 : ret;
+
+    // Reduce arguments 2^-19 <= |x| < 2^26
+
+    // 39/16 <= x < 2^26
+    x = -MATH_RECIP(v);
+    float c = 1.57079632679489655800f; // atan(infinity)
+
+    // 19/16 <= x < 39/16
+    int l = aux < 0x401c0000;
+    float xx = MATH_DIVIDE(v - 1.5f, mad(v, 1.5f, 1.0f));
+    x = l ? xx : x;
+    c = l ? 9.82793723247329054082e-01f : c; // atan(1.5)
+
+    // 11/16 <= x < 19/16
+    l = aux < 0x3f980000U;
+    xx =  MATH_DIVIDE(v - 1.0f, 1.0f + v);
+    x = l ? xx : x;
+    c = l ? 7.85398163397448278999e-01f : c; // atan(1)
+
+    // 7/16 <= x < 11/16
+    l = aux < 0x3f300000;
+    xx = MATH_DIVIDE(mad(v, 2.0f, -1.0f), 2.0f + v);
+    x = l ? xx : x;
+    c = l ? 4.63647609000806093515e-01f : c; // atan(0.5)
+
+    // 2^-19 <= x < 7/16
+    l = aux < 0x3ee00000;
+    x = l ? v : x;
+    c = l ? 0.0f : c;
+
+    // Core approximation: Remez(2,2) on [-7/16,7/16]
+
+    float s = x * x;
+    float a = mad(s,
+                  mad(s, 0.470677934286149214138357545549e-2f, 0.192324546402108583211697690500f),
+                  0.296528598819239217902158651186f);
+
+    float b = mad(s,
+                  mad(s, 0.299309699959659728404442796915f, 0.111072499995399550138837673349e1f),
+                  0.889585796862432286486651434570f);
+
+    float q = x * s * MATH_DIVIDE(a, b);
+
+    float z = c - (q - x);
+    float zs = as_float(sx | as_uint(z));
+
+    ret  = aux < 0x4c800000 ?  zs : ret;
+
+    // |x| < 2^-19
+    ret = aux < 0x36000000 ? as_float(ux) : ret;
+    return ret;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, atan, float);
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+
+_CLC_OVERLOAD _CLC_DEF double atan(double x)
+{
+    const double piby2 = 1.5707963267948966e+00; // 0x3ff921fb54442d18
+
+    double v = fabs(x);
+
+    // 2^56 > v > 39/16
+    double a = -1.0;
+    double b = v;
+    // (chi + clo) = arctan(infinity)
+    double chi = 1.57079632679489655800e+00;
+    double clo = 6.12323399573676480327e-17;
+
+    double ta = v - 1.5;
+    double tb = 1.0 + 1.5 * v;
+    int l = v <= 0x1.38p+1; // 39/16 > v > 19/16
+    a = l ? ta : a;
+    b = l ? tb : b;
+    // (chi + clo) = arctan(1.5)
+    chi = l ? 9.82793723247329054082e-01 : chi;
+    clo = l ? 1.39033110312309953701e-17 : clo;
+
+    ta = v - 1.0;
+    tb = 1.0 + v;
+    l = v <= 0x1.3p+0; // 19/16 > v > 11/16
+    a = l ? ta : a;
+    b = l ? tb : b;
+    // (chi + clo) = arctan(1.)
+    chi = l ? 7.85398163397448278999e-01 : chi;
+    clo = l ? 3.06161699786838240164e-17 : clo;
+
+    ta = 2.0 * v - 1.0;
+    tb = 2.0 + v;
+    l = v <= 0x1.6p-1; // 11/16 > v > 7/16
+    a = l ? ta : a;
+    b = l ? tb : b;
+    // (chi + clo) = arctan(0.5)
+    chi = l ? 4.63647609000806093515e-01 : chi;
+    clo = l ? 2.26987774529616809294e-17 : clo;
+
+    l = v <= 0x1.cp-2; // v < 7/16
+    a = l ? v : a;
+    b = l ? 1.0 : b;;
+    chi = l ? 0.0 : chi;
+    clo = l ? 0.0 : clo;
+
+    // Core approximation: Remez(4,4) on [-7/16,7/16]
+    double r = a / b;
+    double s = r * r;
+    double qn = fma(s,
+                    fma(s,
+                        fma(s,
+                            fma(s, 0.142316903342317766e-3,
+                                   0.304455919504853031e-1),
+                            0.220638780716667420e0),
+                        0.447677206805497472e0),
+                    0.268297920532545909e0);
+
+    double qd = fma(s,
+	            fma(s,
+			fma(s,
+			    fma(s, 0.389525873944742195e-1,
+				   0.424602594203847109e0),
+                            0.141254259931958921e1),
+                        0.182596787737507063e1),
+                    0.804893761597637733e0);
+
+    double q = r * s * qn / qd;
+    r = chi - ((q - clo) - r);
+
+    double z = isnan(x) ? x : piby2;
+    z = v <= 0x1.0p+56 ? r : z;
+    z = v < 0x1.0p-26 ? v : z;
+    return x == v ? z : -z;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, atan, double);
+
+#endif // cl_khr_fp64
diff --git a/libclc/generic/lib/math/atan2.cl b/libclc/generic/lib/math/atan2.cl
new file mode 100644
index 000000000000..9e5fb587d422
--- /dev/null
+++ b/libclc/generic/lib/math/atan2.cl
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "math.h"
+#include "../clcmacro.h"
+
+#include <clc/clc.h>
+
+_CLC_OVERLOAD _CLC_DEF float atan2(float y, float x)
+{
+    const float pi = 0x1.921fb6p+1f;
+    const float piby2 = 0x1.921fb6p+0f;
+    const float piby4 = 0x1.921fb6p-1f;
+    const float threepiby4 = 0x1.2d97c8p+1f;
+
+    float ax = fabs(x);
+    float ay = fabs(y);
+    float v = min(ax, ay);
+    float u = max(ax, ay);
+
+    // Scale since u could be large, as in "regular" divide
+    float s = u > 0x1.0p+96f ? 0x1.0p-32f : 1.0f;
+    float vbyu = s * MATH_DIVIDE(v, s*u);
+
+    float vbyu2 = vbyu * vbyu;
+
+#define USE_2_2_APPROXIMATION
+#if defined USE_2_2_APPROXIMATION
+    float p = mad(vbyu2, mad(vbyu2, -0x1.7e1f78p-9f, -0x1.7d1b98p-3f), -0x1.5554d0p-2f) * vbyu2 * vbyu;
+    float q = mad(vbyu2, mad(vbyu2, 0x1.1a714cp-2f, 0x1.287c56p+0f), 1.0f);
+#else
+    float p = mad(vbyu2, mad(vbyu2, -0x1.55cd22p-5f, -0x1.26cf76p-2f), -0x1.55554ep-2f) * vbyu2 * vbyu;
+    float q = mad(vbyu2, mad(vbyu2, mad(vbyu2, 0x1.9f1304p-5f, 0x1.2656fap-1f), 0x1.76b4b8p+0f), 1.0f);
+#endif
+
+    // Octant 0 result
+    float a = mad(p, MATH_RECIP(q), vbyu);
+
+    // Fix up 3 other octants
+    float at = piby2 - a;
+    a = ay > ax ? at : a;
+    at = pi - a;
+    a = x < 0.0F ? at : a;
+
+    // y == 0 => 0 for x >= 0, pi for x < 0
+    at = as_int(x) < 0 ? pi : 0.0f;
+    a = y == 0.0f ? at : a;
+
+    // if (!FINITE_ONLY()) {
+        // x and y are +- Inf
+        at = x > 0.0f ? piby4 : threepiby4;
+        a = ax == INFINITY & ay == INFINITY ? at : a;
+
+	// x or y is NaN
+	a = isnan(x) | isnan(y) ? as_float(QNANBITPATT_SP32) : a;
+    // }
+
+    // Fixup sign and return
+    return copysign(a, y);
+}
+
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, atan2, float, float);
diff --git a/libclc/generic/lib/math/binary_impl.inc b/libclc/generic/lib/math/binary_impl.inc
new file mode 100644
index 000000000000..c9bf97242672
--- /dev/null
+++ b/libclc/generic/lib/math/binary_impl.inc
@@ -0,0 +1,22 @@
+
+#ifndef __CLC_SCALAR
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, __CLC_GENTYPE y) {
+  return FUNCTION_IMPL(x, y);
+}
+
+#endif
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, float y) {
+  __CLC_GENTYPE vec_y = (__CLC_GENTYPE) (y);
+  return FUNCTION_IMPL(x, vec_y);
+}
+
+#ifdef cl_khr_fp64
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE FUNCTION(__CLC_GENTYPE x, double y) {
+  __CLC_GENTYPE vec_y = (__CLC_GENTYPE) (y);
+  return FUNCTION_IMPL(x, vec_y);
+}
+
+#endif
diff --git a/libclc/generic/lib/math/clc_nextafter.cl b/libclc/generic/lib/math/clc_nextafter.cl
new file mode 100644
index 000000000000..e53837d179fb
--- /dev/null
+++ b/libclc/generic/lib/math/clc_nextafter.cl
@@ -0,0 +1,43 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+// This file provides OpenCL C implementations of nextafter for targets that
+// don't support the clang builtin.
+
+#define FLT_NAN 0.0f/0.0f
+
+#define NEXTAFTER(FLOAT_TYPE, UINT_TYPE, NAN, ZERO, NEXTAFTER_ZERO) \
+_CLC_OVERLOAD _CLC_DEF FLOAT_TYPE __clc_nextafter(FLOAT_TYPE x, FLOAT_TYPE y) { \
+  union {                     \
+    FLOAT_TYPE f;             \
+    UINT_TYPE i;              \
+  } next;                     \
+  if (isnan(x) || isnan(y)) { \
+    return NAN;               \
+  }                           \
+  if (x == y) {               \
+    return y;                 \
+  }                           \
+  next.f = x;                 \
+  if (x < y) {                \
+    next.i++;                 \
+  } else {                    \
+    if (next.f == ZERO) {     \
+    next.i = NEXTAFTER_ZERO;  \
+    } else {                  \
+      next.i--;               \
+    }                         \
+  }                           \
+  return next.f;              \
+}
+
+NEXTAFTER(float, uint, FLT_NAN, 0.0f, 0x80000001)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, __clc_nextafter, float, float)
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#define DBL_NAN 0.0/0.0
+
+NEXTAFTER(double, ulong, DBL_NAN, 0.0, 0x8000000000000001)
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, __clc_nextafter, double, double)
+#endif
diff --git a/libclc/generic/lib/math/copysign.cl b/libclc/generic/lib/math/copysign.cl
new file mode 100644
index 000000000000..4e0c51b09373
--- /dev/null
+++ b/libclc/generic/lib/math/copysign.cl
@@ -0,0 +1,12 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+_CLC_DEFINE_BINARY_BUILTIN(float, copysign, __builtin_copysignf, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_DEFINE_BINARY_BUILTIN(double, copysign, __builtin_copysign, double, double)
+
+#endif
diff --git a/libclc/generic/lib/math/cos.cl b/libclc/generic/lib/math/cos.cl
new file mode 100644
index 000000000000..bbd96b42bc12
--- /dev/null
+++ b/libclc/generic/lib/math/cos.cl
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "math.h"
+#include "sincos_helpers.h"
+#include "../clcmacro.h"
+
+_CLC_OVERLOAD _CLC_DEF float cos(float x)
+{
+    int ix = as_int(x);
+    int ax = ix & 0x7fffffff;
+    float dx = as_float(ax);
+
+    float r0, r1;
+    int regn = argReductionS(&r0, &r1, dx);
+
+    float ss = -sinf_piby4(r0, r1);
+    float cc =  cosf_piby4(r0, r1);
+
+    float c =  (regn & 1) != 0 ? ss : cc;
+    c = as_float(as_int(c) ^ ((regn > 1) << 31));
+
+    c = ax >= PINFBITPATT_SP32 ? as_float(QNANBITPATT_SP32) : c;
+
+    return c;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, cos, float);
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+#define __CLC_FUNCTION __clc_cos_intrinsic
+#define __CLC_INTRINSIC "llvm.cos"
+#include <clc/math/unary_intrin.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
+
+_CLC_OVERLOAD _CLC_DEF double cos(double x) {
+    return __clc_cos_intrinsic(x);
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, cos, double);
+
+#endif
diff --git a/libclc/generic/lib/math/exp.cl b/libclc/generic/lib/math/exp.cl
new file mode 100644
index 000000000000..dbf4a930b01d
--- /dev/null
+++ b/libclc/generic/lib/math/exp.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <exp.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/exp.inc b/libclc/generic/lib/math/exp.inc
new file mode 100644
index 000000000000..525fb59c9967
--- /dev/null
+++ b/libclc/generic/lib/math/exp.inc
@@ -0,0 +1,10 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE exp(__CLC_GENTYPE val) {
+  // exp(x) = exp2(x * log2(e))
+#if __CLC_FPSIZE == 32
+  return exp2(val * M_LOG2E_F);
+#elif __CLC_FPSIZE == 64
+  return exp2(val * M_LOG2E);
+#else
+#error unknown _CLC_FPSIZE
+#endif
+}
diff --git a/libclc/generic/lib/math/exp10.cl b/libclc/generic/lib/math/exp10.cl
new file mode 100644
index 000000000000..c8039cb8dedc
--- /dev/null
+++ b/libclc/generic/lib/math/exp10.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <exp10.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/exp10.inc b/libclc/generic/lib/math/exp10.inc
new file mode 100644
index 000000000000..a592c1948799
--- /dev/null
+++ b/libclc/generic/lib/math/exp10.inc
@@ -0,0 +1,10 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE exp10(__CLC_GENTYPE val) {
+  // exp10(x) = exp2(x * log2(10))
+#if __CLC_FPSIZE == 32
+  return exp2(val * log2(10.0f));
+#elif __CLC_FPSIZE == 64
+  return exp2(val * log2(10.0));
+#else
+#error unknown _CLC_FPSIZE
+#endif
+}
diff --git a/libclc/generic/lib/math/fmax.cl b/libclc/generic/lib/math/fmax.cl
new file mode 100644
index 000000000000..58583d6767aa
--- /dev/null
+++ b/libclc/generic/lib/math/fmax.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define FUNCTION __clc_fmax
+#define FUNCTION_IMPL(x, y) ((x) < (y) ? (y) : (x))
+
+#define __CLC_BODY <binary_impl.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/fmin.cl b/libclc/generic/lib/math/fmin.cl
new file mode 100644
index 000000000000..a61ad4757289
--- /dev/null
+++ b/libclc/generic/lib/math/fmin.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define FUNCTION __clc_fmin
+#define FUNCTION_IMPL(x, y) ((y) < (x) ? (y) : (x))
+
+#define __CLC_BODY <binary_impl.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/fmod.cl b/libclc/generic/lib/math/fmod.cl
new file mode 100644
index 000000000000..f9a4e3176137
--- /dev/null
+++ b/libclc/generic/lib/math/fmod.cl
@@ -0,0 +1,12 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+_CLC_DEFINE_BINARY_BUILTIN(float, fmod, __builtin_fmodf, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_DEFINE_BINARY_BUILTIN(double, fmod, __builtin_fmod, double, double)
+
+#endif
diff --git a/libclc/generic/lib/math/hypot.cl b/libclc/generic/lib/math/hypot.cl
new file mode 100644
index 000000000000..eca042c91535
--- /dev/null
+++ b/libclc/generic/lib/math/hypot.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <hypot.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/hypot.inc b/libclc/generic/lib/math/hypot.inc
new file mode 100644
index 000000000000..036cee7e1f06
--- /dev/null
+++ b/libclc/generic/lib/math/hypot.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE hypot(__CLC_GENTYPE x, __CLC_GENTYPE y) {
+  return sqrt(x*x + y*y);
+}
diff --git a/libclc/generic/lib/math/log1p.cl b/libclc/generic/lib/math/log1p.cl
new file mode 100644
index 000000000000..be25c64bf6a4
--- /dev/null
+++ b/libclc/generic/lib/math/log1p.cl
@@ -0,0 +1,177 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "math.h"
+#include "tables.h"
+#include "../clcmacro.h"
+
+_CLC_OVERLOAD _CLC_DEF float log1p(float x)
+{
+    float w = x;
+    uint ux = as_uint(x);
+    uint ax = ux & EXSIGNBIT_SP32;
+
+    // |x| < 2^-4
+    float u2 = MATH_DIVIDE(x, 2.0f + x);
+    float u = u2 + u2;
+    float v = u * u;
+    // 2/(5 * 2^5), 2/(3 * 2^3)
+    float zsmall = mad(-u2, x, mad(v, 0x1.99999ap-7f, 0x1.555556p-4f) * v * u) + x;
+
+    // |x| >= 2^-4
+    ux = as_uint(x + 1.0f);
+
+    int m = (int)((ux >> EXPSHIFTBITS_SP32) & 0xff) - EXPBIAS_SP32;
+    float mf = (float)m;
+    uint indx = (ux & 0x007f0000) + ((ux & 0x00008000) << 1);
+    float F = as_float(indx | 0x3f000000);
+
+    // x > 2^24
+    float fg24 = F - as_float(0x3f000000 | (ux & MANTBITS_SP32));
+
+    // x <= 2^24
+    uint xhi = ux & 0xffff8000;
+    float xh = as_float(xhi);
+    float xt = (1.0f - xh) + w;
+    uint xnm = ((~(xhi & 0x7f800000)) - 0x00800000) & 0x7f800000;
+    xt = xt * as_float(xnm) * 0.5f;
+    float fl24 = F - as_float(0x3f000000 | (xhi & MANTBITS_SP32)) - xt;
+
+    float f = mf > 24.0f ? fg24 : fl24;
+
+    indx = indx >> 16;
+    float r = f * USE_TABLE(log_inv_tbl, indx);
+
+    // 1/3, 1/2
+    float poly = mad(mad(r, 0x1.555556p-2f, 0x1.0p-1f), r*r, r);
+
+    const float LOG2_HEAD = 0x1.62e000p-1f;   // 0.693115234
+    const float LOG2_TAIL = 0x1.0bfbe8p-15f;  // 0.0000319461833
+
+    float2 tv = USE_TABLE(loge_tbl, indx);
+    float z1 = mad(mf, LOG2_HEAD, tv.s0);
+    float z2 = mad(mf, LOG2_TAIL, -poly) + tv.s1;
+    float z = z1 + z2;
+
+    z = ax < 0x3d800000U ? zsmall : z;
+
+
+
+    // Edge cases
+    z = ax >= PINFBITPATT_SP32 ? w : z;
+    z = w  < -1.0f ? as_float(QNANBITPATT_SP32) : z;
+    z = w == -1.0f ? as_float(NINFBITPATT_SP32) : z;
+        //fix subnormals
+        z = ax  < 0x33800000 ? x : z;
+
+    return z;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, log1p, float);
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_OVERLOAD _CLC_DEF double log1p(double x)
+{
+    // Computes natural log(1+x). Algorithm based on:
+    // Ping-Tak Peter Tang
+    // "Table-driven implementation of the logarithm function in IEEE
+    // floating-point arithmetic"
+    // ACM Transactions on Mathematical Software (TOMS)
+    // Volume 16, Issue 4 (December 1990)
+    // Note that we use a lookup table of size 64 rather than 128,
+    // and compensate by having extra terms in the minimax polynomial
+    // for the kernel approximation.
+
+    // Process Inside the threshold now
+    ulong ux = as_ulong(1.0 + x);
+    int xexp = ((as_int2(ux).hi >> 20) & 0x7ff) - EXPBIAS_DP64;
+    double f = as_double(ONEEXPBITS_DP64 | (ux & MANTBITS_DP64));
+
+    int j = as_int2(ux).hi >> 13;
+    j = ((0x80 | (j & 0x7e)) >> 1) + (j & 0x1);
+    double f1 = (double)j * 0x1.0p-6;
+    j -= 64;
+
+    double f2temp = f - f1;
+    double m2 = as_double(convert_ulong(0x3ff - xexp) << EXPSHIFTBITS_DP64);
+    double f2l = fma(m2, x, m2 - f1);
+    double f2g = fma(m2, x, -f1) + m2;
+    double f2 = xexp <= MANTLENGTH_DP64-1 ? f2l : f2g;
+    f2 = (xexp <= -2) | (xexp >= MANTLENGTH_DP64+8) ? f2temp : f2;
+
+    double2 tv = USE_TABLE(ln_tbl, j);
+    double z1 = tv.s0;
+    double q = tv.s1;
+
+    double u = MATH_DIVIDE(f2, fma(0.5, f2, f1));
+    double v = u * u;
+
+    double poly = v * fma(v,
+                          fma(v, 2.23219810758559851206e-03, 1.24999999978138668903e-02),
+                          8.33333333333333593622e-02);
+
+    // log2_lead and log2_tail sum to an extra-precise version of log(2)
+    const double log2_lead = 6.93147122859954833984e-01; /* 0x3fe62e42e0000000 */
+    const double log2_tail = 5.76999904754328540596e-08; /* 0x3e6efa39ef35793c */
+
+    double z2 = q + fma(u, poly, u);
+    double dxexp = (double)xexp;
+    double r1 = fma(dxexp, log2_lead, z1);
+    double r2 = fma(dxexp, log2_tail, z2);
+    double result1 = r1 + r2;
+
+    // Process Outside the threshold now
+    double r = x;
+    u = r / (2.0 + r);
+    double correction = r * u;
+    u = u + u;
+    v = u * u;
+    r1 = r;
+
+    poly = fma(v,
+               fma(v,
+                   fma(v, 4.34887777707614552256e-04, 2.23213998791944806202e-03),
+                   1.25000000037717509602e-02),
+               8.33333333333317923934e-02);
+
+    r2 = fma(u*v, poly, -correction);
+
+    // The values exp(-1/16)-1 and exp(1/16)-1
+    const double log1p_thresh1 = -0x1.f0540438fd5c3p-5;
+    const double log1p_thresh2 =  0x1.082b577d34ed8p-4;
+    double result2 = r1 + r2;
+    result2 = x < log1p_thresh1 | x > log1p_thresh2 ? result1 : result2;
+
+    result2 = isinf(x) ? x : result2;
+    result2 = x < -1.0 ? as_double(QNANBITPATT_DP64) : result2;
+    result2 = x == -1.0 ? as_double(NINFBITPATT_DP64) : result2;
+    return result2;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, log1p, double);
+
+#endif // cl_khr_fp64
diff --git a/libclc/generic/lib/math/mad.cl b/libclc/generic/lib/math/mad.cl
new file mode 100644
index 000000000000..6c7b90d150d5
--- /dev/null
+++ b/libclc/generic/lib/math/mad.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <mad.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/mad.inc b/libclc/generic/lib/math/mad.inc
new file mode 100644
index 000000000000..d32c7839d1b9
--- /dev/null
+++ b/libclc/generic/lib/math/mad.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mad(__CLC_GENTYPE a, __CLC_GENTYPE b, __CLC_GENTYPE c) {
+  return a * b + c;
+}
diff --git a/libclc/generic/lib/math/math.h b/libclc/generic/lib/math/math.h
new file mode 100644
index 000000000000..f46c7ea7a7d0
--- /dev/null
+++ b/libclc/generic/lib/math/math.h
@@ -0,0 +1,90 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define SNAN 0x001
+#define QNAN 0x002
+#define NINF 0x004
+#define NNOR 0x008
+#define NSUB 0x010
+#define NZER 0x020
+#define PZER 0x040
+#define PSUB 0x080
+#define PNOR 0x100
+#define PINF 0x200
+
+#define HAVE_HW_FMA32() (1)
+#define HAVE_BITALIGN() (0)
+#define HAVE_FAST_FMA32() (0)
+
+#define MATH_DIVIDE(X, Y) ((X) / (Y))
+#define MATH_RECIP(X) (1.0f / (X))
+#define MATH_SQRT(X) sqrt(X)
+
+#define SIGNBIT_SP32      0x80000000
+#define EXSIGNBIT_SP32    0x7fffffff
+#define EXPBITS_SP32      0x7f800000
+#define MANTBITS_SP32     0x007fffff
+#define ONEEXPBITS_SP32   0x3f800000
+#define TWOEXPBITS_SP32   0x40000000
+#define HALFEXPBITS_SP32  0x3f000000
+#define IMPBIT_SP32       0x00800000
+#define QNANBITPATT_SP32  0x7fc00000
+#define INDEFBITPATT_SP32 0xffc00000
+#define PINFBITPATT_SP32  0x7f800000
+#define NINFBITPATT_SP32  0xff800000
+#define EXPBIAS_SP32      127
+#define EXPSHIFTBITS_SP32 23
+#define BIASEDEMIN_SP32   1
+#define EMIN_SP32         -126
+#define BIASEDEMAX_SP32   254
+#define EMAX_SP32         127
+#define LAMBDA_SP32       1.0e30
+#define MANTLENGTH_SP32   24
+#define BASEDIGITS_SP32   7
+
+#ifdef cl_khr_fp64
+
+#define SIGNBIT_DP64      0x8000000000000000L
+#define EXSIGNBIT_DP64    0x7fffffffffffffffL
+#define EXPBITS_DP64      0x7ff0000000000000L
+#define MANTBITS_DP64     0x000fffffffffffffL
+#define ONEEXPBITS_DP64   0x3ff0000000000000L
+#define TWOEXPBITS_DP64   0x4000000000000000L
+#define HALFEXPBITS_DP64  0x3fe0000000000000L
+#define IMPBIT_DP64       0x0010000000000000L
+#define QNANBITPATT_DP64  0x7ff8000000000000L
+#define INDEFBITPATT_DP64 0xfff8000000000000L
+#define PINFBITPATT_DP64  0x7ff0000000000000L
+#define NINFBITPATT_DP64  0xfff0000000000000L
+#define EXPBIAS_DP64      1023
+#define EXPSHIFTBITS_DP64 52
+#define BIASEDEMIN_DP64   1
+#define EMIN_DP64         -1022
+#define BIASEDEMAX_DP64   2046 /* 0x7fe */
+#define EMAX_DP64         1023 /* 0x3ff */
+#define LAMBDA_DP64       1.0e300
+#define MANTLENGTH_DP64   53
+#define BASEDIGITS_DP64   15
+
+#endif // cl_khr_fp64
+
+#define ALIGNED(x)	__attribute__((aligned(x)))
diff --git a/libclc/generic/lib/math/mix.cl b/libclc/generic/lib/math/mix.cl
new file mode 100644
index 000000000000..294f332e67f2
--- /dev/null
+++ b/libclc/generic/lib/math/mix.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <mix.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/mix.inc b/libclc/generic/lib/math/mix.inc
new file mode 100644
index 000000000000..1e8b936149bb
--- /dev/null
+++ b/libclc/generic/lib/math/mix.inc
@@ -0,0 +1,9 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mix(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE a) {
+  return mad( y - x, a, x );
+}
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE mix(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_SCALAR_GENTYPE a) {
+    return mix(x, y, (__CLC_GENTYPE)a);
+}
+#endif
diff --git a/libclc/generic/lib/math/nextafter.cl b/libclc/generic/lib/math/nextafter.cl
new file mode 100644
index 000000000000..cbe54cd4e266
--- /dev/null
+++ b/libclc/generic/lib/math/nextafter.cl
@@ -0,0 +1,12 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+_CLC_DEFINE_BINARY_BUILTIN(float, nextafter, __builtin_nextafterf, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_DEFINE_BINARY_BUILTIN(double, nextafter, __builtin_nextafter, double, double)
+
+#endif
diff --git a/libclc/generic/lib/math/pown.cl b/libclc/generic/lib/math/pown.cl
new file mode 100644
index 000000000000..f3b27d4ccab7
--- /dev/null
+++ b/libclc/generic/lib/math/pown.cl
@@ -0,0 +1,10 @@
+#include <clc/clc.h>
+#include "../clcmacro.h"
+
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, pown, float, int)
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, pown, double, int)
+#endif
diff --git a/libclc/generic/lib/math/sin.cl b/libclc/generic/lib/math/sin.cl
new file mode 100644
index 000000000000..ffc4dd1aa037
--- /dev/null
+++ b/libclc/generic/lib/math/sin.cl
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "math.h"
+#include "sincos_helpers.h"
+#include "../clcmacro.h"
+
+_CLC_OVERLOAD _CLC_DEF float sin(float x)
+{
+    int ix = as_int(x);
+    int ax = ix & 0x7fffffff;
+    float dx = as_float(ax);
+
+    float r0, r1;
+    int regn = argReductionS(&r0, &r1, dx);
+
+    float ss = sinf_piby4(r0, r1);
+    float cc = cosf_piby4(r0, r1);
+
+    float s = (regn & 1) != 0 ? cc : ss;
+    s = as_float(as_int(s) ^ ((regn > 1) << 31) ^ (ix ^ ax));
+
+    s = ax >= PINFBITPATT_SP32 ? as_float(QNANBITPATT_SP32) : s;
+
+    //Subnormals
+    s = x == 0.0f ? x : s;
+
+    return s;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, sin, float);
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+#define __CLC_FUNCTION __clc_sin_intrinsic
+#define __CLC_INTRINSIC "llvm.sin"
+#include <clc/math/unary_intrin.inc>
+#undef __CLC_FUNCTION
+#undef __CLC_INTRINSIC
+
+_CLC_OVERLOAD _CLC_DEF double sin(double x) {
+    return __clc_sin_intrinsic(x);
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, sin, double);
+
+#endif
diff --git a/libclc/generic/lib/math/sincos.cl b/libclc/generic/lib/math/sincos.cl
new file mode 100644
index 000000000000..eace5adcf16f
--- /dev/null
+++ b/libclc/generic/lib/math/sincos.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <sincos.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/sincos.inc b/libclc/generic/lib/math/sincos.inc
new file mode 100644
index 000000000000..e97f0f9641c1
--- /dev/null
+++ b/libclc/generic/lib/math/sincos.inc
@@ -0,0 +1,11 @@
+#define __CLC_DECLARE_SINCOS(ADDRSPACE, TYPE) \
+  _CLC_OVERLOAD _CLC_DEF TYPE sincos (TYPE x, ADDRSPACE TYPE * cosval) { \
+    *cosval = cos(x); \
+    return sin(x); \
+  }
+
+__CLC_DECLARE_SINCOS(global, __CLC_GENTYPE)
+__CLC_DECLARE_SINCOS(local, __CLC_GENTYPE)
+__CLC_DECLARE_SINCOS(private, __CLC_GENTYPE)
+
+#undef __CLC_DECLARE_SINCOS
diff --git a/libclc/generic/lib/math/sincos_helpers.cl b/libclc/generic/lib/math/sincos_helpers.cl
new file mode 100644
index 000000000000..1a5f10c8e651
--- /dev/null
+++ b/libclc/generic/lib/math/sincos_helpers.cl
@@ -0,0 +1,308 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "math.h"
+#include "sincos_helpers.h"
+
+uint bitalign(uint hi, uint lo, uint shift)
+{
+        return (hi << (32 - shift)) | (lo >> shift);
+}
+
+float sinf_piby4(float x, float y)
+{
+    // Taylor series for sin(x) is x - x^3/3! + x^5/5! - x^7/7! ...
+    // = x * (1 - x^2/3! + x^4/5! - x^6/7! ...
+    // = x * f(w)
+    // where w = x*x and f(w) = (1 - w/3! + w^2/5! - w^3/7! ...
+    // We use a minimax approximation of (f(w) - 1) / w
+    // because this produces an expansion in even powers of x.
+
+    const float c1 = -0.1666666666e0f;
+    const float c2 = 0.8333331876e-2f;
+    const float c3 = -0.198400874e-3f;
+    const float c4 = 0.272500015e-5f;
+    const float c5 = -2.5050759689e-08f; // 0xb2d72f34
+    const float c6 = 1.5896910177e-10f;	 // 0x2f2ec9d3
+
+    float z = x * x;
+    float v = z * x;
+    float r = mad(z, mad(z, mad(z, mad(z, c6, c5), c4), c3), c2);
+    float ret = x - mad(v, -c1, mad(z, mad(y, 0.5f, -v*r), -y));
+
+    return ret;
+}
+
+float cosf_piby4(float x, float y)
+{
+    // Taylor series for cos(x) is 1 - x^2/2! + x^4/4! - x^6/6! ...
+    // = f(w)
+    // where w = x*x and f(w) = (1 - w/2! + w^2/4! - w^3/6! ...
+    // We use a minimax approximation of (f(w) - 1 + w/2) / (w*w)
+    // because this produces an expansion in even powers of x.
+
+    const float c1 = 0.416666666e-1f;
+    const float c2 = -0.138888876e-2f;
+    const float c3 = 0.248006008e-4f;
+    const float c4 = -0.2730101334e-6f;
+    const float c5 = 2.0875723372e-09f;	 // 0x310f74f6
+    const float c6 = -1.1359647598e-11f; // 0xad47d74e
+
+    float z = x * x;
+    float r = z * mad(z, mad(z, mad(z, mad(z, mad(z, c6,  c5), c4), c3), c2), c1);
+
+    // if |x| < 0.3
+    float qx = 0.0f;
+
+    int ix = as_int(x) & EXSIGNBIT_SP32;
+
+    //  0.78125 > |x| >= 0.3
+    float xby4 = as_float(ix - 0x01000000);
+    qx = (ix >= 0x3e99999a) & (ix <= 0x3f480000) ? xby4 : qx;
+
+    // x > 0.78125
+    qx = ix > 0x3f480000 ? 0.28125f : qx;
+
+    float hz = mad(z, 0.5f, -qx);
+    float a = 1.0f - qx;
+    float ret = a - (hz - mad(z, r, -x*y));
+    return ret;
+}
+
+void fullMulS(float *hi, float *lo, float a, float b, float bh, float bt)
+{
+    if (HAVE_HW_FMA32()) {
+        float ph = a * b;
+        *hi = ph;
+        *lo = fma(a, b, -ph);
+    } else {
+        float ah = as_float(as_uint(a) & 0xfffff000U);
+        float at = a - ah;
+        float ph = a * b;
+        float pt = mad(at, bt, mad(at, bh, mad(ah, bt, mad(ah, bh, -ph))));
+        *hi = ph;
+        *lo = pt;
+    }
+}
+
+float removePi2S(float *hi, float *lo, float x)
+{
+    // 72 bits of pi/2
+    const float fpiby2_1 = (float) 0xC90FDA / 0x1.0p+23f;
+    const float fpiby2_1_h = (float) 0xC90 / 0x1.0p+11f;
+    const float fpiby2_1_t = (float) 0xFDA / 0x1.0p+23f;
+
+    const float fpiby2_2 = (float) 0xA22168 / 0x1.0p+47f;
+    const float fpiby2_2_h = (float) 0xA22 / 0x1.0p+35f;
+    const float fpiby2_2_t = (float) 0x168 / 0x1.0p+47f;
+
+    const float fpiby2_3 = (float) 0xC234C4 / 0x1.0p+71f;
+    const float fpiby2_3_h = (float) 0xC23 / 0x1.0p+59f;
+    const float fpiby2_3_t = (float) 0x4C4 / 0x1.0p+71f;
+
+    const float twobypi = 0x1.45f306p-1f;
+
+    float fnpi2 = trunc(mad(x, twobypi, 0.5f));
+
+    // subtract n * pi/2 from x
+    float rhead, rtail;
+    fullMulS(&rhead, &rtail, fnpi2, fpiby2_1, fpiby2_1_h, fpiby2_1_t);
+    float v = x - rhead;
+    float rem = v + (((x - v) - rhead) - rtail);
+
+    float rhead2, rtail2;
+    fullMulS(&rhead2, &rtail2, fnpi2, fpiby2_2, fpiby2_2_h, fpiby2_2_t);
+    v = rem - rhead2;
+    rem = v + (((rem - v) - rhead2) - rtail2);
+
+    float rhead3, rtail3;
+    fullMulS(&rhead3, &rtail3, fnpi2, fpiby2_3, fpiby2_3_h, fpiby2_3_t);
+    v = rem - rhead3;
+
+    *hi = v + ((rem - v) - rhead3);
+    *lo = -rtail3;
+    return fnpi2;
+}
+
+int argReductionSmallS(float *r, float *rr, float x)
+{
+    float fnpi2 = removePi2S(r, rr, x);
+    return (int)fnpi2 & 0x3;
+}
+
+#define FULL_MUL(A, B, HI, LO) \
+    LO = A * B; \
+    HI = mul_hi(A, B)
+
+#define FULL_MAD(A, B, C, HI, LO) \
+    LO = ((A) * (B) + (C)); \
+    HI = mul_hi(A, B); \
+    HI += LO < C
+
+int argReductionLargeS(float *r, float *rr, float x)
+{
+    int xe = (int)(as_uint(x) >> 23) - 127;
+    uint xm = 0x00800000U | (as_uint(x) & 0x7fffffU);
+
+    // 224 bits of 2/PI: . A2F9836E 4E441529 FC2757D1 F534DDC0 DB629599 3C439041 FE5163AB
+    const uint b6 = 0xA2F9836EU;
+    const uint b5 = 0x4E441529U;
+    const uint b4 = 0xFC2757D1U;
+    const uint b3 = 0xF534DDC0U;
+    const uint b2 = 0xDB629599U;
+    const uint b1 = 0x3C439041U;
+    const uint b0 = 0xFE5163ABU;
+
+    uint p0, p1, p2, p3, p4, p5, p6, p7, c0, c1;
+
+    FULL_MUL(xm, b0, c0, p0);
+    FULL_MAD(xm, b1, c0, c1, p1);
+    FULL_MAD(xm, b2, c1, c0, p2);
+    FULL_MAD(xm, b3, c0, c1, p3);
+    FULL_MAD(xm, b4, c1, c0, p4);
+    FULL_MAD(xm, b5, c0, c1, p5);
+    FULL_MAD(xm, b6, c1, p7, p6);
+
+    uint fbits = 224 + 23 - xe;
+
+    // shift amount to get 2 lsb of integer part at top 2 bits
+    //   min: 25 (xe=18) max: 134 (xe=127)
+    uint shift = 256U - 2 - fbits;
+
+    // Shift by up to 134/32 = 4 words
+    int c = shift > 31;
+    p7 = c ? p6 : p7;
+    p6 = c ? p5 : p6;
+    p5 = c ? p4 : p5;
+    p4 = c ? p3 : p4;
+    p3 = c ? p2 : p3;
+    p2 = c ? p1 : p2;
+    p1 = c ? p0 : p1;
+    shift -= (-c) & 32;
+
+    c = shift > 31;
+    p7 = c ? p6 : p7;
+    p6 = c ? p5 : p6;
+    p5 = c ? p4 : p5;
+    p4 = c ? p3 : p4;
+    p3 = c ? p2 : p3;
+    p2 = c ? p1 : p2;
+    shift -= (-c) & 32;
+
+    c = shift > 31;
+    p7 = c ? p6 : p7;
+    p6 = c ? p5 : p6;
+    p5 = c ? p4 : p5;
+    p4 = c ? p3 : p4;
+    p3 = c ? p2 : p3;
+    shift -= (-c) & 32;
+
+    c = shift > 31;
+    p7 = c ? p6 : p7;
+    p6 = c ? p5 : p6;
+    p5 = c ? p4 : p5;
+    p4 = c ? p3 : p4;
+    shift -= (-c) & 32;
+
+    // bitalign cannot handle a shift of 32
+    c = shift > 0;
+    shift = 32 - shift;
+    uint t7 = bitalign(p7, p6, shift);
+    uint t6 = bitalign(p6, p5, shift);
+    uint t5 = bitalign(p5, p4, shift);
+    p7 = c ? t7 : p7;
+    p6 = c ? t6 : p6;
+    p5 = c ? t5 : p5;
+
+    // Get 2 lsb of int part and msb of fraction
+    int i = p7 >> 29;
+
+    // Scoot up 2 more bits so only fraction remains
+    p7 = bitalign(p7, p6, 30);
+    p6 = bitalign(p6, p5, 30);
+    p5 = bitalign(p5, p4, 30);
+
+    // Subtract 1 if msb of fraction is 1, i.e. fraction >= 0.5
+    uint flip = i & 1 ? 0xffffffffU : 0U;
+    uint sign = i & 1 ? 0x80000000U : 0U;
+    p7 = p7 ^ flip;
+    p6 = p6 ^ flip;
+    p5 = p5 ^ flip;
+
+    // Find exponent and shift away leading zeroes and hidden bit
+    xe = clz(p7) + 1;
+    shift = 32 - xe;
+    p7 = bitalign(p7, p6, shift);
+    p6 = bitalign(p6, p5, shift);
+
+    // Most significant part of fraction
+    float q1 = as_float(sign | ((127 - xe) << 23) | (p7 >> 9));
+
+    // Shift out bits we captured on q1
+    p7 = bitalign(p7, p6, 32-23);
+
+    // Get 24 more bits of fraction in another float, there are not long strings of zeroes here
+    int xxe = clz(p7) + 1;
+    p7 = bitalign(p7, p6, 32-xxe);
+    float q0 = as_float(sign | ((127 - (xe + 23 + xxe)) << 23) | (p7 >> 9));
+
+    // At this point, the fraction q1 + q0 is correct to at least 48 bits
+    // Now we need to multiply the fraction by pi/2
+    // This loses us about 4 bits
+    // pi/2 = C90 FDA A22 168 C23 4C4
+
+    const float pio2h = (float)0xc90fda / 0x1.0p+23f;
+    const float pio2hh = (float)0xc90 / 0x1.0p+11f;
+    const float pio2ht = (float)0xfda / 0x1.0p+23f;
+    const float pio2t = (float)0xa22168 / 0x1.0p+47f;
+
+    float rh, rt;
+
+    if (HAVE_HW_FMA32()) {
+        rh = q1 * pio2h;
+        rt = fma(q0, pio2h, fma(q1, pio2t, fma(q1, pio2h, -rh)));
+    } else {
+        float q1h = as_float(as_uint(q1) & 0xfffff000);
+        float q1t = q1 - q1h;
+        rh = q1 * pio2h;
+        rt = mad(q1t, pio2ht, mad(q1t, pio2hh, mad(q1h, pio2ht, mad(q1h, pio2hh, -rh))));
+        rt = mad(q0, pio2h, mad(q1, pio2t, rt));
+    }
+
+    float t = rh + rt;
+    rt = rt - (t - rh);
+
+    *r = t;
+    *rr = rt;
+    return ((i >> 1) + (i & 1)) & 0x3;
+}
+
+int argReductionS(float *r, float *rr, float x)
+{
+    if (x < 0x1.0p+23f)
+        return argReductionSmallS(r, rr, x);
+    else
+        return argReductionLargeS(r, rr, x);
+}
+
diff --git a/libclc/generic/lib/math/sincos_helpers.h b/libclc/generic/lib/math/sincos_helpers.h
new file mode 100644
index 000000000000..f89c19f6874c
--- /dev/null
+++ b/libclc/generic/lib/math/sincos_helpers.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+float sinf_piby4(float x, float y);
+float cosf_piby4(float x, float y);
+int argReductionS(float *r, float *rr, float x);
diff --git a/libclc/generic/lib/math/tables.cl b/libclc/generic/lib/math/tables.cl
new file mode 100644
index 000000000000..b5345a2cff1b
--- /dev/null
+++ b/libclc/generic/lib/math/tables.cl
@@ -0,0 +1,366 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "tables.h"
+
+DECLARE_TABLE(float2, LOGE_TBL, 129) = {
+    (float2)(0x0.000000p+0f, 0x0.000000p+0f),
+    (float2)(0x1.fe0000p-8f, 0x1.535882p-23f),
+    (float2)(0x1.fc0000p-7f, 0x1.5161f8p-20f),
+    (float2)(0x1.7b8000p-6f, 0x1.1b07d4p-18f),
+    (float2)(0x1.f82000p-6f, 0x1.361cf0p-19f),
+    (float2)(0x1.39e000p-5f, 0x1.0f73fcp-18f),
+    (float2)(0x1.774000p-5f, 0x1.63d8cap-19f),
+    (float2)(0x1.b42000p-5f, 0x1.bae232p-18f),
+    (float2)(0x1.f0a000p-5f, 0x1.86008ap-20f),
+    (float2)(0x1.164000p-4f, 0x1.36eea2p-16f),
+    (float2)(0x1.340000p-4f, 0x1.d7961ap-16f),
+    (float2)(0x1.51a000p-4f, 0x1.073f06p-16f),
+    (float2)(0x1.6f0000p-4f, 0x1.a515cap-17f),
+    (float2)(0x1.8c2000p-4f, 0x1.45d630p-16f),
+    (float2)(0x1.a92000p-4f, 0x1.b4e92ap-18f),
+    (float2)(0x1.c5e000p-4f, 0x1.523d6ep-18f),
+    (float2)(0x1.e26000p-4f, 0x1.076e2ap-16f),
+    (float2)(0x1.fec000p-4f, 0x1.2263b6p-17f),
+    (float2)(0x1.0d6000p-3f, 0x1.7e7cd0p-15f),
+    (float2)(0x1.1b6000p-3f, 0x1.2ad52ep-15f),
+    (float2)(0x1.294000p-3f, 0x1.52f81ep-15f),
+    (float2)(0x1.370000p-3f, 0x1.fc201ep-15f),
+    (float2)(0x1.44c000p-3f, 0x1.2b6ccap-15f),
+    (float2)(0x1.526000p-3f, 0x1.cbc742p-16f),
+    (float2)(0x1.5fe000p-3f, 0x1.3070a6p-15f),
+    (float2)(0x1.6d6000p-3f, 0x1.fce33ap-20f),
+    (float2)(0x1.7aa000p-3f, 0x1.890210p-15f),
+    (float2)(0x1.87e000p-3f, 0x1.a06520p-15f),
+    (float2)(0x1.952000p-3f, 0x1.6a73d0p-17f),
+    (float2)(0x1.a22000p-3f, 0x1.bc1fe2p-15f),
+    (float2)(0x1.af2000p-3f, 0x1.c94e80p-15f),
+    (float2)(0x1.bc2000p-3f, 0x1.0ce85ap-16f),
+    (float2)(0x1.c8e000p-3f, 0x1.f7c79ap-15f),
+    (float2)(0x1.d5c000p-3f, 0x1.0b5a7cp-18f),
+    (float2)(0x1.e26000p-3f, 0x1.076e2ap-15f),
+    (float2)(0x1.ef0000p-3f, 0x1.5b97b8p-16f),
+    (float2)(0x1.fb8000p-3f, 0x1.186d5ep-15f),
+    (float2)(0x1.040000p-2f, 0x1.2ca5a6p-17f),
+    (float2)(0x1.0a2000p-2f, 0x1.24e272p-14f),
+    (float2)(0x1.104000p-2f, 0x1.8bf9aep-14f),
+    (float2)(0x1.166000p-2f, 0x1.5cabaap-14f),
+    (float2)(0x1.1c8000p-2f, 0x1.3182d2p-15f),
+    (float2)(0x1.228000p-2f, 0x1.41fbcep-14f),
+    (float2)(0x1.288000p-2f, 0x1.5a13dep-14f),
+    (float2)(0x1.2e8000p-2f, 0x1.c575c2p-15f),
+    (float2)(0x1.346000p-2f, 0x1.dd9a98p-14f),
+    (float2)(0x1.3a6000p-2f, 0x1.3155a4p-16f),
+    (float2)(0x1.404000p-2f, 0x1.843434p-17f),
+    (float2)(0x1.460000p-2f, 0x1.8bc21cp-14f),
+    (float2)(0x1.4be000p-2f, 0x1.7e55dcp-16f),
+    (float2)(0x1.51a000p-2f, 0x1.5b0e5ap-15f),
+    (float2)(0x1.576000p-2f, 0x1.dc5d14p-16f),
+    (float2)(0x1.5d0000p-2f, 0x1.bdbf58p-14f),
+    (float2)(0x1.62c000p-2f, 0x1.05e572p-15f),
+    (float2)(0x1.686000p-2f, 0x1.903d36p-15f),
+    (float2)(0x1.6e0000p-2f, 0x1.1d5456p-15f),
+    (float2)(0x1.738000p-2f, 0x1.d7f6bap-14f),
+    (float2)(0x1.792000p-2f, 0x1.4abfbap-15f),
+    (float2)(0x1.7ea000p-2f, 0x1.f07704p-15f),
+    (float2)(0x1.842000p-2f, 0x1.a3b43cp-15f),
+    (float2)(0x1.89a000p-2f, 0x1.9c360ap-17f),
+    (float2)(0x1.8f0000p-2f, 0x1.1e8736p-14f),
+    (float2)(0x1.946000p-2f, 0x1.941c20p-14f),
+    (float2)(0x1.99c000p-2f, 0x1.958116p-14f),
+    (float2)(0x1.9f2000p-2f, 0x1.23ecbep-14f),
+    (float2)(0x1.a48000p-2f, 0x1.024396p-16f),
+    (float2)(0x1.a9c000p-2f, 0x1.d93534p-15f),
+    (float2)(0x1.af0000p-2f, 0x1.293246p-14f),
+    (float2)(0x1.b44000p-2f, 0x1.eef798p-15f),
+    (float2)(0x1.b98000p-2f, 0x1.625a4cp-16f),
+    (float2)(0x1.bea000p-2f, 0x1.4d9da6p-14f),
+    (float2)(0x1.c3c000p-2f, 0x1.d7a7ccp-14f),
+    (float2)(0x1.c8e000p-2f, 0x1.f7c79ap-14f),
+    (float2)(0x1.ce0000p-2f, 0x1.af0b84p-14f),
+    (float2)(0x1.d32000p-2f, 0x1.fcfc00p-15f),
+    (float2)(0x1.d82000p-2f, 0x1.e7258ap-14f),
+    (float2)(0x1.dd4000p-2f, 0x1.a81306p-16f),
+    (float2)(0x1.e24000p-2f, 0x1.1034f8p-15f),
+    (float2)(0x1.e74000p-2f, 0x1.09875ap-16f),
+    (float2)(0x1.ec2000p-2f, 0x1.99d246p-14f),
+    (float2)(0x1.f12000p-2f, 0x1.1ebf5ep-15f),
+    (float2)(0x1.f60000p-2f, 0x1.23fa70p-14f),
+    (float2)(0x1.fae000p-2f, 0x1.588f78p-14f),
+    (float2)(0x1.ffc000p-2f, 0x1.2e0856p-14f),
+    (float2)(0x1.024000p-1f, 0x1.52a5a4p-13f),
+    (float2)(0x1.04a000p-1f, 0x1.df9da8p-13f),
+    (float2)(0x1.072000p-1f, 0x1.f2e0e6p-16f),
+    (float2)(0x1.098000p-1f, 0x1.bd3d5cp-15f),
+    (float2)(0x1.0be000p-1f, 0x1.cb9094p-15f),
+    (float2)(0x1.0e4000p-1f, 0x1.261746p-15f),
+    (float2)(0x1.108000p-1f, 0x1.f39e2cp-13f),
+    (float2)(0x1.12e000p-1f, 0x1.719592p-13f),
+    (float2)(0x1.154000p-1f, 0x1.87a5e8p-14f),
+    (float2)(0x1.178000p-1f, 0x1.eabbd8p-13f),
+    (float2)(0x1.19e000p-1f, 0x1.cd68cep-14f),
+    (float2)(0x1.1c2000p-1f, 0x1.b81f70p-13f),
+    (float2)(0x1.1e8000p-1f, 0x1.7d79c0p-15f),
+    (float2)(0x1.20c000p-1f, 0x1.b9a324p-14f),
+    (float2)(0x1.230000p-1f, 0x1.30d7bep-13f),
+    (float2)(0x1.254000p-1f, 0x1.5bce98p-13f),
+    (float2)(0x1.278000p-1f, 0x1.5e1288p-13f),
+    (float2)(0x1.29c000p-1f, 0x1.37fec2p-13f),
+    (float2)(0x1.2c0000p-1f, 0x1.d3da88p-14f),
+    (float2)(0x1.2e4000p-1f, 0x1.d0db90p-15f),
+    (float2)(0x1.306000p-1f, 0x1.d7334ep-13f),
+    (float2)(0x1.32a000p-1f, 0x1.133912p-13f),
+    (float2)(0x1.34e000p-1f, 0x1.44ece6p-16f),
+    (float2)(0x1.370000p-1f, 0x1.17b546p-13f),
+    (float2)(0x1.392000p-1f, 0x1.e0d356p-13f),
+    (float2)(0x1.3b6000p-1f, 0x1.0893fep-14f),
+    (float2)(0x1.3d8000p-1f, 0x1.026a70p-13f),
+    (float2)(0x1.3fa000p-1f, 0x1.5b84d0p-13f),
+    (float2)(0x1.41c000p-1f, 0x1.8fe846p-13f),
+    (float2)(0x1.43e000p-1f, 0x1.9fe2f8p-13f),
+    (float2)(0x1.460000p-1f, 0x1.8bc21cp-13f),
+    (float2)(0x1.482000p-1f, 0x1.53d1eap-13f),
+    (float2)(0x1.4a4000p-1f, 0x1.f0bb60p-14f),
+    (float2)(0x1.4c6000p-1f, 0x1.e6bf32p-15f),
+    (float2)(0x1.4e6000p-1f, 0x1.d811b6p-13f),
+    (float2)(0x1.508000p-1f, 0x1.13cc00p-13f),
+    (float2)(0x1.52a000p-1f, 0x1.6932dep-16f),
+    (float2)(0x1.54a000p-1f, 0x1.246798p-13f),
+    (float2)(0x1.56a000p-1f, 0x1.f9d5b2p-13f),
+    (float2)(0x1.58c000p-1f, 0x1.5b6b9ap-14f),
+    (float2)(0x1.5ac000p-1f, 0x1.404c34p-13f),
+    (float2)(0x1.5cc000p-1f, 0x1.b1dc6cp-13f),
+    (float2)(0x1.5ee000p-1f, 0x1.54920ap-20f),
+    (float2)(0x1.60e000p-1f, 0x1.97a23cp-16f),
+    (float2)(0x1.62e000p-1f, 0x1.0bfbe8p-15f),
+};
+
+DECLARE_TABLE(float, LOG_INV_TBL, 129) = {
+    0x1.000000p+1f,
+    0x1.fc07f0p+0f,
+    0x1.f81f82p+0f,
+    0x1.f4465ap+0f,
+    0x1.f07c20p+0f,
+    0x1.ecc07cp+0f,
+    0x1.e9131ap+0f,
+    0x1.e573acp+0f,
+    0x1.e1e1e2p+0f,
+    0x1.de5d6ep+0f,
+    0x1.dae608p+0f,
+    0x1.d77b66p+0f,
+    0x1.d41d42p+0f,
+    0x1.d0cb58p+0f,
+    0x1.cd8568p+0f,
+    0x1.ca4b30p+0f,
+    0x1.c71c72p+0f,
+    0x1.c3f8f0p+0f,
+    0x1.c0e070p+0f,
+    0x1.bdd2b8p+0f,
+    0x1.bacf92p+0f,
+    0x1.b7d6c4p+0f,
+    0x1.b4e81cp+0f,
+    0x1.b20364p+0f,
+    0x1.af286cp+0f,
+    0x1.ac5702p+0f,
+    0x1.a98ef6p+0f,
+    0x1.a6d01ap+0f,
+    0x1.a41a42p+0f,
+    0x1.a16d40p+0f,
+    0x1.9ec8eap+0f,
+    0x1.9c2d14p+0f,
+    0x1.99999ap+0f,
+    0x1.970e50p+0f,
+    0x1.948b10p+0f,
+    0x1.920fb4p+0f,
+    0x1.8f9c18p+0f,
+    0x1.8d3018p+0f,
+    0x1.8acb90p+0f,
+    0x1.886e60p+0f,
+    0x1.861862p+0f,
+    0x1.83c978p+0f,
+    0x1.818182p+0f,
+    0x1.7f4060p+0f,
+    0x1.7d05f4p+0f,
+    0x1.7ad220p+0f,
+    0x1.78a4c8p+0f,
+    0x1.767dcep+0f,
+    0x1.745d18p+0f,
+    0x1.724288p+0f,
+    0x1.702e06p+0f,
+    0x1.6e1f76p+0f,
+    0x1.6c16c2p+0f,
+    0x1.6a13cep+0f,
+    0x1.681682p+0f,
+    0x1.661ec6p+0f,
+    0x1.642c86p+0f,
+    0x1.623fa8p+0f,
+    0x1.605816p+0f,
+    0x1.5e75bcp+0f,
+    0x1.5c9882p+0f,
+    0x1.5ac056p+0f,
+    0x1.58ed24p+0f,
+    0x1.571ed4p+0f,
+    0x1.555556p+0f,
+    0x1.539094p+0f,
+    0x1.51d07ep+0f,
+    0x1.501502p+0f,
+    0x1.4e5e0ap+0f,
+    0x1.4cab88p+0f,
+    0x1.4afd6ap+0f,
+    0x1.49539ep+0f,
+    0x1.47ae14p+0f,
+    0x1.460cbcp+0f,
+    0x1.446f86p+0f,
+    0x1.42d662p+0f,
+    0x1.414142p+0f,
+    0x1.3fb014p+0f,
+    0x1.3e22ccp+0f,
+    0x1.3c995ap+0f,
+    0x1.3b13b2p+0f,
+    0x1.3991c2p+0f,
+    0x1.381382p+0f,
+    0x1.3698e0p+0f,
+    0x1.3521d0p+0f,
+    0x1.33ae46p+0f,
+    0x1.323e34p+0f,
+    0x1.30d190p+0f,
+    0x1.2f684cp+0f,
+    0x1.2e025cp+0f,
+    0x1.2c9fb4p+0f,
+    0x1.2b404ap+0f,
+    0x1.29e412p+0f,
+    0x1.288b02p+0f,
+    0x1.27350cp+0f,
+    0x1.25e228p+0f,
+    0x1.24924ap+0f,
+    0x1.234568p+0f,
+    0x1.21fb78p+0f,
+    0x1.20b470p+0f,
+    0x1.1f7048p+0f,
+    0x1.1e2ef4p+0f,
+    0x1.1cf06ap+0f,
+    0x1.1bb4a4p+0f,
+    0x1.1a7b96p+0f,
+    0x1.194538p+0f,
+    0x1.181182p+0f,
+    0x1.16e068p+0f,
+    0x1.15b1e6p+0f,
+    0x1.1485f0p+0f,
+    0x1.135c82p+0f,
+    0x1.12358ep+0f,
+    0x1.111112p+0f,
+    0x1.0fef02p+0f,
+    0x1.0ecf56p+0f,
+    0x1.0db20ap+0f,
+    0x1.0c9714p+0f,
+    0x1.0b7e6ep+0f,
+    0x1.0a6810p+0f,
+    0x1.0953f4p+0f,
+    0x1.084210p+0f,
+    0x1.073260p+0f,
+    0x1.0624dep+0f,
+    0x1.051980p+0f,
+    0x1.041042p+0f,
+    0x1.03091cp+0f,
+    0x1.020408p+0f,
+    0x1.010102p+0f,
+    0x1.000000p+0f,
+};
+
+TABLE_FUNCTION(float2, LOGE_TBL, loge_tbl);
+TABLE_FUNCTION(float, LOG_INV_TBL, log_inv_tbl);
+
+#ifdef cl_khr_fp64
+
+DECLARE_TABLE(double2, LN_TBL, 65) = {
+    (double2)(0x0.0000000000000p+0, 0x0.0000000000000p+0),
+    (double2)(0x1.fc0a800000000p-7, 0x1.61f807c79f3dbp-28),
+    (double2)(0x1.f829800000000p-6, 0x1.873c1980267c8p-25),
+    (double2)(0x1.7745800000000p-5, 0x1.ec65b9f88c69ep-26),
+    (double2)(0x1.f0a3000000000p-5, 0x1.8022c54cc2f99p-26),
+    (double2)(0x1.341d700000000p-4, 0x1.2c37a3a125330p-25),
+    (double2)(0x1.6f0d200000000p-4, 0x1.15cad69737c93p-25),
+    (double2)(0x1.a926d00000000p-4, 0x1.d256ab1b285e9p-27),
+    (double2)(0x1.e270700000000p-4, 0x1.b8abcb97a7aa2p-26),
+    (double2)(0x1.0d77e00000000p-3, 0x1.f34239659a5dcp-25),
+    (double2)(0x1.2955280000000p-3, 0x1.e07fd48d30177p-25),
+    (double2)(0x1.44d2b00000000p-3, 0x1.b32df4799f4f6p-25),
+    (double2)(0x1.5ff3000000000p-3, 0x1.c29e4f4f21cf8p-25),
+    (double2)(0x1.7ab8900000000p-3, 0x1.086c848df1b59p-30),
+    (double2)(0x1.9525a80000000p-3, 0x1.cf456b4764130p-27),
+    (double2)(0x1.af3c900000000p-3, 0x1.3a02ffcb63398p-25),
+    (double2)(0x1.c8ff780000000p-3, 0x1.1e6a6886b0976p-25),
+    (double2)(0x1.e270700000000p-3, 0x1.b8abcb97a7aa2p-25),
+    (double2)(0x1.fb91800000000p-3, 0x1.b578f8aa35552p-25),
+    (double2)(0x1.0a324c0000000p-2, 0x1.139c871afb9fcp-25),
+    (double2)(0x1.1675c80000000p-2, 0x1.5d5d30701ce64p-25),
+    (double2)(0x1.22941c0000000p-2, 0x1.de7bcb2d12142p-25),
+    (double2)(0x1.2e8e280000000p-2, 0x1.d708e984e1664p-25),
+    (double2)(0x1.3a64c40000000p-2, 0x1.56945e9c72f36p-26),
+    (double2)(0x1.4618bc0000000p-2, 0x1.0e2f613e85bdap-29),
+    (double2)(0x1.51aad80000000p-2, 0x1.cb7e0b42724f6p-28),
+    (double2)(0x1.5d1bd80000000p-2, 0x1.fac04e52846c7p-25),
+    (double2)(0x1.686c800000000p-2, 0x1.e9b14aec442bep-26),
+    (double2)(0x1.739d7c0000000p-2, 0x1.b5de8034e7126p-25),
+    (double2)(0x1.7eaf800000000p-2, 0x1.dc157e1b259d3p-25),
+    (double2)(0x1.89a3380000000p-2, 0x1.b05096ad69c62p-28),
+    (double2)(0x1.9479400000000p-2, 0x1.c2116faba4cddp-26),
+    (double2)(0x1.9f323c0000000p-2, 0x1.65fcc25f95b47p-25),
+    (double2)(0x1.a9cec80000000p-2, 0x1.a9a08498d4850p-26),
+    (double2)(0x1.b44f740000000p-2, 0x1.de647b1465f77p-25),
+    (double2)(0x1.beb4d80000000p-2, 0x1.da71b7bf7861dp-26),
+    (double2)(0x1.c8ff7c0000000p-2, 0x1.e6a6886b09760p-28),
+    (double2)(0x1.d32fe40000000p-2, 0x1.f0075eab0ef64p-25),
+    (double2)(0x1.dd46a00000000p-2, 0x1.3071282fb989bp-28),
+    (double2)(0x1.e744240000000p-2, 0x1.0eb43c3f1bed2p-25),
+    (double2)(0x1.f128f40000000p-2, 0x1.faf06ecb35c84p-26),
+    (double2)(0x1.faf5880000000p-2, 0x1.ef1e63db35f68p-27),
+    (double2)(0x1.02552a0000000p-1, 0x1.69743fb1a71a5p-27),
+    (double2)(0x1.0723e40000000p-1, 0x1.c1cdf404e5796p-25),
+    (double2)(0x1.0be72e0000000p-1, 0x1.094aa0ada625ep-27),
+    (double2)(0x1.109f380000000p-1, 0x1.e2d4c96fde3ecp-25),
+    (double2)(0x1.154c3c0000000p-1, 0x1.2f4d5e9a98f34p-25),
+    (double2)(0x1.19ee6a0000000p-1, 0x1.467c96ecc5cbep-25),
+    (double2)(0x1.1e85f40000000p-1, 0x1.e7040d03dec5ap-25),
+    (double2)(0x1.23130c0000000p-1, 0x1.7bebf4282de36p-25),
+    (double2)(0x1.2795e00000000p-1, 0x1.289b11aeb783fp-25),
+    (double2)(0x1.2c0e9e0000000p-1, 0x1.a891d1772f538p-26),
+    (double2)(0x1.307d720000000p-1, 0x1.34f10be1fb591p-25),
+    (double2)(0x1.34e2880000000p-1, 0x1.d9ce1d316eb93p-25),
+    (double2)(0x1.393e0c0000000p-1, 0x1.3562a19a9c442p-25),
+    (double2)(0x1.3d90260000000p-1, 0x1.4e2adf548084cp-26),
+    (double2)(0x1.41d8fe0000000p-1, 0x1.08ce55cc8c97ap-26),
+    (double2)(0x1.4618bc0000000p-1, 0x1.0e2f613e85bdap-28),
+    (double2)(0x1.4a4f840000000p-1, 0x1.db03ebb0227bfp-25),
+    (double2)(0x1.4e7d800000000p-1, 0x1.1b75bb09cb098p-25),
+    (double2)(0x1.52a2d20000000p-1, 0x1.96f16abb9df22p-27),
+    (double2)(0x1.56bf9c0000000p-1, 0x1.5b3f399411c62p-25),
+    (double2)(0x1.5ad4040000000p-1, 0x1.86b3e59f65355p-26),
+    (double2)(0x1.5ee02a0000000p-1, 0x1.2482ceae1ac12p-26),
+    (double2)(0x1.62e42e0000000p-1, 0x1.efa39ef35793cp-25),
+};
+
+TABLE_FUNCTION(double2, LN_TBL, ln_tbl);
+
+#endif // cl_khr_fp64
diff --git a/libclc/generic/lib/math/tables.h b/libclc/generic/lib/math/tables.h
new file mode 100644
index 000000000000..925544064a50
--- /dev/null
+++ b/libclc/generic/lib/math/tables.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define TABLE_SPACE __constant
+
+#define TABLE_MANGLE(NAME) __clc_##NAME
+
+#define DECLARE_TABLE(TYPE,NAME,LENGTH) \
+    TABLE_SPACE TYPE NAME [ LENGTH ]
+
+#define TABLE_FUNCTION(TYPE,TABLE,NAME) \
+    TYPE TABLE_MANGLE(NAME)(size_t idx) { \
+        return TABLE[idx]; \
+    }
+
+#define TABLE_FUNCTION_DECL(TYPE, NAME) \
+    TYPE TABLE_MANGLE(NAME)(size_t idx);
+
+#define USE_TABLE(NAME, IDX) \
+    TABLE_MANGLE(NAME)(IDX)
+
+TABLE_FUNCTION_DECL(float2, loge_tbl);
+TABLE_FUNCTION_DECL(float, log_inv_tbl);
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+TABLE_FUNCTION_DECL(double2, ln_tbl);
+
+#endif // cl_khr_fp64
diff --git a/libclc/generic/lib/math/tan.cl b/libclc/generic/lib/math/tan.cl
new file mode 100644
index 000000000000..a447999ea8b9
--- /dev/null
+++ b/libclc/generic/lib/math/tan.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <tan.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/math/tan.inc b/libclc/generic/lib/math/tan.inc
new file mode 100644
index 000000000000..8d9d9fe24786
--- /dev/null
+++ b/libclc/generic/lib/math/tan.inc
@@ -0,0 +1,8 @@
+/*
+ * Note: tan(x) = sin(x)/cos(x) also, but the final assembly ends up being
+ *       twice as long for R600 (maybe for others as well).
+ */
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE tan(__CLC_GENTYPE x) {
+  __CLC_GENTYPE sinx = sin(x);
+  return sinx / sqrt( (__CLC_GENTYPE) 1.0 - (sinx*sinx) );
+}
diff --git a/libclc/generic/lib/relational/all.cl b/libclc/generic/lib/relational/all.cl
new file mode 100644
index 000000000000..607d7a9c68c4
--- /dev/null
+++ b/libclc/generic/lib/relational/all.cl
@@ -0,0 +1,29 @@
+#include <clc/clc.h>
+
+#define _CLC_ALL(v) (((v) >> ((sizeof(v) * 8) - 1)) & 0x1)
+#define _CLC_ALL2(v) (_CLC_ALL((v).s0) & _CLC_ALL((v).s1))
+#define _CLC_ALL3(v) (_CLC_ALL2((v)) & _CLC_ALL((v).s2))
+#define _CLC_ALL4(v) (_CLC_ALL3((v)) & _CLC_ALL((v).s3))
+#define _CLC_ALL8(v) (_CLC_ALL4((v)) & _CLC_ALL((v).s4) & _CLC_ALL((v).s5) \
+                                     & _CLC_ALL((v).s6) & _CLC_ALL((v).s7))
+#define _CLC_ALL16(v) (_CLC_ALL8((v)) & _CLC_ALL((v).s8) & _CLC_ALL((v).s9) \
+                                      & _CLC_ALL((v).sA) & _CLC_ALL((v).sB) \
+                                      & _CLC_ALL((v).sC) & _CLC_ALL((v).sD) \
+                                      & _CLC_ALL((v).sE) & _CLC_ALL((v).sf))
+
+
+#define ALL_ID(TYPE) \
+  _CLC_OVERLOAD _CLC_DEF int all(TYPE v)
+
+#define ALL_VECTORIZE(TYPE) \
+  ALL_ID(TYPE) { return _CLC_ALL(v); } \
+  ALL_ID(TYPE##2) { return _CLC_ALL2(v); } \
+  ALL_ID(TYPE##3) { return _CLC_ALL3(v); } \
+  ALL_ID(TYPE##4) { return _CLC_ALL4(v); } \
+  ALL_ID(TYPE##8) { return _CLC_ALL8(v); } \
+  ALL_ID(TYPE##16) { return _CLC_ALL16(v); }
+
+ALL_VECTORIZE(char)
+ALL_VECTORIZE(short)
+ALL_VECTORIZE(int)
+ALL_VECTORIZE(long)
diff --git a/libclc/generic/lib/relational/any.cl b/libclc/generic/lib/relational/any.cl
new file mode 100644
index 000000000000..4d372102021b
--- /dev/null
+++ b/libclc/generic/lib/relational/any.cl
@@ -0,0 +1,30 @@
+#include <clc/clc.h>
+
+#define _CLC_ANY(v) (((v) >> ((sizeof(v) * 8) - 1)) & 0x1)
+#define _CLC_ANY2(v) (_CLC_ANY((v).s0) | _CLC_ANY((v).s1))
+#define _CLC_ANY3(v) (_CLC_ANY2((v)) | _CLC_ANY((v).s2))
+#define _CLC_ANY4(v) (_CLC_ANY3((v)) | _CLC_ANY((v).s3))
+#define _CLC_ANY8(v) (_CLC_ANY4((v)) | _CLC_ANY((v).s4) | _CLC_ANY((v).s5) \
+                                     | _CLC_ANY((v).s6) | _CLC_ANY((v).s7))
+#define _CLC_ANY16(v) (_CLC_ANY8((v)) | _CLC_ANY((v).s8) | _CLC_ANY((v).s9) \
+                                      | _CLC_ANY((v).sA) | _CLC_ANY((v).sB) \
+                                      | _CLC_ANY((v).sC) | _CLC_ANY((v).sD) \
+                                      | _CLC_ANY((v).sE) | _CLC_ANY((v).sf))
+
+
+#define ANY_ID(TYPE) \
+  _CLC_OVERLOAD _CLC_DEF int any(TYPE v)
+
+#define ANY_VECTORIZE(TYPE) \
+  ANY_ID(TYPE) { return _CLC_ANY(v); } \
+  ANY_ID(TYPE##2) { return _CLC_ANY2(v); } \
+  ANY_ID(TYPE##3) { return _CLC_ANY3(v); } \
+  ANY_ID(TYPE##4) { return _CLC_ANY4(v); } \
+  ANY_ID(TYPE##8) { return _CLC_ANY8(v); } \
+  ANY_ID(TYPE##16) { return _CLC_ANY16(v); }
+
+ANY_VECTORIZE(char)
+ANY_VECTORIZE(short)
+ANY_VECTORIZE(int)
+ANY_VECTORIZE(long)
+
diff --git a/libclc/generic/lib/relational/isequal.cl b/libclc/generic/lib/relational/isequal.cl
new file mode 100644
index 000000000000..9d79ba6b3dbe
--- /dev/null
+++ b/libclc/generic/lib/relational/isequal.cl
@@ -0,0 +1,30 @@
+#include <clc/clc.h>
+
+#define _CLC_DEFINE_ISEQUAL(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \
+  return (x == y); \
+} \
+
+_CLC_DEFINE_ISEQUAL(int, isequal, float, float)
+_CLC_DEFINE_ISEQUAL(int2, isequal, float2, float2)
+_CLC_DEFINE_ISEQUAL(int3, isequal, float3, float3)
+_CLC_DEFINE_ISEQUAL(int4, isequal, float4, float4)
+_CLC_DEFINE_ISEQUAL(int8, isequal, float8, float8)
+_CLC_DEFINE_ISEQUAL(int16, isequal, float16, float16)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isequal(double) returns an int, but the vector versions
+// return long.
+_CLC_DEFINE_ISEQUAL(int, isequal, double, double)
+_CLC_DEFINE_ISEQUAL(long2, isequal, double2, double2)
+_CLC_DEFINE_ISEQUAL(long3, isequal, double3, double3)
+_CLC_DEFINE_ISEQUAL(long4, isequal, double4, double4)
+_CLC_DEFINE_ISEQUAL(long8, isequal, double8, double8)
+_CLC_DEFINE_ISEQUAL(long16, isequal, double16, double16)
+
+#endif
+
+#undef _CLC_DEFINE_ISEQUAL
+\ No newline at end of file
diff --git a/libclc/generic/lib/relational/isfinite.cl b/libclc/generic/lib/relational/isfinite.cl
new file mode 100644
index 000000000000..d0658c01eacb
--- /dev/null
+++ b/libclc/generic/lib/relational/isfinite.cl
@@ -0,0 +1,18 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+_CLC_DEFINE_RELATIONAL_UNARY(int, isfinite, __builtin_isfinite, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isfinite(double) returns an int, but the vector versions
+// return long.
+_CLC_DEF _CLC_OVERLOAD int isfinite(double x) {
+  return __builtin_isfinite(x);
+}
+
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isfinite, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isgreater.cl b/libclc/generic/lib/relational/isgreater.cl
new file mode 100644
index 000000000000..79456e56d517
--- /dev/null
+++ b/libclc/generic/lib/relational/isgreater.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_isgreater with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, isgreater, __builtin_isgreater, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isgreater(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int isgreater(double x, double y){
+	return __builtin_isgreater(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isgreater, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isgreaterequal.cl b/libclc/generic/lib/relational/isgreaterequal.cl
new file mode 100644
index 000000000000..2d5ebe5770c7
--- /dev/null
+++ b/libclc/generic/lib/relational/isgreaterequal.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_isgreaterequal with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, isgreaterequal, __builtin_isgreaterequal, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isgreaterequal(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int isgreaterequal(double x, double y){
+	return __builtin_isgreaterequal(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isgreaterequal, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isinf.cl b/libclc/generic/lib/relational/isinf.cl
new file mode 100644
index 000000000000..1452d919cb86
--- /dev/null
+++ b/libclc/generic/lib/relational/isinf.cl
@@ -0,0 +1,18 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+_CLC_DEFINE_RELATIONAL_UNARY(int, isinf, __builtin_isinf, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isinf(double) returns an int, but the vector versions
+// return long.
+_CLC_DEF _CLC_OVERLOAD int isinf(double x) {
+  return __builtin_isinf(x);
+}
+
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isinf, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isless.cl b/libclc/generic/lib/relational/isless.cl
new file mode 100644
index 000000000000..56a3e1329b48
--- /dev/null
+++ b/libclc/generic/lib/relational/isless.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_isless with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, isless, __builtin_isless, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isless(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int isless(double x, double y){
+	return __builtin_isless(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isless, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/islessequal.cl b/libclc/generic/lib/relational/islessequal.cl
new file mode 100644
index 000000000000..259c307da453
--- /dev/null
+++ b/libclc/generic/lib/relational/islessequal.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_islessequal with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, islessequal, __builtin_islessequal, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of islessequal(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int islessequal(double x, double y){
+	return __builtin_islessequal(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, islessequal, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/islessgreater.cl b/libclc/generic/lib/relational/islessgreater.cl
new file mode 100644
index 000000000000..fc029f35b73a
--- /dev/null
+++ b/libclc/generic/lib/relational/islessgreater.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_islessgreater with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, islessgreater, __builtin_islessgreater, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of islessgreater(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int islessgreater(double x, double y){
+	return __builtin_islessgreater(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, islessgreater, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isnan.cl b/libclc/generic/lib/relational/isnan.cl
new file mode 100644
index 000000000000..f82dc5d59da5
--- /dev/null
+++ b/libclc/generic/lib/relational/isnan.cl
@@ -0,0 +1,18 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+_CLC_DEFINE_RELATIONAL_UNARY(int, isnan, __builtin_isnan, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isnan(double) returns an int, but the vector versions
+// return long.
+_CLC_DEF _CLC_OVERLOAD int isnan(double x) {
+  return __builtin_isnan(x);
+}
+
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isnan, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isnormal.cl b/libclc/generic/lib/relational/isnormal.cl
new file mode 100644
index 000000000000..2e6b42d00178
--- /dev/null
+++ b/libclc/generic/lib/relational/isnormal.cl
@@ -0,0 +1,18 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+_CLC_DEFINE_RELATIONAL_UNARY(int, isnormal, __builtin_isnormal, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isnormal(double) returns an int, but the vector versions
+// return long.
+_CLC_DEF _CLC_OVERLOAD int isnormal(double x) {
+  return __builtin_isnormal(x);
+}
+
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, isnormal, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/isnotequal.cl b/libclc/generic/lib/relational/isnotequal.cl
new file mode 100644
index 000000000000..787fd8d53c20
--- /dev/null
+++ b/libclc/generic/lib/relational/isnotequal.cl
@@ -0,0 +1,23 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+#define _CLC_DEFINE_ISNOTEQUAL(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \
+  return (x != y); \
+} \
+
+_CLC_DEFINE_ISNOTEQUAL(int, isnotequal, float, float)
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(int, isnotequal, float, float)
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isnotequal(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEFINE_ISNOTEQUAL(int, isnotequal, double, double)
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isnotequal, double, double)
+
+#endif
+
+#undef _CLC_DEFINE_ISNOTEQUAL
diff --git a/libclc/generic/lib/relational/isordered.cl b/libclc/generic/lib/relational/isordered.cl
new file mode 100644
index 000000000000..ebda2eb72ba2
--- /dev/null
+++ b/libclc/generic/lib/relational/isordered.cl
@@ -0,0 +1,23 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+#define _CLC_DEFINE_ISORDERED(RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \
+  return isequal(x, x) && isequal(y, y); \
+} \
+
+_CLC_DEFINE_ISORDERED(int, isordered, float, float)
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(int, isordered, float, float)
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isordered(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEFINE_ISORDERED(int, isordered, double, double)
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isordered, double, double)
+
+#endif
+
+#undef _CLC_DEFINE_ISORDERED
diff --git a/libclc/generic/lib/relational/isunordered.cl b/libclc/generic/lib/relational/isunordered.cl
new file mode 100644
index 000000000000..8bc5e3fa7f6d
--- /dev/null
+++ b/libclc/generic/lib/relational/isunordered.cl
@@ -0,0 +1,22 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+//Note: It would be nice to use __builtin_isunordered with vector inputs, but it seems to only take scalar values as
+//      input, which will produce incorrect output for vector input types.
+
+_CLC_DEFINE_RELATIONAL_BINARY(int, isunordered, __builtin_isunordered, float, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of isunordered(double, double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int isunordered(double x, double y){
+	return __builtin_isunordered(x, y);
+}
+
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(long, isunordered, double, double)
+
+#endif
diff --git a/libclc/generic/lib/relational/relational.h b/libclc/generic/lib/relational/relational.h
new file mode 100644
index 000000000000..e492750dacb3
--- /dev/null
+++ b/libclc/generic/lib/relational/relational.h
@@ -0,0 +1,117 @@
+/*
+ * Contains relational macros that have to return 1 for scalar and -1 for vector
+ * when the result is true.
+ */
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_NAME, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x){ \
+	return BUILTIN_NAME(x); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC2(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \
+  return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo), FUNCTION(x.hi)} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC3(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \
+  return (RET_TYPE)( (RET_TYPE){FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2)} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC4(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC8(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3), \
+		FUNCTION(x.s4), FUNCTION(x.s5), FUNCTION(x.s6), FUNCTION(x.s7) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC16(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG_TYPE x) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0), FUNCTION(x.s1), FUNCTION(x.s2), FUNCTION(x.s3), \
+		FUNCTION(x.s4), FUNCTION(x.s5), FUNCTION(x.s6), FUNCTION(x.s7), \
+		FUNCTION(x.s8), FUNCTION(x.s9), FUNCTION(x.sa), FUNCTION(x.sb), \
+		FUNCTION(x.sc), FUNCTION(x.sd), FUNCTION(x.se), FUNCTION(x.sf) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(RET_TYPE, FUNCTION, ARG_TYPE) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC2(RET_TYPE##2, FUNCTION, ARG_TYPE##2) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC3(RET_TYPE##3, FUNCTION, ARG_TYPE##3) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC4(RET_TYPE##4, FUNCTION, ARG_TYPE##4) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC8(RET_TYPE##8, FUNCTION, ARG_TYPE##8) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC16(RET_TYPE##16, FUNCTION, ARG_TYPE##16)
+
+#define _CLC_DEFINE_RELATIONAL_UNARY(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG_TYPE) \
+_CLC_DEFINE_RELATIONAL_UNARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG_TYPE) \
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(RET_TYPE, FUNCTION, ARG_TYPE) \
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_NAME, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y){ \
+	return BUILTIN_NAME(x, y); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC2(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( (RET_TYPE){FUNCTION(x.lo, y.lo), FUNCTION(x.hi, y.hi)} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC3(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( (RET_TYPE){FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2)} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC4(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC8(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3), \
+		FUNCTION(x.s4, y.s4), FUNCTION(x.s5, y.s5), FUNCTION(x.s6, y.s6), FUNCTION(x.s7, y.s7) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC16(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG0_TYPE x, ARG1_TYPE y) { \
+  return (RET_TYPE)( \
+	(RET_TYPE){ \
+		FUNCTION(x.s0, y.s0), FUNCTION(x.s1, y.s1), FUNCTION(x.s2, y.s2), FUNCTION(x.s3, y.s3), \
+		FUNCTION(x.s4, y.s4), FUNCTION(x.s5, y.s5), FUNCTION(x.s6, y.s6), FUNCTION(x.s7, y.s7), \
+		FUNCTION(x.s8, y.s8), FUNCTION(x.s9, y.s9), FUNCTION(x.sa, y.sa), FUNCTION(x.sb, y.sb), \
+		FUNCTION(x.sc, y.sc), FUNCTION(x.sd, y.sd), FUNCTION(x.se, y.se), FUNCTION(x.sf, y.sf) \
+	} != (RET_TYPE)0); \
+}
+
+#define _CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC2(RET_TYPE##2, FUNCTION, ARG0_TYPE##2, ARG1_TYPE##2) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC3(RET_TYPE##3, FUNCTION, ARG0_TYPE##3, ARG1_TYPE##3) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC4(RET_TYPE##4, FUNCTION, ARG0_TYPE##4, ARG1_TYPE##4) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC8(RET_TYPE##8, FUNCTION, ARG0_TYPE##8, ARG1_TYPE##8) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC16(RET_TYPE##16, FUNCTION, ARG0_TYPE##16, ARG1_TYPE##16)
+
+#define _CLC_DEFINE_RELATIONAL_BINARY(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEFINE_RELATIONAL_BINARY_SCALAR(RET_TYPE, FUNCTION, BUILTIN_FUNCTION, ARG0_TYPE, ARG1_TYPE) \
+_CLC_DEFINE_RELATIONAL_BINARY_VEC_ALL(RET_TYPE, FUNCTION, ARG0_TYPE, ARG1_TYPE)
diff --git a/libclc/generic/lib/relational/signbit.cl b/libclc/generic/lib/relational/signbit.cl
new file mode 100644
index 000000000000..ab37d2f1288c
--- /dev/null
+++ b/libclc/generic/lib/relational/signbit.cl
@@ -0,0 +1,19 @@
+#include <clc/clc.h>
+#include "relational.h"
+
+_CLC_DEFINE_RELATIONAL_UNARY(int, signbit, __builtin_signbitf, float)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+// The scalar version of signbit(double) returns an int, but the vector versions
+// return long.
+
+_CLC_DEF _CLC_OVERLOAD int signbit(double x){
+	return __builtin_signbit(x);
+}
+
+_CLC_DEFINE_RELATIONAL_UNARY_VEC_ALL(long, signbit, double)
+
+#endif
diff --git a/libclc/generic/lib/shared/clamp.cl b/libclc/generic/lib/shared/clamp.cl
new file mode 100644
index 000000000000..c79a358e00e0
--- /dev/null
+++ b/libclc/generic/lib/shared/clamp.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <clamp.inc>
+#include <clc/integer/gentype.inc>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <clamp.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/shared/clamp.inc b/libclc/generic/lib/shared/clamp.inc
new file mode 100644
index 000000000000..c918f9c499e7
--- /dev/null
+++ b/libclc/generic/lib/shared/clamp.inc
@@ -0,0 +1,9 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_GENTYPE y, __CLC_GENTYPE z) {
+  return (x > z ? z : (x < y ? y : x));
+}
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE clamp(__CLC_GENTYPE x, __CLC_SCALAR_GENTYPE y, __CLC_SCALAR_GENTYPE z) {
+  return (x > (__CLC_GENTYPE)z ? (__CLC_GENTYPE)z : (x < (__CLC_GENTYPE)y ? (__CLC_GENTYPE)y : x));
+}
+#endif
diff --git a/libclc/generic/lib/shared/max.cl b/libclc/generic/lib/shared/max.cl
new file mode 100644
index 000000000000..1c4457c82144
--- /dev/null
+++ b/libclc/generic/lib/shared/max.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <max.inc>
+#include <clc/integer/gentype.inc>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <max.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/shared/max.inc b/libclc/generic/lib/shared/max.inc
new file mode 100644
index 000000000000..75a24c077d1a
--- /dev/null
+++ b/libclc/generic/lib/shared/max.inc
@@ -0,0 +1,9 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_GENTYPE b) {
+  return (a > b ? a : b);
+}
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE max(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b) {
+  return (a > (__CLC_GENTYPE)b ? a : (__CLC_GENTYPE)b);
+}
+#endif
diff --git a/libclc/generic/lib/shared/min.cl b/libclc/generic/lib/shared/min.cl
new file mode 100644
index 000000000000..433087a1069d
--- /dev/null
+++ b/libclc/generic/lib/shared/min.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+#define __CLC_BODY <min.inc>
+#include <clc/integer/gentype.inc>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define __CLC_BODY <min.inc>
+#include <clc/math/gentype.inc>
diff --git a/libclc/generic/lib/shared/min.inc b/libclc/generic/lib/shared/min.inc
new file mode 100644
index 000000000000..fe42864df257
--- /dev/null
+++ b/libclc/generic/lib/shared/min.inc
@@ -0,0 +1,9 @@
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_GENTYPE b) {
+  return (a < b ? a : b);
+}
+
+#ifndef __CLC_SCALAR
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE min(__CLC_GENTYPE a, __CLC_SCALAR_GENTYPE b) {
+  return (a < (__CLC_GENTYPE)b ? a : (__CLC_GENTYPE)b);
+}
+#endif
diff --git a/libclc/generic/lib/shared/vload.cl b/libclc/generic/lib/shared/vload.cl
new file mode 100644
index 000000000000..88972005cfa2
--- /dev/null
+++ b/libclc/generic/lib/shared/vload.cl
@@ -0,0 +1,52 @@
+#include <clc/clc.h>
+
+#define VLOAD_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \
+  typedef PRIM_TYPE##2 less_aligned_##ADDR_SPACE##PRIM_TYPE##2 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##2 vload2(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \
+    return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&x[2*offset])); \
+  } \
+\
+  typedef PRIM_TYPE##3 less_aligned_##ADDR_SPACE##PRIM_TYPE##3 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##3 vload3(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \
+    PRIM_TYPE##2 vec = *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&x[3*offset])); \
+    return (PRIM_TYPE##3)(vec.s0, vec.s1, x[offset*3+2]); \
+  } \
+\
+  typedef PRIM_TYPE##4 less_aligned_##ADDR_SPACE##PRIM_TYPE##4 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##4 vload4(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \
+    return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##4*) (&x[4*offset])); \
+  } \
+\
+  typedef PRIM_TYPE##8 less_aligned_##ADDR_SPACE##PRIM_TYPE##8 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##8 vload8(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \
+    return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##8*) (&x[8*offset])); \
+  } \
+\
+  typedef PRIM_TYPE##16 less_aligned_##ADDR_SPACE##PRIM_TYPE##16 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##16 vload16(size_t offset, const ADDR_SPACE PRIM_TYPE *x) { \
+    return *((const ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##16*) (&x[16*offset])); \
+  } \
+
+#define VLOAD_ADDR_SPACES(__CLC_SCALAR_GENTYPE) \
+    VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __private) \
+    VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __local) \
+    VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __constant) \
+    VLOAD_VECTORIZE(__CLC_SCALAR_GENTYPE, __global) \
+
+#define VLOAD_TYPES() \
+    VLOAD_ADDR_SPACES(char) \
+    VLOAD_ADDR_SPACES(uchar) \
+    VLOAD_ADDR_SPACES(short) \
+    VLOAD_ADDR_SPACES(ushort) \
+    VLOAD_ADDR_SPACES(int) \
+    VLOAD_ADDR_SPACES(uint) \
+    VLOAD_ADDR_SPACES(long) \
+    VLOAD_ADDR_SPACES(ulong) \
+    VLOAD_ADDR_SPACES(float) \
+
+VLOAD_TYPES()
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+    VLOAD_ADDR_SPACES(double)
+#endif
diff --git a/libclc/generic/lib/shared/vstore.cl b/libclc/generic/lib/shared/vstore.cl
new file mode 100644
index 000000000000..4777b7ea76ad
--- /dev/null
+++ b/libclc/generic/lib/shared/vstore.cl
@@ -0,0 +1,52 @@
+#include <clc/clc.h>
+
+#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
+
+#define VSTORE_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \
+  typedef PRIM_TYPE##2 less_aligned_##ADDR_SPACE##PRIM_TYPE##2 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF void vstore2(PRIM_TYPE##2 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
+    *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&mem[2*offset])) = vec; \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF void vstore3(PRIM_TYPE##3 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
+    *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##2*) (&mem[3*offset])) = (PRIM_TYPE##2)(vec.s0, vec.s1); \
+    mem[3 * offset + 2] = vec.s2;\
+  } \
+\
+  typedef PRIM_TYPE##4 less_aligned_##ADDR_SPACE##PRIM_TYPE##4 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF void vstore4(PRIM_TYPE##4 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
+    *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##4*) (&mem[4*offset])) = vec; \
+  } \
+\
+  typedef PRIM_TYPE##8 less_aligned_##ADDR_SPACE##PRIM_TYPE##8 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF void vstore8(PRIM_TYPE##8 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
+    *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##8*) (&mem[8*offset])) = vec; \
+  } \
+\
+  typedef PRIM_TYPE##16 less_aligned_##ADDR_SPACE##PRIM_TYPE##16 __attribute__ ((aligned (sizeof(PRIM_TYPE))));\
+  _CLC_OVERLOAD _CLC_DEF void vstore16(PRIM_TYPE##16 vec, size_t offset, ADDR_SPACE PRIM_TYPE *mem) { \
+    *((ADDR_SPACE less_aligned_##ADDR_SPACE##PRIM_TYPE##16*) (&mem[16*offset])) = vec; \
+  } \
+
+#define VSTORE_ADDR_SPACES(__CLC_SCALAR___CLC_GENTYPE) \
+    VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __private) \
+    VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __local) \
+    VSTORE_VECTORIZE(__CLC_SCALAR___CLC_GENTYPE, __global) \
+
+#define VSTORE_TYPES() \
+    VSTORE_ADDR_SPACES(char) \
+    VSTORE_ADDR_SPACES(uchar) \
+    VSTORE_ADDR_SPACES(short) \
+    VSTORE_ADDR_SPACES(ushort) \
+    VSTORE_ADDR_SPACES(int) \
+    VSTORE_ADDR_SPACES(uint) \
+    VSTORE_ADDR_SPACES(long) \
+    VSTORE_ADDR_SPACES(ulong) \
+    VSTORE_ADDR_SPACES(float) \
+
+VSTORE_TYPES()
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+    VSTORE_ADDR_SPACES(double)
+#endif
diff --git a/libclc/generic/lib/workitem/get_global_id.cl b/libclc/generic/lib/workitem/get_global_id.cl
new file mode 100644
index 000000000000..fdd83d2953d4
--- /dev/null
+++ b/libclc/generic/lib/workitem/get_global_id.cl
@@ -0,0 +1,5 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_global_id(uint dim) {
+  return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
+}
diff --git a/libclc/generic/lib/workitem/get_global_size.cl b/libclc/generic/lib/workitem/get_global_size.cl
new file mode 100644
index 000000000000..5ae649e10d51
--- /dev/null
+++ b/libclc/generic/lib/workitem/get_global_size.cl
@@ -0,0 +1,5 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_global_size(uint dim) {
+  return get_num_groups(dim)*get_local_size(dim);
+}
diff --git a/libclc/ptx-nvidiacl/lib/SOURCES b/libclc/ptx-nvidiacl/lib/SOURCES
new file mode 100644
index 000000000000..7cdbd8507699
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/SOURCES
@@ -0,0 +1,5 @@
+synchronization/barrier.cl
+workitem/get_group_id.cl
+workitem/get_local_id.cl
+workitem/get_local_size.cl
+workitem/get_num_groups.cl
diff --git a/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl
new file mode 100644
index 000000000000..fb36c2612be4
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/synchronization/barrier.cl
@@ -0,0 +1,8 @@
+#include <clc/clc.h>
+
+_CLC_DEF void barrier(cl_mem_fence_flags flags) {
+  if (flags & CLK_LOCAL_MEM_FENCE) {
+    __builtin_ptx_bar_sync(0);
+  }
+}
+
diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl b/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl
new file mode 100644
index 000000000000..2b35b4eaaa95
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/workitem/get_group_id.cl
@@ -0,0 +1,10 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_group_id(uint dim) {
+  switch (dim) {
+  case 0:  return __builtin_ptx_read_ctaid_x();
+  case 1:  return __builtin_ptx_read_ctaid_y();
+  case 2:  return __builtin_ptx_read_ctaid_z();
+  default: return 0;
+  }
+}
diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl b/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl
new file mode 100644
index 000000000000..f0cfdc005fe8
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/workitem/get_local_id.cl
@@ -0,0 +1,10 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_local_id(uint dim) {
+  switch (dim) {
+  case 0:  return __builtin_ptx_read_tid_x();
+  case 1:  return __builtin_ptx_read_tid_y();
+  case 2:  return __builtin_ptx_read_tid_z();
+  default: return 0;
+  }
+}
diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl b/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl
new file mode 100644
index 000000000000..c3f542595def
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/workitem/get_local_size.cl
@@ -0,0 +1,10 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_local_size(uint dim) {
+  switch (dim) {
+  case 0:  return __builtin_ptx_read_ntid_x();
+  case 1:  return __builtin_ptx_read_ntid_y();
+  case 2:  return __builtin_ptx_read_ntid_z();
+  default: return 0;
+  }
+}
diff --git a/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl b/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl
new file mode 100644
index 000000000000..90bdc2e41d2c
--- /dev/null
+++ b/libclc/ptx-nvidiacl/lib/workitem/get_num_groups.cl
@@ -0,0 +1,10 @@
+#include <clc/clc.h>
+
+_CLC_DEF size_t get_num_groups(uint dim) {
+  switch (dim) {
+  case 0:  return __builtin_ptx_read_nctaid_x();
+  case 1:  return __builtin_ptx_read_nctaid_y();
+  case 2:  return __builtin_ptx_read_nctaid_z();
+  default: return 0;
+  }
+}
diff --git a/libclc/ptx/lib/OVERRIDES b/libclc/ptx/lib/OVERRIDES
new file mode 100644
index 000000000000..475162c97cd2
--- /dev/null
+++ b/libclc/ptx/lib/OVERRIDES
@@ -0,0 +1,2 @@
+integer/add_sat_if.ll
+integer/sub_sat_if.ll
diff --git a/libclc/ptx/lib/SOURCES b/libclc/ptx/lib/SOURCES
new file mode 100644
index 000000000000..fb6e17fbc697
--- /dev/null
+++ b/libclc/ptx/lib/SOURCES
@@ -0,0 +1,2 @@
+integer/add_sat.ll
+integer/sub_sat.ll
+\ No newline at end of file
diff --git a/libclc/ptx/lib/integer/add_sat.ll b/libclc/ptx/lib/integer/add_sat.ll
new file mode 100644
index 000000000000..f887962c8a49
--- /dev/null
+++ b/libclc/ptx/lib/integer/add_sat.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+
+define ptx_device i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+
+define ptx_device i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+
+define ptx_device i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+
+define ptx_device i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+
+define ptx_device i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+
+define ptx_device i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
+
+define ptx_device i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
+  ret i64 %call
+}
+
+declare i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
+
+define ptx_device i64 @__clc_add_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
+  ret i64 %call
+}
diff --git a/libclc/ptx/lib/integer/sub_sat.ll b/libclc/ptx/lib/integer/sub_sat.ll
new file mode 100644
index 000000000000..1a66eb566b52
--- /dev/null
+++ b/libclc/ptx/lib/integer/sub_sat.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y)
+
+define ptx_device i8 @__clc_sub_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_sub_sat_impl_s8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y)
+
+define ptx_device i8 @__clc_sub_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_sub_sat_impl_u8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y)
+
+define ptx_device i16 @__clc_sub_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_sub_sat_impl_s16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y)
+
+define ptx_device i16 @__clc_sub_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_sub_sat_impl_u16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y)
+
+define ptx_device i32 @__clc_sub_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_sub_sat_impl_s32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y)
+
+define ptx_device i32 @__clc_sub_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_sub_sat_impl_u32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y)
+
+define ptx_device i64 @__clc_sub_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_sub_sat_impl_s64(i64 %x, i64 %y)
+  ret i64 %call
+}
+
+declare i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y)
+
+define ptx_device i64 @__clc_sub_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
+  %call = call i64 @__clc_sub_sat_impl_u64(i64 %x, i64 %y)
+  ret i64 %call
+}
diff --git a/libclc/r600/lib/OVERRIDES b/libclc/r600/lib/OVERRIDES
new file mode 100644
index 000000000000..3f941d890be7
--- /dev/null
+++ b/libclc/r600/lib/OVERRIDES
@@ -0,0 +1,2 @@
+workitem/get_group_id.cl
+workitem/get_global_size.cl
diff --git a/libclc/r600/lib/SOURCES b/libclc/r600/lib/SOURCES
new file mode 100644
index 000000000000..ef23d83a5450
--- /dev/null
+++ b/libclc/r600/lib/SOURCES
@@ -0,0 +1,10 @@
+atomic/atomic.cl
+math/nextafter.cl
+workitem/get_num_groups.ll
+workitem/get_group_id.ll
+workitem/get_local_size.ll
+workitem/get_local_id.ll
+workitem/get_global_size.ll
+workitem/get_work_dim.ll
+synchronization/barrier.cl
+synchronization/barrier_impl.ll
diff --git a/libclc/r600/lib/atomic/atomic.cl b/libclc/r600/lib/atomic/atomic.cl
new file mode 100644
index 000000000000..5bfe07b94bfd
--- /dev/null
+++ b/libclc/r600/lib/atomic/atomic.cl
@@ -0,0 +1,65 @@
+#include <clc/clc.h>
+
+#define ATOMIC_FUNC_DEFINE(RET_SIGN, ARG_SIGN, TYPE, CL_FUNCTION, CLC_FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+_CLC_OVERLOAD _CLC_DEF RET_SIGN TYPE CL_FUNCTION (volatile CL_ADDRSPACE RET_SIGN TYPE *p, RET_SIGN TYPE val) { \
+	return (RET_SIGN TYPE)__clc_##CLC_FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE ARG_SIGN TYPE*)p, (ARG_SIGN TYPE)val); \
+}
+
+/* For atomic functions that don't need different bitcode dependending on argument signedness */
+#define ATOMIC_FUNC_SIGN(TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+	_CLC_DECL signed TYPE __clc_##FUNCTION##_addr##LLVM_ADDRSPACE(volatile CL_ADDRSPACE signed TYPE*, signed TYPE); \
+	ATOMIC_FUNC_DEFINE(signed, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+	ATOMIC_FUNC_DEFINE(unsigned, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE)
+
+#define ATOMIC_FUNC_ADDRSPACE(TYPE, FUNCTION) \
+	ATOMIC_FUNC_SIGN(TYPE, FUNCTION, global, 1) \
+	ATOMIC_FUNC_SIGN(TYPE, FUNCTION, local, 3)
+
+#define ATOMIC_FUNC(FUNCTION) \
+	ATOMIC_FUNC_ADDRSPACE(int, FUNCTION)
+
+#define ATOMIC_FUNC_DEFINE_3_ARG(RET_SIGN, ARG_SIGN, TYPE, CL_FUNCTION, CLC_FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+_CLC_OVERLOAD _CLC_DEF RET_SIGN TYPE CL_FUNCTION (volatile CL_ADDRSPACE RET_SIGN TYPE *p, RET_SIGN TYPE cmp, RET_SIGN TYPE val) { \
+	return (RET_SIGN TYPE)__clc_##CLC_FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE ARG_SIGN TYPE*)p, (ARG_SIGN TYPE)cmp, (ARG_SIGN TYPE)val); \
+}
+
+/* For atomic functions that don't need different bitcode dependending on argument signedness */
+#define ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+	_CLC_DECL signed TYPE __clc_##FUNCTION##_addr##LLVM_ADDRSPACE(volatile CL_ADDRSPACE signed TYPE*, signed TYPE, signed TYPE); \
+	ATOMIC_FUNC_DEFINE_3_ARG(signed, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+	ATOMIC_FUNC_DEFINE_3_ARG(unsigned, signed, TYPE, FUNCTION, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE)
+
+#define ATOMIC_FUNC_ADDRSPACE_3_ARG(TYPE, FUNCTION) \
+	ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, global, 1) \
+	ATOMIC_FUNC_SIGN_3_ARG(TYPE, FUNCTION, local, 3)
+
+#define ATOMIC_FUNC_3_ARG(FUNCTION) \
+	ATOMIC_FUNC_ADDRSPACE_3_ARG(int, FUNCTION)
+
+ATOMIC_FUNC(atomic_add)
+ATOMIC_FUNC(atomic_and)
+ATOMIC_FUNC(atomic_or)
+ATOMIC_FUNC(atomic_sub)
+ATOMIC_FUNC(atomic_xchg)
+ATOMIC_FUNC(atomic_xor)
+ATOMIC_FUNC_3_ARG(atomic_cmpxchg)
+
+_CLC_DECL signed int __clc_atomic_max_addr1(volatile global signed int*, signed int);
+_CLC_DECL signed int __clc_atomic_max_addr3(volatile local signed int*, signed int);
+_CLC_DECL uint __clc_atomic_umax_addr1(volatile global uint*, uint);
+_CLC_DECL uint __clc_atomic_umax_addr3(volatile local uint*, uint);
+
+ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_max, atomic_max, global, 1)
+ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_max, atomic_max, local, 3)
+ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_max, atomic_umax, global, 1)
+ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_max, atomic_umax, local, 3)
+
+_CLC_DECL signed int __clc_atomic_min_addr1(volatile global signed int*, signed int);
+_CLC_DECL signed int __clc_atomic_min_addr3(volatile local signed int*, signed int);
+_CLC_DECL uint __clc_atomic_umin_addr1(volatile global uint*, uint);
+_CLC_DECL uint __clc_atomic_umin_addr3(volatile local uint*, uint);
+
+ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_min, atomic_min, global, 1)
+ATOMIC_FUNC_DEFINE(signed, signed, int, atomic_min, atomic_min, local, 3)
+ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_min, atomic_umin, global, 1)
+ATOMIC_FUNC_DEFINE(unsigned, unsigned, int, atomic_min, atomic_umin, local, 3)
diff --git a/libclc/r600/lib/math/nextafter.cl b/libclc/r600/lib/math/nextafter.cl
new file mode 100644
index 000000000000..4611c81ae91e
--- /dev/null
+++ b/libclc/r600/lib/math/nextafter.cl
@@ -0,0 +1,4 @@
+#include <clc/clc.h>
+#include "../lib/clcmacro.h"
+
+_CLC_DEFINE_BINARY_BUILTIN(float, nextafter, __clc_nextafter, float, float)
diff --git a/libclc/r600/lib/synchronization/barrier.cl b/libclc/r600/lib/synchronization/barrier.cl
new file mode 100644
index 000000000000..6f2900b06eef
--- /dev/null
+++ b/libclc/r600/lib/synchronization/barrier.cl
@@ -0,0 +1,10 @@
+
+#include <clc/clc.h>
+
+_CLC_DEF int __clc_clk_local_mem_fence() {
+  return CLK_LOCAL_MEM_FENCE;
+}
+
+_CLC_DEF int __clc_clk_global_mem_fence() {
+  return CLK_GLOBAL_MEM_FENCE;
+}
diff --git a/libclc/r600/lib/synchronization/barrier_impl.ll b/libclc/r600/lib/synchronization/barrier_impl.ll
new file mode 100644
index 000000000000..3d8ee66bab6e
--- /dev/null
+++ b/libclc/r600/lib/synchronization/barrier_impl.ll
@@ -0,0 +1,29 @@
+declare i32 @__clc_clk_local_mem_fence() nounwind alwaysinline
+declare i32 @__clc_clk_global_mem_fence() nounwind alwaysinline
+declare void @llvm.AMDGPU.barrier.local() nounwind noduplicate
+declare void @llvm.AMDGPU.barrier.global() nounwind noduplicate
+
+define void @barrier(i32 %flags) nounwind noduplicate alwaysinline {
+barrier_local_test:
+  %CLK_LOCAL_MEM_FENCE = call i32 @__clc_clk_local_mem_fence()
+  %0 = and i32 %flags, %CLK_LOCAL_MEM_FENCE
+  %1 = icmp ne i32 %0, 0
+  br i1 %1, label %barrier_local, label %barrier_global_test
+
+barrier_local:
+  call void @llvm.AMDGPU.barrier.local() noduplicate
+  br label %barrier_global_test
+
+barrier_global_test:
+  %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
+  %2 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
+  %3 = icmp ne i32 %2, 0
+  br i1 %3, label %barrier_global, label %done
+
+barrier_global:
+  call void @llvm.AMDGPU.barrier.global() noduplicate
+  br label %done
+
+done:
+  ret void
+}
diff --git a/libclc/r600/lib/workitem/get_global_size.ll b/libclc/r600/lib/workitem/get_global_size.ll
new file mode 100644
index 000000000000..ac2d08d8ee19
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_global_size.ll
@@ -0,0 +1,18 @@
+declare i32 @llvm.r600.read.global.size.x() nounwind readnone
+declare i32 @llvm.r600.read.global.size.y() nounwind readnone
+declare i32 @llvm.r600.read.global.size.z() nounwind readnone
+
+define i32 @get_global_size(i32 %dim) nounwind readnone alwaysinline {
+  switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim]
+x_dim:
+  %x = call i32 @llvm.r600.read.global.size.x() nounwind readnone
+  ret i32 %x
+y_dim:
+  %y = call i32 @llvm.r600.read.global.size.y() nounwind readnone
+  ret i32 %y
+z_dim:
+  %z = call i32 @llvm.r600.read.global.size.z() nounwind readnone
+  ret i32 %z
+default:
+  ret i32 0
+}
diff --git a/libclc/r600/lib/workitem/get_group_id.ll b/libclc/r600/lib/workitem/get_group_id.ll
new file mode 100644
index 000000000000..0dc86e5edfe1
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_group_id.ll
@@ -0,0 +1,18 @@
+declare i32 @llvm.r600.read.tgid.x() nounwind readnone
+declare i32 @llvm.r600.read.tgid.y() nounwind readnone
+declare i32 @llvm.r600.read.tgid.z() nounwind readnone
+
+define i32 @get_group_id(i32 %dim) nounwind readnone alwaysinline {
+  switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim]
+x_dim:
+  %x = call i32 @llvm.r600.read.tgid.x() nounwind readnone
+  ret i32 %x
+y_dim:
+  %y = call i32 @llvm.r600.read.tgid.y() nounwind readnone
+  ret i32 %y
+z_dim:
+  %z = call i32 @llvm.r600.read.tgid.z() nounwind readnone
+  ret i32 %z
+default:
+  ret i32 0
+}
diff --git a/libclc/r600/lib/workitem/get_local_id.ll b/libclc/r600/lib/workitem/get_local_id.ll
new file mode 100644
index 000000000000..ac5522a7822b
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_local_id.ll
@@ -0,0 +1,18 @@
+declare i32 @llvm.r600.read.tidig.x() nounwind readnone
+declare i32 @llvm.r600.read.tidig.y() nounwind readnone
+declare i32 @llvm.r600.read.tidig.z() nounwind readnone
+
+define i32 @get_local_id(i32 %dim) nounwind readnone alwaysinline {
+  switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim]
+x_dim:
+  %x = call i32 @llvm.r600.read.tidig.x() nounwind readnone
+  ret i32 %x
+y_dim:
+  %y = call i32 @llvm.r600.read.tidig.y() nounwind readnone
+  ret i32 %y
+z_dim:
+  %z = call i32 @llvm.r600.read.tidig.z() nounwind readnone
+  ret i32 %z
+default:
+  ret i32 0
+}
diff --git a/libclc/r600/lib/workitem/get_local_size.ll b/libclc/r600/lib/workitem/get_local_size.ll
new file mode 100644
index 000000000000..0a98de683ae4
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_local_size.ll
@@ -0,0 +1,18 @@
+declare i32 @llvm.r600.read.local.size.x() nounwind readnone
+declare i32 @llvm.r600.read.local.size.y() nounwind readnone
+declare i32 @llvm.r600.read.local.size.z() nounwind readnone
+
+define i32 @get_local_size(i32 %dim) nounwind readnone alwaysinline {
+  switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim]
+x_dim:
+  %x = call i32 @llvm.r600.read.local.size.x() nounwind readnone
+  ret i32 %x
+y_dim:
+  %y = call i32 @llvm.r600.read.local.size.y() nounwind readnone
+  ret i32 %y
+z_dim:
+  %z = call i32 @llvm.r600.read.local.size.z() nounwind readnone
+  ret i32 %z
+default:
+  ret i32 0
+}
diff --git a/libclc/r600/lib/workitem/get_num_groups.ll b/libclc/r600/lib/workitem/get_num_groups.ll
new file mode 100644
index 000000000000..a708f422c27e
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_num_groups.ll
@@ -0,0 +1,18 @@
+declare i32 @llvm.r600.read.ngroups.x() nounwind readnone
+declare i32 @llvm.r600.read.ngroups.y() nounwind readnone
+declare i32 @llvm.r600.read.ngroups.z() nounwind readnone
+
+define i32 @get_num_groups(i32 %dim) nounwind readnone alwaysinline {
+  switch i32 %dim, label %default [i32 0, label %x_dim i32 1, label %y_dim i32 2, label %z_dim]
+x_dim:
+  %x = call i32 @llvm.r600.read.ngroups.x() nounwind readnone
+  ret i32 %x
+y_dim:
+  %y = call i32 @llvm.r600.read.ngroups.y() nounwind readnone
+  ret i32 %y
+z_dim:
+  %z = call i32 @llvm.r600.read.ngroups.z() nounwind readnone
+  ret i32 %z
+default:
+  ret i32 0
+}
diff --git a/libclc/r600/lib/workitem/get_work_dim.ll b/libclc/r600/lib/workitem/get_work_dim.ll
new file mode 100644
index 000000000000..1220153fe2bd
--- /dev/null
+++ b/libclc/r600/lib/workitem/get_work_dim.ll
@@ -0,0 +1,8 @@
+declare i32 @llvm.AMDGPU.read.workdim() nounwind readnone
+
+define i32 @get_work_dim() nounwind readnone alwaysinline {
+  %x = call i32 @llvm.AMDGPU.read.workdim() nounwind readnone , !range !0
+  ret i32 %x
+}
+
+!0 = metadata !{ i32 1, i32 4 }
diff --git a/libclc/test/add_sat.cl b/libclc/test/add_sat.cl
new file mode 100644
index 000000000000..45c8567b4403
--- /dev/null
+++ b/libclc/test/add_sat.cl
@@ -0,0 +1,3 @@
+__kernel void foo(__global char *a, __global char *b, __global char *c) {
+  *a = add_sat(*b, *c);
+}
diff --git a/libclc/test/as_type.cl b/libclc/test/as_type.cl
new file mode 100644
index 000000000000..e8fb1228d28d
--- /dev/null
+++ b/libclc/test/as_type.cl
@@ -0,0 +1,3 @@
+__kernel void foo(int4 *x, float4 *y) {
+  *x = as_int4(*y);
+}
diff --git a/libclc/test/convert.cl b/libclc/test/convert.cl
new file mode 100644
index 000000000000..928fc326b6a1
--- /dev/null
+++ b/libclc/test/convert.cl
@@ -0,0 +1,3 @@
+__kernel void foo(int4 *x, float4 *y) {
+  *x = convert_int4(*y);
+}
diff --git a/libclc/test/cos.cl b/libclc/test/cos.cl
new file mode 100644
index 000000000000..4230eb2a0e93
--- /dev/null
+++ b/libclc/test/cos.cl
@@ -0,0 +1,3 @@
+__kernel void foo(float4 *f) {
+  *f = cos(*f);
+}
diff --git a/libclc/test/cross.cl b/libclc/test/cross.cl
new file mode 100644
index 000000000000..08955cbd9af5
--- /dev/null
+++ b/libclc/test/cross.cl
@@ -0,0 +1,3 @@
+__kernel void foo(float4 *f) {
+  *f = cross(f[0], f[1]);
+}
diff --git a/libclc/test/fabs.cl b/libclc/test/fabs.cl
new file mode 100644
index 000000000000..91d42c466676
--- /dev/null
+++ b/libclc/test/fabs.cl
@@ -0,0 +1,3 @@
+__kernel void foo(float *f) {
+  *f = fabs(*f);
+}
diff --git a/libclc/test/get_group_id.cl b/libclc/test/get_group_id.cl
new file mode 100644
index 000000000000..43725cda8027
--- /dev/null
+++ b/libclc/test/get_group_id.cl
@@ -0,0 +1,3 @@
+__kernel void foo(int *i) {
+  i[get_group_id(0)] = 1;
+}
diff --git a/libclc/test/rsqrt.cl b/libclc/test/rsqrt.cl
new file mode 100644
index 000000000000..13ad216b79f4
--- /dev/null
+++ b/libclc/test/rsqrt.cl
@@ -0,0 +1,6 @@
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+__kernel void foo(float4 *x, double4 *y) {
+  x[1] = rsqrt(x[0]);
+  y[1] = rsqrt(y[0]);
+}
diff --git a/libclc/test/subsat.cl b/libclc/test/subsat.cl
new file mode 100644
index 000000000000..a83414b4dc85
--- /dev/null
+++ b/libclc/test/subsat.cl
@@ -0,0 +1,19 @@
+__kernel void test_subsat_char(char *a, char x, char y) {
+  *a = sub_sat(x, y);
+  return;
+}
+
+__kernel void test_subsat_uchar(uchar *a, uchar x, uchar y) {
+  *a = sub_sat(x, y);
+  return;
+}
+
+__kernel void test_subsat_long(long *a, long x, long y) {
+  *a = sub_sat(x, y);
+  return;
+}
+
+__kernel void test_subsat_ulong(ulong *a, ulong x, ulong y) {
+  *a = sub_sat(x, y);
+  return;
+}
+\ No newline at end of file
diff --git a/libclc/utils/prepare-builtins.cpp b/libclc/utils/prepare-builtins.cpp
new file mode 100644
index 000000000000..ee51edfee01f
--- /dev/null
+++ b/libclc/utils/prepare-builtins.cpp
@@ -0,0 +1,137 @@
+#include "llvm/Bitcode/ReaderWriter.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/ErrorOr.h"
+#include "llvm/Support/ToolOutputFile.h"
+#include "llvm/Config/llvm-config.h"
+
+#define LLVM_360_AND_NEWER \
+  (LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 6))
+
+#define LLVM_350_AND_NEWER \
+  (LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5))
+
+#if LLVM_350_AND_NEWER
+#include <system_error>
+
+#define ERROR_CODE std::error_code
+#define UNIQUE_PTR std::unique_ptr
+#else
+#include "llvm/ADT/OwningPtr.h"
+#include "llvm/Support/system_error.h"
+
+#define ERROR_CODE error_code
+#define UNIQUE_PTR OwningPtr
+#endif
+
+using namespace llvm;
+
+static cl::opt<std::string>
+InputFilename(cl::Positional, cl::desc("<input bitcode>"), cl::init("-"));
+
+static cl::opt<std::string>
+OutputFilename("o", cl::desc("Output filename"),
+               cl::value_desc("filename"));
+
+int main(int argc, char **argv) {
+  LLVMContext &Context = getGlobalContext();
+  llvm_shutdown_obj Y;  // Call llvm_shutdown() on exit.
+
+  cl::ParseCommandLineOptions(argc, argv, "libclc builtin preparation tool\n");
+
+  std::string ErrorMessage;
+  std::auto_ptr<Module> M;
+
+  {
+#if LLVM_350_AND_NEWER
+    ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr =
+      MemoryBuffer::getFile(InputFilename);
+    std::unique_ptr<MemoryBuffer> &BufferPtr = BufferOrErr.get();
+    if (std::error_code  ec = BufferOrErr.getError())
+#else
+    UNIQUE_PTR<MemoryBuffer> BufferPtr;
+    if (ERROR_CODE ec = MemoryBuffer::getFileOrSTDIN(InputFilename, BufferPtr))
+#endif
+      ErrorMessage = ec.message();
+    else {
+#if LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR > 4)
+# if LLVM_360_AND_NEWER
+      ErrorOr<Module *> ModuleOrErr =
+	parseBitcodeFile(BufferPtr.get()->getMemBufferRef(), Context);
+# else
+      ErrorOr<Module *> ModuleOrErr = parseBitcodeFile(BufferPtr.get(), Context);
+# endif
+      if (ERROR_CODE ec = ModuleOrErr.getError())
+        ErrorMessage = ec.message();
+      M.reset(ModuleOrErr.get());
+#else
+      M.reset(ParseBitcodeFile(BufferPtr.get(), Context, &ErrorMessage));
+#endif
+    }
+  }
+
+  if (M.get() == 0) {
+    errs() << argv[0] << ": ";
+    if (ErrorMessage.size())
+      errs() << ErrorMessage << "\n";
+    else
+      errs() << "bitcode didn't read correctly.\n";
+    return 1;
+  }
+
+  // Set linkage of every external definition to linkonce_odr.
+  for (Module::iterator i = M->begin(), e = M->end(); i != e; ++i) {
+    if (!i->isDeclaration() && i->getLinkage() == GlobalValue::ExternalLinkage)
+      i->setLinkage(GlobalValue::LinkOnceODRLinkage);
+  }
+
+  for (Module::global_iterator i = M->global_begin(), e = M->global_end();
+       i != e; ++i) {
+    if (!i->isDeclaration() && i->getLinkage() == GlobalValue::ExternalLinkage)
+      i->setLinkage(GlobalValue::LinkOnceODRLinkage);
+  }
+
+  if (OutputFilename.empty()) {
+    errs() << "no output file\n";
+    return 1;
+  }
+
+#if LLVM_360_AND_NEWER
+  std::error_code EC;
+  UNIQUE_PTR<tool_output_file> Out
+  (new tool_output_file(OutputFilename, EC, sys::fs::F_None));
+  if (EC) {
+    errs() << EC.message() << '\n';
+    exit(1);
+  }
+#else
+  std::string ErrorInfo;
+  UNIQUE_PTR<tool_output_file> Out
+  (new tool_output_file(OutputFilename.c_str(), ErrorInfo,
+#if (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR == 4)
+                        sys::fs::F_Binary));
+#elif LLVM_VERSION_MAJOR > 3 || (LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5)
+                        sys::fs::F_None));
+#else
+                        raw_fd_ostream::F_Binary));
+#endif
+  if (!ErrorInfo.empty()) {
+    errs() << ErrorInfo << '\n';
+    exit(1);
+  }
+#endif // LLVM_360_AND_NEWER
+
+  WriteBitcodeToFile(M.get(), Out->os());
+
+  // Declare success.
+  Out->keep();
+  return 0;
+}
+
diff --git a/libclc/www/index.html b/libclc/www/index.html
new file mode 100644
index 000000000000..bbd0dc8fcede
--- /dev/null
+++ b/libclc/www/index.html
@@ -0,0 +1,55 @@
+<html>
+<head>
+<title>libclc</title>
+</head>
+<body>
+<h1>libclc</h1>
+<p>
+libclc is an open source, BSD/MIT dual licensed
+implementation of the library requirements of the
+OpenCL C programming language, as specified by the <a
+href="http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf">OpenCL
+1.1 Specification</a>.  The following sections of the specification
+impose library requirements:
+<ul>
+<li>6.1: Supported Data Types
+<li>6.2.3: Explicit Conversions
+<li>6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
+<li>6.9: Preprocessor Directives and Macros
+<li>6.11: Built-in Functions
+<li>9.3: Double Precision Floating-Point
+<li>9.4: 64-bit Atomics
+<li>9.5: Writing to 3D image memory objects
+<li>9.6: Half Precision Floating-Point
+</ul>
+</p>
+
+<p>
+libclc is intended to be used with the <a href="http://clang.llvm.org/">Clang</a>
+compiler's OpenCL frontend.
+</p>
+
+<p>
+libclc is designed to be portable and extensible.  To this end,
+it provides generic implementations of most library requirements,
+allowing the target to override the generic implementation at the
+granularity of individual functions.
+</p>
+
+<p>
+libclc currently only supports the PTX target, but support for more
+targets is welcome.
+</p>
+
+<h2>Download</h2>
+
+<tt>svn checkout http://llvm.org/svn/llvm-project/libclc/trunk libclc</tt> (<a href="http://llvm.org/viewvc/llvm-project/libclc/trunk/">ViewVC</a>)
+<br>- or -<br>
+<tt>git clone http://llvm.org/git/libclc.git</tt>
+
+<h2>Mailing List</h2>
+
+libclc-dev@pcc.me.uk (<a href="http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev">subscribe/unsubscribe</a>, <a href="http://www.pcc.me.uk/pipermail/libclc-dev/">archives</a>)
+
+</body>
+</html>
author	Tom Stellard <thomas.stellard@amd.com>	2014-12-31 14:56:56 +0000
committer	Tom Stellard <thomas.stellard@amd.com>	2014-12-31 14:56:56 +0000
commit	6ad55e925a906382607f275be2c78da988d13d2a (patch)
tree	4389f06ba859bb72d4c8e0f68e74d1de6736d0fe
parent	30a0597a0d84153fda5a595fd3b801aaea183cd8 (diff)
parent	67978556a5eb7e69ba54113c8b3be38e30ebc2a9 (diff)
download	llvm-6ad55e925a906382607f275be2c78da988d13d2a.tar.gz