%
% (c) The AQUA Project, Glasgow University, 1993-1998
%

\begin{code}
module AsmCodeGen ( nativeCodeGen ) where

#include "HsVersions.h"
#include "nativeGen/NCG.h"

import IO		( Handle )
import List		( intersperse )

import MachMisc
import MachRegs
import MachCode
import PprMach

import AbsCStixGen	( genCodeAbstractC )
import AbsCSyn		( AbstractC, MagicId )
import AsmRegAlloc	( runRegAllocate )
import PrimOp		( commutableOp, PrimOp(..) )
import RegAllocInfo	( mkMRegsState, MRegsState, findReservedRegs )
import Stix		( StixTree(..), StixReg(..), 
                          pprStixTrees, pprStixTree, CodeSegment(..),
                          stixCountTempUses, stixSubst,
                          NatM, initNat, mapNat,
                          NatM_State, mkNatM_State,
                          uniqOfNatM_State, deltaOfNatM_State )
import PrimRep		( isFloatingRep, PrimRep(..) )
import UniqSupply	( returnUs, thenUs, mapUs, initUs, 
                          initUs_, UniqSM, UniqSupply )
import MachMisc		( IF_ARCH_i386(i386_insert_ffrees,) )

import OrdList		( fromOL, concatOL )
import Outputable	

\end{code}

The 96/03 native-code generator has machine-independent and
machine-dependent modules (those \tr{#include}'ing \tr{NCG.h}).

This module (@AsmCodeGen@) is the top-level machine-independent
module.  It uses @AbsCStixGen.genCodeAbstractC@ to produce @StixTree@s
(defined in module @Stix@), using support code from @StixInfo@ (info
tables), @StixPrim@ (primitive operations), @StixMacro@ (Abstract C
macros), and @StixInteger@ (GMP arbitrary-precision operations).

Before entering machine-dependent land, we do some machine-independent
@genericOpt@imisations (defined below) on the @StixTree@s.

We convert to the machine-specific @Instr@ datatype with
@stmt2Instrs@, assuming an ``infinite'' supply of registers.  We then
use a machine-independent register allocator (@runRegAllocate@) to
rejoin reality.  Obviously, @runRegAllocate@ has machine-specific
helper functions (see about @RegAllocInfo@ below).

The machine-dependent bits break down as follows:
\begin{description}
\item[@MachRegs@:]  Everything about the target platform's machine
    registers (and immediate operands, and addresses, which tend to
    intermingle/interact with registers).

\item[@MachMisc@:]  Includes the @Instr@ datatype (possibly should
    have a module of its own), plus a miscellany of other things
    (e.g., @targetDoubleSize@, @smStablePtrTable@, ...)

\item[@MachCode@:]  @stmt2Instrs@ is where @Stix@ stuff turns into
    machine instructions.

\item[@PprMach@:] @pprInstr@ turns an @Instr@ into text (well, really
    an @Doc@).

\item[@RegAllocInfo@:] In the register allocator, we manipulate
    @MRegsState@s, which are @BitSet@s, one bit per machine register.
    When we want to say something about a specific machine register
    (e.g., ``it gets clobbered by this instruction''), we set/unset
    its bit.  Obviously, we do this @BitSet@ thing for efficiency
    reasons.

    The @RegAllocInfo@ module collects together the machine-specific
    info needed to do register allocation.
\end{description}

So, here we go:

\begin{code}
nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc)
nativeCodeGen absC us
   = let (stixRaw, us1) = initUs us (genCodeAbstractC absC)
         stixOpt        = map genericOpt stixRaw
         insns          = initUs_ us1 (codeGen stixOpt)
         debug_stix     = vcat (map pprStixTrees stixOpt)
     in {- trace "nativeGen: begin" -} (debug_stix, insns)
\end{code}

@codeGen@ is the top-level code-generation function:
\begin{code}
codeGen :: [[StixTree]] -> UniqSM SDoc

codeGen stixFinal
  = mapUs genMachCode stixFinal	`thenUs` \ dynamic_codes ->
    let
        fp_kludge :: [Instr] -> [Instr]
        fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id)

        static_instrss :: [[Instr]]
	static_instrss = map fp_kludge (scheduleMachCode dynamic_codes)
        docs           = map (vcat . map pprInstr) static_instrss

        -- for debugging only
        docs_prealloc  = map (vcat . map pprInstr . fromOL) 
                             dynamic_codes
        text_prealloc  = vcat (intersperse (char ' ' $$ char ' ') docs_prealloc)
    in
    --trace (showSDoc text_prealloc) (
    returnUs (vcat (intersperse (char ' ' 
                                 $$ ptext SLIT("# ___stg_split_marker")
                                 $$ char ' ') 
                    docs))
    --)
\end{code}

Top level code generator for a chunk of stix code.  For this part of
the computation, we switch from the UniqSM monad to the NatM monad.
The latter carries not only a Unique, but also an Int denoting the
current C stack pointer offset in the generated code; this is needed
for creating correct spill offsets on architectures which don't offer,
or for which it would be prohibitively expensive to employ, a frame
pointer register.  Viz, x86.

The offset is measured in bytes, and indicates the difference between
the current (simulated) C stack-ptr and the value it was at the
beginning of the block.  For stacks which grow down, this value should
be either zero or negative.

Switching between the two monads whilst carrying along the same Unique
supply breaks abstraction.  Is that bad?

\begin{code}
genMachCode :: [StixTree] -> UniqSM InstrBlock

genMachCode stmts initial_us
  = let initial_st         = mkNatM_State initial_us 0
        (blocks, final_st) = initNat initial_st 
                                     (mapNat stmt2Instrs stmts)
        instr_list         = concatOL blocks
        final_us           = uniqOfNatM_State final_st
        final_delta        = deltaOfNatM_State final_st
    in
        if   final_delta == 0
        then (instr_list, final_us)
        else pprPanic "genMachCode: nonzero final delta"
                      (int final_delta)
\end{code}

The next bit does the code scheduling.  The scheduler must also deal
with register allocation of temporaries.  Much parallelism can be
exposed via the OrdList, but more might occur, so further analysis
might be needed.

\begin{code}
scheduleMachCode :: [InstrBlock] -> [[Instr]]

scheduleMachCode
  = map (runRegAllocate freeRegsState findReservedRegs)
  where
    freeRegsState = mkMRegsState (extractMappedRegNos freeRegs)
\end{code}

%************************************************************************
%*									*
\subsection[NCOpt]{The Generic Optimiser}
%*									*
%************************************************************************

This is called between translating Abstract C to its Tree and actually
using the Native Code Generator to generate the annotations.  It's a
chance to do some strength reductions.

** Remember these all have to be machine independent ***

Note that constant-folding should have already happened, but we might
have introduced some new opportunities for constant-folding wrt
address manipulations.

\begin{code}
genericOpt :: [StixTree] -> [StixTree]
genericOpt = map stixConFold . stixPeep


stixPeep :: [StixTree] -> [StixTree]

-- This transformation assumes that the temp assigned to in t1
-- is not assigned to in t2; for otherwise the target of the
-- second assignment would be substituted for, giving nonsense
-- code.  As far as I can see, StixTemps are only ever assigned
-- to once.  It would be nice to be sure!
{-
stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs)
         : t2
         : ts )
   | stixCountTempUses u t2 == 1
     && sum (map (stixCountTempUses u) ts) == 0
   = trace ("nativeGen: stixInline: " ++ showSDoc (pprStixTree rhs))
           (stixPeep (stixSubst u rhs t2 : ts))

stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts)
stixPeep [t1]       = [t1]
stixPeep []         = []
-}

-- disable stix inlining until we figure out how to fix the
-- latent bugs in the register allocator which are exposed by
-- the inliner.
stixPeep = id
\end{code}

For most nodes, just optimize the children.

\begin{code}
stixConFold :: StixTree -> StixTree

stixConFold (StInd pk addr) = StInd pk (stixConFold addr)

stixConFold (StAssign pk dst src)
  = StAssign pk (stixConFold dst) (stixConFold src)

stixConFold (StJump addr) = StJump (stixConFold addr)

stixConFold (StCondJump addr test)
  = StCondJump addr (stixConFold test)

stixConFold (StCall fn cconv pk args)
  = StCall fn cconv pk (map stixConFold args)
\end{code}

Fold indices together when the types match:
\begin{code}
stixConFold (StIndex pk (StIndex pk' base off) off')
  | pk == pk'
  = StIndex pk (stixConFold base)
    	       (stixConFold (StPrim IntAddOp [off, off']))

stixConFold (StIndex pk base off)
  = StIndex pk (stixConFold base) (stixConFold off)
\end{code}

For PrimOps, we first optimize the children, and then we try our hand
at some constant-folding.

\begin{code}
stixConFold (StPrim op args) = stixPrimFold op (map stixConFold args)
\end{code}

Replace register leaves with appropriate StixTrees for the given
target.

\begin{code}
stixConFold leaf@(StReg (StixMagicId id))
  = case (stgReg id) of
    	Always tree -> stixConFold tree
    	Save _      -> leaf

stixConFold other = other
\end{code}

Now, try to constant-fold the PrimOps.  The arguments have already
been optimized and folded.

\begin{code}
stixPrimFold
    :: PrimOp	    	-- The operation from an StPrim
    -> [StixTree]   	-- The optimized arguments
    -> StixTree

stixPrimFold op arg@[StInt x]
  = case op of
    	IntNegOp -> StInt (-x)
    	_ -> StPrim op arg

stixPrimFold op args@[StInt x, StInt y]
  = case op of
    	CharGtOp -> StInt (if x > y  then 1 else 0)
    	CharGeOp -> StInt (if x >= y then 1 else 0)
    	CharEqOp -> StInt (if x == y then 1 else 0)
    	CharNeOp -> StInt (if x /= y then 1 else 0)
    	CharLtOp -> StInt (if x < y  then 1 else 0)
    	CharLeOp -> StInt (if x <= y then 1 else 0)
    	IntAddOp -> StInt (x + y)
    	IntSubOp -> StInt (x - y)
    	IntMulOp -> StInt (x * y)
    	IntQuotOp -> StInt (x `quot` y)
    	IntRemOp -> StInt (x `rem` y)
    	IntGtOp -> StInt (if x > y  then 1 else 0)
    	IntGeOp -> StInt (if x >= y then 1 else 0)
    	IntEqOp -> StInt (if x == y then 1 else 0)
    	IntNeOp -> StInt (if x /= y then 1 else 0)
    	IntLtOp -> StInt (if x < y  then 1 else 0)
    	IntLeOp -> StInt (if x <= y then 1 else 0)
	-- ToDo: WordQuotOp, WordRemOp.
    	_ -> StPrim op args
\end{code}

When possible, shift the constants to the right-hand side, so that we
can match for strength reductions.  Note that the code generator will
also assume that constants have been shifted to the right when
possible.

\begin{code}
stixPrimFold op [x@(StInt _), y] | commutableOp op = stixPrimFold op [y, x]
\end{code}

We can often do something with constants of 0 and 1 ...

\begin{code}
stixPrimFold op args@[x, y@(StInt 0)]
  = case op of
    	IntAddOp -> x
    	IntSubOp -> x
    	IntMulOp -> y
    	AndOp  	 -> y
    	OrOp   	 -> x
    	XorOp  	 -> x
    	SllOp  	 -> x
    	SrlOp  	 -> x
    	ISllOp 	 -> x
    	ISraOp 	 -> x
    	ISrlOp 	 -> x
        IntNeOp  | is_comparison -> x
    	_	 -> StPrim op args
    where
       is_comparison
          = case x of
               StPrim opp [_, _] -> opp `elem` comparison_ops
               _                 -> False

stixPrimFold op args@[x, y@(StInt 1)]
  = case op of
    	IntMulOp  -> x
    	IntQuotOp -> x
    	IntRemOp  -> StInt 0
    	_	  -> StPrim op args
\end{code}

Now look for multiplication/division by powers of 2 (integers).

\begin{code}
stixPrimFold op args@[x, y@(StInt n)]
  = case op of
    	IntMulOp -> case exactLog2 n of
	    Nothing -> StPrim op args
    	    Just p  -> StPrim ISllOp [x, StInt p]
    	IntQuotOp -> case exactLog2 n of
	    Nothing -> StPrim op args
    	    Just p  -> StPrim ISrlOp [x, StInt p]
    	_ -> StPrim op args
\end{code}

Anything else is just too hard.

\begin{code}
stixPrimFold op args = StPrim op args
\end{code}

\begin{code}
comparison_ops
   = [ CharGtOp  , CharGeOp  , CharEqOp  , CharNeOp  , CharLtOp  , CharLeOp,
       IntGtOp   , IntGeOp   , IntEqOp   , IntNeOp   , IntLtOp   , IntLeOp,
       WordGtOp  , WordGeOp  , WordEqOp  , WordNeOp  , WordLtOp  , WordLeOp,
       AddrGtOp  , AddrGeOp  , AddrEqOp  , AddrNeOp  , AddrLtOp  , AddrLeOp,
       FloatGtOp , FloatGeOp , FloatEqOp , FloatNeOp , FloatLtOp , FloatLeOp,
       DoubleGtOp, DoubleGeOp, DoubleEqOp, DoubleNeOp, DoubleLtOp, DoubleLeOp
     ]
\end{code}