From 0831a12ea2fc73c33652eeec1adc79fa19700578 Mon Sep 17 00:00:00 2001
From: Simon Peyton Jones <simonpj@microsoft.com>
Date: Thu, 17 Jan 2013 10:54:07 +0000
Subject: Major patch to implement the new Demand Analyser

This patch is the result of Ilya Sergey's internship at MSR.  It
constitutes a thorough overhaul and simplification of the demand
analyser.  It makes a solid foundation on which we can now build.
Main changes are

* Instead of having one combined type for Demand, a Demand is
   now a pair (JointDmd) of
      - a StrDmd and
      - an AbsDmd.
   This allows strictness and absence to be though about quite
   orthogonally, and greatly reduces brain melt-down.

* Similarly in the DmdResult type, it's a pair of
     - a PureResult (indicating only divergence/non-divergence)
     - a CPRResult (which deals only with the CPR property

* In IdInfo, the
    strictnessInfo field contains a StrictSig, not a Maybe StrictSig
    demandInfo     field contains a Demand, not a Maybe Demand
  We don't need Nothing (to indicate no strictness/demand info)
  any more; topSig/topDmd will do.

* Remove "boxity" analysis entirely.  This was an attempt to
  avoid "reboxing", but it added complexity, is extremely
  ad-hoc, and makes very little difference in practice.

* Remove the "unboxing strategy" computation. This was an an
  attempt to ensure that a worker didn't get zillions of
  arguments by unboxing big tuples.  But in fact removing it
  DRAMATICALLY reduces allocation in an inner loop of the
  I/O library (where the threshold argument-count had been
  set just too low).  It's exceptional to have a zillion arguments
  and I don't think it's worth the complexity, especially since
  it turned out to have a serious performance hit.

* Remove quite a bit of ad-hoc cruft

* Move worthSplittingFun, worthSplittingThunk from WorkWrap to
  Demand. This allows JointDmd to be fully abstract, examined
  only inside Demand.

Everything else really follows from these changes.

All of this is really just refactoring, so we don't expect
big performance changes, but acutally the numbers look quite
good.  Here is a full nofib run with some highlights identified:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
         expert          -2.6%    -15.5%      0.00      0.00     +0.0%
          fluid          -2.4%     -7.1%      0.01      0.01     +0.0%
             gg          -2.5%    -28.9%      0.02      0.02    -33.3%
      integrate          -2.6%     +3.2%     +2.6%     +2.6%     +0.0%
        mandel2          -2.6%     +4.2%      0.01      0.01     +0.0%
       nucleic2          -2.0%    -16.3%      0.11      0.11     +0.0%
           para          -2.6%    -20.0%    -11.8%    -11.7%     +0.0%
         parser          -2.5%    -17.9%      0.05      0.05     +0.0%
         prolog          -2.6%    -13.0%      0.00      0.00     +0.0%
         puzzle          -2.6%     +2.2%     +0.8%     +0.8%     +0.0%
        sorting          -2.6%    -35.9%      0.00      0.00     +0.0%
       treejoin          -2.6%    -52.2%     -9.8%     -9.9%     +0.0%
--------------------------------------------------------------------------------
            Min          -2.7%    -52.2%    -11.8%    -11.7%    -33.3%
            Max          -1.8%     +4.2%    +10.5%    +10.5%     +7.7%
 Geometric Mean          -2.5%     -2.8%     -0.4%     -0.5%     -0.4%

Things to note

* Binary sizes are smaller. I don't know why, but it's good.

* Allocation is sometiemes a *lot* smaller. I believe that all the big numbers
  (I checked treejoin, gg, sorting) arise from one place, namely a function
  GHC.IO.Encoding.UTF8.utf8_decode, which is strict in two Buffers both of
  which have several arugments.  Not w/w'ing both arguments (which is what
  we did before) has a big effect.  So the big win in actually somewhat
  accidental, gained by removing the "unboxing strategy" code.

* A couple of benchmarks allocate slightly more.  This turns out
  to be due to reboxing (integrate).  But the biggest increase is
  mandel2, and *that* turned out also to be a somewhat accidental
  loss of CSE, and pointed the way to doing better CSE: see Trac
  #7596.

* Runtimes are never very reliable, but seem to improve very slightly.

All in all, a good piece of work.  Thank you Ilya!
---
 compiler/iface/TcIface.lhs | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

(limited to 'compiler/iface/TcIface.lhs')

diff --git a/compiler/iface/TcIface.lhs b/compiler/iface/TcIface.lhs
index d6acc06688..930bb1e2a2 100644
--- a/compiler/iface/TcIface.lhs
+++ b/compiler/iface/TcIface.lhs
@@ -31,8 +31,8 @@ import CoreSyn
 import CoreUtils
 import CoreUnfold
 import CoreLint
-import WorkWrap
-import MkCore( castBottomExpr )
+import WorkWrap                     ( mkWrapper )
+import MkCore                       ( castBottomExpr )
 import Id
 import MkId
 import IdInfo
@@ -1205,7 +1205,7 @@ tcIdInfo ignore_prags name ty info
     tcPrag :: IdInfo -> IfaceInfoItem -> IfL IdInfo
     tcPrag info HsNoCafRefs        = return (info `setCafInfo`   NoCafRefs)
     tcPrag info (HsArity arity)    = return (info `setArityInfo` arity)
-    tcPrag info (HsStrictness str) = return (info `setStrictnessInfo` Just str)
+    tcPrag info (HsStrictness str) = return (info `setStrictnessInfo` str)
     tcPrag info (HsInline prag)    = return (info `setInlinePragInfo` prag)
 
         -- The next two are lazy, so they don't transitively suck stuff in
@@ -1226,12 +1226,11 @@ tcUnfolding name _ info (IfCoreUnfold stable if_expr)
                     Nothing   -> NoUnfolding
                     Just expr -> mkUnfolding dflags unf_src
                                              True {- Top level -} 
-                                             is_bottoming expr) }
+                                             is_bottoming
+                                             expr) }
   where
      -- Strictness should occur before unfolding!
-    is_bottoming = case strictnessInfo info of
-                     Just sig -> isBottomingSig sig
-                     Nothing  -> False
+    is_bottoming = isBottomingSig $ strictnessInfo info              
 
 tcUnfolding name _ _ (IfCompulsory if_expr)
   = do  { mb_expr <- tcPragExpr name if_expr
@@ -1278,12 +1277,9 @@ tcIfaceWrapper name ty info arity get_worker
         = mkWwInlineRule wkr_id
                          (initUs_ us (mkWrapper dflags ty strict_sig) wkr_id) 
                          arity
-
-        -- Again we rely here on strictness info always appearing 
-        -- before unfolding
-    strict_sig = case strictnessInfo info of
-                   Just sig -> sig
-                   Nothing  -> pprPanic "Worker info but no strictness for" (ppr name)
+        -- Again we rely here on strictness info 
+        -- always appearing before unfolding
+    strict_sig = strictnessInfo info
 \end{code}
 
 For unfoldings we try to do the job lazily, so that we never type check
-- 
cgit v1.2.1