Safe Haskell

Safe Haskell safe haskell Safe Haskell is an extension to the Haskell language that is implemented in GHC as of version 7.2. It allows for unsafe code to be securely included in a trusted code base by restricting the features of GHC Haskell the code is allowed to use. Put simply, it makes the types of programs trustable. Safe Haskell is aimed to be as minimal as possible while still providing strong enough guarantees about compiled Haskell code for more advance secure systems to be built on top of it. While this is the use case that Safe Haskell was motivated by it is important to understand that what Safe Haskell is tracking and enforcing is a stricter form of type safety than is usually guaranteed in Haskell. As part of this, Safe Haskell is run during every compilation of GHC, tracking safety and inferring it even for modules that don't explicitly use Safe Haskell. Please refer to section for more details of this. This also means that there are some design choices that from a security point of view may seem strange but when thought of from the angle of tracking type safety are logical. Feedback on the current design and this tension between the security and type safety view points is welcome. The design of Safe Haskell covers the following aspects: A safe language dialect of Haskell that provides guarantees about the code. It allows types and module boundaries to be trusted. A safe import extension that specifies that the module being imported must be trusted. A definition of trust (or safety) and how it operates, along with ways of defining and changing the trust of modules and packages. Safe Haskell, however, does not offer compilation safety. During compilation time it is possible for arbitrary processes to be launched, using for example the custom pre-processor flag. This can be manipulated to either compromise a users system at compilation time, or to modify the source code just before compilation to try to alter set Safe Haskell flags. This is discussed further in section . Uses of Safe Haskell safe haskell uses Safe Haskell has been designed with two use cases in mind: Enforcing strict type safety at compile time Compiling and executing untrusted code Strict type-safety (good style) Haskell offers a powerful type system and separation of pure and effectual functions through the IO monad. There are several loop holes in the type system though, the most obvious offender being the unsafePerformIO :: IO a -> a function. The safe language dialect of Safe Haskell disallows the use of such functions. This can be useful for a variety of purposes as it makes Haskell code easier to analyze and reason about. It also codifies an existing culture in the Haskell community of trying to avoid using such unsafe functions unless absolutely necessary. As such using the safe language (through the flag) can be thought of as a way of enforcing good style, similar to the function of . Building secure systems (restricted IO Monads) secure haskell Systems such as information flow control security, capability based security systems and DSLs for working with encrypted data.. etc can be built in the Haskell language simply as a library. However they require guarantees about the properties of the Haskell language that aren't true in the general case where uses of functions like unsafePerformIO are allowed. Safe Haskell is designed to give users enough guarantees about the safety properties of compiled code so that such secure systems can be built. As an example lets define an interface for a plugin system where the plugin authors are untrusted, possibly malicious third-parties. We do this by restricting the plugin interface to pure functions or to a restricted IO monad that we have defined that only allows a safe subset of IO actions to be executed. We define the plugin interface here so that it requires the plugin module, Danger, to export a single computation, Danger.runMe, of type RIO (), where RIO is a new monad defined as follows: -- Either of the following Safe Haskell pragmas would do {-# LANGUAGE Trustworthy #-} {-# LANGUAGE Safe #-} module RIO (RIO(), runRIO, rioReadFile, rioWriteFile) where -- Notice that symbol UnsafeRIO is not exported from this module! newtype RIO a = UnsafeRIO { runRIO :: IO a } instance Monad RIO where return = UnsafeRIO . return (UnsafeRIO m) >>= k = UnsafeRIO $ m >>= runRIO . k -- Returns True iff access is allowed to file name pathOK :: FilePath -> IO Bool pathOK file = {- Implement some policy based on file name -} rioReadFile :: FilePath -> RIO String rioReadFile file = UnsafeRIO $ do ok <- pathOK file if ok then readFile file else return "" rioWriteFile :: FilePath -> String -> RIO () rioWriteFile file contents = UnsafeRIO $ do ok <- pathOK file if ok then writeFile file contents else return () We compile Danger using the new Safe Haskell flag: {-# LANGUAGE Safe #-} module Danger ( runMe ) where runMe :: RIO () runMe = ... Before going into the Safe Haskell details, lets point out some of the reasons this design would fail without Safe Haskell: The design attempts to restrict the operations that Danger can perform by using types, specifically the RIO type wrapper around IO. The author of Danger can subvert this though by simply writing arbitrary IO actions and using unsafePerformIO :: IO a -> a to execute them as pure functions. The design also relies on the Danger module not being able to access the UnsafeRIO constructor. Unfortunately Template Haskell can be used to subvert module boundaries and so could be used to gain access to this constructor. There is no way to place restrictions on the modules that the untrusted Danger module can import. This gives the author of Danger a very large attack surface, essentially any package currently installed on the system. Should any of these packages have a vulnerability then the Danger module can exploit this. The only way to stop this would be to patch or remove packages with known vulnerabilities even if they should only be used by trusted code such as the RIO module. To stop these attacks Safe Haskell can be used. This is done by compiling the RIO module with the flag and compiling the Danger module with the flag. The use of the flag to compile the Danger module restricts the features of Haskell that can be used to a safe subset. This includes disallowing unsafePerfromIO, Template Haskell, pure FFI functions, Generalized Newtype Deriving, RULES and restricting the operation of Overlapping Instances. The flag also restricts the modules can be imported by Danger to only those that are considered trusted. Trusted modules are those compiled with , where GHC provides a mechanical guarantee that the code is safe. Or those modules compiled with , where the module author claims that the module is Safe. This is why the RIO module is compiled with , to allow the Danger module to import it. The flag doesn't place any restrictions on the module like does. Instead the module author claims that while code may use unsafe features internally, it only exposes an API that can used in a safe manner. The use of by itself marks the module as trusted. There is an issue here as may be used by an arbitrary module and module author. To control the use of trustworthy modules it is recommended to use the flag. This flag adds an extra requirement to the trust check for trustworthy modules, such that for trustworthy modules to be considered trusted, and allowed to be used in compiled code, the client C compiling the code must tell GHC that they trust the package the trustworthy module resides in. This is essentially a way of for C to say, while this package contains trustworthy modules that can be used by untrusted modules compiled with , I trust the author(s) of this package and trust the modules only expose a safe API. The trust of a package can be changed at any time, so if a vulnerability found in a package, C can declare that package untrusted so that any future compilation against that package would fail. For a more detailed overview of this mechanism see . In the example, Danger can import module RIO because RIO is marked trustworthy. Thus, Danger can make use of the rioReadFile and rioWriteFile functions to access permitted file names. The main application then imports both RIO and Danger. To run the plugin, it calls RIO.runRIO Danger.runMe within the IO monad. The application is safe in the knowledge that the only IO to ensue will be to files whose paths were approved by the pathOK test. Safe Language safe language The Safe Haskell safe language guarantees the following properties: Referential transparency — Functions in the safe language are deterministic, evaluating them will not cause any side effects. Functions in the IO monad are still allowed and behave as usual. Any pure function though, as according to its type, is guaranteed to indeed be pure. This property allows a user of the safe language to trust the types. This means, for example, that the unsafePerformIO :: IO a -> a function is disallowed in the safe language. Module boundary control — Haskell code compiled using the safe language is guaranteed to only access symbols that are publicly available to it through other modules export lists. An important part of this is that safe compiled code is not able to examine or create data values using data constructors that it cannot import. If a module M establishes some invariants through careful use of its export list then code compiled using the safe language that imports M is guaranteed to respect those invariants. Because of this, Template Haskell and GeneralizedNewtypeDeriving are disabled in the safe language as they can be used to violate this property. Semantic consistency — The safe language is strictly a subset of Haskell as implemented by GHC. Any expression that compiles in the safe language has the same meaning as it does when compiled in normal Haskell. In addition, in any module that imports a safe language module, expressions that compile both with and without the safe import have the same meaning in both cases. That is, importing a module using the safe language cannot change the meaning of existing code that isn't dependent on that module. So for example, there are some restrictions placed on the Overlapping Instances extension as it can violate this property. These three properties guarantee that in the safe language you can trust the types, can trust that module export lists are respected and can trust that code that successfully compiles has the same meaning as it normally would. Lets now look at the details of the safe language. In the safe language dialect (enabled by ) we disable completely the following features: GeneralizedNewtypeDeriving — It can be used to violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend, breaking invariants they have established. TemplateHaskell — Is particularly dangerous, as it can cause side effects even at compilation time and can be used to access constructors of abstract data types. In the safe language dialect we restrict the following features: ForeignFunctionInterface — This is mostly safe, but foreign import declarations that import a function with a non-IO type are disallowed. All FFI imports must reside in the IO Monad. RULES — As they can change the behaviour of trusted code in unanticipated ways, violating semantic consistency, they are restricted in function. Specifically any RULES defined in a module M compiled with are dropped. RULES defined in trustworthy modules that M imports are still valid and will fire as usual. OverlappingInstances — This extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. The extension is not disabled for a module M compiled with but restricted. While M can define overlapping instance declarations, they can only overlap other instance declaration defined in M. If in a module N that imports M, at a call site that uses a type-class function there is a choice of which instance to use (i.e. an overlap) and the most specific instances is from M, then all the other choices must also be from M. If not, a compilation error will occur. A simple way to think of this is a same origin policy for overlapping instances defined in Safe compiled modules. Data.Typeable — We restrict Typeable instances to only derived ones (offered by GHC through the extension). Hand crafted instances of the Typeable type class are not allowed in Safe Haskell as this can easily be abused to unsafely coerce between types. Safe Imports safe imports Safe Haskell enables a small extension to the usual import syntax of Haskell, adding a safe keyword: impdecl -> import [safe] [qualified] modid [as modid] [impspec] When used, the module being imported with the safe keyword must be a trusted module, otherwise a compilation error will occur. The safe import extension is enabled by either of the , , or flags and corresponding PRAGMA's. When the flag is used, the safe keyword is allowed but meaningless, every import is required to be safe regardless. Trust and Safe Haskell Modes safe haskell trust trust The Safe Haskell extension introduces the following three language flags: -XSafe — Enables the safe language dialect, asking GHC to guarantee trust. The safe language dialect requires that all imports be trusted or a compilation error will occur. -XTrustworthy — Means that while this module may invoke unsafe functions internally, the module's author claims that it exports an API that can't be used in an unsafe way. This doesn't enable the safe language or place any restrictions on the allowed Haskell code. The trust guarantee is provided by the module author, not GHC. An import statement with the safe keyword results in a compilation error if the imported module is not trusted. An import statement without the keyword behaves as usual and can import any module whether trusted or not. -XUnsafe — Marks the module being compiled as unsafe so that modules compiled using can't import it. The procedure to check if a module is trusted or not depends on if the flag is present. The check is very similar in both cases with the presence of the flag simply enabling an extra requirement for trustworthy modules to be regarded as trusted. Trust check (<option>-fpackage-trust</option> disabled) trust check A module M in a package P is trusted by a client C if and only if: Both of these hold: The module was compiled with All of M's direct imports are trusted by C OR all of these hold: The module was compiled with All of M's direct safe imports are trusted by C The above definition of trust has an issue. Any module can be compiled with -XTrustworthy and it will be trusted regardless of what it does. To control this there is an additional definition of package trust (enabled with the flag). The point of package trusts is to require that the client C explicitly say which packages are allowed to contain trustworthy modules. That is, C establishes that it trusts a package P and its author and so trust the modules in P that use . When package trust is enabled, any modules that are considered trustworthy but reside in a package that isn't trusted are not considered trusted. A more formal definition is given in the next section. Trust check (<option>-fpackage-trust</option> enabled) trust check -fpackage-trust When the flag is enabled, whether or not a module is trusted depends on a notion of trust for packages, which is determined by the client C invoking GHC (i.e. you). A package P is trusted when one of these hold: C's package database records that P is trusted (and no command-line arguments override this) C's command-line flags say to trust P regardless of what is recorded in the package database. In either case, C is the only authority on package trust. It is up to the client to decide which packages they trust. When the flag is used a module M from package P is trusted by a client C if and only if: Both of these hold: The module was compiled with All of M's direct imports are trusted by C OR all of these hold: The module was compiled with All of M's direct safe imports are trusted by C Package P is trusted by C For the first trust definition the trust guarantee is provided by GHC through the restrictions imposed by the safe language. For the second definition of trust, the guarantee is provided initially by the module author. The client C then establishes that they trust the module author by indicating they trust the package the module resides in. This trust chain is required as GHC provides no guarantee for -XTrustworthy compiled modules. The reason there are two modes of checking trust is that the extra requirement enabled by causes the design of Safe Haskell to be invasive. Packages using Safe Haskell when the flag is enabled may or may not compile depending on the state of trusted packages on a users machine. A maintainer of a package foo that uses Safe Haskell so that security conscious Haskellers can use foo now may have other users of foo who don't know or care about Safe Haskell complaining about compilation problems they are having with foobecause a package barthat foo requires, isn't trusted on their machine. In this sense, the flag can be thought of as a flag to properly turn on Safe Haskell while without it, it's operating in a covert fashion. Having the flag also nicely unifies the semantics of how Safe Haskell works when used explicitly and how modules are inferred as safe. Example Package Wuggle: {-# LANGUAGE Safe #-} module Buggle where import Prelude f x = ...blah... Package P: {-# LANGUAGE Trustworthy #-} module M where import System.IO.Unsafe import safe Buggle Suppose a client C decides to trust package P. Then does C trust module M? To decide, GHC must check M's imports — M imports System.IO.Unsafe. M was compiled with , so P's author takes responsibility for that import. C trusts P's author, so C trusts M to only use its unsafe imports in a safe and consistent manner with respect to the API M exposes. M also has a safe import of Buggle, so for this import P's author takes no responsibility for the safety, so GHC must check whether Buggle is trusted by C. Is it? Well, it is compiled with , so the code in Buggle itself is machine-checked to be OK, but again under the assumption that all of Buggle's imports are trusted by C. Prelude comes from base, which C trusts, and is compiled with (While Prelude is typically imported implicitly, it still obeys the same rules outlined here). So Buggle is considered trusted. Notice that C didn't need to trust package Wuggle; the machine checking is enough. C only needs to trust packages that contain modules. Trustworthy Requirements trustworthy Module authors using the language extension for a module M should ensure that M's public API (the symbols exposed by its export list) can't be used in an unsafe manner. This mean that symbols exported should respect type safety and referential transparency. Package Trust package trust Safe Haskell gives packages a new Boolean property, that of trust. Several new options are available at the GHC command-line to specify the trust property of packages: -trust P — Exposes package P if it was hidden and considers it a trusted package regardless of the package database. -distrust P — Exposes package P if it was hidden and considers it an untrusted package regardless of the package database. -distrust-all-packages — Considers all packages distrusted unless they are explicitly set to be trusted by subsequent command-line options. To set a package's trust property in the package database please refer to . Safe Haskell Inference safe inference In the case where a module is compiled without one of , or being used, GHC will try to figure out itself if the module can be considered safe. This safety inference will never mark a module as trustworthy, only as either unsafe or as safe. GHC uses a simple method to determine this for a module M: If M would compile without error under the flag, then M is marked as safe. If M would fail to compile under the flag, then it is marked as unsafe. When should you use Safe Haskell inference and when should you use an explicit flag? The later case should be used when you have a hard requirement that the module be safe. That is, the use cases outlined and the purpose for which Safe Haskell is intended: compiling untrusted code. Safe inference is meant to be used by ordinary Haskell programmers. Users who probably don't care about Safe Haskell. Say you are writing a Haskell library. Then you probably just want to use Safe inference. Assuming you avoid any unsafe features of the language then your modules will be marked safe. This is a benefit as now a user of your library who may want to use it as part of an API exposed to untrusted code can use the library without change. If there wasn't safety inference then either the writer of the library would have to explicitly use Safe Haskell, which is an unreasonable expectation of the whole Haskell community. Or the user of the library would have to wrap it in a shim that simply re-exported your API through a trustworthy module, an annoying practice. Safe Haskell Flag Summary safe haskell flags In summary, Safe Haskell consists of the following three language flags: -XSafe -XSafe To be trusted, all of the module's direct imports must be trusted, but the module itself need not reside in a trusted package, because the compiler vouches for its trustworthiness. The "safe" keyword is allowed but meaningless in import statements, every import is required to be safe regardless. Module Trusted — Yes Haskell Language — Restricted to Safe Language Imported Modules — All forced to be safe imports, all must be trusted. -XTrustworthy -XTrustworthy This establishes that the module is trusted, but the guarantee is provided by the module's author. A client of this module then specifies that they trust the module author by specifying they trust the package containing the module. has no effect on the accepted range of Haskell programs or their semantics, except that they allow the safe import keyword. Module Trusted — Yes. Module Trusted ( enabled) — Yes but only if the package the module resides in is also trusted. Haskell Language — Unrestricted Imported Modules — Under control of module author which ones must be trusted. -XUnsafe -XUnsafe Mark a module as unsafe so that it can't be imported by code compiled with . Also enable the Safe Import extension so that a module can require a dependency to be trusted. Module Trusted — No Haskell Language — Unrestricted Imported Modules — Under control of module author which ones must be trusted. And one general flag: -fpackage-trust -fpackage-trust When enabled turn on an extra check for a trustworthy module M, requiring that the package M resides in is considered trusted for the M to be considered trusted. And two warning flags: -fwarn-unsafe -fwarn-unsafe Issue a warning if the module being compiled is regarded to be unsafe. Should be used to check the safety status of modules when using safe inference. -fwarn-safe -fwarn-safe Issue a warning if the module being compiled is regarded to be safe. Should be used to check the safety status of modules when using safe inference. Safe Compilation safe compilation GHC includes a variety of flags that allow arbitrary processes to be run at compilation time. One such example is the custom pre-processor flag. Another is the ability of Template Haskell to execute Haskell code at compilation time, including IO actions. Safe Haskell does not address this danger (although, Template Haskell is a disallowed feature). Due to this, it is suggested that when compiling untrusted source code that has had no manual inspection done, the following precautions be taken: Compile in a sandbox, such as a chroot or similar container technology. Or simply as a user with very reduced system access. Compile untrusted code with the flag being specified on the command line. This will ensure that modifications to the source being compiled can't disable the use of the Safe Language as the command line flag takes precedence over a source level pragma. Ensure that all untrusted code is imported as a safe import and that the flag is used with packages from untrusted sources being marked as untrusted. There is a more detailed discussion of the issues involved in compilation safety and some potential solutions on the GHC Wiki.