Compiling and Linking ---- Assume we have: - one or more source files, *.go, perhaps in different directories - a compiler, C. it takes one .go file and generates a .o file. - a linker, L, it takes one or more .o files and generates a go.out (!) file. There is a question around naming of the files. Let's avoid that problem for now and state that if the input is X.go, the output of the compiler is X.o, ignoring the package declaration in the file. This is not current behavior and probably not correct behavior, but it keeps the exposition simpler. Let's also assume that the linker knows about the run time and we don't have to specify bootstrap and runtime linkage explicitly. Basics ---- Given a single file, main.go, with no dependencies, we do: C main.go # compile L main.o # link go.out # run Now let's say that main.go contains import "fmt" and that fmt.go contains import "sys" Then to build, we must compile in dependency order: C sys.go C fmt.go C main.go and then link L main.o fmt.o sys.o To the linker itself, the order of arguments is unimportant. When we compile fmt.go, we need to know the details of the functions (etc.) exported by sys.go and used by fmt.go. When we run C fmt.go it discovers the import of sys, and must then read sys.o to discover the details. We must therefore compile the exporting source file before we can compile the importing source. Moreover, if there is a mismatch between export and import, we can discover it during compilation of the importing source. To be explicit, then, what we say is, in effect C sys.go C fmt.go sys.o C main.go fmt.o sys.o L main.o fmt.o sys.o The contents of .o files (I) ---- It's necessary to include in fmt.o the information for linking against the functions etc. in sys.o. It's also possible to identify sys.o explicitly inside fmt.o, so we need to say only L main.o fmt.o with sys.o discovered automatically. Iterating again, it's easy to reduce the link step to L main.o with L discovering automatically the .o files it needs to process to create the final go.out. Automation of dependencies (I) ---- It should be possible to automate discovery of the dependencies of main.go and therefore the order necessary to compile. Since the source files contain explicit import statements, it is possible, given a source file, to discover the dependency tree automatically. (This will require rules and/or conventions about where to find things; for now assume everything is in the same directory.) The program that does this might possibly be a variant of the compiler, since it must parse import statements at least, but for clarity let's call it D for dependency. It can be a little like make, but let's not call it make because that brings along properties we don't want. In particular, it reads the sources to discover the dependencies; it doesn't need a separate description such as a Makefile. In a directory with the source files above, including main.go, but with no .o files, we say: D main.go D reads main.go, finds the import for fmt, and in effect descends, automatically running D fmt.go which in turn invokes D sys.go The file sys.go has no dependencies, so it can be compiled; D therefore says in effect "compile sys.go" and returns; then we have what we need for fmt.go since the exports in sys.go are known (or at least the recipe to discover them is known). So the next level says "compile fmt.go" and pops up, whereupon the top D says "compile main.go" The output of D could therefore be described as a script to run to compile the source. We could imagine that instead, D actually runs the compiler. (Conversely, we could imagine that C uses D to make sure the dependencies are built, but that has the danger of causing unnecessary dependency checking and compilation; more on that later.) To build, therefore, all we need to say is: D -c main.go # -c means 'run the compiler' L main.o Obviously, D at this stage could just run L. Therefore, we can simplify further by having it do so, whereupon D -c main.go can automate the complete compilation and linking process. Automation of dependencies (II) ---- Let's say we now edit main.go without changing its imports. To recompile, we have two options. First, we could be explicit: C main.go Or we could use D to automate running the compiler, as described in the previous section: D -c main.go The D command will discover the import of fmt, but can see that fmt.o already exists. Assuming its existence implies its currency, it need go no further; it can invoke C to compile main.go and link as usual. Whether it should make this assumption might be controlled by a flag. For the purpose of discussion, let's say it makes the assumption if the -c flag is set. There are two implications to this scheme. First, running D when D is going to turn around and run C anyway implies we could just run C directly and save one command invocation. (We could decide independently whether C should automatically invoke the linker.) The other implication is more interesting. If we stop traversing the dependency hierarchy as soon as we discover a .o file, then we may not realize that fmt.o is out of date and link against a stale binary. To fix this problem, we need to stat() or checksum the .o and .go files to see if they need recompilation. Doing this every time is expensive and gets us back into the make-like approach. The great majority of compilations do not require this full check, however; this is especially true when in the compile-debug-edit cycle. We therefore propose splitting the model into two scenarios. Scenario 1: General In this scenario, we ask D to update the full dependency tree by stat()-ing or checksumming files to check currency. The generated go.out will always be up to date but incremental compilation will be slower. Typically, this will be necessary only after a major operation like syncing or checking out code, or if there are known changes being made to the dependencies. Scenario 2: Fast In this scenario, we explicitly tell D -c what has changed and have it compile only what is required. Typically, this will mean compiling only the single active file or maybe a few files. If an IDE is present or there is some watcher tool, it's easy to avoid the common mistake of forgetting to compile a changed file. If an edit has caused skew between export and import, this will be caught by the compiler, so it should be type-safe at least. If D is running the compilation, it might be possible to arrange that C tells it there is a dependency problem and have D then try to resolve it by reevaluation. The contents of .o files (II) ---- For scenario 2, we can make things even faster if the .o files identify not just the files that must be imported to satisfy the imports, but details about the imports themselves. Let's say main.go uses only one function from fmt.go, called F. If the compiled main.o says, in effect from package fmt get F then the linker will not need to read all of fmt.o to link main.o; instead it can extract only the necessary function. Even better, if fmt is a package made of many files, it may be possible to store in main.o specific information about the exact files needed: from file fmtF.o get F The linker can then not even bother opening the other .o files that form package fmt. The compiler should therefore be explicit and detailed within the .o files it generates about what elements of a package are needed by the program being compiled. Earlier, we said that when we run C fmt.go it discovers the import of sys, and must then read sys.o to discover the details. Note that if we record the information as specified here, when we then do C main.go and it reads fmt.o, it does not in turn need to read sys.o; the necessary information has already been pulled up into fmt.o by D. Thus, once the dependency information is properly constructed, to compile a program X.go we must read X.go plus N .o files, where N is the number of packages explicitly imported by X.go. The transitive closure need not be evaluated to compile a file, only the explicit imports. By this result, we hope to dramatically reduce the amount of I/O necessary to compile a Go source file. To put this another way, if a package P imports packages Xi, the existence of Xi.o files is all that is needed to compile P because the Xi.o files contain the export information. This is what breaks the transitive dependency closure.