diff options
author | gcc <gcc@7b3dc134-2b1b-0410-93df-9e9f96275f8d> | 2006-08-17 01:18:26 +0000 |
---|---|---|
committer | gcc <gcc@7b3dc134-2b1b-0410-93df-9e9f96275f8d> | 2006-08-17 01:18:26 +0000 |
commit | 15f34685e7a9b5caf761af2ebf6afa20438d440b (patch) | |
tree | dc04ce3cdf040f198743c15b64557824de174680 /libc/manual/llio.texi | |
parent | 1e848e0e775a36f6359161f5deb890942ef42ff3 (diff) | |
download | eglibc2-15f34685e7a9b5caf761af2ebf6afa20438d440b.tar.gz |
Import glibc-mainline for 2006-08-16
git-svn-id: svn://svn.eglibc.org/fsf/trunk@4 7b3dc134-2b1b-0410-93df-9e9f96275f8d
Diffstat (limited to 'libc/manual/llio.texi')
-rw-r--r-- | libc/manual/llio.texi | 3666 |
1 files changed, 3666 insertions, 0 deletions
diff --git a/libc/manual/llio.texi b/libc/manual/llio.texi new file mode 100644 index 000000000..1d088d8ee --- /dev/null +++ b/libc/manual/llio.texi @@ -0,0 +1,3666 @@ +@node Low-Level I/O, File System Interface, I/O on Streams, Top +@c %MENU% Low-level, less portable I/O +@chapter Low-Level Input/Output + +This chapter describes functions for performing low-level input/output +operations on file descriptors. These functions include the primitives +for the higher-level I/O functions described in @ref{I/O on Streams}, as +well as functions for performing low-level control operations for which +there are no equivalents on streams. + +Stream-level I/O is more flexible and usually more convenient; +therefore, programmers generally use the descriptor-level functions only +when necessary. These are some of the usual reasons: + +@itemize @bullet +@item +For reading binary files in large chunks. + +@item +For reading an entire file into core before parsing it. + +@item +To perform operations other than data transfer, which can only be done +with a descriptor. (You can use @code{fileno} to get the descriptor +corresponding to a stream.) + +@item +To pass descriptors to a child process. (The child can create its own +stream to use a descriptor that it inherits, but cannot inherit a stream +directly.) +@end itemize + +@menu +* Opening and Closing Files:: How to open and close file + descriptors. +* I/O Primitives:: Reading and writing data. +* File Position Primitive:: Setting a descriptor's file + position. +* Descriptors and Streams:: Converting descriptor to stream + or vice-versa. +* Stream/Descriptor Precautions:: Precautions needed if you use both + descriptors and streams. +* Scatter-Gather:: Fast I/O to discontinuous buffers. +* Memory-mapped I/O:: Using files like memory. +* Waiting for I/O:: How to check for input or output + on multiple file descriptors. +* Synchronizing I/O:: Making sure all I/O actions completed. +* Asynchronous I/O:: Perform I/O in parallel. +* Control Operations:: Various other operations on file + descriptors. +* Duplicating Descriptors:: Fcntl commands for duplicating + file descriptors. +* Descriptor Flags:: Fcntl commands for manipulating + flags associated with file + descriptors. +* File Status Flags:: Fcntl commands for manipulating + flags associated with open files. +* File Locks:: Fcntl commands for implementing + file locking. +* Interrupt Input:: Getting an asynchronous signal when + input arrives. +* IOCTLs:: Generic I/O Control operations. +@end menu + + +@node Opening and Closing Files +@section Opening and Closing Files + +@cindex opening a file descriptor +@cindex closing a file descriptor +This section describes the primitives for opening and closing files +using file descriptors. The @code{open} and @code{creat} functions are +declared in the header file @file{fcntl.h}, while @code{close} is +declared in @file{unistd.h}. +@pindex unistd.h +@pindex fcntl.h + +@comment fcntl.h +@comment POSIX.1 +@deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) +The @code{open} function creates and returns a new file descriptor +for the file named by @var{filename}. Initially, the file position +indicator for the file is at the beginning of the file. The argument +@var{mode} is used only when a file is created, but it doesn't hurt +to supply the argument in any case. + +The @var{flags} argument controls how the file is to be opened. This is +a bit mask; you create the value by the bitwise OR of the appropriate +parameters (using the @samp{|} operator in C). +@xref{File Status Flags}, for the parameters available. + +The normal return value from @code{open} is a non-negative integer file +descriptor. In the case of an error, a value of @math{-1} is returned +instead. In addition to the usual file name errors (@pxref{File +Name Errors}), the following @code{errno} error conditions are defined +for this function: + +@table @code +@item EACCES +The file exists but is not readable/writable as requested by the @var{flags} +argument, the file does not exist and the directory is unwritable so +it cannot be created. + +@item EEXIST +Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already +exists. + +@item EINTR +The @code{open} operation was interrupted by a signal. +@xref{Interrupted Primitives}. + +@item EISDIR +The @var{flags} argument specified write access, and the file is a directory. + +@item EMFILE +The process has too many files open. +The maximum number of file descriptors is controlled by the +@code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. + +@item ENFILE +The entire system, or perhaps the file system which contains the +directory, cannot support any additional open files at the moment. +(This problem cannot happen on the GNU system.) + +@item ENOENT +The named file does not exist, and @code{O_CREAT} is not specified. + +@item ENOSPC +The directory or file system that would contain the new file cannot be +extended, because there is no disk space left. + +@item ENXIO +@code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags} +argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and +FIFOs}), and no process has the file open for reading. + +@item EROFS +The file resides on a read-only file system and any of @w{@code{O_WRONLY}}, +@code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument, +or @code{O_CREAT} is set and the file does not already exist. +@end table + +@c !!! umask + +If on a 32 bit machine the sources are translated with +@code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file +descriptor opened in the large file mode which enables the file handling +functions to use files up to @math{2^63} bytes in size and offset from +@math{-2^63} to @math{2^63}. This happens transparently for the user +since all of the lowlevel file handling functions are equally replaced. + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{open} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this calls to @code{open} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The @code{open} function is the underlying primitive for the @code{fopen} +and @code{freopen} functions, that create streams. +@end deftypefun + +@comment fcntl.h +@comment Unix98 +@deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) +This function is similar to @code{open}. It returns a file descriptor +which can be used to access the file named by @var{filename}. The only +difference is that on 32 bit systems the file is opened in the +large file mode. I.e., file length and file offsets can exceed 31 bits. + +When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this +function is actually available under the name @code{open}. I.e., the +new, extended API using 64 bit file sizes and offsets transparently +replaces the old API. +@end deftypefun + +@comment fcntl.h +@comment POSIX.1 +@deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode}) +This function is obsolete. The call: + +@smallexample +creat (@var{filename}, @var{mode}) +@end smallexample + +@noindent +is equivalent to: + +@smallexample +open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode}) +@end smallexample + +If on a 32 bit machine the sources are translated with +@code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file +descriptor opened in the large file mode which enables the file handling +functions to use files up to @math{2^63} in size and offset from +@math{-2^63} to @math{2^63}. This happens transparently for the user +since all of the lowlevel file handling functions are equally replaced. +@end deftypefn + +@comment fcntl.h +@comment Unix98 +@deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode}) +This function is similar to @code{creat}. It returns a file descriptor +which can be used to access the file named by @var{filename}. The only +the difference is that on 32 bit systems the file is opened in the +large file mode. I.e., file length and file offsets can exceed 31 bits. + +To use this file descriptor one must not use the normal operations but +instead the counterparts named @code{*64}, e.g., @code{read64}. + +When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this +function is actually available under the name @code{open}. I.e., the +new, extended API using 64 bit file sizes and offsets transparently +replaces the old API. +@end deftypefn + +@comment unistd.h +@comment POSIX.1 +@deftypefun int close (int @var{filedes}) +The function @code{close} closes the file descriptor @var{filedes}. +Closing a file has the following consequences: + +@itemize @bullet +@item +The file descriptor is deallocated. + +@item +Any record locks owned by the process on the file are unlocked. + +@item +When all file descriptors associated with a pipe or FIFO have been closed, +any unread data is discarded. +@end itemize + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{close} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this, calls to @code{close} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The normal return value from @code{close} is @math{0}; a value of @math{-1} +is returned in case of failure. The following @code{errno} error +conditions are defined for this function: + +@table @code +@item EBADF +The @var{filedes} argument is not a valid file descriptor. + +@item EINTR +The @code{close} call was interrupted by a signal. +@xref{Interrupted Primitives}. +Here is an example of how to handle @code{EINTR} properly: + +@smallexample +TEMP_FAILURE_RETRY (close (desc)); +@end smallexample + +@item ENOSPC +@itemx EIO +@itemx EDQUOT +When the file is accessed by NFS, these errors from @code{write} can sometimes +not be detected until @code{close}. @xref{I/O Primitives}, for details +on their meaning. +@end table + +Please note that there is @emph{no} separate @code{close64} function. +This is not necessary since this function does not determine nor depend +on the mode of the file. The kernel which performs the @code{close} +operation knows which mode the descriptor is used for and can handle +this situation. +@end deftypefun + +To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead +of trying to close its underlying file descriptor with @code{close}. +This flushes any buffered output and updates the stream object to +indicate that it is closed. + +@node I/O Primitives +@section Input and Output Primitives + +This section describes the functions for performing primitive input and +output operations on file descriptors: @code{read}, @code{write}, and +@code{lseek}. These functions are declared in the header file +@file{unistd.h}. +@pindex unistd.h + +@comment unistd.h +@comment POSIX.1 +@deftp {Data Type} ssize_t +This data type is used to represent the sizes of blocks that can be +read or written in a single operation. It is similar to @code{size_t}, +but must be a signed type. +@end deftp + +@cindex reading from a file descriptor +@comment unistd.h +@comment POSIX.1 +@deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size}) +The @code{read} function reads up to @var{size} bytes from the file +with descriptor @var{filedes}, storing the results in the @var{buffer}. +(This is not necessarily a character string, and no terminating null +character is added.) + +@cindex end-of-file, on a file descriptor +The return value is the number of bytes actually read. This might be +less than @var{size}; for example, if there aren't that many bytes left +in the file or if there aren't that many bytes immediately available. +The exact behavior depends on what kind of file it is. Note that +reading less than @var{size} bytes is not an error. + +A value of zero indicates end-of-file (except if the value of the +@var{size} argument is also zero). This is not considered an error. +If you keep calling @code{read} while at end-of-file, it will keep +returning zero and doing nothing else. + +If @code{read} returns at least one character, there is no way you can +tell whether end-of-file was reached. But if you did reach the end, the +next read will return zero. + +In case of an error, @code{read} returns @math{-1}. The following +@code{errno} error conditions are defined for this function: + +@table @code +@item EAGAIN +Normally, when no input is immediately available, @code{read} waits for +some input. But if the @code{O_NONBLOCK} flag is set for the file +(@pxref{File Status Flags}), @code{read} returns immediately without +reading any data, and reports this error. + +@strong{Compatibility Note:} Most versions of BSD Unix use a different +error code for this: @code{EWOULDBLOCK}. In the GNU library, +@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter +which name you use. + +On some systems, reading a large amount of data from a character special +file can also fail with @code{EAGAIN} if the kernel cannot find enough +physical memory to lock down the user's pages. This is limited to +devices that transfer with direct memory access into the user's memory, +which means it does not include terminals, since they always use +separate buffers inside the kernel. This problem never happens in the +GNU system. + +Any condition that could result in @code{EAGAIN} can instead result in a +successful @code{read} which returns fewer bytes than requested. +Calling @code{read} again immediately would result in @code{EAGAIN}. + +@item EBADF +The @var{filedes} argument is not a valid file descriptor, +or is not open for reading. + +@item EINTR +@code{read} was interrupted by a signal while it was waiting for input. +@xref{Interrupted Primitives}. A signal will not necessary cause +@code{read} to return @code{EINTR}; it may instead result in a +successful @code{read} which returns fewer bytes than requested. + +@item EIO +For many devices, and for disk files, this error code indicates +a hardware error. + +@code{EIO} also occurs when a background process tries to read from the +controlling terminal, and the normal action of stopping the process by +sending it a @code{SIGTTIN} signal isn't working. This might happen if +the signal is being blocked or ignored, or because the process group is +orphaned. @xref{Job Control}, for more information about job control, +and @ref{Signal Handling}, for information about signals. + +@item EINVAL +In some systems, when reading from a character or block device, position +and size offsets must be aligned to a particular block size. This error +indicates that the offsets were not properly aligned. +@end table + +Please note that there is no function named @code{read64}. This is not +necessary since this function does not directly modify or handle the +possibly wide file offset. Since the kernel handles this state +internally, the @code{read} function can be used for all cases. + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{read} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this, calls to @code{read} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The @code{read} function is the underlying primitive for all of the +functions that read from streams, such as @code{fgetc}. +@end deftypefun + +@comment unistd.h +@comment Unix98 +@deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset}) +The @code{pread} function is similar to the @code{read} function. The +first three arguments are identical, and the return values and error +codes also correspond. + +The difference is the fourth argument and its handling. The data block +is not read from the current position of the file descriptor +@code{filedes}. Instead the data is read from the file starting at +position @var{offset}. The position of the file descriptor itself is +not affected by the operation. The value is the same as before the call. + +When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the +@code{pread} function is in fact @code{pread64} and the type +@code{off_t} has 64 bits, which makes it possible to handle files up to +@math{2^63} bytes in length. + +The return value of @code{pread} describes the number of bytes read. +In the error case it returns @math{-1} like @code{read} does and the +error codes are also the same, with these additions: + +@table @code +@item EINVAL +The value given for @var{offset} is negative and therefore illegal. + +@item ESPIPE +The file descriptor @var{filedes} is associate with a pipe or a FIFO and +this device does not allow positioning of the file pointer. +@end table + +The function is an extension defined in the Unix Single Specification +version 2. +@end deftypefun + +@comment unistd.h +@comment Unix98 +@deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) +This function is similar to the @code{pread} function. The difference +is that the @var{offset} parameter is of type @code{off64_t} instead of +@code{off_t} which makes it possible on 32 bit machines to address +files larger than @math{2^31} bytes and up to @math{2^63} bytes. The +file descriptor @code{filedes} must be opened using @code{open64} since +otherwise the large offsets possible with @code{off64_t} will lead to +errors with a descriptor in small file mode. + +When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a +32 bit machine this function is actually available under the name +@code{pread} and so transparently replaces the 32 bit interface. +@end deftypefun + +@cindex writing to a file descriptor +@comment unistd.h +@comment POSIX.1 +@deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size}) +The @code{write} function writes up to @var{size} bytes from +@var{buffer} to the file with descriptor @var{filedes}. The data in +@var{buffer} is not necessarily a character string and a null character is +output like any other character. + +The return value is the number of bytes actually written. This may be +@var{size}, but can always be smaller. Your program should always call +@code{write} in a loop, iterating until all the data is written. + +Once @code{write} returns, the data is enqueued to be written and can be +read back right away, but it is not necessarily written out to permanent +storage immediately. You can use @code{fsync} when you need to be sure +your data has been permanently stored before continuing. (It is more +efficient for the system to batch up consecutive writes and do them all +at once when convenient. Normally they will always be written to disk +within a minute or less.) Modern systems provide another function +@code{fdatasync} which guarantees integrity only for the file data and +is therefore faster. +@c !!! xref fsync, fdatasync +You can use the @code{O_FSYNC} open mode to make @code{write} always +store the data to disk before returning; @pxref{Operating Modes}. + +In the case of an error, @code{write} returns @math{-1}. The following +@code{errno} error conditions are defined for this function: + +@table @code +@item EAGAIN +Normally, @code{write} blocks until the write operation is complete. +But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control +Operations}), it returns immediately without writing any data and +reports this error. An example of a situation that might cause the +process to block on output is writing to a terminal device that supports +flow control, where output has been suspended by receipt of a STOP +character. + +@strong{Compatibility Note:} Most versions of BSD Unix use a different +error code for this: @code{EWOULDBLOCK}. In the GNU library, +@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter +which name you use. + +On some systems, writing a large amount of data from a character special +file can also fail with @code{EAGAIN} if the kernel cannot find enough +physical memory to lock down the user's pages. This is limited to +devices that transfer with direct memory access into the user's memory, +which means it does not include terminals, since they always use +separate buffers inside the kernel. This problem does not arise in the +GNU system. + +@item EBADF +The @var{filedes} argument is not a valid file descriptor, +or is not open for writing. + +@item EFBIG +The size of the file would become larger than the implementation can support. + +@item EINTR +The @code{write} operation was interrupted by a signal while it was +blocked waiting for completion. A signal will not necessarily cause +@code{write} to return @code{EINTR}; it may instead result in a +successful @code{write} which writes fewer bytes than requested. +@xref{Interrupted Primitives}. + +@item EIO +For many devices, and for disk files, this error code indicates +a hardware error. + +@item ENOSPC +The device containing the file is full. + +@item EPIPE +This error is returned when you try to write to a pipe or FIFO that +isn't open for reading by any process. When this happens, a @code{SIGPIPE} +signal is also sent to the process; see @ref{Signal Handling}. + +@item EINVAL +In some systems, when writing to a character or block device, position +and size offsets must be aligned to a particular block size. This error +indicates that the offsets were not properly aligned. +@end table + +Unless you have arranged to prevent @code{EINTR} failures, you should +check @code{errno} after each failing call to @code{write}, and if the +error was @code{EINTR}, you should simply repeat the call. +@xref{Interrupted Primitives}. The easy way to do this is with the +macro @code{TEMP_FAILURE_RETRY}, as follows: + +@smallexample +nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count)); +@end smallexample + +Please note that there is no function named @code{write64}. This is not +necessary since this function does not directly modify or handle the +possibly wide file offset. Since the kernel handles this state +internally the @code{write} function can be used for all cases. + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{write} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this, calls to @code{write} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The @code{write} function is the underlying primitive for all of the +functions that write to streams, such as @code{fputc}. +@end deftypefun + +@comment unistd.h +@comment Unix98 +@deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset}) +The @code{pwrite} function is similar to the @code{write} function. The +first three arguments are identical, and the return values and error codes +also correspond. + +The difference is the fourth argument and its handling. The data block +is not written to the current position of the file descriptor +@code{filedes}. Instead the data is written to the file starting at +position @var{offset}. The position of the file descriptor itself is +not affected by the operation. The value is the same as before the call. + +When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the +@code{pwrite} function is in fact @code{pwrite64} and the type +@code{off_t} has 64 bits, which makes it possible to handle files up to +@math{2^63} bytes in length. + +The return value of @code{pwrite} describes the number of written bytes. +In the error case it returns @math{-1} like @code{write} does and the +error codes are also the same, with these additions: + +@table @code +@item EINVAL +The value given for @var{offset} is negative and therefore illegal. + +@item ESPIPE +The file descriptor @var{filedes} is associated with a pipe or a FIFO and +this device does not allow positioning of the file pointer. +@end table + +The function is an extension defined in the Unix Single Specification +version 2. +@end deftypefun + +@comment unistd.h +@comment Unix98 +@deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) +This function is similar to the @code{pwrite} function. The difference +is that the @var{offset} parameter is of type @code{off64_t} instead of +@code{off_t} which makes it possible on 32 bit machines to address +files larger than @math{2^31} bytes and up to @math{2^63} bytes. The +file descriptor @code{filedes} must be opened using @code{open64} since +otherwise the large offsets possible with @code{off64_t} will lead to +errors with a descriptor in small file mode. + +When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a +32 bit machine this function is actually available under the name +@code{pwrite} and so transparently replaces the 32 bit interface. +@end deftypefun + + +@node File Position Primitive +@section Setting the File Position of a Descriptor + +Just as you can set the file position of a stream with @code{fseek}, you +can set the file position of a descriptor with @code{lseek}. This +specifies the position in the file for the next @code{read} or +@code{write} operation. @xref{File Positioning}, for more information +on the file position and what it means. + +To read the current file position value from a descriptor, use +@code{lseek (@var{desc}, 0, SEEK_CUR)}. + +@cindex file positioning on a file descriptor +@cindex positioning a file descriptor +@cindex seeking on a file descriptor +@comment unistd.h +@comment POSIX.1 +@deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence}) +The @code{lseek} function is used to change the file position of the +file with descriptor @var{filedes}. + +The @var{whence} argument specifies how the @var{offset} should be +interpreted, in the same way as for the @code{fseek} function, and it must +be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or +@code{SEEK_END}. + +@table @code +@item SEEK_SET +Specifies that @var{whence} is a count of characters from the beginning +of the file. + +@item SEEK_CUR +Specifies that @var{whence} is a count of characters from the current +file position. This count may be positive or negative. + +@item SEEK_END +Specifies that @var{whence} is a count of characters from the end of +the file. A negative count specifies a position within the current +extent of the file; a positive count specifies a position past the +current end. If you set the position past the current end, and +actually write data, you will extend the file with zeros up to that +position. +@end table + +The return value from @code{lseek} is normally the resulting file +position, measured in bytes from the beginning of the file. +You can use this feature together with @code{SEEK_CUR} to read the +current file position. + +If you want to append to the file, setting the file position to the +current end of file with @code{SEEK_END} is not sufficient. Another +process may write more data after you seek but before you write, +extending the file so the position you write onto clobbers their data. +Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}. + +You can set the file position past the current end of the file. This +does not by itself make the file longer; @code{lseek} never changes the +file. But subsequent output at that position will extend the file. +Characters between the previous end of file and the new position are +filled with zeros. Extending the file in this way can create a +``hole'': the blocks of zeros are not actually allocated on disk, so the +file takes up less space than it appears to; it is then called a +``sparse file''. +@cindex sparse files +@cindex holes in files + +If the file position cannot be changed, or the operation is in some way +invalid, @code{lseek} returns a value of @math{-1}. The following +@code{errno} error conditions are defined for this function: + +@table @code +@item EBADF +The @var{filedes} is not a valid file descriptor. + +@item EINVAL +The @var{whence} argument value is not valid, or the resulting +file offset is not valid. A file offset is invalid. + +@item ESPIPE +The @var{filedes} corresponds to an object that cannot be positioned, +such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error +only for pipes and FIFOs, but in the GNU system, you always get +@code{ESPIPE} if the object is not seekable.) +@end table + +When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the +@code{lseek} function is in fact @code{lseek64} and the type +@code{off_t} has 64 bits which makes it possible to handle files up to +@math{2^63} bytes in length. + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{lseek} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this calls to @code{lseek} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The @code{lseek} function is the underlying primitive for the +@code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and +@code{rewind} functions, which operate on streams instead of file +descriptors. +@end deftypefun + +@comment unistd.h +@comment Unix98 +@deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence}) +This function is similar to the @code{lseek} function. The difference +is that the @var{offset} parameter is of type @code{off64_t} instead of +@code{off_t} which makes it possible on 32 bit machines to address +files larger than @math{2^31} bytes and up to @math{2^63} bytes. The +file descriptor @code{filedes} must be opened using @code{open64} since +otherwise the large offsets possible with @code{off64_t} will lead to +errors with a descriptor in small file mode. + +When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a +32 bits machine this function is actually available under the name +@code{lseek} and so transparently replaces the 32 bit interface. +@end deftypefun + +You can have multiple descriptors for the same file if you open the file +more than once, or if you duplicate a descriptor with @code{dup}. +Descriptors that come from separate calls to @code{open} have independent +file positions; using @code{lseek} on one descriptor has no effect on the +other. For example, + +@smallexample +@group +@{ + int d1, d2; + char buf[4]; + d1 = open ("foo", O_RDONLY); + d2 = open ("foo", O_RDONLY); + lseek (d1, 1024, SEEK_SET); + read (d2, buf, 4); +@} +@end group +@end smallexample + +@noindent +will read the first four characters of the file @file{foo}. (The +error-checking code necessary for a real program has been omitted here +for brevity.) + +By contrast, descriptors made by duplication share a common file +position with the original descriptor that was duplicated. Anything +which alters the file position of one of the duplicates, including +reading or writing data, affects all of them alike. Thus, for example, + +@smallexample +@{ + int d1, d2, d3; + char buf1[4], buf2[4]; + d1 = open ("foo", O_RDONLY); + d2 = dup (d1); + d3 = dup (d2); + lseek (d3, 1024, SEEK_SET); + read (d1, buf1, 4); + read (d2, buf2, 4); +@} +@end smallexample + +@noindent +will read four characters starting with the 1024'th character of +@file{foo}, and then four more characters starting with the 1028'th +character. + +@comment sys/types.h +@comment POSIX.1 +@deftp {Data Type} off_t +This is an arithmetic data type used to represent file sizes. +In the GNU system, this is equivalent to @code{fpos_t} or @code{long int}. + +If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type +is transparently replaced by @code{off64_t}. +@end deftp + +@comment sys/types.h +@comment Unix98 +@deftp {Data Type} off64_t +This type is used similar to @code{off_t}. The difference is that even +on 32 bit machines, where the @code{off_t} type would have 32 bits, +@code{off64_t} has 64 bits and so is able to address files up to +@math{2^63} bytes in length. + +When compiling with @code{_FILE_OFFSET_BITS == 64} this type is +available under the name @code{off_t}. +@end deftp + +These aliases for the @samp{SEEK_@dots{}} constants exist for the sake +of compatibility with older BSD systems. They are defined in two +different header files: @file{fcntl.h} and @file{sys/file.h}. + +@table @code +@item L_SET +An alias for @code{SEEK_SET}. + +@item L_INCR +An alias for @code{SEEK_CUR}. + +@item L_XTND +An alias for @code{SEEK_END}. +@end table + +@node Descriptors and Streams +@section Descriptors and Streams +@cindex streams, and file descriptors +@cindex converting file descriptor to stream +@cindex extracting file descriptor from stream + +Given an open file descriptor, you can create a stream for it with the +@code{fdopen} function. You can get the underlying file descriptor for +an existing stream with the @code{fileno} function. These functions are +declared in the header file @file{stdio.h}. +@pindex stdio.h + +@comment stdio.h +@comment POSIX.1 +@deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype}) +The @code{fdopen} function returns a new stream for the file descriptor +@var{filedes}. + +The @var{opentype} argument is interpreted in the same way as for the +@code{fopen} function (@pxref{Opening Streams}), except that +the @samp{b} option is not permitted; this is because GNU makes no +distinction between text and binary files. Also, @code{"w"} and +@code{"w+"} do not cause truncation of the file; these have an effect only +when opening a file, and in this case the file has already been opened. +You must make sure that the @var{opentype} argument matches the actual +mode of the open file descriptor. + +The return value is the new stream. If the stream cannot be created +(for example, if the modes for the file indicated by the file descriptor +do not permit the access specified by the @var{opentype} argument), a +null pointer is returned instead. + +In some other systems, @code{fdopen} may fail to detect that the modes +for file descriptor do not permit the access specified by +@code{opentype}. The GNU C library always checks for this. +@end deftypefun + +For an example showing the use of the @code{fdopen} function, +see @ref{Creating a Pipe}. + +@comment stdio.h +@comment POSIX.1 +@deftypefun int fileno (FILE *@var{stream}) +This function returns the file descriptor associated with the stream +@var{stream}. If an error is detected (for example, if the @var{stream} +is not valid) or if @var{stream} does not do I/O to a file, +@code{fileno} returns @math{-1}. +@end deftypefun + +@comment stdio.h +@comment GNU +@deftypefun int fileno_unlocked (FILE *@var{stream}) +The @code{fileno_unlocked} function is equivalent to the @code{fileno} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + +@cindex standard file descriptors +@cindex file descriptors, standard +There are also symbolic constants defined in @file{unistd.h} for the +file descriptors belonging to the standard streams @code{stdin}, +@code{stdout}, and @code{stderr}; see @ref{Standard Streams}. +@pindex unistd.h + +@comment unistd.h +@comment POSIX.1 +@table @code +@item STDIN_FILENO +@vindex STDIN_FILENO +This macro has value @code{0}, which is the file descriptor for +standard input. +@cindex standard input file descriptor + +@comment unistd.h +@comment POSIX.1 +@item STDOUT_FILENO +@vindex STDOUT_FILENO +This macro has value @code{1}, which is the file descriptor for +standard output. +@cindex standard output file descriptor + +@comment unistd.h +@comment POSIX.1 +@item STDERR_FILENO +@vindex STDERR_FILENO +This macro has value @code{2}, which is the file descriptor for +standard error output. +@end table +@cindex standard error file descriptor + +@node Stream/Descriptor Precautions +@section Dangers of Mixing Streams and Descriptors +@cindex channels +@cindex streams and descriptors +@cindex descriptors and streams +@cindex mixing descriptors and streams + +You can have multiple file descriptors and streams (let's call both +streams and descriptors ``channels'' for short) connected to the same +file, but you must take care to avoid confusion between channels. There +are two cases to consider: @dfn{linked} channels that share a single +file position value, and @dfn{independent} channels that have their own +file positions. + +It's best to use just one channel in your program for actual data +transfer to any given file, except when all the access is for input. +For example, if you open a pipe (something you can only do at the file +descriptor level), either do all I/O with the descriptor, or construct a +stream from the descriptor with @code{fdopen} and then do all I/O with +the stream. + +@menu +* Linked Channels:: Dealing with channels sharing a file position. +* Independent Channels:: Dealing with separately opened, unlinked channels. +* Cleaning Streams:: Cleaning a stream makes it safe to use + another channel. +@end menu + +@node Linked Channels +@subsection Linked Channels +@cindex linked channels + +Channels that come from a single opening share the same file position; +we call them @dfn{linked} channels. Linked channels result when you +make a stream from a descriptor using @code{fdopen}, when you get a +descriptor from a stream with @code{fileno}, when you copy a descriptor +with @code{dup} or @code{dup2}, and when descriptors are inherited +during @code{fork}. For files that don't support random access, such as +terminals and pipes, @emph{all} channels are effectively linked. On +random-access files, all append-type output streams are effectively +linked to each other. + +@cindex cleaning up a stream +If you have been using a stream for I/O (or have just opened the stream), +and you want to do I/O using +another channel (either a stream or a descriptor) that is linked to it, +you must first @dfn{clean up} the stream that you have been using. +@xref{Cleaning Streams}. + +Terminating a process, or executing a new program in the process, +destroys all the streams in the process. If descriptors linked to these +streams persist in other processes, their file positions become +undefined as a result. To prevent this, you must clean up the streams +before destroying them. + +@node Independent Channels +@subsection Independent Channels +@cindex independent channels + +When you open channels (streams or descriptors) separately on a seekable +file, each channel has its own file position. These are called +@dfn{independent channels}. + +The system handles each channel independently. Most of the time, this +is quite predictable and natural (especially for input): each channel +can read or write sequentially at its own place in the file. However, +if some of the channels are streams, you must take these precautions: + +@itemize @bullet +@item +You should clean an output stream after use, before doing anything else +that might read or write from the same part of the file. + +@item +You should clean an input stream before reading data that may have been +modified using an independent channel. Otherwise, you might read +obsolete data that had been in the stream's buffer. +@end itemize + +If you do output to one channel at the end of the file, this will +certainly leave the other independent channels positioned somewhere +before the new end. You cannot reliably set their file positions to the +new end of file before writing, because the file can always be extended +by another process between when you set the file position and when you +write the data. Instead, use an append-type descriptor or stream; they +always output at the current end of the file. In order to make the +end-of-file position accurate, you must clean the output channel you +were using, if it is a stream. + +It's impossible for two channels to have separate file pointers for a +file that doesn't support random access. Thus, channels for reading or +writing such files are always linked, never independent. Append-type +channels are also always linked. For these channels, follow the rules +for linked channels; see @ref{Linked Channels}. + +@node Cleaning Streams +@subsection Cleaning Streams + +On the GNU system, you can clean up any stream with @code{fclean}: + +@comment stdio.h +@comment GNU +@deftypefun int fclean (FILE *@var{stream}) +Clean up the stream @var{stream} so that its buffer is empty. If +@var{stream} is doing output, force it out. If @var{stream} is doing +input, give the data in the buffer back to the system, arranging to +reread it. +@end deftypefun + +On other systems, you can use @code{fflush} to clean a stream in most +cases. + +You can skip the @code{fclean} or @code{fflush} if you know the stream +is already clean. A stream is clean whenever its buffer is empty. For +example, an unbuffered stream is always clean. An input stream that is +at end-of-file is clean. A line-buffered stream is clean when the last +character output was a newline. However, a just-opened input stream +might not be clean, as its input buffer might not be empty. + +There is one case in which cleaning a stream is impossible on most +systems. This is when the stream is doing input from a file that is not +random-access. Such streams typically read ahead, and when the file is +not random access, there is no way to give back the excess data already +read. When an input stream reads from a random-access file, +@code{fflush} does clean the stream, but leaves the file pointer at an +unpredictable place; you must set the file pointer before doing any +further I/O. On the GNU system, using @code{fclean} avoids both of +these problems. + +Closing an output-only stream also does @code{fflush}, so this is a +valid way of cleaning an output stream. On the GNU system, closing an +input stream does @code{fclean}. + +You need not clean a stream before using its descriptor for control +operations such as setting terminal modes; these operations don't affect +the file position and are not affected by it. You can use any +descriptor for these operations, and all channels are affected +simultaneously. However, text already ``output'' to a stream but still +buffered by the stream will be subject to the new terminal modes when +subsequently flushed. To make sure ``past'' output is covered by the +terminal settings that were in effect at the time, flush the output +streams for that terminal before setting the modes. @xref{Terminal +Modes}. + +@node Scatter-Gather +@section Fast Scatter-Gather I/O +@cindex scatter-gather + +Some applications may need to read or write data to multiple buffers, +which are separated in memory. Although this can be done easily enough +with multiple calls to @code{read} and @code{write}, it is inefficient +because there is overhead associated with each kernel call. + +Instead, many platforms provide special high-speed primitives to perform +these @dfn{scatter-gather} operations in a single kernel call. The GNU C +library will provide an emulation on any system that lacks these +primitives, so they are not a portability threat. They are defined in +@code{sys/uio.h}. + +These functions are controlled with arrays of @code{iovec} structures, +which describe the location and size of each buffer. + +@comment sys/uio.h +@comment BSD +@deftp {Data Type} {struct iovec} + +The @code{iovec} structure describes a buffer. It contains two fields: + +@table @code + +@item void *iov_base +Contains the address of a buffer. + +@item size_t iov_len +Contains the length of the buffer. + +@end table +@end deftp + +@comment sys/uio.h +@comment BSD +@deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) + +The @code{readv} function reads data from @var{filedes} and scatters it +into the buffers described in @var{vector}, which is taken to be +@var{count} structures long. As each buffer is filled, data is sent to the +next. + +Note that @code{readv} is not guaranteed to fill all the buffers. +It may stop at any point, for the same reasons @code{read} would. + +The return value is a count of bytes (@emph{not} buffers) read, @math{0} +indicating end-of-file, or @math{-1} indicating an error. The possible +errors are the same as in @code{read}. + +@end deftypefun + +@comment sys/uio.h +@comment BSD +@deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) + +The @code{writev} function gathers data from the buffers described in +@var{vector}, which is taken to be @var{count} structures long, and writes +them to @code{filedes}. As each buffer is written, it moves on to the +next. + +Like @code{readv}, @code{writev} may stop midstream under the same +conditions @code{write} would. + +The return value is a count of bytes written, or @math{-1} indicating an +error. The possible errors are the same as in @code{write}. + +@end deftypefun + +@c Note - I haven't read this anywhere. I surmised it from my knowledge +@c of computer science. Thus, there could be subtleties I'm missing. + +Note that if the buffers are small (under about 1kB), high-level streams +may be easier to use than these functions. However, @code{readv} and +@code{writev} are more efficient when the individual buffers themselves +(as opposed to the total output), are large. In that case, a high-level +stream would not be able to cache the data effectively. + +@node Memory-mapped I/O +@section Memory-mapped I/O + +On modern operating systems, it is possible to @dfn{mmap} (pronounced +``em-map'') a file to a region of memory. When this is done, the file can +be accessed just like an array in the program. + +This is more efficient than @code{read} or @code{write}, as only the regions +of the file that a program actually accesses are loaded. Accesses to +not-yet-loaded parts of the mmapped region are handled in the same way as +swapped out pages. + +Since mmapped pages can be stored back to their file when physical +memory is low, it is possible to mmap files orders of magnitude larger +than both the physical memory @emph{and} swap space. The only limit is +address space. The theoretical limit is 4GB on a 32-bit machine - +however, the actual limit will be smaller since some areas will be +reserved for other purposes. If the LFS interface is used the file size +on 32-bit systems is not limited to 2GB (offsets are signed which +reduces the addressable area of 4GB by half); the full 64-bit are +available. + +Memory mapping only works on entire pages of memory. Thus, addresses +for mapping must be page-aligned, and length values will be rounded up. +To determine the size of a page the machine uses one should use + +@vindex _SC_PAGESIZE +@smallexample +size_t page_size = (size_t) sysconf (_SC_PAGESIZE); +@end smallexample + +@noindent +These functions are declared in @file{sys/mman.h}. + +@comment sys/mman.h +@comment POSIX +@deftypefun {void *} mmap (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset}) + +The @code{mmap} function creates a new mapping, connected to bytes +(@var{offset}) to (@var{offset} + @var{length} - 1) in the file open on +@var{filedes}. A new reference for the file specified by @var{filedes} +is created, which is not removed by closing the file. + +@var{address} gives a preferred starting address for the mapping. +@code{NULL} expresses no preference. Any previous mapping at that +address is automatically removed. The address you give may still be +changed, unless you use the @code{MAP_FIXED} flag. + +@vindex PROT_READ +@vindex PROT_WRITE +@vindex PROT_EXEC +@var{protect} contains flags that control what kind of access is +permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and +@code{PROT_EXEC}, which permit reading, writing, and execution, +respectively. Inappropriate access will cause a segfault (@pxref{Program +Error Signals}). + +Note that most hardware designs cannot support write permission without +read permission, and many do not distinguish read and execute permission. +Thus, you may receive wider permissions than you ask for, and mappings of +write-only files may be denied even if you do not use @code{PROT_READ}. + +@var{flags} contains flags that control the nature of the map. +One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified. + +They include: + +@vtable @code +@item MAP_PRIVATE +This specifies that writes to the region should never be written back +to the attached file. Instead, a copy is made for the process, and the +region will be swapped normally if memory runs low. No other process will +see the changes. + +Since private mappings effectively revert to ordinary memory +when written to, you must have enough virtual memory for a copy of +the entire mmapped region if you use this mode with @code{PROT_WRITE}. + +@item MAP_SHARED +This specifies that writes to the region will be written back to the +file. Changes made will be shared immediately with other processes +mmaping the same file. + +Note that actual writing may take place at any time. You need to use +@code{msync}, described below, if it is important that other processes +using conventional I/O get a consistent view of the file. + +@item MAP_FIXED +This forces the system to use the exact mapping address specified in +@var{address} and fail if it can't. + +@c One of these is official - the other is obviously an obsolete synonym +@c Which is which? +@item MAP_ANONYMOUS +@itemx MAP_ANON +This flag tells the system to create an anonymous mapping, not connected +to a file. @var{filedes} and @var{off} are ignored, and the region is +initialized with zeros. + +Anonymous maps are used as the basic primitive to extend the heap on some +systems. They are also useful to share data between multiple tasks +without creating a file. + +On some systems using private anonymous mmaps is more efficient than using +@code{malloc} for large blocks. This is not an issue with the GNU C library, +as the included @code{malloc} automatically uses @code{mmap} where appropriate. + +@c Linux has some other MAP_ options, which I have not discussed here. +@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to +@c user programs (and I don't understand the last two). MAP_LOCKED does +@c not appear to be implemented. + +@end vtable + +@code{mmap} returns the address of the new mapping, or @math{-1} for an +error. + +Possible errors include: + +@table @code + +@item EINVAL + +Either @var{address} was unusable, or inconsistent @var{flags} were +given. + +@item EACCES + +@var{filedes} was not open for the type of access specified in @var{protect}. + +@item ENOMEM + +Either there is not enough memory for the operation, or the process is +out of address space. + +@item ENODEV + +This file is of a type that doesn't support mapping. + +@item ENOEXEC + +The file is on a filesystem that doesn't support mapping. + +@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock. +@c However mandatory locks are not discussed in this manual. +@c +@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented +@c here) is used and the file is already open for writing. + +@end table + +@end deftypefun + +@comment sys/mman.h +@comment LFS +@deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset}) +The @code{mmap64} function is equivalent to the @code{mmap} function but +the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems +this allows the file associated with the @var{filedes} descriptor to be +larger than 2GB. @var{filedes} must be a descriptor returned from a +call to @code{open64} or @code{fopen64} and @code{freopen64} where the +descriptor is retrieved with @code{fileno}. + +When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this +function is actually available under the name @code{mmap}. I.e., the +new, extended API using 64 bit file sizes and offsets transparently +replaces the old API. +@end deftypefun + +@comment sys/mman.h +@comment POSIX +@deftypefun int munmap (void *@var{addr}, size_t @var{length}) + +@code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} + +@var{length}). @var{length} should be the length of the mapping. + +It is safe to unmap multiple mappings in one command, or include unmapped +space in the range. It is also possible to unmap only part of an existing +mapping. However, only entire pages can be removed. If @var{length} is not +an even number of pages, it will be rounded up. + +It returns @math{0} for success and @math{-1} for an error. + +One error is possible: + +@table @code + +@item EINVAL +The memory range given was outside the user mmap range or wasn't page +aligned. + +@end table + +@end deftypefun + +@comment sys/mman.h +@comment POSIX +@deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags}) + +When using shared mappings, the kernel can write the file at any time +before the mapping is removed. To be certain data has actually been +written to the file and will be accessible to non-memory-mapped I/O, it +is necessary to use this function. + +It operates on the region @var{address} to (@var{address} + @var{length}). +It may be used on part of a mapping or multiple mappings, however the +region given should not contain any unmapped space. + +@var{flags} can contain some options: + +@vtable @code + +@item MS_SYNC + +This flag makes sure the data is actually written @emph{to disk}. +Normally @code{msync} only makes sure that accesses to a file with +conventional I/O reflect the recent changes. + +@item MS_ASYNC + +This tells @code{msync} to begin the synchronization, but not to wait for +it to complete. + +@c Linux also has MS_INVALIDATE, which I don't understand. + +@end vtable + +@code{msync} returns @math{0} for success and @math{-1} for +error. Errors include: + +@table @code + +@item EINVAL +An invalid region was given, or the @var{flags} were invalid. + +@item EFAULT +There is no existing mapping in at least part of the given region. + +@end table + +@end deftypefun + +@comment sys/mman.h +@comment GNU +@deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag}) + +This function can be used to change the size of an existing memory +area. @var{address} and @var{length} must cover a region entirely mapped +in the same @code{mmap} statement. A new mapping with the same +characteristics will be returned with the length @var{new_length}. + +One option is possible, @code{MREMAP_MAYMOVE}. If it is given in +@var{flags}, the system may remove the existing mapping and create a new +one of the desired length in another location. + +The address of the resulting mapping is returned, or @math{-1}. Possible +error codes include: + +@table @code + +@item EFAULT +There is no existing mapping in at least part of the original region, or +the region covers two or more distinct mappings. + +@item EINVAL +The address given is misaligned or inappropriate. + +@item EAGAIN +The region has pages locked, and if extended it would exceed the +process's resource limit for locked pages. @xref{Limits on Resources}. + +@item ENOMEM +The region is private writable, and insufficient virtual memory is +available to extend it. Also, this error will occur if +@code{MREMAP_MAYMOVE} is not given and the extension would collide with +another mapped region. + +@end table +@end deftypefun + +This function is only available on a few systems. Except for performing +optional optimizations one should not rely on this function. + +Not all file descriptors may be mapped. Sockets, pipes, and most devices +only allow sequential access and do not fit into the mapping abstraction. +In addition, some regular files may not be mmapable, and older kernels may +not support mapping at all. Thus, programs using @code{mmap} should +have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU +Coding Standards}. + +@comment sys/mman.h +@comment POSIX +@deftypefun int madvise (void *@var{addr}, size_t @var{length}, int @var{advice}) + +This function can be used to provide the system with @var{advice} about +the intended usage patterns of the memory region starting at @var{addr} +and extending @var{length} bytes. + +The valid BSD values for @var{advice} are: + +@table @code + +@item MADV_NORMAL +The region should receive no further special treatment. + +@item MADV_RANDOM +The region will be accessed via random page references. The kernel +should page-in the minimal number of pages for each page fault. + +@item MADV_SEQUENTIAL +The region will be accessed via sequential page references. This +may cause the kernel to aggressively read-ahead, expecting further +sequential references after any page fault within this region. + +@item MADV_WILLNEED +The region will be needed. The pages within this region may +be pre-faulted in by the kernel. + +@item MADV_DONTNEED +The region is no longer needed. The kernel may free these pages, +causing any changes to the pages to be lost, as well as swapped +out pages to be discarded. + +@end table + +The POSIX names are slightly different, but with the same meanings: + +@table @code + +@item POSIX_MADV_NORMAL +This corresponds with BSD's @code{MADV_NORMAL}. + +@item POSIX_MADV_RANDOM +This corresponds with BSD's @code{MADV_RANDOM}. + +@item POSIX_MADV_SEQUENTIAL +This corresponds with BSD's @code{MADV_SEQUENTIAL}. + +@item POSIX_MADV_WILLNEED +This corresponds with BSD's @code{MADV_WILLNEED}. + +@item POSIX_MADV_DONTNEED +This corresponds with BSD's @code{MADV_DONTNEED}. + +@end table + +@code{msync} returns @math{0} for success and @math{-1} for +error. Errors include: +@table @code + +@item EINVAL +An invalid region was given, or the @var{advice} was invalid. + +@item EFAULT +There is no existing mapping in at least part of the given region. + +@end table +@end deftypefun + +@node Waiting for I/O +@section Waiting for Input or Output +@cindex waiting for input or output +@cindex multiplexing input +@cindex input from multiple files + +Sometimes a program needs to accept input on multiple input channels +whenever input arrives. For example, some workstations may have devices +such as a digitizing tablet, function button box, or dial box that are +connected via normal asynchronous serial interfaces; good user interface +style requires responding immediately to input on any device. Another +example is a program that acts as a server to several other processes +via pipes or sockets. + +You cannot normally use @code{read} for this purpose, because this +blocks the program until input is available on one particular file +descriptor; input on other channels won't wake it up. You could set +nonblocking mode and poll each file descriptor in turn, but this is very +inefficient. + +A better solution is to use the @code{select} function. This blocks the +program until input or output is ready on a specified set of file +descriptors, or until a timer expires, whichever comes first. This +facility is declared in the header file @file{sys/types.h}. +@pindex sys/types.h + +In the case of a server socket (@pxref{Listening}), we say that +``input'' is available when there are pending connections that could be +accepted (@pxref{Accepting Connections}). @code{accept} for server +sockets blocks and interacts with @code{select} just as @code{read} does +for normal input. + +@cindex file descriptor sets, for @code{select} +The file descriptor sets for the @code{select} function are specified +as @code{fd_set} objects. Here is the description of the data type +and some macros for manipulating these objects. + +@comment sys/types.h +@comment BSD +@deftp {Data Type} fd_set +The @code{fd_set} data type represents file descriptor sets for the +@code{select} function. It is actually a bit array. +@end deftp + +@comment sys/types.h +@comment BSD +@deftypevr Macro int FD_SETSIZE +The value of this macro is the maximum number of file descriptors that a +@code{fd_set} object can hold information about. On systems with a +fixed maximum number, @code{FD_SETSIZE} is at least that number. On +some systems, including GNU, there is no absolute limit on the number of +descriptors open, but this macro still has a constant value which +controls the number of bits in an @code{fd_set}; if you get a file +descriptor with a value as high as @code{FD_SETSIZE}, you cannot put +that descriptor into an @code{fd_set}. +@end deftypevr + +@comment sys/types.h +@comment BSD +@deftypefn Macro void FD_ZERO (fd_set *@var{set}) +This macro initializes the file descriptor set @var{set} to be the +empty set. +@end deftypefn + +@comment sys/types.h +@comment BSD +@deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set}) +This macro adds @var{filedes} to the file descriptor set @var{set}. + +The @var{filedes} parameter must not have side effects since it is +evaluated more than once. +@end deftypefn + +@comment sys/types.h +@comment BSD +@deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set}) +This macro removes @var{filedes} from the file descriptor set @var{set}. + +The @var{filedes} parameter must not have side effects since it is +evaluated more than once. +@end deftypefn + +@comment sys/types.h +@comment BSD +@deftypefn Macro int FD_ISSET (int @var{filedes}, const fd_set *@var{set}) +This macro returns a nonzero value (true) if @var{filedes} is a member +of the file descriptor set @var{set}, and zero (false) otherwise. + +The @var{filedes} parameter must not have side effects since it is +evaluated more than once. +@end deftypefn + +Next, here is the description of the @code{select} function itself. + +@comment sys/types.h +@comment BSD +@deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout}) +The @code{select} function blocks the calling process until there is +activity on any of the specified sets of file descriptors, or until the +timeout period has expired. + +The file descriptors specified by the @var{read-fds} argument are +checked to see if they are ready for reading; the @var{write-fds} file +descriptors are checked to see if they are ready for writing; and the +@var{except-fds} file descriptors are checked for exceptional +conditions. You can pass a null pointer for any of these arguments if +you are not interested in checking for that kind of condition. + +A file descriptor is considered ready for reading if a @code{read} +call will not block. This usually includes the read offset being at +the end of the file or there is an error to report. A server socket +is considered ready for reading if there is a pending connection which +can be accepted with @code{accept}; @pxref{Accepting Connections}. A +client socket is ready for writing when its connection is fully +established; @pxref{Connecting}. + +``Exceptional conditions'' does not mean errors---errors are reported +immediately when an erroneous system call is executed, and do not +constitute a state of the descriptor. Rather, they include conditions +such as the presence of an urgent message on a socket. (@xref{Sockets}, +for information on urgent messages.) + +The @code{select} function checks only the first @var{nfds} file +descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value +of this argument. + +The @var{timeout} specifies the maximum time to wait. If you pass a +null pointer for this argument, it means to block indefinitely until one +of the file descriptors is ready. Otherwise, you should provide the +time in @code{struct timeval} format; see @ref{High-Resolution +Calendar}. Specify zero as the time (a @code{struct timeval} containing +all zeros) if you want to find out which descriptors are ready without +waiting if none are ready. + +The normal return value from @code{select} is the total number of ready file +descriptors in all of the sets. Each of the argument sets is overwritten +with information about the descriptors that are ready for the corresponding +operation. Thus, to see if a particular descriptor @var{desc} has input, +use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns. + +If @code{select} returns because the timeout period expires, it returns +a value of zero. + +Any signal will cause @code{select} to return immediately. So if your +program uses signals, you can't rely on @code{select} to keep waiting +for the full time specified. If you want to be sure of waiting for a +particular amount of time, you must check for @code{EINTR} and repeat +the @code{select} with a newly calculated timeout based on the current +time. See the example below. See also @ref{Interrupted Primitives}. + +If an error occurs, @code{select} returns @code{-1} and does not modify +the argument file descriptor sets. The following @code{errno} error +conditions are defined for this function: + +@table @code +@item EBADF +One of the file descriptor sets specified an invalid file descriptor. + +@item EINTR +The operation was interrupted by a signal. @xref{Interrupted Primitives}. + +@item EINVAL +The @var{timeout} argument is invalid; one of the components is negative +or too large. +@end table +@end deftypefun + +@strong{Portability Note:} The @code{select} function is a BSD Unix +feature. + +Here is an example showing how you can use @code{select} to establish a +timeout period for reading from a file descriptor. The @code{input_timeout} +function blocks the calling process until input is available on the +file descriptor, or until the timeout period expires. + +@smallexample +@include select.c.texi +@end smallexample + +There is another example showing the use of @code{select} to multiplex +input from multiple sockets in @ref{Server Example}. + + +@node Synchronizing I/O +@section Synchronizing I/O operations + +@cindex synchronizing +In most modern operating systems, the normal I/O operations are not +executed synchronously. I.e., even if a @code{write} system call +returns, this does not mean the data is actually written to the media, +e.g., the disk. + +In situations where synchronization points are necessary, you can use +special functions which ensure that all operations finish before +they return. + +@comment unistd.h +@comment X/Open +@deftypefun int sync (void) +A call to this function will not return as long as there is data which +has not been written to the device. All dirty buffers in the kernel will +be written and so an overall consistent system can be achieved (if no +other process in parallel writes data). + +A prototype for @code{sync} can be found in @file{unistd.h}. + +The return value is zero to indicate no error. +@end deftypefun + +Programs more often want to ensure that data written to a given file is +committed, rather than all data in the system. For this, @code{sync} is overkill. + + +@comment unistd.h +@comment POSIX +@deftypefun int fsync (int @var{fildes}) +The @code{fsync} function can be used to make sure all data associated with +the open file @var{fildes} is written to the device associated with the +descriptor. The function call does not return unless all actions have +finished. + +A prototype for @code{fsync} can be found in @file{unistd.h}. + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{fsync} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this, calls to @code{fsync} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop + +The return value of the function is zero if no error occurred. Otherwise +it is @math{-1} and the global variable @var{errno} is set to the +following values: +@table @code +@item EBADF +The descriptor @var{fildes} is not valid. + +@item EINVAL +No synchronization is possible since the system does not implement this. +@end table +@end deftypefun + +Sometimes it is not even necessary to write all data associated with a +file descriptor. E.g., in database files which do not change in size it +is enough to write all the file content data to the device. +Meta-information, like the modification time etc., are not that important +and leaving such information uncommitted does not prevent a successful +recovering of the file in case of a problem. + +@comment unistd.h +@comment POSIX +@deftypefun int fdatasync (int @var{fildes}) +When a call to the @code{fdatasync} function returns, it is ensured +that all of the file data is written to the device. For all pending I/O +operations, the parts guaranteeing data integrity finished. + +Not all systems implement the @code{fdatasync} operation. On systems +missing this functionality @code{fdatasync} is emulated by a call to +@code{fsync} since the performed actions are a superset of those +required by @code{fdatasync}. + +The prototype for @code{fdatasync} is in @file{unistd.h}. + +The return value of the function is zero if no error occurred. Otherwise +it is @math{-1} and the global variable @var{errno} is set to the +following values: +@table @code +@item EBADF +The descriptor @var{fildes} is not valid. + +@item EINVAL +No synchronization is possible since the system does not implement this. +@end table +@end deftypefun + + +@node Asynchronous I/O +@section Perform I/O Operations in Parallel + +The POSIX.1b standard defines a new set of I/O operations which can +significantly reduce the time an application spends waiting at I/O. The +new functions allow a program to initiate one or more I/O operations and +then immediately resume normal work while the I/O operations are +executed in parallel. This functionality is available if the +@file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}. + +These functions are part of the library with realtime functions named +@file{librt}. They are not actually part of the @file{libc} binary. +The implementation of these functions can be done using support in the +kernel (if available) or using an implementation based on threads at +userlevel. In the latter case it might be necessary to link applications +with the thread library @file{libpthread} in addition to @file{librt}. + +All AIO operations operate on files which were opened previously. There +might be arbitrarily many operations running for one file. The +asynchronous I/O operations are controlled using a data structure named +@code{struct aiocb} (@dfn{AIO control block}). It is defined in +@file{aio.h} as follows. + +@comment aio.h +@comment POSIX.1b +@deftp {Data Type} {struct aiocb} +The POSIX.1b standard mandates that the @code{struct aiocb} structure +contains at least the members described in the following table. There +might be more elements which are used by the implementation, but +depending upon these elements is not portable and is highly deprecated. + +@table @code +@item int aio_fildes +This element specifies the file descriptor to be used for the +operation. It must be a legal descriptor, otherwise the operation will +fail. + +The device on which the file is opened must allow the seek operation. +I.e., it is not possible to use any of the AIO operations on devices +like terminals where an @code{lseek} call would lead to an error. + +@item off_t aio_offset +This element specifies the offset in the file at which the operation (input +or output) is performed. Since the operations are carried out in arbitrary +order and more than one operation for one file descriptor can be +started, one cannot expect a current read/write position of the file +descriptor. + +@item volatile void *aio_buf +This is a pointer to the buffer with the data to be written or the place +where the read data is stored. + +@item size_t aio_nbytes +This element specifies the length of the buffer pointed to by @code{aio_buf}. + +@item int aio_reqprio +If the platform has defined @code{_POSIX_PRIORITIZED_IO} and +@code{_POSIX_PRIORITY_SCHEDULING}, the AIO requests are +processed based on the current scheduling priority. The +@code{aio_reqprio} element can then be used to lower the priority of the +AIO operation. + +@item struct sigevent aio_sigevent +This element specifies how the calling process is notified once the +operation terminates. If the @code{sigev_notify} element is +@code{SIGEV_NONE}, no notification is sent. If it is @code{SIGEV_SIGNAL}, +the signal determined by @code{sigev_signo} is sent. Otherwise, +@code{sigev_notify} must be @code{SIGEV_THREAD}. In this case, a thread +is created which starts executing the function pointed to by +@code{sigev_notify_function}. + +@item int aio_lio_opcode +This element is only used by the @code{lio_listio} and +@code{lio_listio64} functions. Since these functions allow an +arbitrary number of operations to start at once, and each operation can be +input or output (or nothing), the information must be stored in the +control block. The possible values are: + +@vtable @code +@item LIO_READ +Start a read operation. Read from the file at position +@code{aio_offset} and store the next @code{aio_nbytes} bytes in the +buffer pointed to by @code{aio_buf}. + +@item LIO_WRITE +Start a write operation. Write @code{aio_nbytes} bytes starting at +@code{aio_buf} into the file starting at position @code{aio_offset}. + +@item LIO_NOP +Do nothing for this control block. This value is useful sometimes when +an array of @code{struct aiocb} values contains holes, i.e., some of the +values must not be handled although the whole array is presented to the +@code{lio_listio} function. +@end vtable +@end table + +When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a +32 bit machine, this type is in fact @code{struct aiocb64}, since the LFS +interface transparently replaces the @code{struct aiocb} definition. +@end deftp + +For use with the AIO functions defined in the LFS, there is a similar type +defined which replaces the types of the appropriate members with larger +types but otherwise is equivalent to @code{struct aiocb}. Particularly, +all member names are the same. + +@comment aio.h +@comment POSIX.1b +@deftp {Data Type} {struct aiocb64} +@table @code +@item int aio_fildes +This element specifies the file descriptor which is used for the +operation. It must be a legal descriptor since otherwise the operation +fails for obvious reasons. + +The device on which the file is opened must allow the seek operation. +I.e., it is not possible to use any of the AIO operations on devices +like terminals where an @code{lseek} call would lead to an error. + +@item off64_t aio_offset +This element specifies at which offset in the file the operation (input +or output) is performed. Since the operation are carried in arbitrary +order and more than one operation for one file descriptor can be +started, one cannot expect a current read/write position of the file +descriptor. + +@item volatile void *aio_buf +This is a pointer to the buffer with the data to be written or the place +where the read data is stored. + +@item size_t aio_nbytes +This element specifies the length of the buffer pointed to by @code{aio_buf}. + +@item int aio_reqprio +If for the platform @code{_POSIX_PRIORITIZED_IO} and +@code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are +processed based on the current scheduling priority. The +@code{aio_reqprio} element can then be used to lower the priority of the +AIO operation. + +@item struct sigevent aio_sigevent +This element specifies how the calling process is notified once the +operation terminates. If the @code{sigev_notify}, element is +@code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL}, +the signal determined by @code{sigev_signo} is sent. Otherwise, +@code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread +which starts executing the function pointed to by +@code{sigev_notify_function}. + +@item int aio_lio_opcode +This element is only used by the @code{lio_listio} and +@code{[lio_listio64} functions. Since these functions allow an +arbitrary number of operations to start at once, and since each operation can be +input or output (or nothing), the information must be stored in the +control block. See the description of @code{struct aiocb} for a description +of the possible values. +@end table + +When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a +32 bit machine, this type is available under the name @code{struct +aiocb64}, since the LFS transparently replaces the old interface. +@end deftp + +@menu +* Asynchronous Reads/Writes:: Asynchronous Read and Write Operations. +* Status of AIO Operations:: Getting the Status of AIO Operations. +* Synchronizing AIO Operations:: Getting into a consistent state. +* Cancel AIO Operations:: Cancellation of AIO Operations. +* Configuration of AIO:: How to optimize the AIO implementation. +@end menu + +@node Asynchronous Reads/Writes +@subsection Asynchronous Read and Write Operations + +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_read (struct aiocb *@var{aiocbp}) +This function initiates an asynchronous read operation. It +immediately returns after the operation was enqueued or when an +error was encountered. + +The first @code{aiocbp->aio_nbytes} bytes of the file for which +@code{aiocbp->aio_fildes} is a descriptor are written to the buffer +starting at @code{aiocbp->aio_buf}. Reading starts at the absolute +position @code{aiocbp->aio_offset} in the file. + +If prioritized I/O is supported by the platform the +@code{aiocbp->aio_reqprio} value is used to adjust the priority before +the request is actually enqueued. + +The calling process is notified about the termination of the read +request according to the @code{aiocbp->aio_sigevent} value. + +When @code{aio_read} returns, the return value is zero if no error +occurred that can be found before the process is enqueued. If such an +early error is found, the function returns @math{-1} and sets +@code{errno} to one of the following values: + +@table @code +@item EAGAIN +The request was not enqueued due to (temporarily) exceeded resource +limitations. +@item ENOSYS +The @code{aio_read} function is not implemented. +@item EBADF +The @code{aiocbp->aio_fildes} descriptor is not valid. This condition +need not be recognized before enqueueing the request and so this error +might also be signaled asynchronously. +@item EINVAL +The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is +invalid. This condition need not be recognized before enqueueing the +request and so this error might also be signaled asynchronously. +@end table + +If @code{aio_read} returns zero, the current status of the request +can be queried using @code{aio_error} and @code{aio_return} functions. +As long as the value returned by @code{aio_error} is @code{EINPROGRESS} +the operation has not yet completed. If @code{aio_error} returns zero, +the operation successfully terminated, otherwise the value is to be +interpreted as an error code. If the function terminated, the result of +the operation can be obtained using a call to @code{aio_return}. The +returned value is the same as an equivalent call to @code{read} would +have returned. Possible error codes returned by @code{aio_error} are: + +@table @code +@item EBADF +The @code{aiocbp->aio_fildes} descriptor is not valid. +@item ECANCELED +The operation was canceled before the operation was finished +(@pxref{Cancel AIO Operations}) +@item EINVAL +The @code{aiocbp->aio_offset} value is invalid. +@end table + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is in fact @code{aio_read64} since the LFS interface transparently +replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_read64 (struct aiocb *@var{aiocbp}) +This function is similar to the @code{aio_read} function. The only +difference is that on @w{32 bit} machines, the file descriptor should +be opened in the large file mode. Internally, @code{aio_read64} uses +functionality equivalent to @code{lseek64} (@pxref{File Position +Primitive}) to position the file descriptor correctly for the reading, +as opposed to @code{lseek} functionality used in @code{aio_read}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is available under the name @code{aio_read} and so transparently +replaces the interface for small files on 32 bit machines. +@end deftypefun + +To write data asynchronously to a file, there exists an equivalent pair +of functions with a very similar interface. + +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_write (struct aiocb *@var{aiocbp}) +This function initiates an asynchronous write operation. The function +call immediately returns after the operation was enqueued or if before +this happens an error was encountered. + +The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at +@code{aiocbp->aio_buf} are written to the file for which +@code{aiocbp->aio_fildes} is an descriptor, starting at the absolute +position @code{aiocbp->aio_offset} in the file. + +If prioritized I/O is supported by the platform, the +@code{aiocbp->aio_reqprio} value is used to adjust the priority before +the request is actually enqueued. + +The calling process is notified about the termination of the read +request according to the @code{aiocbp->aio_sigevent} value. + +When @code{aio_write} returns, the return value is zero if no error +occurred that can be found before the process is enqueued. If such an +early error is found the function returns @math{-1} and sets +@code{errno} to one of the following values. + +@table @code +@item EAGAIN +The request was not enqueued due to (temporarily) exceeded resource +limitations. +@item ENOSYS +The @code{aio_write} function is not implemented. +@item EBADF +The @code{aiocbp->aio_fildes} descriptor is not valid. This condition +may not be recognized before enqueueing the request, and so this error +might also be signaled asynchronously. +@item EINVAL +The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqprio} value is +invalid. This condition may not be recognized before enqueueing the +request and so this error might also be signaled asynchronously. +@end table + +In the case @code{aio_write} returns zero, the current status of the +request can be queried using @code{aio_error} and @code{aio_return} +functions. As long as the value returned by @code{aio_error} is +@code{EINPROGRESS} the operation has not yet completed. If +@code{aio_error} returns zero, the operation successfully terminated, +otherwise the value is to be interpreted as an error code. If the +function terminated, the result of the operation can be get using a call +to @code{aio_return}. The returned value is the same as an equivalent +call to @code{read} would have returned. Possible error codes returned +by @code{aio_error} are: + +@table @code +@item EBADF +The @code{aiocbp->aio_fildes} descriptor is not valid. +@item ECANCELED +The operation was canceled before the operation was finished. +(@pxref{Cancel AIO Operations}) +@item EINVAL +The @code{aiocbp->aio_offset} value is invalid. +@end table + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is in fact @code{aio_write64} since the LFS interface transparently +replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_write64 (struct aiocb *@var{aiocbp}) +This function is similar to the @code{aio_write} function. The only +difference is that on @w{32 bit} machines the file descriptor should +be opened in the large file mode. Internally @code{aio_write64} uses +functionality equivalent to @code{lseek64} (@pxref{File Position +Primitive}) to position the file descriptor correctly for the writing, +as opposed to @code{lseek} functionality used in @code{aio_write}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is available under the name @code{aio_write} and so transparently +replaces the interface for small files on 32 bit machines. +@end deftypefun + +Besides these functions with the more or less traditional interface, +POSIX.1b also defines a function which can initiate more than one +operation at a time, and which can handle freely mixed read and write +operations. It is therefore similar to a combination of @code{readv} and +@code{writev}. + +@comment aio.h +@comment POSIX.1b +@deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) +The @code{lio_listio} function can be used to enqueue an arbitrary +number of read and write requests at one time. The requests can all be +meant for the same file, all for different files or every solution in +between. + +@code{lio_listio} gets the @var{nent} requests from the array pointed to +by @var{list}. The operation to be performed is determined by the +@code{aio_lio_opcode} member in each element of @var{list}. If this +field is @code{LIO_READ} a read operation is enqueued, similar to a call +of @code{aio_read} for this element of the array (except that the way +the termination is signalled is different, as we will see below). If +the @code{aio_lio_opcode} member is @code{LIO_WRITE} a write operation +is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP} +in which case this element of @var{list} is simply ignored. This +``operation'' is useful in situations where one has a fixed array of +@code{struct aiocb} elements from which only a few need to be handled at +a time. Another situation is where the @code{lio_listio} call was +canceled before all requests are processed (@pxref{Cancel AIO +Operations}) and the remaining requests have to be reissued. + +The other members of each element of the array pointed to by +@code{list} must have values suitable for the operation as described in +the documentation for @code{aio_read} and @code{aio_write} above. + +The @var{mode} argument determines how @code{lio_listio} behaves after +having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it +waits until all requests terminated. Otherwise @var{mode} must be +@code{LIO_NOWAIT} and in this case the function returns immediately after +having enqueued all the requests. In this case the caller gets a +notification of the termination of all requests according to the +@var{sig} parameter. If @var{sig} is @code{NULL} no notification is +send. Otherwise a signal is sent or a thread is started, just as +described in the description for @code{aio_read} or @code{aio_write}. + +If @var{mode} is @code{LIO_WAIT}, the return value of @code{lio_listio} +is @math{0} when all requests completed successfully. Otherwise the +function return @math{-1} and @code{errno} is set accordingly. To find +out which request or requests failed one has to use the @code{aio_error} +function on all the elements of the array @var{list}. + +In case @var{mode} is @code{LIO_NOWAIT}, the function returns @math{0} if +all requests were enqueued correctly. The current state of the requests +can be found using @code{aio_error} and @code{aio_return} as described +above. If @code{lio_listio} returns @math{-1} in this mode, the +global variable @code{errno} is set accordingly. If a request did not +yet terminate, a call to @code{aio_error} returns @code{EINPROGRESS}. If +the value is different, the request is finished and the error value (or +@math{0}) is returned and the result of the operation can be retrieved +using @code{aio_return}. + +Possible values for @code{errno} are: + +@table @code +@item EAGAIN +The resources necessary to queue all the requests are not available at +the moment. The error status for each element of @var{list} must be +checked to determine which request failed. + +Another reason could be that the system wide limit of AIO requests is +exceeded. This cannot be the case for the implementation on GNU systems +since no arbitrary limits exist. +@item EINVAL +The @var{mode} parameter is invalid or @var{nent} is larger than +@code{AIO_LISTIO_MAX}. +@item EIO +One or more of the request's I/O operations failed. The error status of +each request should be checked to determine which one failed. +@item ENOSYS +The @code{lio_listio} function is not supported. +@end table + +If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels +a request, the error status for this request returned by +@code{aio_error} is @code{ECANCELED}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is in fact @code{lio_listio64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int lio_listio64 (int @var{mode}, struct aiocb *const @var{list}, int @var{nent}, struct sigevent *@var{sig}) +This function is similar to the @code{lio_listio} function. The only +difference is that on @w{32 bit} machines, the file descriptor should +be opened in the large file mode. Internally, @code{lio_listio64} uses +functionality equivalent to @code{lseek64} (@pxref{File Position +Primitive}) to position the file descriptor correctly for the reading or +writing, as opposed to @code{lseek} functionality used in +@code{lio_listio}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is available under the name @code{lio_listio} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +@node Status of AIO Operations +@subsection Getting the Status of AIO Operations + +As already described in the documentation of the functions in the last +section, it must be possible to get information about the status of an I/O +request. When the operation is performed truly asynchronously (as with +@code{aio_read} and @code{aio_write} and with @code{lio_listio} when the +mode is @code{LIO_NOWAIT}), one sometimes needs to know whether a +specific request already terminated and if so, what the result was. +The following two functions allow you to get this kind of information. + +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_error (const struct aiocb *@var{aiocbp}) +This function determines the error state of the request described by the +@code{struct aiocb} variable pointed to by @var{aiocbp}. If the +request has not yet terminated the value returned is always +@code{EINPROGRESS}. Once the request has terminated the value +@code{aio_error} returns is either @math{0} if the request completed +successfully or it returns the value which would be stored in the +@code{errno} variable if the request would have been done using +@code{read}, @code{write}, or @code{fsync}. + +The function can return @code{ENOSYS} if it is not implemented. It +could also return @code{EINVAL} if the @var{aiocbp} parameter does not +refer to an asynchronous operation whose return status is not yet known. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is in fact @code{aio_error64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp}) +This function is similar to @code{aio_error} with the only difference +that the argument is a reference to a variable of type @code{struct +aiocb64}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is available under the name @code{aio_error} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +@comment aio.h +@comment POSIX.1b +@deftypefun ssize_t aio_return (const struct aiocb *@var{aiocbp}) +This function can be used to retrieve the return status of the operation +carried out by the request described in the variable pointed to by +@var{aiocbp}. As long as the error status of this request as returned +by @code{aio_error} is @code{EINPROGRESS} the return of this function is +undefined. + +Once the request is finished this function can be used exactly once to +retrieve the return value. Following calls might lead to undefined +behavior. The return value itself is the value which would have been +returned by the @code{read}, @code{write}, or @code{fsync} call. + +The function can return @code{ENOSYS} if it is not implemented. It +could also return @code{EINVAL} if the @var{aiocbp} parameter does not +refer to an asynchronous operation whose return status is not yet known. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is in fact @code{aio_return64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_return64 (const struct aiocb64 *@var{aiocbp}) +This function is similar to @code{aio_return} with the only difference +that the argument is a reference to a variable of type @code{struct +aiocb64}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is available under the name @code{aio_return} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +@node Synchronizing AIO Operations +@subsection Getting into a Consistent State + +When dealing with asynchronous operations it is sometimes necessary to +get into a consistent state. This would mean for AIO that one wants to +know whether a certain request or a group of request were processed. +This could be done by waiting for the notification sent by the system +after the operation terminated, but this sometimes would mean wasting +resources (mainly computation time). Instead POSIX.1b defines two +functions which will help with most kinds of consistency. + +The @code{aio_fsync} and @code{aio_fsync64} functions are only available +if the symbol @code{_POSIX_SYNCHRONIZED_IO} is defined in @file{unistd.h}. + +@cindex synchronizing +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp}) +Calling this function forces all I/O operations operating queued at the +time of the function call operating on the file descriptor +@code{aiocbp->aio_fildes} into the synchronized I/O completion state +(@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns +immediately but the notification through the method described in +@code{aiocbp->aio_sigevent} will happen only after all requests for this +file descriptor have terminated and the file is synchronized. This also +means that requests for this very same file descriptor which are queued +after the synchronization request are not affected. + +If @var{op} is @code{O_DSYNC} the synchronization happens as with a call +to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and +the synchronization happens as with @code{fsync}. + +As long as the synchronization has not happened, a call to +@code{aio_error} with the reference to the object pointed to by +@var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is +done @code{aio_error} return @math{0} if the synchronization was not +successful. Otherwise the value returned is the value to which the +@code{fsync} or @code{fdatasync} function would have set the +@code{errno} variable. In this case nothing can be assumed about the +consistency for the data written to this file descriptor. + +The return value of this function is @math{0} if the request was +successfully enqueued. Otherwise the return value is @math{-1} and +@code{errno} is set to one of the following values: + +@table @code +@item EAGAIN +The request could not be enqueued due to temporary lack of resources. +@item EBADF +The file descriptor @code{aiocbp->aio_fildes} is not valid or not open +for writing. +@item EINVAL +The implementation does not support I/O synchronization or the @var{op} +parameter is other than @code{O_DSYNC} and @code{O_SYNC}. +@item ENOSYS +This function is not implemented. +@end table + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is in fact @code{aio_fsync64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp}) +This function is similar to @code{aio_fsync} with the only difference +that the argument is a reference to a variable of type @code{struct +aiocb64}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is available under the name @code{aio_fsync} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +Another method of synchronization is to wait until one or more requests of a +specific set terminated. This could be achieved by the @code{aio_*} +functions to notify the initiating process about the termination but in +some situations this is not the ideal solution. In a program which +constantly updates clients somehow connected to the server it is not +always the best solution to go round robin since some connections might +be slow. On the other hand letting the @code{aio_*} function notify the +caller might also be not the best solution since whenever the process +works on preparing data for on client it makes no sense to be +interrupted by a notification since the new client will not be handled +before the current client is served. For situations like this +@code{aio_suspend} should be used. + +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) +When calling this function, the calling thread is suspended until at +least one of the requests pointed to by the @var{nent} elements of the +array @var{list} has completed. If any of the requests has already +completed at the time @code{aio_suspend} is called, the function returns +immediately. Whether a request has terminated or not is determined by +comparing the error status of the request with @code{EINPROGRESS}. If +an element of @var{list} is @code{NULL}, the entry is simply ignored. + +If no request has finished, the calling process is suspended. If +@var{timeout} is @code{NULL}, the process is not woken until a request +has finished. If @var{timeout} is not @code{NULL}, the process remains +suspended at least as long as specified in @var{timeout}. In this case, +@code{aio_suspend} returns with an error. + +The return value of the function is @math{0} if one or more requests +from the @var{list} have terminated. Otherwise the function returns +@math{-1} and @code{errno} is set to one of the following values: + +@table @code +@item EAGAIN +None of the requests from the @var{list} completed in the time specified +by @var{timeout}. +@item EINTR +A signal interrupted the @code{aio_suspend} function. This signal might +also be sent by the AIO implementation while signalling the termination +of one of the requests. +@item ENOSYS +The @code{aio_suspend} function is not implemented. +@end table + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is in fact @code{aio_suspend64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) +This function is similar to @code{aio_suspend} with the only difference +that the argument is a reference to a variable of type @code{struct +aiocb64}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this +function is available under the name @code{aio_suspend} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +@node Cancel AIO Operations +@subsection Cancellation of AIO Operations + +When one or more requests are asynchronously processed, it might be +useful in some situations to cancel a selected operation, e.g., if it +becomes obvious that the written data is no longer accurate and would +have to be overwritten soon. As an example, assume an application, which +writes data in files in a situation where new incoming data would have +to be written in a file which will be updated by an enqueued request. +The POSIX AIO implementation provides such a function, but this function +is not capable of forcing the cancellation of the request. It is up to the +implementation to decide whether it is possible to cancel the operation +or not. Therefore using this function is merely a hint. + +@comment aio.h +@comment POSIX.1b +@deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp}) +The @code{aio_cancel} function can be used to cancel one or more +outstanding requests. If the @var{aiocbp} parameter is @code{NULL}, the +function tries to cancel all of the outstanding requests which would process +the file descriptor @var{fildes} (i.e., whose @code{aio_fildes} member +is @var{fildes}). If @var{aiocbp} is not @code{NULL}, @code{aio_cancel} +attempts to cancel the specific request pointed to by @var{aiocbp}. + +For requests which were successfully canceled, the normal notification +about the termination of the request should take place. I.e., depending +on the @code{struct sigevent} object which controls this, nothing +happens, a signal is sent or a thread is started. If the request cannot +be canceled, it terminates the usual way after performing the operation. + +After a request is successfully canceled, a call to @code{aio_error} with +a reference to this request as the parameter will return +@code{ECANCELED} and a call to @code{aio_return} will return @math{-1}. +If the request wasn't canceled and is still running the error status is +still @code{EINPROGRESS}. + +The return value of the function is @code{AIO_CANCELED} if there were +requests which haven't terminated and which were successfully canceled. +If there is one or more requests left which couldn't be canceled, the +return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error} +must be used to find out which of the, perhaps multiple, requests (in +@var{aiocbp} is @code{NULL}) weren't successfully canceled. If all +requests already terminated at the time @code{aio_cancel} is called the +return value is @code{AIO_ALLDONE}. + +If an error occurred during the execution of @code{aio_cancel} the +function returns @math{-1} and sets @code{errno} to one of the following +values. + +@table @code +@item EBADF +The file descriptor @var{fildes} is not valid. +@item ENOSYS +@code{aio_cancel} is not implemented. +@end table + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is in fact @code{aio_cancel64} since the LFS interface +transparently replaces the normal implementation. +@end deftypefun + +@comment aio.h +@comment Unix98 +@deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb64 *@var{aiocbp}) +This function is similar to @code{aio_cancel} with the only difference +that the argument is a reference to a variable of type @code{struct +aiocb64}. + +When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this +function is available under the name @code{aio_cancel} and so +transparently replaces the interface for small files on 32 bit +machines. +@end deftypefun + +@node Configuration of AIO +@subsection How to optimize the AIO implementation + +The POSIX standard does not specify how the AIO functions are +implemented. They could be system calls, but it is also possible to +emulate them at userlevel. + +At the point of this writing, the available implementation is a userlevel +implementation which uses threads for handling the enqueued requests. +While this implementation requires making some decisions about +limitations, hard limitations are something which is best avoided +in the GNU C library. Therefore, the GNU C library provides a means +for tuning the AIO implementation according to the individual use. + +@comment aio.h +@comment GNU +@deftp {Data Type} {struct aioinit} +This data type is used to pass the configuration or tunable parameters +to the implementation. The program has to initialize the members of +this struct and pass it to the implementation using the @code{aio_init} +function. + +@table @code +@item int aio_threads +This member specifies the maximal number of threads which may be used +at any one time. +@item int aio_num +This number provides an estimate on the maximal number of simultaneously +enqueued requests. +@item int aio_locks +Unused. +@item int aio_usedba +Unused. +@item int aio_debug +Unused. +@item int aio_numusers +Unused. +@item int aio_reserved[2] +Unused. +@end table +@end deftp + +@comment aio.h +@comment GNU +@deftypefun void aio_init (const struct aioinit *@var{init}) +This function must be called before any other AIO function. Calling it +is completely voluntary, as it is only meant to help the AIO +implementation perform better. + +Before calling the @code{aio_init}, function the members of a variable of +type @code{struct aioinit} must be initialized. Then a reference to +this variable is passed as the parameter to @code{aio_init} which itself +may or may not pay attention to the hints. + +The function has no return value and no error cases are defined. It is +a extension which follows a proposal from the SGI implementation in +@w{Irix 6}. It is not covered by POSIX.1b or Unix98. +@end deftypefun + +@node Control Operations +@section Control Operations on Files + +@cindex control operations on files +@cindex @code{fcntl} function +This section describes how you can perform various other operations on +file descriptors, such as inquiring about or setting flags describing +the status of the file descriptor, manipulating record locks, and the +like. All of these operations are performed by the function @code{fcntl}. + +The second argument to the @code{fcntl} function is a command that +specifies which operation to perform. The function and macros that name +various flags that are used with it are declared in the header file +@file{fcntl.h}. Many of these flags are also used by the @code{open} +function; see @ref{Opening and Closing Files}. +@pindex fcntl.h + +@comment fcntl.h +@comment POSIX.1 +@deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{}) +The @code{fcntl} function performs the operation specified by +@var{command} on the file descriptor @var{filedes}. Some commands +require additional arguments to be supplied. These additional arguments +and the return value and error conditions are given in the detailed +descriptions of the individual commands. + +Briefly, here is a list of what the various commands are. + +@table @code +@item F_DUPFD +Duplicate the file descriptor (return another file descriptor pointing +to the same open file). @xref{Duplicating Descriptors}. + +@item F_GETFD +Get flags associated with the file descriptor. @xref{Descriptor Flags}. + +@item F_SETFD +Set flags associated with the file descriptor. @xref{Descriptor Flags}. + +@item F_GETFL +Get flags associated with the open file. @xref{File Status Flags}. + +@item F_SETFL +Set flags associated with the open file. @xref{File Status Flags}. + +@item F_GETLK +Get a file lock. @xref{File Locks}. + +@item F_SETLK +Set or clear a file lock. @xref{File Locks}. + +@item F_SETLKW +Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. + +@item F_GETOWN +Get process or process group ID to receive @code{SIGIO} signals. +@xref{Interrupt Input}. + +@item F_SETOWN +Set process or process group ID to receive @code{SIGIO} signals. +@xref{Interrupt Input}. +@end table + +This function is a cancellation point in multi-threaded programs. This +is a problem if the thread allocates some resources (like memory, file +descriptors, semaphores or whatever) at the time @code{fcntl} is +called. If the thread gets canceled these resources stay allocated +until the program ends. To avoid this calls to @code{fcntl} should be +protected using cancellation handlers. +@c ref pthread_cleanup_push / pthread_cleanup_pop +@end deftypefun + + +@node Duplicating Descriptors +@section Duplicating Descriptors + +@cindex duplicating file descriptors +@cindex redirecting input and output + +You can @dfn{duplicate} a file descriptor, or allocate another file +descriptor that refers to the same open file as the original. Duplicate +descriptors share one file position and one set of file status flags +(@pxref{File Status Flags}), but each has its own set of file descriptor +flags (@pxref{Descriptor Flags}). + +The major use of duplicating a file descriptor is to implement +@dfn{redirection} of input or output: that is, to change the +file or pipe that a particular file descriptor corresponds to. + +You can perform this operation using the @code{fcntl} function with the +@code{F_DUPFD} command, but there are also convenient functions +@code{dup} and @code{dup2} for duplicating descriptors. + +@pindex unistd.h +@pindex fcntl.h +The @code{fcntl} function and flags are declared in @file{fcntl.h}, +while prototypes for @code{dup} and @code{dup2} are in the header file +@file{unistd.h}. + +@comment unistd.h +@comment POSIX.1 +@deftypefun int dup (int @var{old}) +This function copies descriptor @var{old} to the first available +descriptor number (the first number not currently open). It is +equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}. +@end deftypefun + +@comment unistd.h +@comment POSIX.1 +@deftypefun int dup2 (int @var{old}, int @var{new}) +This function copies the descriptor @var{old} to descriptor number +@var{new}. + +If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it +does not close @var{new}. Otherwise, the new duplicate of @var{old} +replaces any previous meaning of descriptor @var{new}, as if @var{new} +were closed first. + +If @var{old} and @var{new} are different numbers, and @var{old} is a +valid descriptor number, then @code{dup2} is equivalent to: + +@smallexample +close (@var{new}); +fcntl (@var{old}, F_DUPFD, @var{new}) +@end smallexample + +However, @code{dup2} does this atomically; there is no instant in the +middle of calling @code{dup2} at which @var{new} is closed and not yet a +duplicate of @var{old}. +@end deftypefun + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_DUPFD +This macro is used as the @var{command} argument to @code{fcntl}, to +copy the file descriptor given as the first argument. + +The form of the call in this case is: + +@smallexample +fcntl (@var{old}, F_DUPFD, @var{next-filedes}) +@end smallexample + +The @var{next-filedes} argument is of type @code{int} and specifies that +the file descriptor returned should be the next available one greater +than or equal to this value. + +The return value from @code{fcntl} with this command is normally the value +of the new file descriptor. A return value of @math{-1} indicates an +error. The following @code{errno} error conditions are defined for +this command: + +@table @code +@item EBADF +The @var{old} argument is invalid. + +@item EINVAL +The @var{next-filedes} argument is invalid. + +@item EMFILE +There are no more file descriptors available---your program is already +using the maximum. In BSD and GNU, the maximum is controlled by a +resource limit that can be changed; @pxref{Limits on Resources}, for +more information about the @code{RLIMIT_NOFILE} limit. +@end table + +@code{ENFILE} is not a possible error code for @code{dup2} because +@code{dup2} does not create a new opening of a file; duplicate +descriptors do not count toward the limit which @code{ENFILE} +indicates. @code{EMFILE} is possible because it refers to the limit on +distinct descriptor numbers in use in one process. +@end deftypevr + +Here is an example showing how to use @code{dup2} to do redirection. +Typically, redirection of the standard streams (like @code{stdin}) is +done by a shell or shell-like program before calling one of the +@code{exec} functions (@pxref{Executing a File}) to execute a new +program in a child process. When the new program is executed, it +creates and initializes the standard streams to point to the +corresponding file descriptors, before its @code{main} function is +invoked. + +So, to redirect standard input to a file, the shell could do something +like: + +@smallexample +pid = fork (); +if (pid == 0) + @{ + char *filename; + char *program; + int file; + @dots{} + file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY)); + dup2 (file, STDIN_FILENO); + TEMP_FAILURE_RETRY (close (file)); + execv (program, NULL); + @} +@end smallexample + +There is also a more detailed example showing how to implement redirection +in the context of a pipeline of processes in @ref{Launching Jobs}. + + +@node Descriptor Flags +@section File Descriptor Flags +@cindex file descriptor flags + +@dfn{File descriptor flags} are miscellaneous attributes of a file +descriptor. These flags are associated with particular file +descriptors, so that if you have created duplicate file descriptors +from a single opening of a file, each descriptor has its own set of flags. + +Currently there is just one file descriptor flag: @code{FD_CLOEXEC}, +which causes the descriptor to be closed if you use any of the +@code{exec@dots{}} functions (@pxref{Executing a File}). + +The symbols in this section are defined in the header file +@file{fcntl.h}. +@pindex fcntl.h + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_GETFD +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should return the file descriptor flags associated +with the @var{filedes} argument. + +The normal return value from @code{fcntl} with this command is a +nonnegative number which can be interpreted as the bitwise OR of the +individual flags (except that currently there is only one flag to use). + +In case of an error, @code{fcntl} returns @math{-1}. The following +@code{errno} error conditions are defined for this command: + +@table @code +@item EBADF +The @var{filedes} argument is invalid. +@end table +@end deftypevr + + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_SETFD +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should set the file descriptor flags associated with the +@var{filedes} argument. This requires a third @code{int} argument to +specify the new flags, so the form of the call is: + +@smallexample +fcntl (@var{filedes}, F_SETFD, @var{new-flags}) +@end smallexample + +The normal return value from @code{fcntl} with this command is an +unspecified value other than @math{-1}, which indicates an error. +The flags and error conditions are the same as for the @code{F_GETFD} +command. +@end deftypevr + +The following macro is defined for use as a file descriptor flag with +the @code{fcntl} function. The value is an integer constant usable +as a bit mask value. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int FD_CLOEXEC +@cindex close-on-exec (file descriptor flag) +This flag specifies that the file descriptor should be closed when +an @code{exec} function is invoked; see @ref{Executing a File}. When +a file descriptor is allocated (as with @code{open} or @code{dup}), +this bit is initially cleared on the new file descriptor, meaning that +descriptor will survive into the new program after @code{exec}. +@end deftypevr + +If you want to modify the file descriptor flags, you should get the +current flags with @code{F_GETFD} and modify the value. Don't assume +that the flags listed here are the only ones that are implemented; your +program may be run years from now and more flags may exist then. For +example, here is a function to set or clear the flag @code{FD_CLOEXEC} +without altering any other flags: + +@smallexample +/* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,} + @r{or clear the flag if @var{value} is 0.} + @r{Return 0 on success, or -1 on error with @code{errno} set.} */ + +int +set_cloexec_flag (int desc, int value) +@{ + int oldflags = fcntl (desc, F_GETFD, 0); + /* @r{If reading the flags failed, return error indication now.} */ + if (oldflags < 0) + return oldflags; + /* @r{Set just the flag we want to set.} */ + if (value != 0) + oldflags |= FD_CLOEXEC; + else + oldflags &= ~FD_CLOEXEC; + /* @r{Store modified flag word in the descriptor.} */ + return fcntl (desc, F_SETFD, oldflags); +@} +@end smallexample + +@node File Status Flags +@section File Status Flags +@cindex file status flags + +@dfn{File status flags} are used to specify attributes of the opening of a +file. Unlike the file descriptor flags discussed in @ref{Descriptor +Flags}, the file status flags are shared by duplicated file descriptors +resulting from a single opening of the file. The file status flags are +specified with the @var{flags} argument to @code{open}; +@pxref{Opening and Closing Files}. + +File status flags fall into three categories, which are described in the +following sections. + +@itemize @bullet +@item +@ref{Access Modes}, specify what type of access is allowed to the +file: reading, writing, or both. They are set by @code{open} and are +returned by @code{fcntl}, but cannot be changed. + +@item +@ref{Open-time Flags}, control details of what @code{open} will do. +These flags are not preserved after the @code{open} call. + +@item +@ref{Operating Modes}, affect how operations such as @code{read} and +@code{write} are done. They are set by @code{open}, and can be fetched or +changed with @code{fcntl}. +@end itemize + +The symbols in this section are defined in the header file +@file{fcntl.h}. +@pindex fcntl.h + +@menu +* Access Modes:: Whether the descriptor can read or write. +* Open-time Flags:: Details of @code{open}. +* Operating Modes:: Special modes to control I/O operations. +* Getting File Status Flags:: Fetching and changing these flags. +@end menu + +@node Access Modes +@subsection File Access Modes + +The file access modes allow a file descriptor to be used for reading, +writing, or both. (In the GNU system, they can also allow none of these, +and allow execution of the file as a program.) The access modes are chosen +when the file is opened, and never change. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_RDONLY +Open the file for read access. +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_WRONLY +Open the file for write access. +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_RDWR +Open the file for both reading and writing. +@end deftypevr + +In the GNU system (and not in other systems), @code{O_RDONLY} and +@code{O_WRONLY} are independent bits that can be bitwise-ORed together, +and it is valid for either bit to be set or clear. This means that +@code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access +mode of zero is permissible; it allows no operations that do input or +output to the file, but does allow other operations such as +@code{fchmod}. On the GNU system, since ``read-only'' or ``write-only'' +is a misnomer, @file{fcntl.h} defines additional names for the file +access modes. These names are preferred when writing GNU-specific code. +But most programs will want to be portable to other POSIX.1 systems and +should use the POSIX.1 names above instead. + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_READ +Open the file for reading. Same as @code{O_RDONLY}; only defined on GNU. +@end deftypevr + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_WRITE +Open the file for writing. Same as @code{O_WRONLY}; only defined on GNU. +@end deftypevr + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_EXEC +Open the file for executing. Only defined on GNU. +@end deftypevr + +To determine the file access mode with @code{fcntl}, you must extract +the access mode bits from the retrieved file status flags. In the GNU +system, you can just test the @code{O_READ} and @code{O_WRITE} bits in +the flags word. But in other POSIX.1 systems, reading and writing +access modes are not stored as distinct bit flags. The portable way to +extract the file access mode bits is with @code{O_ACCMODE}. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_ACCMODE +This macro stands for a mask that can be bitwise-ANDed with the file +status flag value to produce a value representing the file access mode. +The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}. +(In the GNU system it could also be zero, and it never includes the +@code{O_EXEC} bit.) +@end deftypevr + +@node Open-time Flags +@subsection Open-time Flags + +The open-time flags specify options affecting how @code{open} will behave. +These options are not preserved once the file is open. The exception to +this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it +@emph{is} saved. @xref{Opening and Closing Files}, for how to call +@code{open}. + +There are two sorts of options specified by open-time flags. + +@itemize @bullet +@item +@dfn{File name translation flags} affect how @code{open} looks up the +file name to locate the file, and whether the file can be created. +@cindex file name translation flags +@cindex flags, file name translation + +@item +@dfn{Open-time action flags} specify extra operations that @code{open} will +perform on the file once it is open. +@cindex open-time action flags +@cindex flags, open-time action +@end itemize + +Here are the file name translation flags. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_CREAT +If set, the file will be created if it doesn't already exist. +@c !!! mode arg, umask +@cindex create on open (file status flag) +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_EXCL +If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails +if the specified file already exists. This is guaranteed to never +clobber an existing file. +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_NONBLOCK +@cindex non-blocking open +This prevents @code{open} from blocking for a ``long time'' to open the +file. This is only meaningful for some kinds of files, usually devices +such as serial ports; when it is not meaningful, it is harmless and +ignored. Often opening a port to a modem blocks until the modem reports +carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will +return immediately without a carrier. + +Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating +mode and a file name translation flag. This means that specifying +@code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode; +@pxref{Operating Modes}. To open the file without blocking but do normal +I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and +then call @code{fcntl} to turn the bit off. +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_NOCTTY +If the named file is a terminal device, don't make it the controlling +terminal for the process. @xref{Job Control}, for information about +what it means to be the controlling terminal. + +In the GNU system and 4.4 BSD, opening a file never makes it the +controlling terminal and @code{O_NOCTTY} is zero. However, other +systems may use a nonzero value for @code{O_NOCTTY} and set the +controlling terminal when you open a file that is a terminal device; so +to be portable, use @code{O_NOCTTY} when it is important to avoid this. +@cindex controlling terminal, setting +@end deftypevr + +The following three file name translation flags exist only in the GNU system. + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_IGNORE_CTTY +Do not recognize the named file as the controlling terminal, even if it +refers to the process's existing controlling terminal device. Operations +on the new file descriptor will never induce job control signals. +@xref{Job Control}. +@end deftypevr + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_NOLINK +If the named file is a symbolic link, open the link itself instead of +the file it refers to. (@code{fstat} on the new file descriptor will +return the information returned by @code{lstat} on the link's name.) +@cindex symbolic link, opening +@end deftypevr + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_NOTRANS +If the named file is specially translated, do not invoke the translator. +Open the bare file the translator itself sees. +@end deftypevr + + +The open-time action flags tell @code{open} to do additional operations +which are not really related to opening the file. The reason to do them +as part of @code{open} instead of in separate calls is that @code{open} +can do them @i{atomically}. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_TRUNC +Truncate the file to zero length. This option is only useful for +regular files, not special files such as directories or FIFOs. POSIX.1 +requires that you open the file for writing to use @code{O_TRUNC}. In +BSD and GNU you must have permission to write the file to truncate it, +but you need not open for write access. + +This is the only open-time action flag specified by POSIX.1. There is +no good reason for truncation to be done by @code{open}, instead of by +calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in +Unix before @code{ftruncate} was invented, and is retained for backward +compatibility. +@end deftypevr + +The remaining operating modes are BSD extensions. They exist only +on some systems. On other systems, these macros are not defined. + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_SHLOCK +Acquire a shared lock on the file, as with @code{flock}. +@xref{File Locks}. + +If @code{O_CREAT} is specified, the locking is done atomically when +creating the file. You are guaranteed that no other process will get +the lock on the new file first. +@end deftypevr + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_EXLOCK +Acquire an exclusive lock on the file, as with @code{flock}. +@xref{File Locks}. This is atomic like @code{O_SHLOCK}. +@end deftypevr + +@node Operating Modes +@subsection I/O Operating Modes + +The operating modes affect how input and output operations using a file +descriptor work. These flags are set by @code{open} and can be fetched +and changed with @code{fcntl}. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_APPEND +The bit that enables append mode for the file. If set, then all +@code{write} operations write the data at the end of the file, extending +it, regardless of the current file position. This is the only reliable +way to append to a file. In append mode, you are guaranteed that the +data you write will always go to the current end of the file, regardless +of other processes writing to the file. Conversely, if you simply set +the file position to the end of file and write, then another process can +extend the file after you set the file position but before you write, +resulting in your data appearing someplace before the real end of file. +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int O_NONBLOCK +The bit that enables nonblocking mode for the file. If this bit is set, +@code{read} requests on the file can return immediately with a failure +status if there is no input immediately available, instead of blocking. +Likewise, @code{write} requests can also return immediately with a +failure status if the output can't be written immediately. + +Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O +operating mode and a file name translation flag; @pxref{Open-time Flags}. +@end deftypevr + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_NDELAY +This is an obsolete name for @code{O_NONBLOCK}, provided for +compatibility with BSD. It is not defined by the POSIX.1 standard. +@end deftypevr + +The remaining operating modes are BSD and GNU extensions. They exist only +on some systems. On other systems, these macros are not defined. + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_ASYNC +The bit that enables asynchronous input mode. If set, then @code{SIGIO} +signals will be generated when input is available. @xref{Interrupt Input}. + +Asynchronous input mode is a BSD feature. +@end deftypevr + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_FSYNC +The bit that enables synchronous writing for the file. If set, each +@code{write} call will make sure the data is reliably stored on disk before +returning. @c !!! xref fsync + +Synchronous writing is a BSD feature. +@end deftypevr + +@comment fcntl.h +@comment BSD +@deftypevr Macro int O_SYNC +This is another name for @code{O_FSYNC}. They have the same value. +@end deftypevr + +@comment fcntl.h +@comment GNU +@deftypevr Macro int O_NOATIME +If this bit is set, @code{read} will not update the access time of the +file. @xref{File Times}. This is used by programs that do backups, so +that backing a file up does not count as reading it. +Only the owner of the file or the superuser may use this bit. + +This is a GNU extension. +@end deftypevr + +@node Getting File Status Flags +@subsection Getting and Setting File Status Flags + +The @code{fcntl} function can fetch or change file status flags. + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_GETFL +This macro is used as the @var{command} argument to @code{fcntl}, to +read the file status flags for the open file with descriptor +@var{filedes}. + +The normal return value from @code{fcntl} with this command is a +nonnegative number which can be interpreted as the bitwise OR of the +individual flags. Since the file access modes are not single-bit values, +you can mask off other bits in the returned flags with @code{O_ACCMODE} +to compare them. + +In case of an error, @code{fcntl} returns @math{-1}. The following +@code{errno} error conditions are defined for this command: + +@table @code +@item EBADF +The @var{filedes} argument is invalid. +@end table +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_SETFL +This macro is used as the @var{command} argument to @code{fcntl}, to set +the file status flags for the open file corresponding to the +@var{filedes} argument. This command requires a third @code{int} +argument to specify the new flags, so the call looks like this: + +@smallexample +fcntl (@var{filedes}, F_SETFL, @var{new-flags}) +@end smallexample + +You can't change the access mode for the file in this way; that is, +whether the file descriptor was opened for reading or writing. + +The normal return value from @code{fcntl} with this command is an +unspecified value other than @math{-1}, which indicates an error. The +error conditions are the same as for the @code{F_GETFL} command. +@end deftypevr + +If you want to modify the file status flags, you should get the current +flags with @code{F_GETFL} and modify the value. Don't assume that the +flags listed here are the only ones that are implemented; your program +may be run years from now and more flags may exist then. For example, +here is a function to set or clear the flag @code{O_NONBLOCK} without +altering any other flags: + +@smallexample +@group +/* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,} + @r{or clear the flag if @var{value} is 0.} + @r{Return 0 on success, or -1 on error with @code{errno} set.} */ + +int +set_nonblock_flag (int desc, int value) +@{ + int oldflags = fcntl (desc, F_GETFL, 0); + /* @r{If reading the flags failed, return error indication now.} */ + if (oldflags == -1) + return -1; + /* @r{Set just the flag we want to set.} */ + if (value != 0) + oldflags |= O_NONBLOCK; + else + oldflags &= ~O_NONBLOCK; + /* @r{Store modified flag word in the descriptor.} */ + return fcntl (desc, F_SETFL, oldflags); +@} +@end group +@end smallexample + +@node File Locks +@section File Locks + +@cindex file locks +@cindex record locking +The remaining @code{fcntl} commands are used to support @dfn{record +locking}, which permits multiple cooperating programs to prevent each +other from simultaneously accessing parts of a file in error-prone +ways. + +@cindex exclusive lock +@cindex write lock +An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access +for writing to the specified part of the file. While a write lock is in +place, no other process can lock that part of the file. + +@cindex shared lock +@cindex read lock +A @dfn{shared} or @dfn{read} lock prohibits any other process from +requesting a write lock on the specified part of the file. However, +other processes can request read locks. + +The @code{read} and @code{write} functions do not actually check to see +whether there are any locks in place. If you want to implement a +locking protocol for a file shared by multiple processes, your application +must do explicit @code{fcntl} calls to request and clear locks at the +appropriate points. + +Locks are associated with processes. A process can only have one kind +of lock set for each byte of a given file. When any file descriptor for +that file is closed by the process, all of the locks that process holds +on that file are released, even if the locks were made using other +descriptors that remain open. Likewise, locks are released when a +process exits, and are not inherited by child processes created using +@code{fork} (@pxref{Creating a Process}). + +When making a lock, use a @code{struct flock} to specify what kind of +lock and where. This data type and the associated macros for the +@code{fcntl} function are declared in the header file @file{fcntl.h}. +@pindex fcntl.h + +@comment fcntl.h +@comment POSIX.1 +@deftp {Data Type} {struct flock} +This structure is used with the @code{fcntl} function to describe a file +lock. It has these members: + +@table @code +@item short int l_type +Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or +@code{F_UNLCK}. + +@item short int l_whence +This corresponds to the @var{whence} argument to @code{fseek} or +@code{lseek}, and specifies what the offset is relative to. Its value +can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}. + +@item off_t l_start +This specifies the offset of the start of the region to which the lock +applies, and is given in bytes relative to the point specified by +@code{l_whence} member. + +@item off_t l_len +This specifies the length of the region to be locked. A value of +@code{0} is treated specially; it means the region extends to the end of +the file. + +@item pid_t l_pid +This field is the process ID (@pxref{Process Creation Concepts}) of the +process holding the lock. It is filled in by calling @code{fcntl} with +the @code{F_GETLK} command, but is ignored when making a lock. +@end table +@end deftp + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_GETLK +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should get information about a lock. This command +requires a third argument of type @w{@code{struct flock *}} to be passed +to @code{fcntl}, so that the form of the call is: + +@smallexample +fcntl (@var{filedes}, F_GETLK, @var{lockp}) +@end smallexample + +If there is a lock already in place that would block the lock described +by the @var{lockp} argument, information about that lock overwrites +@code{*@var{lockp}}. Existing locks are not reported if they are +compatible with making a new lock as specified. Thus, you should +specify a lock type of @code{F_WRLCK} if you want to find out about both +read and write locks, or @code{F_RDLCK} if you want to find out about +write locks only. + +There might be more than one lock affecting the region specified by the +@var{lockp} argument, but @code{fcntl} only returns information about +one of them. The @code{l_whence} member of the @var{lockp} structure is +set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields +set to identify the locked region. + +If no lock applies, the only change to the @var{lockp} structure is to +update the @code{l_type} to a value of @code{F_UNLCK}. + +The normal return value from @code{fcntl} with this command is an +unspecified value other than @math{-1}, which is reserved to indicate an +error. The following @code{errno} error conditions are defined for +this command: + +@table @code +@item EBADF +The @var{filedes} argument is invalid. + +@item EINVAL +Either the @var{lockp} argument doesn't specify valid lock information, +or the file associated with @var{filedes} doesn't support locks. +@end table +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_SETLK +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should set or clear a lock. This command requires a +third argument of type @w{@code{struct flock *}} to be passed to +@code{fcntl}, so that the form of the call is: + +@smallexample +fcntl (@var{filedes}, F_SETLK, @var{lockp}) +@end smallexample + +If the process already has a lock on any part of the region, the old lock +on that part is replaced with the new lock. You can remove a lock +by specifying a lock type of @code{F_UNLCK}. + +If the lock cannot be set, @code{fcntl} returns immediately with a value +of @math{-1}. This function does not block waiting for other processes +to release locks. If @code{fcntl} succeeds, it return a value other +than @math{-1}. + +The following @code{errno} error conditions are defined for this +function: + +@table @code +@item EAGAIN +@itemx EACCES +The lock cannot be set because it is blocked by an existing lock on the +file. Some systems use @code{EAGAIN} in this case, and other systems +use @code{EACCES}; your program should treat them alike, after +@code{F_SETLK}. (The GNU system always uses @code{EAGAIN}.) + +@item EBADF +Either: the @var{filedes} argument is invalid; you requested a read lock +but the @var{filedes} is not open for read access; or, you requested a +write lock but the @var{filedes} is not open for write access. + +@item EINVAL +Either the @var{lockp} argument doesn't specify valid lock information, +or the file associated with @var{filedes} doesn't support locks. + +@item ENOLCK +The system has run out of file lock resources; there are already too +many file locks in place. + +Well-designed file systems never report this error, because they have no +limitation on the number of locks. However, you must still take account +of the possibility of this error, as it could result from network access +to a file system on another machine. +@end table +@end deftypevr + +@comment fcntl.h +@comment POSIX.1 +@deftypevr Macro int F_SETLKW +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should set or clear a lock. It is just like the +@code{F_SETLK} command, but causes the process to block (or wait) +until the request can be specified. + +This command requires a third argument of type @code{struct flock *}, as +for the @code{F_SETLK} command. + +The @code{fcntl} return values and errors are the same as for the +@code{F_SETLK} command, but these additional @code{errno} error conditions +are defined for this command: + +@table @code +@item EINTR +The function was interrupted by a signal while it was waiting. +@xref{Interrupted Primitives}. + +@item EDEADLK +The specified region is being locked by another process. But that +process is waiting to lock a region which the current process has +locked, so waiting for the lock would result in deadlock. The system +does not guarantee that it will detect all such conditions, but it lets +you know if it notices one. +@end table +@end deftypevr + + +The following macros are defined for use as values for the @code{l_type} +member of the @code{flock} structure. The values are integer constants. + +@table @code +@comment fcntl.h +@comment POSIX.1 +@vindex F_RDLCK +@item F_RDLCK +This macro is used to specify a read (or shared) lock. + +@comment fcntl.h +@comment POSIX.1 +@vindex F_WRLCK +@item F_WRLCK +This macro is used to specify a write (or exclusive) lock. + +@comment fcntl.h +@comment POSIX.1 +@vindex F_UNLCK +@item F_UNLCK +This macro is used to specify that the region is unlocked. +@end table + +As an example of a situation where file locking is useful, consider a +program that can be run simultaneously by several different users, that +logs status information to a common file. One example of such a program +might be a game that uses a file to keep track of high scores. Another +example might be a program that records usage or accounting information +for billing purposes. + +Having multiple copies of the program simultaneously writing to the +file could cause the contents of the file to become mixed up. But +you can prevent this kind of problem by setting a write lock on the +file before actually writing to the file. + +If the program also needs to read the file and wants to make sure that +the contents of the file are in a consistent state, then it can also use +a read lock. While the read lock is set, no other process can lock +that part of the file for writing. + +@c ??? This section could use an example program. + +Remember that file locks are only a @emph{voluntary} protocol for +controlling access to a file. There is still potential for access to +the file by programs that don't use the lock protocol. + +@node Interrupt Input +@section Interrupt-Driven Input + +@cindex interrupt-driven input +If you set the @code{O_ASYNC} status flag on a file descriptor +(@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever +input or output becomes possible on that file descriptor. The process +or process group to receive the signal can be selected by using the +@code{F_SETOWN} command to the @code{fcntl} function. If the file +descriptor is a socket, this also selects the recipient of @code{SIGURG} +signals that are delivered when out-of-band data arrives on that socket; +see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation +where @code{select} would report the socket as having an ``exceptional +condition''. @xref{Waiting for I/O}.) + +If the file descriptor corresponds to a terminal device, then @code{SIGIO} +signals are sent to the foreground process group of the terminal. +@xref{Job Control}. + +@pindex fcntl.h +The symbols in this section are defined in the header file +@file{fcntl.h}. + +@comment fcntl.h +@comment BSD +@deftypevr Macro int F_GETOWN +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should get information about the process or process +group to which @code{SIGIO} signals are sent. (For a terminal, this is +actually the foreground process group ID, which you can get using +@code{tcgetpgrp}; see @ref{Terminal Access Functions}.) + +The return value is interpreted as a process ID; if negative, its +absolute value is the process group ID. + +The following @code{errno} error condition is defined for this command: + +@table @code +@item EBADF +The @var{filedes} argument is invalid. +@end table +@end deftypevr + +@comment fcntl.h +@comment BSD +@deftypevr Macro int F_SETOWN +This macro is used as the @var{command} argument to @code{fcntl}, to +specify that it should set the process or process group to which +@code{SIGIO} signals are sent. This command requires a third argument +of type @code{pid_t} to be passed to @code{fcntl}, so that the form of +the call is: + +@smallexample +fcntl (@var{filedes}, F_SETOWN, @var{pid}) +@end smallexample + +The @var{pid} argument should be a process ID. You can also pass a +negative number whose absolute value is a process group ID. + +The return value from @code{fcntl} with this command is @math{-1} +in case of error and some other value if successful. The following +@code{errno} error conditions are defined for this command: + +@table @code +@item EBADF +The @var{filedes} argument is invalid. + +@item ESRCH +There is no process or process group corresponding to @var{pid}. +@end table +@end deftypevr + +@c ??? This section could use an example program. + +@node IOCTLs +@section Generic I/O Control operations +@cindex generic i/o control operations +@cindex IOCTLs + +The GNU system can handle most input/output operations on many different +devices and objects in terms of a few file primitives - @code{read}, +@code{write} and @code{lseek}. However, most devices also have a few +peculiar operations which do not fit into this model. Such as: + +@itemize @bullet + +@item +Changing the character font used on a terminal. + +@item +Telling a magnetic tape system to rewind or fast forward. (Since they +cannot move in byte increments, @code{lseek} is inapplicable). + +@item +Ejecting a disk from a drive. + +@item +Playing an audio track from a CD-ROM drive. + +@item +Maintaining routing tables for a network. + +@end itemize + +Although some such objects such as sockets and terminals +@footnote{Actually, the terminal-specific functions are implemented with +IOCTLs on many platforms.} have special functions of their own, it would +not be practical to create functions for all these cases. + +Instead these minor operations, known as @dfn{IOCTL}s, are assigned code +numbers and multiplexed through the @code{ioctl} function, defined in +@code{sys/ioctl.h}. The code numbers themselves are defined in many +different headers. + +@comment sys/ioctl.h +@comment BSD +@deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{}) + +The @code{ioctl} function performs the generic I/O operation +@var{command} on @var{filedes}. + +A third argument is usually present, either a single number or a pointer +to a structure. The meaning of this argument, the returned value, and +any error codes depends upon the command used. Often @math{-1} is +returned for a failure. + +@end deftypefun + +On some systems, IOCTLs used by different devices share the same numbers. +Thus, although use of an inappropriate IOCTL @emph{usually} only produces +an error, you should not attempt to use device-specific IOCTLs on an +unknown device. + +Most IOCTLs are OS-specific and/or only used in special system utilities, +and are thus beyond the scope of this document. For an example of the use +of an IOCTL, see @ref{Out-of-Band Data}. |