diff options
Diffstat (limited to 'doc/gzip.texi')
-rw-r--r-- | doc/gzip.texi | 549 |
1 files changed, 549 insertions, 0 deletions
diff --git a/doc/gzip.texi b/doc/gzip.texi new file mode 100644 index 0000000..1d8d100 --- /dev/null +++ b/doc/gzip.texi @@ -0,0 +1,549 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename gzip.info +@documentencoding UTF-8 +@include version.texi +@settitle GNU Gzip +@finalout +@setchapternewpage odd +@c %**end of header +@copying +This manual is for GNU Gzip +(version @value{VERSION}, @value{UPDATED}), +and documents commands for compressing and decompressing data. + +Copyright @copyright{} 1998-1999, 2001-2002, 2006-2007, 2009-2016 Free Software +Foundation, Inc. + +Copyright @copyright{} 1992, 1993 Jean-loup Gailly + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled ``GNU +Free Documentation License''. +@end quotation +@end copying + +@dircategory Compression +@direntry +* Gzip: (gzip). General (de)compression of files (lzw). +@end direntry + +@dircategory Individual utilities +@direntry +* gunzip: (gzip)Overview. Decompression. +* gzexe: (gzip)Overview. Compress executables. +* zcat: (gzip)Overview. Decompression to stdout. +* zdiff: (gzip)Overview. Compare compressed files. +* zforce: (gzip)Overview. Force .gz extension on files. +* zgrep: (gzip)Overview. Search compressed files. +* zmore: (gzip)Overview. Decompression output by pages. +@end direntry + +@titlepage +@title GNU gzip +@subtitle The data compression program +@subtitle for Gzip version @value{VERSION} +@subtitle @value{UPDATED} +@author by Jean-loup Gailly + +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@contents + +@ifnottex +@node Top +@top GNU Gzip: General file (de)compression + +@insertcopying +@end ifnottex + +@menu +* Overview:: Preliminary information. +* Sample:: Sample output from @command{gzip}. +* Invoking gzip:: How to run @command{gzip}. +* Advanced usage:: Concatenated files. +* Environment:: The @env{GZIP} environment variable +* Tapes:: Using @command{gzip} on tapes. +* Problems:: Reporting bugs. +* GNU Free Documentation License:: Copying and sharing this manual. +* Concept index:: Index of concepts. +@end menu + +@node Overview +@chapter Overview +@cindex overview + +@command{gzip} reduces the size of the named files using Lempel--Ziv coding +(LZ77). Whenever possible, each file is replaced by one with the +extension @samp{.gz}, while keeping the same ownership modes, access and +modification times. (The default extension is @samp{-gz} for @abbr{VMS}, +@samp{z} for @abbr{MSDOS}, @abbr{OS/2} @abbr{FAT} and Atari.) +If no files are specified or +if a file name is @file{-}, the standard input is compressed to the standard +output. @command{gzip} will only attempt to compress regular files. In +particular, it will ignore symbolic links. + +If the new file name is too long for its file system, @command{gzip} +truncates it. @command{gzip} attempts to truncate only the parts of the +file name longer than 3 characters. (A part is delimited by dots.) If +the name consists of small parts only, the longest parts are truncated. +For example, if file names are limited to 14 characters, gzip.msdos.exe +is compressed to gzi.msd.exe.gz. Names are not truncated on systems +which do not have a limit on file name length. + +By default, @command{gzip} keeps the original file name and time stamp in +the compressed file. These are used when decompressing the file with the +@option{-N} option. This is useful when the compressed file name was +truncated or when the time stamp was not preserved after a file +transfer. However, due to limitations in the current @command{gzip} file +format, fractional seconds are discarded. Also, time stamps must fall +within the range 1970-01-01 00:00:00 through 2106-02-07 06:28:15 +@abbr{UTC}, and hosts whose operating systems use 32-bit time +stamps are further restricted to time stamps no later than 2038-01-19 +03:14:07 @abbr{UTC}. The upper bounds assume the typical case +where leap seconds are ignored. + +Compressed files can be restored to their original form using @samp{gzip -d} +or @command{gunzip} or @command{zcat}. If the original name saved in the +compressed file is not suitable for its file system, a new name is +constructed from the original one to make it legal. + +@command{gunzip} takes a list of files on its command line and replaces +each file whose name ends with @samp{.gz}, @samp{.z} +@samp{-gz}, @samp{-z}, or @samp{_z} (ignoring case) +and which begins with the correct +magic number with an uncompressed file without the original extension. +@command{gunzip} also recognizes the special extensions @samp{.tgz} and +@samp{.taz} as shorthands for @samp{.tar.gz} and @samp{.tar.Z} +respectively. When compressing, @command{gzip} uses the @samp{.tgz} +extension if necessary instead of truncating a file with a @samp{.tar} +extension. + +@command{gunzip} can currently decompress files created by @command{gzip}, +@command{zip}, @command{compress} or @command{pack}. The detection of the input +format is automatic. When using the first two formats, @command{gunzip} +checks a 32 bit @abbr{CRC} (cyclic redundancy check). For @command{pack}, +@command{gunzip} checks the uncompressed length. The @command{compress} format +was not designed to allow consistency checks. However @command{gunzip} is +sometimes able to detect a bad @samp{.Z} file. If you get an error when +uncompressing a @samp{.Z} file, do not assume that the @samp{.Z} file is +correct simply because the standard @command{uncompress} does not complain. +This generally means that the standard @command{uncompress} does not check +its input, and happily generates garbage output. The @abbr{SCO} @samp{compress +-H} format (@abbr{LZH} compression method) does not include a @abbr{CRC} but +also allows some consistency checks. + +Files created by @command{zip} can be uncompressed by @command{gzip} only if +they have a single member compressed with the ``deflation'' method. This +feature is only intended to help conversion of @file{tar.zip} files to +the @file{tar.gz} format. To extract a @command{zip} file with a single +member, use a command like @samp{gunzip <foo.zip} or @samp{gunzip -S +.zip foo.zip}. To extract @command{zip} files with several +members, use @command{unzip} instead of @command{gunzip}. + +@command{zcat} is identical to @samp{gunzip -c}. @command{zcat} +uncompresses either a list of files on the command line or its standard +input and writes the uncompressed data on standard output. @command{zcat} +will uncompress files that have the correct magic number whether they +have a @samp{.gz} suffix or not. + +@command{gzip} uses the Lempel--Ziv algorithm used in @command{zip} and +@abbr{PKZIP}@. +The amount of compression obtained depends on the size of the input and +the distribution of common substrings. Typically, text such as source +code or English is reduced by 60--70%. Compression is generally much +better than that achieved by @abbr{LZW} (as used in @command{compress}), Huffman +coding (as used in @command{pack}), or adaptive Huffman coding +(@command{compact}). + +Compression is always performed, even if the compressed file is slightly +larger than the original. The worst case expansion is a few bytes for +the @command{gzip} file header, plus 5 bytes every 32K block, or an expansion +ratio of 0.015% for large files. Note that the actual number of used +disk blocks almost never increases. @command{gzip} normally preserves the mode, +ownership and time stamps of files when compressing or decompressing. + +The @command{gzip} file format is specified in P. Deutsch, GZIP file +format specification version 4.3, +@uref{http://www.ietf.org/rfc/rfc1952.txt, Internet @abbr{RFC} 1952} (May +1996). The @command{zip} deflation format is specified in P. Deutsch, +DEFLATE Compressed Data Format Specification version 1.3, +@uref{http://www.ietf.org/rfc/rfc1951.txt, Internet @abbr{RFC} 1951} (May +1996). + +@node Sample +@chapter Sample output +@cindex sample + +Here are some realistic examples of running @command{gzip}. + +This is the output of the command @samp{gzip -h}: + +@example +Usage: gzip [OPTION]... [FILE]... +Compress or uncompress FILEs (by default, compress FILES in-place). + +Mandatory arguments to long options are mandatory for short options too. + + -c, --stdout write on standard output, keep original files unchanged + -d, --decompress decompress + -f, --force force overwrite of output file and compress links + -h, --help give this help + -k, --keep keep (don't delete) input files + -l, --list list compressed file contents + -L, --license display software license + -n, --no-name do not save or restore the original name and time stamp + -N, --name save or restore the original name and time stamp + -q, --quiet suppress all warnings + -r, --recursive operate recursively on directories + --rsyncable make rsync-friendly archive + -S, --suffix=SUF use suffix SUF on compressed files + --synchronous synchronous output (safer if system crashes, but slower) + -t, --test test compressed file integrity + -v, --verbose verbose mode + -V, --version display version number + -1, --fast compress faster + -9, --best compress better + +With no FILE, or when FILE is -, read standard input. + +Report bugs to <bug-gzip@@gnu.org>. +@end example + +This is the output of the command @samp{gzip -v texinfo.tex}: + +@example +texinfo.tex: 69.3% -- replaced with texinfo.tex.gz +@end example + +The following command will find all regular @samp{.gz} files in the +current directory and subdirectories (skipping file names that contain +newlines), and extract them in place without destroying the original, +stopping on the first failure: + +@example +find . -name '* +*' -prune -o -name '*.gz' -type f -print | + sed " + s/'/'\\\\''/g + s/^\\(.*\\)\\.gz$/gunzip <'\\1.gz' >'\\1'/ + " | + sh -e +@end example + +@node Invoking gzip +@chapter Invoking @command{gzip} +@cindex invoking +@cindex options + +The format for running the @command{gzip} program is: + +@example +gzip @var{option} @dots{} +@end example + +@command{gzip} supports the following options: + +@table @option +@item --stdout +@itemx --to-stdout +@itemx -c +Write output on standard output; keep original files unchanged. +If there are several input files, the output consists of a sequence of +independently compressed members. To obtain better compression, +concatenate all input files before compressing them. + +@item --decompress +@itemx --uncompress +@itemx -d +Decompress. + +@item --force +@itemx -f +Force compression or decompression even if the file has multiple links +or the corresponding file already exists, or if the compressed data +is read from or written to a terminal. If the input data is not in +a format recognized by @command{gzip}, and if the option @option{--stdout} is also +given, copy the input data without change to the standard output: let +@command{zcat} behave as @command{cat}. If @option{-f} is not given, and +when not running in the background, @command{gzip} prompts to verify +whether an existing file should be overwritten. + +@item --help +@itemx -h +Print an informative help message describing the options then quit. + +@item --keep +@itemx -k +Keep (don't delete) input files during compression or decompression. + +@item --list +@itemx -l +For each compressed file, list the following fields: + +@example +compressed size: size of the compressed file +uncompressed size: size of the uncompressed file +ratio: compression ratio (0.0% if unknown) +uncompressed_name: name of the uncompressed file +@end example + +The uncompressed size is given as @minus{}1 for files not in @command{gzip} +format, such as compressed @samp{.Z} files. To get the uncompressed size for +such a file, you can use: + +@example +zcat file.Z | wc -c +@end example + +In combination with the @option{--verbose} option, the following fields are also +displayed: + +@example +method: compression method (deflate,compress,lzh,pack) +crc: the 32-bit CRC of the uncompressed data +date & time: time stamp for the uncompressed file +@end example + +The @abbr{CRC} is given as ffffffff for a file not in gzip format. + +With @option{--verbose}, the size totals and compression ratio for all files +is also displayed, unless some sizes are unknown. With @option{--quiet}, +the title and totals lines are not displayed. + +The @command{gzip} format represents the input size modulo +@math{2^32}, so the uncompressed size and compression ratio are listed +incorrectly for uncompressed files 4 GiB and larger. To work around +this problem, you can use the following command to discover a large +uncompressed file's true size: + +@example +zcat file.gz | wc -c +@end example + +@item --license +@itemx -L +Display the @command{gzip} license then quit. + +@item --no-name +@itemx -n +When compressing, do not save the original file name and time stamp by +default. (The original name is always saved if the name had to be +truncated.) When decompressing, do not restore the original file name +if present (remove only the @command{gzip} +suffix from the compressed file name) and do not restore the original +time stamp if present (copy it from the compressed file). This option +is the default when decompressing. + +@item --name +@itemx -N +When compressing, always save the original file name and time stamp; this +is the default. When decompressing, restore the original file name and +time stamp if present. This option is useful on systems which have +a limit on file name length or when the time stamp has been lost after +a file transfer. + +@item --quiet +@itemx -q +Suppress all warning messages. + +@item --recursive +@itemx -r +Travel the directory structure recursively. If any of the file names +specified on the command line are directories, @command{gzip} will descend +into the directory and compress all the files it finds there (or +decompress them in the case of @command{gunzip}). + +@item --rsyncable +Cater better to the @command{rsync} program by periodically resetting +the internal structure of the compressed data stream. This lets the +@code{rsync} program take advantage of similarities in the uncompressed +input when synchronizing two files compressed with this flag. The cost: +the compressed output is usually about one percent larger. + +@item --suffix @var{suf} +@itemx -S @var{suf} +Use suffix @var{suf} instead of @samp{.gz}. Any suffix can be +given, but suffixes other than @samp{.z} and @samp{.gz} should be +avoided to avoid confusion when files are transferred to other systems. +A null suffix forces gunzip to try decompression on all given files +regardless of suffix, as in: + +@example +gunzip -S "" * (*.* for MSDOS) +@end example + +Previous versions of gzip used the @samp{.z} suffix. This was changed to +avoid a conflict with @command{pack}. + +@item --synchronous +Use synchronous output, by transferring output data to the output +file's storage device when the file system supports this. Because +file system data can be cached, without this option if the system +crashes around the time a command like @samp{gzip FOO} is run the user +might lose both @file{FOO} and @file{FOO.gz}; this is the default with +@command{gzip}, just as it is the default with most applications that +move data. When this option is used, @command{gzip} is safer but can +be considerably slower. + +@item --test +@itemx -t +Test. Check the compressed file integrity. + +@item --verbose +@itemx -v +Verbose. Display the name and percentage reduction for each file compressed. + +@item --version +@itemx -V +Version. Display the version number and compilation options, then quit. + +@item --fast +@itemx --best +@itemx -@var{n} +Regulate the speed of compression using the specified digit @var{n}, +where @option{-1} or @option{--fast} indicates the fastest compression +method (less compression) and @option{--best} or @option{-9} indicates the +slowest compression method (optimal compression). The default +compression level is @option{-6} (that is, biased towards high compression at +expense of speed). +@end table + +@node Advanced usage +@chapter Advanced usage +@cindex concatenated files + +Multiple compressed files can be concatenated. In this case, +@command{gunzip} will extract all members at once. If one member is +damaged, other members might still be recovered after removal of the +damaged member. Better compression can be usually obtained if all +members are decompressed and then recompressed in a single step. + +This is an example of concatenating @command{gzip} files: + +@example +gzip -c file1 > foo.gz +gzip -c file2 >> foo.gz +@end example + +@noindent +Then + +@example +gunzip -c foo +@end example + +@noindent +is equivalent to + +@example +cat file1 file2 +@end example + +In case of damage to one member of a @samp{.gz} file, other members can +still be recovered (if the damaged member is removed). However, +you can get better compression by compressing all members at once: + +@example +cat file1 file2 | gzip > foo.gz +@end example + +@noindent +compresses better than + +@example +gzip -c file1 file2 > foo.gz +@end example + +If you want to recompress concatenated files to get better compression, do: + +@example +zcat old.gz | gzip > new.gz +@end example + +If a compressed file consists of several members, the uncompressed +size and @abbr{CRC} reported by the @option{--list} option applies to +the last member +only. If you need the uncompressed size for all members, you can use: + +@example +zcat file.gz | wc -c +@end example + +If you wish to create a single archive file with multiple members so +that members can later be extracted independently, use an archiver such +as @command{tar} or @command{zip}. @acronym{GNU} @command{tar} +supports the @option{-z} +option to invoke @command{gzip} transparently. @command{gzip} is designed as a +complement to @command{tar}, not as a replacement. + +@node Environment +@chapter Environment +@cindex Environment + +The obsolescent environment variable @env{GZIP} can hold a set of +default options for @command{gzip}. These options are interpreted +first and can be overwritten by explicit command line parameters. As +this can cause problems when using scripts, this feature is supported +only for options that are reasonably likely to not cause too much +harm, and @command{gzip} warns if it is used. This feature will be +removed in a future release of @command{gzip}. + +You can use an alias or script instead. For example, if +@command{gzip} is in the directory @samp{/usr/bin} you can prepend +@file{$HOME/bin} to your @env{PATH} and create an executable script +@file{$HOME/bin/gzip} containing the following: + +@example +#! /bin/sh +export PATH=/usr/bin +exec gzip -9 "$@@" +@end example + +On @abbr{VMS}, the name of the obsolescent environment variable is +@env{GZIP_OPT}, to avoid a conflict with the symbol set for invocation +of the program. + +@node Tapes +@chapter Using @command{gzip} on tapes +@cindex tapes + +When writing compressed data to a tape, it is generally necessary to pad +the output with zeroes up to a block boundary. When the data is read and +the whole block is passed to @command{gunzip} for decompression, +@command{gunzip} detects that there is extra trailing garbage after the +compressed data and emits a warning by default if the garbage contains +nonzero bytes. You can use the @option{--quiet} option to suppress +the warning. + +@node Problems +@chapter Reporting Bugs +@cindex bugs + +If you find a bug in @command{gzip}, please send electronic mail to +@email{bug-gzip@@gnu.org}. Include the version number, +which you can find by running @w{@samp{gzip -V}}. Also include in your +message the hardware and operating system, the compiler used to compile +@command{gzip}, +a description of the bug behavior, and the input to @command{gzip} +that triggered +the bug.@refill + +@node GNU Free Documentation License +@appendix GNU Free Documentation License + +@include fdl.texi + +@node Concept index +@appendix Concept index + +@printindex cp + +@bye |