summaryrefslogtreecommitdiff
path: root/rsync3.txt
diff options
context:
space:
mode:
authorMartin Pool <mbp@samba.org>2001-09-12 14:20:44 +0000
committerMartin Pool <mbp@samba.org>2001-09-12 14:20:44 +0000
commit3c6cd53b238daeeb5ba2afa87616df7cc90a429b (patch)
treef17cae8cce6ca165a1666154dde0bd80e92dcd97 /rsync3.txt
parent4f69fe59c7df335de04bd4409a369885eb31ab2a (diff)
downloadrsync-3c6cd53b238daeeb5ba2afa87616df7cc90a429b.tar.gz
Think think.
Diffstat (limited to 'rsync3.txt')
-rw-r--r--rsync3.txt154
1 files changed, 146 insertions, 8 deletions
diff --git a/rsync3.txt b/rsync3.txt
index 15bb7b06..21ebaf6d 100644
--- a/rsync3.txt
+++ b/rsync3.txt
@@ -1,7 +1,7 @@
-*- indented-text -*-
Notes towards a new version of rsync
-Martin Pool <mbp@samba.org>
+Martin Pool <mbp@samba.org>, September 2001.
Good things about the current implementation:
@@ -36,6 +36,12 @@ Good things about the current implementation:
- You can easily push or pull simply by switching the order of
files.
+ - The "modules" system has some neat features compared to
+ e.g. Apache's per-directory configuration. In particular, because
+ you can set a userid and chroot directory, there is strong
+ protection between different modules. I haven't seen any calls
+ for a more flexible system.
+
Bad things about the current implementation:
@@ -64,6 +70,13 @@ Bad things about the current implementation:
- Error messages can be cryptic.
+ - Default behaviour is not intuitive: in too many cases rsync will
+ happily do nothing. Perhaps -a should be the default?
+
+ - People get confused by trailing slashes, though it's hard to think
+ of another reasonable way to make this necessary distinction
+ between a directory and its contents.
+
Protocol philosophy:
@@ -115,10 +128,48 @@ Desirable features:
Unix. It might be better to try to add O_NOATIME to kernels, and
call that.
- - VFS. Useful?
-
- Unicode. Probably just use UTF-8 for everything.
+ - Open authentication system. Can we use PAM? Is SASL an adequate
+ mapping of PAM to the network, or useful in some other way?
+
+ - Resume interrupted transfers without the --partial flag. We need
+ to leave the temporary file behind, and then know to use it. This
+ leaves a risk of large temporary files accumulating, which is not
+ good. Perhaps it should be off by default.
+
+ - tcpwrappers support. Should be trivial; can already be done
+ through tcpd or inetd.
+
+ - Socks support built in. It's not clear this is any better than
+ just linking against the socks library, though.
+
+ - When run over SSH, invoke with predictable command-line arguments,
+ so that people can restrict what commands sshd will run. (Is this
+ really required?)
+
+ - Comparison mode: give a list of which files are new, gone, or
+ different. Set return code depending on whether anything has
+ changed.
+
+ - Internationalized messages (gettext?)
+
+ - Optionally use real regexps rather than globs?
+
+ - Show overall progress. Pretty hard to do, especially if we insist
+ on not scanning the directory tree up front.
+
+
+Regression testing:
+
+ - Support automatic testing.
+
+ - Have hard internal timeouts against hangs.
+
+ - Be deterministic.
+
+ - Measure performance.
+
Hard links:
@@ -131,6 +182,14 @@ Hard links:
become known.
+Command-line options:
+
+ We have rather a lot at the moment. We might get more if the tool
+ becomes more flexible. Do we need a .rc or configuration file?
+ That wouldn't really fit with its pattern of use: cp and tar don't
+ have them, though ssh does.
+
+
Scripting issues:
- Perhaps support multiple scripting languages: candidates include
@@ -144,6 +203,19 @@ Scripting issues:
it's not running in the users own account. So we can either
disallow it, or use some kind of sandbox system.
+ - Python is a good language, but the syntax is not so good for
+ giving small fragments on the command line.
+
+ - Tcl is broken Lisp.
+
+ - Lots of sysadmins know Perl, though Perl can give some bizarre or
+ confusing errors. The built in stat operators and regexps might
+ be useful.
+
+ - Sadly probably not enough people know Scheme.
+
+ - sh is hard to embed.
+
Scripting hooks:
@@ -159,6 +231,26 @@ Scripting hooks:
- Locking
+ - Cache
+
+ - Generating backup path/name.
+
+ - Post-processing of backups, e.g. to do compression.
+
+ - After transfer, before replacement: so that we can spit out a diff
+ of what was changed, or kick off some kind of reconciliation
+ process.
+
+
+VFS:
+
+ Rather than talking straight to the filesystem, rsyncd talks through
+ an internal API. Samba has one. Is it useful?
+
+ - Could be a tidy way to implement cached signatures.
+
+ - Keep files compressed on disk?
+
Interactive interface:
@@ -169,10 +261,14 @@ Interactive interface:
- The standalone process needs to produce output in a form easily
digestible by a calling program, like the --emacs feature some
- have.
+ have. Same goes for output: rpm outputs a series of hash symbols,
+ which are easier for a GUI to handle than "\r30% complete"
+ strings.
- Yow! emacs support. (You could probably build that already, of
- course.)
+ course.) I'd like to be able to write a simple script on a remote
+ machine that rsyncs it to my workstation, edits it there, then
+ pushes it back up.
Pie-in-the-sky features:
@@ -203,6 +299,25 @@ Pie-in-the-sky features:
with replication in place, though on some systems we will also
have to do I/O on block boundaries.
+ - Peer to peer features. Flavour of the year. Can we think about
+ ways for clients to smoothly and voluntarily become servers for
+ content they receive?
+
+
+Unlikely features:
+
+ - Allow remote source and destination. If this can be cleanly
+ designed into the protocol, perhaps with the remote machine acting
+ as a kind of echo, then it's good. It's uncommon enough that we
+ don't want to shape the whole protocol around it, though.
+
+ In fact, in a triangle of machines there are two possibilities:
+ all traffic passes from remote1 to remote2 through local, or local
+ just sets up the transfer and then remote1 talks to remote2. FTP
+ supports the second but it's not clearly good. There are some
+ security problems with being able to instruct one machine to open
+ a connection to another.
+
In favour of evolving the protocol:
@@ -274,7 +389,7 @@ Conflict resolution:
would be useful.
-Moved files:
+Moved files: <http://rsync.samba.org/cgi-bin/rsync.fom?file=44>
- There's no trivial way to detect renamed files, especially if they
move between directories.
@@ -290,6 +405,12 @@ Moved files:
Filesystem migration:
+ NFSv4 probably wants to migrate file locks, but that's not really
+ our problem.
+
+
+Atomic updates:
+
The NFSv4 working group wants atomic migration. Most of the
responsibility for this lies on the NFS server or OS.
@@ -297,8 +418,9 @@ Filesystem migration:
at the end. This ties in to having separate basis and destination
files.
- NFSv4 probably wants to migrate file locks, but that's not really
- our problem.
+ There's no way in Unix to replace a whole set of files atomically.
+ However, if we get them all onto the destination machine and then do
+ the updates quickly it would greatly reduce the window.
Scalability:
@@ -314,6 +436,8 @@ Scalability:
On the whole CPU usage is not normally a limiting factor, if only
because running over SSH burns a lot of cycles on encryption.
+ Perhaps have resource throttling without relying on rlimit.
+
Streaming:
@@ -322,3 +446,17 @@ Streaming:
pipelined. This is a problem with FTP, and NFS (at least up to
v3). NFSv4 can pipeline operations, but building on that is
probably a bit complicated.
+
+
+Related work:
+
+ - mirror.pl http://freshmeat.net/project/mirror/
+
+ - ProFTPd
+
+ - Apache
+
+ - http://freshmeat.net/search/?site=Freshmeat&q=mirror&section=projects
+
+ - BitTorrent -- p2p mirroring
+ http://bitconjurer.org/BitTorrent/ \ No newline at end of file