Wednesday, March 15, 2017

parsync

"rsync is a fabulous data mover. Possibly more bytes have been moved (or have been prevented from being moved) by rsync than by any other application.

So what’s not to love?

For transferring large, deep file trees, rsync will pause while it generates lists of files to process. Since Version 3, it does this pretty fast, but on sluggish filesystems, it can take hours or even days before it will start to actually exchange rsync data.

Second, due to various bottlenecks, rsync will tend to use less than the available bandwidth on high speed networks. Starting multiple instances of rsync can improve this significantly. However, on such transfers, it is also easy to overload the available bandwidth, so it would be nice to both limit the bandwidth used if necessary and also to limit the load on the system.

parsync tries to satisfy all these conditions and more by:
  • using the kdir-cache-writer utility from the beautiful kdirstat directory browser which can produce lists of files very rapidly
  • allowing re-use of the cache files so generated.
  • doing crude loadbalancing of the number of active rsyncs, suspending and un-suspending the processes as necessary.
  • using rsync’s own bandwidth limiter (--bwlimit) to throttle the total bandwidth.
  • using rsync’s own vast option selection is available as a pass-thru (tho limited to those compatible with the --files-from option).

    The main use case for parsync is really only very large data transfers thru fairly fast network connections (>1Gb/s). Below this speed, a single rsync can saturate the connection, so there’s little reason to use parsync and in fact the overhead of testing the existence of and starting more rsyncs tends to worsen its performance on small transfers to slightly less than rsync alone."

    http://moo.nac.uci.edu/~hjm/parsync/

No comments:

Post a Comment