tar格式相当简单。我们可以自己用这个流式传输
TXR Lisp
程序
注意:这不适用于长路径;它只为每个对象输出一个头块。
备份列表由路径和命令项的混合组成。
命令被执行,它们的输出被切成4K块,这些块成为编号文件。这些都是在我们前进的过程中删除的,所以不会累积任何内容。
即使当我们编写自己的tar实现时,我们仍然必须这样做,因为格式要求提前知道每个对象的大小并将其放入标头中。无法将任意长的命令输出作为tar流进行流式传输。
(defvar backup-list
'("/etc"
"/root"
(:cmd "cat /proc/cpuinfo" "cpuinfo")
(:cmd "lspci" "lspci")))
(defsymacro splsize 4096) ;; split size for commands
(defsymacro blsize 512) ;; tar block size: written in stone
(typedef tarheader
(struct tarheader
(name (array 100 zchar))
(mode (array 8 zchar))
(uid (array 8 zchar))
(gid (array 8 zchar))
(size (array 12 zchar))
(mtime (array 12 zchar))
(chksum (array 8 char))
(typeflag char)
(linkname (array 100 zchar))
(magic (array 6 char))
(version (array 2 char))
(uname (array 32 zchar))
(gname (array 32 zchar))
(devmajor (array 8 zchar))
(devminor (array 8 zchar))
(prefix (array 155 zchar))))
(defmacro octfill (slot expr)
^(fmt "~,0*o" (pred (sizeof tarheader.,slot)) ,expr))
;; Dump an object into the archive.
;; Form a correct header, calculate the checksum,
;; put out a header block and for regular files,
;; put out data blocks.
(defun tar-dump-object (file-in stream : stat)
(let* ((file (trim-path-seps file-in))
(s (or stat (stat file)))
(tf (ecaseql* (logand s.mode s-ifmt)
(s-ifreg #\0)
(s-iflnk #\2)
(s-ifchr #\3)
(s-ifblk #\4)
(s-ifdir #\5)
(s-ififo #\6)))
(h (new tarheader
name (let* ((n (cond
((equal "/" file) ".")
((starts-with "/" file) [file 1..:])
(t file))))
(if (eql tf #\5) `@n/` n))
mode (octfill mode (logand s.mode #o777))
uid (octfill uid s.uid)
gid (octfill gid s.gid)
size (octfill size (if (eql tf #\0) s.size 0))
mtime (octfill mtime s.mtime)
chksum (load-time (str 8))
typeflag tf
linkname (if (eql tf #\2) (readlink file) "")
magic "ustar "
version " "
uname (or (getpwuid s.uid).?name "")
gname (or (getgrgid s.gid).?name "")
devmajor (if (meql tf #\3 #\4)
(octfill devmajor (major s.rdev)) "")
devminor (if (meql tf #\3 #\4)
(octfill devminor (minor s.rdev)) "")
prefix ""))
(hb (ffi-put h (ffi tarheader)))
(ck (logand (sum hb) #x1FFFF))
(bl (make-buf blsize))
(nb (trunc (+ s.size blsize -1) blsize)))
(set h.chksum (fmt "~,06o\xdc00 " ck))
(ffi-put-into bl h (ffi tarheader))
(put-buf bl 0 stream)
(if (eql tf #\0)
(with-stream (in (open-file file "rb"))
(each ((i 0..nb))
(fill-buf-adjust bl 0 in)
(buf-set-length bl blsize)
(put-buf bl 0 stream))))))
;; Output two zero-filled blocks to terminate archive.
(defun tar-finish-archive (: (stream *stdout*))
(let ((bl (make-buf (* 2 blsize))))
(put-buf bl 0 stream)))
;; Dump an object into the archive, recursing
;; if it is a directory.
(defun tar-dump-recursive (path : (stream *stdout*))
(ftw path (lambda (path type stat . rest)
(tar-dump-object path stream stat))))
;; Dump a command to the archive by capturing the
;; output into numbered temporary split files.
(defun tar-dump-command (command prefix : (stream *stdout*))
(let ((bl (make-buf splsize))
(i 0))
(with-stream (s (open-command command "rb"))
(while (plusp (fill-buf-adjust bl 0 s))
(let ((name (pic `@{prefix}0###` (inc i))))
(file-put-buf name bl)
(tar-dump-object name stream)
(remove-path name))))))
;; main: process the backup list to stream out the archive
;; on standard output, then terminate it.
(each ((item backup-list))
(match-ecase item
((:cmd @cmd @prefix) (tar-dump-command cmd prefix))
(`@file` (tar-dump-recursive file))))
(tar-finish-archive)
我没有一个回归测试套件;我手动测试了它,方法是归档各种类型的单个对象,并对其与GNU tar之间的十六进制转储进行比较,然后拆包此实现归档的目录树,对原始树进行递归diff。
但是,我想知道您正在使用的备份服务是否无法处理链接存档。如果它处理链接的归档,那么您只需多次调用
tar
以产生流,并且不存在所有这些过程协调问题。
对于tar消费者来说,要处理链接的存档,它只需要忽略所有的零块(不将它们视为存档的末尾),而是继续读取。
如果备份服务是这样的,那么您基本上可以按照以下方式进行:
(tar cf - /etc
tar cf - /root
dump_program_1 | \
split -b 4K /var/backup/PREFIX_DUMP_1_ \
--filter "tar cf - $FILE; rm $FILE"
...) | ... into backup service ...
我在GNU Tar中看不到任何不写终止零的选项。可以编写一个过滤器来消除这些问题:
tar cf - file | remove-zero-blocks
尚未写入
remove-zero-blocks
通过面向块的FIFO读取512字节块的过滤器,该FIFO的长度足以覆盖
焦油
它将一个新读取的缓冲区放入FIFO的一端,并写入从另一端溢出的最旧的缓冲区。当遇到EOF时,FIFO被刷新,但所有为零的512字节的尾部块都被省略。
这应该会击败拒绝忽略零块的备份服务。