tar格式相当简单。我们可以自己用这个流式传输
TXR Lisp
程序
注意:这不适用于长路径;它只为每个对象输出一个头块。
备份列表由路径和命令项的混合组成。
命令被执行,它们的输出被切成4K块,这些块成为编号文件。这些都是在我们前进的过程中删除的,所以不会累积任何内容。
即使当我们编写自己的tar实现时,我们仍然必须这样做,因为格式要求提前知道每个对象的大小并将其放入标头中。无法将任意长的命令输出作为tar流进行流式传输。
(defvar backup-list
'("/etc"
"/root"
(:cmd "cat /proc/cpuinfo" "cpuinfo")
(:cmd "lspci" "lspci")))
(defsymacro splsize 4096) ;; split size for commands
(defsymacro blsize 512) ;; tar block size: written in stone
(typedef tarheader
(struct tarheader
(name (array 100 zchar))
(mode (array 8 zchar))
(uid (array 8 zchar))
(gid (array 8 zchar))
(size (array 12 zchar))
(mtime (array 12 zchar))
(chksum (array 8 char))
(typeflag char)
(linkname (array 100 zchar))
(magic (array 6 char))
(version (array 2 char))
(uname (array 32 zchar))
(gname (array 32 zchar))
(devmajor (array 8 zchar))
(devminor (array 8 zchar))
(prefix (array 155 zchar))))
(defmacro octfill (slot expr)
^(fmt "~,0*o" (pred (sizeof tarheader.,slot)) ,expr))
;; Dump an object into the archive.
;; Form a correct header, calculate the checksum,
;; put out a header block and for regular files,
;; put out data blocks.
(defun tar-dump-object (file-in stream : stat)
(let* ((file (trim-path-seps file-in))
(s (or stat (stat file)))
(tf (ecaseql* (logand s.mode s-ifmt)
(s-ifreg
(s-iflnk
(s-ifchr
(s-ifblk
(s-ifdir
(s-ififo
(h (new tarheader
name (let* ((n (cond
((equal "/" file) ".")
((starts-with "/" file) [file 1..:])
(t file))))
(if (eql tf
mode (octfill mode (logand s.mode
uid (octfill uid s.uid)
gid (octfill gid s.gid)
size (octfill size (if (eql tf
mtime (octfill mtime s.mtime)
chksum (load-time (str 8))
typeflag tf
linkname (if (eql tf
magic "ustar "
version " "
uname (or (getpwuid s.uid).?name "")
gname (or (getgrgid s.gid).?name "")
devmajor (if (meql tf
(octfill devmajor (major s.rdev)) "")
devminor (if (meql tf
(octfill devminor (minor s.rdev)) "")
prefix ""))
(hb (ffi-put h (ffi tarheader)))
(ck (logand (sum hb)
(bl (make-buf blsize))
(nb (trunc (+ s.size blsize -1) blsize)))
(set h.chksum (fmt "~,06o\xdc00 " ck))
(ffi-put-into bl h (ffi tarheader))
(put-buf bl 0 stream)
(if (eql tf
(with-stream (in (open-file file "rb"))
(each ((i 0..nb))
(fill-buf-adjust bl 0 in)
(buf-set-length bl blsize)
(put-buf bl 0 stream))))))
;; Output two zero-filled blocks to terminate archive.
(defun tar-finish-archive (: (stream *stdout*))
(let ((bl (make-buf (* 2 blsize))))
(put-buf bl 0 stream)))
;; Dump an object into the archive, recursing
;; if it is a directory.
(defun tar-dump-recursive (path : (stream *stdout*))
(ftw path (lambda (path type stat . rest)
(tar-dump-object path stream stat))))
;; Dump a command to the archive by capturing the
;; output into numbered temporary split files.
(defun tar-dump-command (command prefix : (stream *stdout*))
(let ((bl (make-buf splsize))
(i 0))
(with-stream (s (open-command command "rb"))
(while (plusp (fill-buf-adjust bl 0 s))
(let ((name (pic `@{prefix}0
(file-put-buf name bl)
(tar-dump-object name stream)
(remove-path name))))))
;; main: process the backup list to stream out the archive
;; on standard output, then terminate it.
(each ((item backup-list))
(match-ecase item
((:cmd @cmd @prefix) (tar-dump-command cmd prefix))
(`@file` (tar-dump-recursive file))))
(tar-finish-archive)
我没有一个回归测试套件;我手动测试了它,方法是归档各种类型的单个对象,并对其与GNU tar之间的十六进制转储进行比较,然后拆包此实现归档的目录树,对原始树进行递归diff。
但是,我想知道您正在使用的备份服务是否无法处理链接存档。如果它处理链接的归档,那么您只需多次调用
tar
以产生流,并且不存在所有这些过程协调问题。
对于tar消费者来说,要处理链接的存档,它只需要忽略所有的零块(不将它们视为存档的末尾),而是继续读取。
如果备份服务是这样的,那么您基本上可以按照以下方式进行:
(tar cf - /etc
tar cf - /root
dump_program_1 | \
split -b 4K /var/backup/PREFIX_DUMP_1_ \
--filter "tar cf - $FILE; rm $FILE"
...) | ... into backup service ...
我在GNU Tar中看不到任何不写终止零的选项。可以编写一个过滤器来消除这些问题:
tar cf - file | remove-zero-blocks
尚未写入
remove-zero-blocks
通过面向块的FIFO读取512字节块的过滤器,该FIFO的长度足以覆盖
焦油
它将一个新读取的缓冲区放入FIFO的一端,并写入从另一端溢出的最旧的缓冲区。当遇到EOF时,FIFO被刷新,但所有为零的512字节的尾部块都被省略。
这应该会击败拒绝忽略零块的备份服务。