代码之家 › 专栏 › 技术社区 › Gwang-Jin Kim

在公共lisp中逐行读取文件(内存不足)

state generator common-lisp closures file-io

2

Gwang-Jin Kim · 技术社区 · 6 年前

我正在寻找一种一次读取多个文件1s-expression(数据列表)的方法。

问题是这些文件很大——数百兆字节或千兆字节。我需要RAM来计算。

对于输出文件,

(defun add-to-file (process-result file-path)
  (with-open-file (os file-path :direction :output
                                :if-exists :append
                                :if-does-not-exist :create)
    (print process-result os)))

该作业很好地逐行追加结果字符串或s表达式。(我不知道-也许这不是最有效的方法?).

前一段时间,我要求一个宏,可以打开任意多的文件 with-open-file 在这里,我可以从主体中访问我可以创建和提供流变量的所有文件。但是,由于打开的输入文件和输出文件的数量是可变的,所以设计人员用这样的调用者调用每个文件——打开它们——到达正确的位置——写入或读取——然后再关闭它,可能会容易得多,我想。

对于输出,给定的函数执行该任务。但是,对于输入,我希望有一个函数,每当我调用它时,它读取下一个lisp表达式(s-expression),并且有一种内存,每当我调用它时,它最后一次读取文件中的内容——重新打开文件,知道在哪里读取——并返回值——下次读取并返回下一个值等等。类似于迭代程序上的python生成器——它在序列中生成下一个值。

我想通过一个表达式来处理-读入-文件表达式-以最小化内存使用。

你将如何攻击这样的任务?还是你有一个好的策略?

1 回复 | 直到 6 年前

1

6

Gwang-Jin Kim 6 年前

草图:

生成一个结构或类,该结构或类存储最后一个读取的位置。

(defstruct myfile
  path
  (last-position 0))

(defmethod next-expression ((mf myfile))
  (with-open-file (s (myfile-path mf) :direction :input)
    (file-position s (myfile-last-position mf))
    (prog1
        (read s)
      (setf (myfile-last-position mf) (file-position s)))))

使用实例:

(defparameter *mf1* (make-myfile :path (pathname "/foo/bar.sexp")))

(print (next-expression *mf1*)) ;; get first s-expr from file
;; do sth else
(myfile-last-position *mf1*)  ;; check current position
;; do sth else
(print (next-expression *mf1*)) ;; gives next s-expr from file

然后编写一个方法来检查新的s表达式是否可用。等。