代码之家 › 专栏 › 技术社区 › Eugeniu Torica

是否可以遍历存储在Lucene索引中的文档?

lucene.net lucene

22

Eugeniu Torica · 技术社区 · 16 年前

我有一些文档存储在带有docid字段的Lucene索引中。我想获取索引中存储的所有文档ID。还有一个问题。文档的数量大约是30万,所以我更喜欢以500大小的块获取这个docid。可以这样做吗?

5 回复 | 直到 10 年前

1

46

bajafresh4life 16 年前

IndexReader reader = // create IndexReader
for (int i=0; i<reader.maxDoc(); i++) {
    if (reader.isDeleted(i))
        continue;

    Document doc = reader.document(i);
    String docId = doc.get("docId");

    // do something with docId here...
}

2

15

bcoughlan 12 年前

Bits liveDocs = MultiFields.getLiveDocs(reader);
for (int i=0; i<reader.maxDoc(); i++) {
    if (liveDocs != null && !liveDocs.get(i))
        continue;

    Document doc = reader.document(i);
}

https://lucene.apache.org/core/4_0_0/MIGRATE.html

3

5

Chunliang Lyu 10 年前

MatchAllDocsQuery

Query query = new MatchAllDocsQuery();
TopDocs topDocs = getIndexSearcher.search(query, RESULT_LIMIT);

4

2

Yaroslav 16 年前

5

0

andreyro 11 年前

    IndexReader reader = // create IndexReader
for (int i=offset; i<offset + 10; i++) {
    if (reader.isDeleted(i))
        continue;

    Document doc = reader.document(i);
    String docId = doc.get("docId");
}

    is = getIndexSearcher(); //new IndexSearcher(indexReader)
    //get all results without any conditions attached. 
    Term term = new Term([[any mandatory field name]], "*");
    Query query = new WildcardQuery(term);

    topCollector = TopScoreDocCollector.create([[int max hits to get]], true);
    is.search(query, topCollector);

   TopDocs topDocs = topCollector.topDocs(offset, count);