Indexing: Stream Size and File Size Settings Explained

  • 7020660
  • 24-Mar-2014
  • 07-Aug-2017

Environment


Retain 2.x - 3.x

Situation

Under Server Configuration | Index, how do the stream size and file size limit settings affect the indexing behavior of documents?

Resolution


Java uses the term "stream" for the document; thus - generally speaking - the stream size is the limit to how much of the document will be indexed.  Once that limit is exceeded, it stops indexing the document at that point.

The "file size" setting determines if the stream setting is even paid attention to at all.

These settings behave differently based on the indexer used:
Exalead
The stream size limit is not used.  If the file size limit  is exceeded, the document is not indexed.  Think of it as "all or nothing".

Lucene
Retain looks at the file size limit first.  If the document is larger than that, it will NOT index any of the document regardless of the stream size setting.

However, if the file is under the file size limit, it will index the file up to the limit of the stream size.

With the default settings of Retain where the stream size and file size settings are matching values, a file over the file size will not be indexed at all.  A file under the file size limit will be completely indexed; thus, by default, it is all or nothing with Lucene unless those settings are changed.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2272.