Partial reading of binary documents

Author: Mikhail Ponikarov

Forthcoming OCCT 7.6.0 release improves reading and writing of OCAF documents for binary formats of files. Now it is possible to load only a part of the document or append some part into an existing one.

This functionality could be used, for example, for implementation of preview of the document content or postponed reading of some parts of the document, unused at a moment. Another approach may be to load the document structure, without shapes inside, and after some analysis of this structure load only the needed shapes.

For that purpose, use the PCDM_ReadingFilter class. Define in it which attributes must be loaded or omitted, or define one or several entries for sub-tree that must be loaded. The following example opens the document aDocument but reads only "0:1:2" label content with all sub-elements and only TDataStd_Name attributes on them.

Handle(PCDM_ReaderFilter) aFilter = new PCDM_ReaderFilter("0:1:2");
anApplication->Open("example.cbf", aDocument, aFilter);

Using the same filter interface, part of the document can be appended into the already loaded document from the same file. For example, to read into the previously opened aDocument all attributes, except TDataStd_Name and TDataStd_Integer:

Handle(PCDM_ReaderFilter) aFilter2 = new PCDM_ReaderFilter(PCDM_ReaderFilter::AppendMode_Protect);
anApplication->Open("example.cbf", doc, aFilter2);

Inside of PCDM_ReaderFilter, the AppendMode_Protect mode means that if the loading algorithm finds an already existing attribute in the document, it will not be overwritten by the attribute from the loading file. If it is needed to substitute the existing attributes, the reading mode AppendMode_Overwrite could be used instead.

AddRead and AddSkipped methods for attributes should not be used in one filter. If it is so, the skipped attributes are ignored during the reading.

Appending to the document content of the already loaded file may be performed several times with the same or different parts of the loaded document. For that purpose, the filter reading mode must be AppendMode_Protect or AppendMode_Overwrite, which enables the "append" mode of the document to open. If the filter is empty or null or skipped in arguments, it opens a document with the "append" mode disabled and any loading limitations.

For fast reading of the part of the document, the new file format (number 12) was introduced into OCAF. The following features were applied in it:

  • Geometrical and topological information is stored right in the corresponding attributes area in the file, not in a special section.
  • Size of each label (and sub-elements content) and each attribute is stored, that provides the ability to skip any part of a file.
  • Several objects in the document are stored in a more compact way, which allows minimizing the stored file size despite newly added data.

As a result, files in this format become slightly smaller, writing 10-15% slower, because it is done in two passes: writing the main file content and writing sizes of blocks after these sizes are measured.

However, reading of the whole document becomes 5-15% faster since the file is smaller and no one seek is needed in this case. In any way, these results are very dependent on the environment: reading from network disks, slow devices, or files cached in RAM. We provide here average numbers.

Partial reading of the document may be applied to the documents stored in older formats; thus the new format is much better in this. Below we provide some statistics of reading of different parts of a medium-size document (about 8MB), located locally on a medium SSD: