Review of the workflow of AIS_InteractiveContext

Roman Lygin
Sat, 12/06/2014 - 23:27

Forums:

The attached materials review a drawback of the current visualization workflow which has to block the app GUI thread and thus decreases GUI responsiveness.

Performance data have been collected to estimate potential gain of the improved workflow.

The materials are based on discussions with Open CASCADE visualization expert Sergey Anikin.

Attachments:

occ_visualization_workflow.pdf

Sergey Anikin
Thu, 12/11/2014 - 13:04

Hello Roman,

Thank you a lot for detailed and complete analysis!
Now it is much easier to understand what actually should be improved in OCCT visualization in the considered context.

Please, correct me if my understanding is wrong, but I interpret it as follows:

Iterative scene updates are no longer considered as major "N-square" performance bottlneck that you reported several times before
Pre-calculation of primitive arrays for custom interactive objects not using standard OCCT presentation tools is already possible at application level, thus no OCCT change is needed here.
If a custom interactive object is used that relies on StdPrs_ShadedShape tool, most of the time is spent on triangulation that can be suppressed using the workaround described in your document. So no OCCT code refactoring is needed in order to move most heavy parts of StdPrs_ShadedShape::Add() outside the rendering thread.
Correction of OCCT Mantis issue 23200 is still desirable to avoid the above-mentioned workaround

Best regards,
Sergey

Roman Lygin
Thu, 12/11/2014 - 18:03

Hi Sergey,

Thank you for your interest.
Let me clarify your points

1. Not quite. Even if you take everything else (i.e. tessellations, precomputations) to worker threads, the update itself obviously remains the
key hotspot of the GUI/rendering thread.
It effectively defines the throughput of the application. If the scene update takes 1-2 seconds or 0.5-1FPS (this is what we observe on large models)
then it does not make sense to call the scene update with higher frequency, even if worker threads can precompute parts faster (e.g. 5 parts a sec).
That's what I had to do - parts arriving from worker threads are just sent to AIS_InteractiveContext::Display(false) and do not trigger view update
until the time ellapsed since previous view update gets about the same as it takes to update the view.
That is about 5-10 parts can be skipped and only the next one will trigger the view update redrawing the entire scene.
This allows to keep UI responsive but not swamped by updates happenning after each part.

Thus, it is still a O(N^2) problem which requires the app developer to address that.

I used Amplifier XE to profile the scene update. Redraw(), which is now the only computation happening in GUI thread, breaks down into OpenGl_PrimitiveArray::DrawArray() - 89.8% and OpenGl_PrimitiveArray::BuildVBO() - 7.3%.
See enclosed screenshot of Amplifier*.
(http://s29.postimg.org/isrooa9uf/axe_redraw.png)

With that, I can only see two ways to increase current OCC througput:
Improve efficieny of existing or design new Graphic3d_ArrayOfPrimitives which would allow more efficient display. For instance, Graphic3d_ArrayOfSegments
requires adding each vertex twice. So if you need to draw a segment of n vertices you would need to call AddVertex() 2*n times. I tried to use
Graphic3d_ArrayOfPolylines which only requires 1 call but final performance of polylines was lower than of segments, so I had to drop that.
Not sure about the room for possible improvement here and how much you would be constrained by OpenGL API.

2. Correct

3. No, not quite. Even if you can move tessellation to the worker threads, you still leave 95% of AIS_IC::Compute() to rendering thread - see slide 10.
Thus, you will pay this cost upon first call to Compute(). If one could move this 95% to a worker thread that would still be beneficial, as this would decrease serial part.

4. Correct. It is "highly" desirable given that the work-around suggested in the presentation has a global impact (due to using a singletone factory).

* Is there a way to attach files to interim posts ? Please advise.

Sergey Anikin
Thu, 12/11/2014 - 18:11

Hello Roman,

* Is there a way to attach files to interim posts ? Please advise.

You should edit the original post of the thread, add the necessary attachment and later you can add a reference to it in any post.

Best regards,
Sergey

Roman Lygin
Thu, 12/11/2014 - 18:19

To follow up on #1 (more efficient use of primitives) - it is not a matter of how efficient it is to populate the array of primitives. This happens only once and can be done in a worker thread.

It is a matter of which primitives and how they would allow efficient use upon *every* redraw. For instance, if the primitive allowed to send 50% less data to GPU each time then this could be more efficient. Maybe data locality, etc. whatever use of glDrawElements() or glDrawArrays() might suggest...

Sergey Anikin
Thu, 12/11/2014 - 19:38

I perfectly understand what you said.
So basically we can consider two approaches to point 1:

Make the most of the regular redraw procedure - here we have some room for potential improvements, e.g. reorganizing OpenGL data in a more efficient way, decreasing the number of OpenGL calls, etc. Most likely, this will be done in any case, because all OCCT users will benefit from this improvement.
Incremental scene redrawing - i.e. only newly added objects are drawn each time, and the framebuffer is not cleared between updates. This approach is suitable for one very specific situation: when a lot of objects is added to the scene one by one, and all objects are static (not moving) - this is exactly your situation. Implementing it looks feasible, however it poses at least two technical problems:
- how to update the viewing frustum (and depth range) each time an object is added, or better to say, how to set it up initially to avoid any updates in the future - because updating the viewing frustum (and depth range) requires redrawing the complete scene?
- OCCT design issue - how to inform the renderer that it should draw the specified presentation only, without resetting the framebuffer? Currently, presentations are added to the scene one by one, but rendering always traverses the complete list of presentations. Perhaps, immediate mode could be used...have you ever experimented with immediate mode for this purpose?

Kirill Gavrilov
Thu, 12/11/2014 - 21:06

You might be interested to read investigation for #23519 issue.

Note that there are natural limitation of how we might optimize rendering of single primitive array, thus better performance in CPU-limited case might be achieved only by grouping independent objects with the same properties into single primitive array. Though this would involve additional problems for selection and highligting of parts.

Sergey Anikin
Fri, 12/12/2014 - 18:40

Hello Roman,

4. Correct. It is "highly" desirable given that the work-around suggested in the presentation has a global impact (due to using a singletone factory).

It would be great if you could help us to prepare 1-2 test cases for reproducing the remaining problems related to #23200 with the current OCCT Git master. I have already asked the issue reporter for the same.
Without suitable test cases, this issue is likely to remain in suspended state forever.
Thanks a lot in advance!

Best regards,
Sergey

Sergey Anikin
Fri, 02/13/2015 - 18:18

Hello Roman,

It might be interesting for you to have a look at the latest changes to issue #23200.
We hope that the implemented solution will give full control over tessellation to application developers.

Best regards,
Sergey