
Sat, 12/06/2014 - 23:27
Forums:
The attached materials review a drawback of the current visualization workflow which has to block the app GUI thread and thus decreases GUI responsiveness.
Performance data have been collected to estimate potential gain of the improved workflow.
The materials are based on discussions with Open CASCADE visualization expert Sergey Anikin.
Attachments:
Thu, 12/11/2014 - 13:04
Hello Roman,
Thank you a lot for detailed and complete analysis!
Now it is much easier to understand what actually should be improved in OCCT visualization in the considered context.
Please, correct me if my understanding is wrong, but I interpret it as follows:
Best regards,
Sergey
Thu, 12/11/2014 - 18:03
Hi Sergey,
Thank you for your interest.
Let me clarify your points
1. Not quite. Even if you take everything else (i.e. tessellations, precomputations) to worker threads, the update itself obviously remains the
key hotspot of the GUI/rendering thread.
It effectively defines the throughput of the application. If the scene update takes 1-2 seconds or 0.5-1FPS (this is what we observe on large models)
then it does not make sense to call the scene update with higher frequency, even if worker threads can precompute parts faster (e.g. 5 parts a sec).
That's what I had to do - parts arriving from worker threads are just sent to AIS_InteractiveContext::Display(false) and do not trigger view update
until the time ellapsed since previous view update gets about the same as it takes to update the view.
That is about 5-10 parts can be skipped and only the next one will trigger the view update redrawing the entire scene.
This allows to keep UI responsive but not swamped by updates happenning after each part.
Thus, it is still a O(N^2) problem which requires the app developer to address that.
I used Amplifier XE to profile the scene update. Redraw(), which is now the only computation happening in GUI thread, breaks down into OpenGl_PrimitiveArray::DrawArray() - 89.8% and OpenGl_PrimitiveArray::BuildVBO() - 7.3%.
See enclosed screenshot of Amplifier*.
(http://s29.postimg.org/isrooa9uf/axe_redraw.png)
With that, I can only see two ways to increase current OCC througput:
Improve efficieny of existing or design new Graphic3d_ArrayOfPrimitives which would allow more efficient display. For instance, Graphic3d_ArrayOfSegments
requires adding each vertex twice. So if you need to draw a segment of n vertices you would need to call AddVertex() 2*n times. I tried to use
Graphic3d_ArrayOfPolylines which only requires 1 call but final performance of polylines was lower than of segments, so I had to drop that.
Not sure about the room for possible improvement here and how much you would be constrained by OpenGL API.
2. Correct
3. No, not quite. Even if you can move tessellation to the worker threads, you still leave 95% of AIS_IC::Compute() to rendering thread - see slide 10.
Thus, you will pay this cost upon first call to Compute(). If one could move this 95% to a worker thread that would still be beneficial, as this would decrease serial part.
4. Correct. It is "highly" desirable given that the work-around suggested in the presentation has a global impact (due to using a singletone factory).
* Is there a way to attach files to interim posts ? Please advise.
Thu, 12/11/2014 - 18:11
Hello Roman,
You should edit the original post of the thread, add the necessary attachment and later you can add a reference to it in any post.
Best regards,
Sergey
Thu, 12/11/2014 - 18:19
To follow up on #1 (more efficient use of primitives) - it is not a matter of how efficient it is to populate the array of primitives. This happens only once and can be done in a worker thread.
It is a matter of which primitives and how they would allow efficient use upon *every* redraw. For instance, if the primitive allowed to send 50% less data to GPU each time then this could be more efficient. Maybe data locality, etc. whatever use of glDrawElements() or glDrawArrays() might suggest...
Thu, 12/11/2014 - 19:38
I perfectly understand what you said.
So basically we can consider two approaches to point 1:
- how to update the viewing frustum (and depth range) each time an object is added, or better to say, how to set it up initially to avoid any updates in the future - because updating the viewing frustum (and depth range) requires redrawing the complete scene?
- OCCT design issue - how to inform the renderer that it should draw the specified presentation only, without resetting the framebuffer? Currently, presentations are added to the scene one by one, but rendering always traverses the complete list of presentations. Perhaps, immediate mode could be used...have you ever experimented with immediate mode for this purpose?
Thu, 12/11/2014 - 21:06
You might be interested to read investigation for #23519 issue.
Note that there are natural limitation of how we might optimize rendering of single primitive array, thus better performance in CPU-limited case might be achieved only by grouping independent objects with the same properties into single primitive array. Though this would involve additional problems for selection and highligting of parts.
Fri, 12/12/2014 - 18:40
Hello Roman,
It would be great if you could help us to prepare 1-2 test cases for reproducing the remaining problems related to #23200 with the current OCCT Git master. I have already asked the issue reporter for the same.
Without suitable test cases, this issue is likely to remain in suspended state forever.
Thanks a lot in advance!
Best regards,
Sergey
Fri, 02/13/2015 - 18:18
Hello Roman,
It might be interesting for you to have a look at the latest changes to issue #23200.
We hope that the implemented solution will give full control over tessellation to application developers.
Best regards,
Sergey