Performance issues rendering with AIS_triangulation

Hello,
I am working on an industrial machining new simulation project and checking various solutions at this stage. In this simulator, meshes are first-class citizens (solids are meshified). My company has been using OCCT for decades. To implement a graphics engine + renderer in OCCT, I extended the Poly_triangulation class to support trasformations bulkly applied to each single vertex, as I am interested in rigid transformations to be efficient (at least those). So I implemented Transform(), Translate() and Rotate() methods, being very careful to transform each vertex in-place. With no extra-copies (and multi-threading, which is another topic), I managed to outperform a similar approach done in MachineWorks by a 10x. But bad news show up come when it comes to rendering.
The attached gif is a sample of a demo animation which involves a very limited number of items. Each frames takes 20 to 80 ms to render. Apart from the variance, which is as bad, the bottleneck is in the AIS_triangulation::Compute().
Trying to save time, I firstly cached
Handle(Graphic3d_ArrayOfTriangles) anArray = new Graphic3d_ArrayOfTriangles (myNbNodes, myNbTriangles * 3, hasVNormals, hasVColors, Standard_False);

by creating myArray as an internal class state, which I fetch in the ctor. So in the Compute() only myArray->SetVertice(i, nodes(i)); is set for each vertex, instead of reacreating a volatile anArray and anArray->AddVertex(nodes(i)). But this spares little time, as AddVertex/SetVertice end-up in a plethora of matrioskas, with 'if's all over, extra-copies, function call overhead: all is a mess from the performance POV.
So my question is very simple: the original triangulation (Poly_triangulation) is basically a contiguous set of doubles, in memory. This is kept as a pointer in AIS_triangulation, so it's still contiguous there. It gets fragmented when it gets transformed into a Graphic3d_something for rendering. So is there a way to do this with a single bulk operation, instead of fragmenting into single operations, yet with various overheads in between? This is overkill and <20fps for that scene is really poor.

If this requires some hack I am happy to patch the sources. If the performance cannot keep up to such relaxed requirements such as 40-60fps, I'll have to evaluate another approach.

Thanks in advance,
regards.
Marco Cecchi.

Attachments: 
Kirill Gavrilov's picture

It gets fragmented when it gets transformed into a Graphic3d_something for rendering.

I don't get what do you mean by "fragmented" here - Graphic3d_ArrayOfPrimitives is just a couple of continuous arrays of vertex attributes and indices - very close to Poly_Triangulation, just packed in single precision. Graphic3d_ArrayOfPrimitives is an auxiliary interface for filling in Graphic3d_Buffer + Graphic3d_IndexBuffer, which are passed to construct Vertex Buffer Objects (VBO) at lower graphics level. You may fill in Graphic3d_Buffer directly if you found per-vertex function overhead considerable in your case.

I extended the Poly_triangulation class to support trasformations bulkly applied to each single vertex, as I am interested in rigid transformations to be efficient (at least those). So I implemented Transform(), Translate() and Rotate() methods, being very careful to transform each vertex in-place.

I have some doubts that transforming vertices is what you really need here, unless your simulation computes deformations of water / soft bodies or something similar. Normally, you would better splitting a mesh into individual transforming pieces. Assigning Local Transformation to AIS object is much less expensive operation than computing per-vertex transformation on your own.

Note that OCCT doesn't yet implement interface for skeletal animation, because it is rarely useful in CAD applications. This mechanism should allow applying per-vertex transformations to the mesh without re-uploading of entire mesh data, as necessary transformations will be done by special GLSL Vertex Shader. If this is what you are trying to implement, then it could be done by improving OCCT or using custom GLSL programs.

Apart from the variance, which is as bad, the bottleneck is in the AIS_triangulation::Compute().

::Compute() usually creates primitive arrays in format, supported by graphics driver. As you already figured out, you may create this array on your own before hand and avoid extra copies of the same data - like AIS_PointCloud::Compute() which just puts myPoints into graphics group. Note, that apart from ::Compute(), there is also ::ComputeSelection() which also takes some time if your object is intended to be selectable, but this is a different topic - I assume that your objects are not selectable here (displayed with -1 selection mode).

What comes next to ::Compute() is uploading of these array into GPU memory - which might take a while for large amounts of data. If your transformations are straightforward, they could be done by implementing a custom GLSL program. If calculations aren't that simple, but mesh is updated partially, then Graphic3d_Buffer::InvalidatedRange() mechanism could be useful to re-upload modified portion of Graphic3d_ArrayOfPrimitives without recomputing entire presentation. Alternatively, you may split large mesh into smaller pieces and recompute only changed parts as individual AIS objects.

Marco Cecchi's picture

Hi Kirill,

thanks for your reply. Yes, I disabled selection right away to give it a try, but that was not a big deal.

>Graphic3d_ArrayOfPrimitives is just a couple of continuous arrays of vertex attributes and indices
Ok, so I'll look deeper into it. As far as I could see, even if the array is contigouous, each vertex data is singularly processed from the AIS_ object, with various overhead in between. This should be not related to moving data to the GPU: it comes before and causes a huge CPU processing time (GPU time is a snap). So if the memory is not fragmented, I will try implementing a 'bulk' way of moving all the data. Maybe I can set OCCT to work with float instead of double as the basic 'decimal' value? So with a memcpy I can set Graphic3d_Buffer ? Is this some kind of cmake parameter? Or else, this GLSL Vertex Shader thing that you say should be super-fast, I need to get the details.

My meshes are single machine units and change altogether when they need to be transformed. Transforming each single vertex is very fast, but actually there is no other way (apart from syntactic sugar), as at each time frame the whole mesh is rototranslated. Actually after this evaluation that I am doing, then the workpiece machinining will cause those kind of meshes to be 'soft transformed', also changing the mesh structure and so on. But for now, I expect the rendering of rigid meshes to be fast, which I am not getting as of now. Also, I don't think this easy scenario should require moving to OCCT/VTK interoperability, which I'd prefer to avoid.

Thanks again and regards.
Marco

Kirill Gavrilov's picture

Maybe I can set OCCT to work with float instead of double as the basic 'decimal' value? So with a memcpy I can set Graphic3d_Buffer ? Is this some kind of cmake parameter

You may configure Graphic3d_Buffer in whatever manner you want, as long as it is compatible with VBO. Vertex array is defined as interleaved attributes, but it is also possible defining a non-interleaved buffer Graphic3d_AttribBuffer::SetInterleaved(). It is always single floating point precision, as GPUs don't like double precision - so, be careful to avoid ruining data due to accumulated error within sequential transformations.

Poly_Triangulation stores double precision, but could be asked to store single precision Poly_Triangulation::IsDoublePrecision() instead, but I guess you don't need Poly_Triangulation at all, if you are going into low-level stuff.

Marco Cecchi's picture

>Poly_Triangulation stores double precision, but could be asked to store single precision Poly_Triangulation::IsDoublePrecision() instead, but I guess you
>don't need Poly_Triangulation at all, if you are going into low-level stuff.

True. Actually it always an abstraction that comes handy for setting material, colors, dealing with interactivity and so on in its AIS counterpart. Indeed, it can be extended to pump the AIS_/Poly triangulation vertex data (given that it's there, it's worth sticking with it) directly into video memory, as you suggest.

Thanks for your precious advices.

Marco Cecchi's picture

For the record, among other things one outstanding bottleneck is the recalculation of the bounding box in Graphic3d_Group::AddPrimitiveArray(), hence at each ::Redisplay().
Given that every AIS_ object is meant for rendering, I suggest having a stateful Graphic3d_BndBox4 that is kept in synch with creation/editing of the AIS basic elements.
I created a new pair of classes Poly_RigidTriangulation/AIS_RigidTriangulation whereby nodes and triangles can only be set at creation time and then only trasformed as I was saying. Each transform updates the BB as well. In the Compute(), only vertex coordinates are fast copied into a Graphic3d_Vec3, then
auto bb = myTriangulation->GetBB();
TheGroup->SetMinMaxValues(bb.CornerMin().x(), bb.CornerMin().y(), bb.CornerMin().z(),
bb.CornerMax().x(), bb.CornerMax().y(), bb.CornerMax().z());
TheGroup->AddPrimitiveArray(myArray->Type(), myArray->Indices(), myArray->Attributes(), myArray->Bounds(), false);

note the last false because with SetMinMaxValues() above the BB is updated in a single shot.

With all this, I get trivial rendering times instead of the O(milliseconds) that I was complaining about.