Is BRepFilletAPI_MakeFillet reentrant ?

Hello,

I am experiencing crashes with BRepFilletAPI_MakeFillet in multi-threading context.
My program runs N threads, each performing a fillet on some shape with its own BRepFilletAPI_MakeFillet instance.
I expect BRepFilletAPI_MakeFillet to be reentrant, in the sense of the Qt definition:
« A class is said to be reentrant if its member functions can be called safely from multiple threads, as long as each thread uses a different instance of the class. »

If I protect with a mutex the code section using BRepFilletAPI_MakeFillet then the program does not crash.

Does the OpenCascade team confirms BRepFilletAPI_MakeFillet is not reentrant ? If yes is there any plan to make it reentrant ?

Roman Lygin's picture

Hi Hugues,

First off, great to get in touch with you again ;-).
Most likely, BRepFilletAPI_MakeFillet is *NOT* re-entrant (yet). Recently, working on expansion of parallelism in CAD Exchanger, I had to fix a few core modules of Open CASCADE participating in multiple algorithms - intersections, approximations, etc. Please see id23952 in the Mantis tracker. It is very likely that these affected functions (in math, Approx2Var, ...) are also used in fillets.

You could experiment by picking up the fix and see the impact.
Of course, I would recommend that you use a thread checking tool to verify no data races. I could recommend Intel Inspector for that.

Hope this helps.
Roman

Forum supervisor's picture

Dear Hugues,
I confirm that BRepFilletAPI_MakeFillet algorithm is not certified for multi-threading and can't be considered as "re-entrant". Try the hint suggested by Roman. In any case you are welcome to make contribution in thread-safety of OCCT.
Regards

Hugues Delorme's picture

Hello,

Roman, I might try your patch but, even if it works, I won't be confident about thread-safety ...
Anyway it's great to see you still so involved in OpenCascade, working to make it better !

I was wondering if this workaround solution is of interest:
* Instead of running the code (using non-reentrant OCC classes) in a thread, run it inside a separated "heavy" process
* Create a memory segment shared by all the worker processes where the input data is copied.
* Make each process outputs its result in the shared memory segment

The solution is heavier than threading, put it should work. What is your opinion about it ?
What would OpenCascade team recommend ?

Roman Lygin's picture

Hi Hugues,

Going from shared to distributed memory (from multi-threading to multi-processing) is a common pattern to overcome thread-safety issues. So this will certainly work for the case you describe. I myself considered this for a use case where a 3rd-party code was involved, but this did not materialize.

You however should understand the implications of this vs multi-threading, for instance:
- Higher start up costs: forking processes each time for parallel region would introduce too much overhead and can kill your performance gain. An alternative is to use specific process managers (they are usually part of MPI implementations). Another aspect is that each worker might need to load/receive data before starting processing it.
- Higher memory footprint costs: each process will create its own set of working temporary data (e.g. OCC allocates some memory even if it may never use it). Depending on your algorithm you might have to read entire input model while processing only a chunk of it.
- Higher development costs: you will have to implement communication scheme between the process ranks. You might either use standard solution (MPI) or design your own IPC (inter-process communication).
- Poorer work balance: unlike dynamic scheduling (e.g. brought by TBB) you will likely have to implement static scheduling - distributing 1/n of work space to each process. If the input model has imbalance (which is likely so), you will encounter imbalance across the processes. The master and other workers may wait for one worker crunching with a more complex piece of work.

On the up side:
- Greater confidence of no data race. If your processes are single threaded, no data races inside.
- Greater scalability (from single node to cluster). Once you have made transition to multi-processing (esp. with MPI-based solution) then you get scalability to more than single node. Using fast network technologies (Infiniband) will give your scalability beyond SHM (Shared Memory segments on a single node).
- Ability to address greater problem size: going to multi-nodes can allow you to address data sizes which cannot fit into memory on a single node.

So you will have to decide on these and maybe other factors. If you wish, drop me an email at roman dot lygin at gmail dot com, so we can dive into more details.

Hope this helps.
Roman

P.S. MPI is Message Passing Interface, the industry standard programming model for multi-processing communications.