Parallelizing BOPs

Pawel's picture

Meanwhile, the performance of BOPs in OCCT has frequently been discussed and compared to other systems (e.g. CATIA). Although OCCT-BOPs surely have their advantages over other systems (e.g. precision) they usually are much slower than the competitors.

At this point I'd like to suggest parallelizing BOPs (where possible) using multiple processor cores (or maybe even GPU-cores) to accelerate Boolean operations.

Do you think this task can be incorporated into the road map?

Thanks in advance!

Pawel

ifv's picture

Parallelizing BOPs

Dear Pavel thank you for suggestion, parallelizing of BOP is included in road map. You can visit updated Modeling page to see all planned actions for improving performance and reliability of BO and other modeling algorithms.

Br
Igor

Pawel's picture

Parallelizing BOPs

Dear Igor,

yes thank you for letting me know.

I have read the updated version of the road map and I'm very glad you're going to focus on parallelism too!

Pawel

mach22's picture

BOP future works

When will you plan to make this new developments in BOP? They will be availables in the next public release?

abv's picture

Parallel BOPs are being implemented

Hello,

This development is in progress. It is being implemented gradually: some parallel processing is provided in OCCT 6.7.1 (see Release Notes, issue 24157), though more complete implementation will be in 6.8.0. We are going to share more details in separate thread.

Andrey

Timo's picture

Parallel BOPs in OCCT 6.7.1

As far as I can see, when using the API provided by package BRepAlgoAPI the boolean operations don't run in parallel mode by default in OCCT 6.7.1. There is a possibility to switch on parallelism if a BOPAlgo_PaveFiller is used, but this refers only to the intersection part. How can I switch on parallelism for the building part? Shouldn't I use BRepAlgoAPI for this but go one level deeper? Will BRepAlgoAPI provide a parallelism switch in OCCT 6.8.0?

What is the state of parallelism of BOPs in OCCT 6.7.1? I mean, can parallel mode be used for production or are there any known issues?

Ragards,
Timo

abv's picture

Parallel BOPs are not yet ready for production

Hello Timo,

By the moment parallel execution of Boolean operations is disabled by default, and corresponding option is available only on the level of BOPAlgo (see BOPAlgo_Algo::SetRunParallel()). The testing was so far limited to non-regression in sequential mode, and testing on selected cases in parallel mode. No known issues exist; however more careful testing is needed to be sure it works well in all circumstances. We are going to complete this testing during the summer, so that this functionality can be included in OCCT 6.8.0. The option to enable / disable parallel processing should be provided on API level (BRepAlgoAPI) then.

Andrey

Timo's picture

Switch for parallel BOPs

In OCCT 6.8.0 it was possible to set a default parallelism mode for BOPs using the static function BOPAlgo_Algo::SetParallelMode.
It was still possible to change the setting for individual BOPs via BOPAlgo_Algo::SetRunParallel().

In OCCT 6.9.0 these functions still exist but the default parallelism mode does not have an effect if BRepAlgoAPI is used because in the constructor BRepAlgoAPI_Algo::BRepAlgoAPI_Algo() myRunParallel is set to false by default. So, for every usage of BRepAlgoAPI you have to explicitly set the parallel mode via BRepAlgoAPI_Algo::SetRunParallel().

Wouldn't it be better to set the myRunParallel flag in BRepAlgoAPI_Algo::BRepAlgoAPI_Algo() to BOPAlgo_Algo::GetParallelMode() instead of false?
In this way, you could define a default parallel mode for your application via BOPAlgo_Algo::SetRunParallel() and only in rare cases when you really want to use a different mode for a boolean operation you would have to define it explicitly via BRepAlgoAPI_Algo::SetRunParallel().

Additionally, if this change is accepted it might be possible to simplify the BOPTest commands because currently they hold their own flag in BOPTest_Objects::myRunParallel. But this might be wrong. Maybe it is needed as it is.

Regards

kgv's picture

Wouldn't it be better to set

Wouldn't it be better to set the myRunParallel flag in BRepAlgoAPI_Algo::BRepAlgoAPI_Algo() to BOPAlgo_Algo::GetParallelMode() instead of false?

we have discussed this question (internally) in scope of BRepMesh_IncrementalMesh + BRepMesh_IncrementalMesh::SetParallelDefault() and has concluded that it would be bad idea to change default behavior of algorithm in such way using global variable.

pkv's picture

Current Progress

Parallelization of Boolean Operations Algorithm

1. Preface

The problem of improvement the performance is actual for CAD Kernel where there are huge amount of data to treat. The Boolean Operations Algorithm (BOA) is one of the most wanted algorithms for any CAD system. On the other hand BOA is one of the most time-consuming CAD Algorithm. Thus the task of improvement performance of BOA is really crucial for OCCT.

Nowadays computer systems without multiple processor cores have become
relatively rare. Parallel programming models that exploit the advantages of multi-core systems allow to increase the performance drastically.

The facts above allow formulating the task: to improve the performance of BOA using the advantages of multi-core systems.

To achieve the goal the Intel Threading Building Blocks (Intel TBB) library has been chosen.

Intel TBB is the way that helps to specify parallelism far more conveniently than using raw threads, while improving performance, portability, and scalability.

2. Developments and Implementations

The subject is of two tasks:

2.1 Design the basic schema that allows:

  • Separate inner TBB contents and features from the existing OCC code. This allows to use any other threading library (if necessary) without the necessity to rewrite the OCC code.
  • Turn on / off the parallelism on the fly if necessary, independently
    from existing threading library implementation. The fact allows to switch the parallel treatment according to current treatment conditions.

The schema was designed as set of templates. The templates encapsulate all used TBB types, template classes, etc. The OCC code is separated from inner contents of parallelization library. The templates can be instantiated on the fly inside the code intended to parallelize without any mention about parallelization driver. The driver can be changed easily. To keep the things moving it is just enough to provide necessary interface of corresponding template classes.

The schema allows switching the parallel treatment on/off according to current treatment conditions. The feature is very convenient and comfortable to carry out explorations, customizations, tuning, prevent overheads, debugging, etc

The implementation of schema is general and can be applied to any code that intended to be parallel.

The implementation of schema can be extended (if desired) by using other features of TBB or other threading libraries

2.2 Implement the parallelism in BOA using the schema.
The Boolean Operation Algorithm consists of two parts: Intersection Part and Building Part. The implementation of parallelism has also been divided on two corresponding parts. Inside each part the levels of parallelization have been defined. Then the schema has been applied to each level.
In particular for Intersection Part almost all parts of high level have been parallelized:

  • intersection of bounding boxes of source shapes
  • computation of Vertex/Edge interferences
  • computation of Edge/Edge interferences
  • computation of Vertex/Face interferences
  • computation of Edge/Face interferences
  • computation of Face/Face interferences
  • computation of Split Edges
  • computation of p-curves

Furthermore, due to recursive nature of the Intersection Part of the Algorithm all post-treatment parts of Edge/Edge, Edge/Face, Face/Face interferences have been parallelized automatically.

As for the Building Part of the Algorithm the most time-consuming parts of high level:

  • building split faces
  • building same domain faces
  • building split solids
  • checking the results
  • and one level deeper:

  • splitting wires for faces
  • splitting shells for solids

have been parallelized.

3. Results and Resume

3.1. Environment
The following environment is used to obtain the results:
Processor: Intel(R) Core(TM) i5-3450 CPU @ 3.10 GHz
Installed memory (RAM): 16GB
System type: 64-bit
Operating System: Windows 7. Service Pack1
Compiller: Microsoft Visual Studio 2008 Version 9.
OpenCASCADE: 6.7.1 dev, optimized mode
Time: CPU user time, sec

3.2 Results
The Table below shows the speedup for real cases from grid tests database.
The comparison done for parallel and serial modes.






No Test Name Parallel Serial Speedup
1 boolean bcommon_complex C1 0.19 0.28 1.47
2 boolean bcut_complex G1 0.71 2.04 2.87
3 boolean bcut_complex G2 1.56 5.38 3.45
4 boolean bcut_complex G3 0.16 0.27 1.69
5 boolean bcut_complex G7 0.65 2.08 3.2
6 boolean bcut_complex L6 0.83 1.65 1.99
7 boolean bcut_complex M2 0.39 0.92 2.36
8 boolean bcut_complex N9 1.45 3.63 2.5
9 boolean bcut_complex Q9 6.99 13.98 2
10 boolean bfuse_complex E3 0.15 0.51 3.4
11 boolean bfuse_complex E6 0.52 1.14 2.19
12 boolean bfuse_complex F1 1.39 4.65 3.35
13 boolean bfuse_complex F8 0.89 2.73 3.07
14 boolean bfuse_complex J6 0.17 0.52 3.06
15 boolean bfuse_complex K7 0.23 0.49 2.13
16 boolean bfuse_complex N2 0.42 1.08 2.57
17 boolean bfuse_complex O9 0.56 0.97 1.73
18 boolean bfuse_complex P8 1.04 1.96 1.88
19 boolean bfuse_complex R9 0.91 2.38 2.62
20 boolean bopcut_complex C7 3.07 5.78 1.88
21 boolean bopcut_complex N4 1.67 4.21 2.52
22 boolean bopcut_complex P8 1.71 4.57 2.67
23 boolean bopsection D4 1.13 3.32 2.94
24 boolean bsection H8 1.18 3.54 3
25 boolean bsection J7 1.51 4.09 2.71
26 bugs modalg_1 bug10160_1 7.21 15.9 2.21
27 bugs modalg_1 bug12257 2.16 4.6 2.13
28 bugs modalg_2 bug23100 12.73 22.71 1.78
  Total 51.58 115.38  
  Average Speedup     2.24

The results shows considerable performance improvement for parallel version comparing with the serial version of BOA.
The results shows the operability and applicability of implementation schema proposed.
Nonetheless there are cases where the performance can still be improved using more deep levels of parallelization.
This is the matter of analysis and improvements for the future.

www.opencascade.com

Copyright 2011-2017
OPEN CASCADE SAS
Contact us