Fri, 01/08/2016 - 11:06
I have found a strange behaviour of Open CASCADE 6.8.0 (but using the community edition) and I would like to share my measurements with you and ask for your experiences.
My code is doing computations on shells by iterating over their faces and evaluating points and derivatives (D0 and D1). I have been working with a shell created with CATIA Freestyle where these computations were particularly slow. My profiler states that most of the time is spend in the following methods:
If I read the results correctly, almost the whole time is spent in these functions.
I had realized, that the shell has only one face, even though it has a quite complex, curved form. Therefore I have split the shell into more faces by trimming it in CATIA and recombining the results. If the same shell is stored with nine faces instead of one face and read in from a STEP file, the computations are quicker by a factor of four.
Is this something general? Or is this a bug? Have you ever experienced something like this? Do you even know a better workaround than manually splitting shells into more faces? Thank you very much for sharing your knowledge!
Fri, 01/08/2016 - 12:10
I bet you are observing an issue of locality instead of "number of faces". It all depends on the number of spans per each face and how you evaluate values and derivatives on the original face - whether you evict or reuse pre-computed cache. Please refer to this discussion - http://www.opencascade.com/content/bsplclibbohm - that explains the root-cause.
Note that with upcoming 7.0 things can get worse out of the box. If you just use Geom_BSplineSurface::Dx() then performance will drop as no cache will be used at all. You will need to use GeomAdaptor_Surface which will now hold the cache. On the other hand, if/when the multi-cache feature discussed here (http://dev.opencascade.org/index.php?q=node/1138, see end of the thread) is implemented, even a single Face can give a better result than in 6.9.x.
Hope this helps.
Fri, 01/08/2016 - 17:40
thank you very much for the explanation! It helped very much!
Mon, 01/11/2016 - 15:14
I wonder whether it is possible to enlarge the cache to have some speed-up until there is a final solution. Is this possible?
Mon, 01/11/2016 - 20:31
Let me ask my question more precisely: am I right that the current cache only stores the last knot span? Therefore non-locality makes recomputations necessary and it is not easily possible to enlarge the cache somehow to reduce the locality problem. Perhaps thing like that will be possible from OCCT 7.0 on. Is this correct?
Thank you very much!
Thu, 01/14/2016 - 12:53
I would advise you (and any community member interested in the resolution of this issue) to make a personal contribution into this bug by either
a) developing a corresponding source code (via the Development Portal) or
b) by using our support services.
This could radically speed up the processing of this issue.
Tue, 01/12/2016 - 11:23
Currently this is not available in OCC (only the last computed span is cached). The referred discussion on GeomAdaptor is exactly about that. Until that is implemented in OCC you could try to implement this in your own code.
Tue, 01/12/2016 - 10:01
yes, I think you are right. In the cache, only the recently evaluated knot span is stored. There was a discussion about this a couple of weeks ago here:
According to the mysterious Bohm's approach (actually, I think, it is rather Böhm's approach), a B-Spline is essentially a polynomial in every knot span, therefore on the long run it is faster to compute its Taylor expansion knotspan-wise and apply Horner's scheme to evaluate the polynom in the knot span. The Taylor coeffiecents are cached, so that when many points are evaluated in a span, they can be reused. However, when you jump to a new span, the coefficients are thrown away, and they need to be recomputed.
You can try it out yourself: create a B-Spline surface with 2x2 knot spans, and evaluate e.g. 1000000 points in two different ways: a), you evaluate the points randomly: this should take longer, because there will be a lot of jumps between different spans. b) you evaluate 250000 points in the span (0,0), then 250000 in span (0,1) and so on. This will be significantly faster, because the BSplCLib::Bohm will be called much less.
I wrote an example to demonstrate this, I will upload in when I will find some time,