Wed, 10/07/2015 - 14:34

Dear All,

I have been trying to figure out how the following function work in TKMath, in the package for BSplines:

void BSplCLib::Bohm(const Standard_Real U,

const Standard_Integer Degree,

const Standard_Integer N,

Standard_Real& Knots,

const Standard_Integer Dimension,

Standard_Real& Poles)

According to the manual, it uses Böhm's algorithm to compute the value and derivatives up to order N. When I search for this algorithm, I find references to knot insertion. Are the two somehow connected? Is there some kind of a guide that explains what is happening in this algorithm?

I found the paper of W. Böhm from 1983 about "Efficient evaluation of B-Splines", but have so far been unable to identify the similarities...

Thank you for you help!

László

Wed, 10/07/2015 - 15:34

Dear Laszlo,

I suggest you to have a look at the message (if you are registered on development portal) http://tracker.dev.opencascade.org/view.php?id=26308#c42862.

Best regards

Sergey

Wed, 10/07/2015 - 17:59

Dear Sergey,

thank you for the reply. I had a look on the issue that you have linked in Mantis. To be honest, I don't see how this is linked to the Böhm algorithm. Obviously, the author of the issue wants to evaluate a spline surface outside it's parameter range, which is not well supported by the B-Spline theory, I guess. The cited Wikipedia article points mentions the Cox de Boor algorithm, but does not mention the one I asked.

Maybe I tell you more about the problem I want to solve: recently I have discovered that in my application, a lot of time is spent in this one function (BSplCLib::Bohm). Mostly because I have some BSpline surfaces whose value is evaluated a lot. I was tracking down how things are being called there, and I realized that at some point, RebuildCache is called behind the scenes, and this calls Bohm. I would expect that evaluating the value of a BSpline surface involves de Boor or something similar. I just simply don't understand why is Bohm called?

Thanks for your answer!

László

Wed, 10/07/2015 - 18:03

Oh, and if it has to be called, I am completely lost when looking at the function and trying to link it to things I already know about B-Splines. Variables names like "*pole, coef, *tbis, *psav, *psDD, *psDDmDim", for loops with some weird pointer arithmetic makes me feel unsecrue about what is going on there... Could you point me to the paper or the concept which is followed by this function.

Sorry if this is a silly question...

Wed, 10/07/2015 - 19:20

Hello Laszlo,

You may refer to the paper dedicated to performance analysis of NURBS evaluation - please see https://software.intel.com/en-us/articles/applying-vectorization-techniq.... It contains brief explanation of the used algorithms and refer to other papers.

BSplCLib::Bohm() is often a hotspot in many compute-intensive algorithms (projections, intersections, triangulation, etc). The primary root-cause for that is usually poor locality of the points being evaluated along the B-Spline (especially surfaces). You could likely reproduce that effect, if, for instance, you randomly evaluate 1M (u,v) points on a NURBS surface. In this case, the above method will likely be a hotspot. However if you choose a tile-like approach and evaluate the same points in chunks grouped into local groups then the performance profile will likely change as your cache hit ratio will be improved and Bohm() will be called much fewer times.

This effect has exact same nature as cache miss when working with any data in RAM. If you can reuse data in cache then you can substantially speed up your computations comparing to random access in RAM.

Hope this will give you some food for thought.

Good luck!

Roman

Tue, 10/13/2015 - 10:22

Dear Roman,

thank you for your answer! I read your paper about the optimization issues, looks interesting! Are the changes incorporated in newer versions of OpenCASCADE? I use 6.7.0, and as far as I saw, the corresponding PLib functions are still the "old fashioned" versions, with for example colum-wise array access..

Actually, before posting I did exactly the same study that you suggested, with the randomized point evaluation. Bohm really becomes a bottleneck in these cases. I am wondering if I could improve it somehow, so that others could also benefit from it. It seems that the bottleneck right now for me is trying to understand what is happening there. :) That's why I was asking for the paper that explains the theory behind.

Anyways, if I manage to improve something there, I will post the corresponding changes here.

László

Thu, 10/15/2015 - 16:59

Dear Laszlo,

Original modifications were not incorporated due to IP (Intellectual Property) issues - original OCC CLA (Contributor License Agreement) could not be accepted by Intel legal to hand over copyright, a few other options did not work out either, and I left Intel before we could find any other solution. Unfortunately that happens in large companies.

Version 6.9.0 introduced OCC own template-based implementation of the ideas discussed in the paper. See #24285 (http://tracker.dev.opencascade.org/view.php?id=24285).

For any algorithm heavily exploiting B-Spline evaluation, the first effort should be made on improving data locality rather than to rely on Bohm() improvements. Just like you should improve data locality when accessing RAM than expecting that RAM access improves in hardware over time.

Of course, one does not exclude the other but I don't remember if there is anything else significant left in Bohm() itself. If you discover something I would definitively be interested to discuss your research findings.

Good luck!

Roman

Sun, 10/25/2015 - 14:45

Hello all,

after some debugging and call tree investigation, I concluded the following about the caching issue and Bohms algorithm. It would be nice if someone experienced could confirm the following findings:

- Every B-Spline curve and surface is essentially a polynomial inside the knot spans, therefore the cheapest way to evaluate them is to compute their Taylor expansion in the knot span and apply Horners scheme for evaluating the Taylor polynomial. The task of ::Bohm is to gather the Taylor coefficients together in an efficient way.

- When a curve/surface has to be evaluated frequently, the coefficients of the Taylor expansion can be reused to speed up the computation. Therefore, when e.g. BSplineSurface::D0 is called with a pair of parameters (u_1,v_1), a check is done to decide whether they lie in the knot span where the Taylor coefficients have been computed. If it is so, CacheD0 evaluates the polynomial. If it is not, the cache has to be rebuilt for the knot span where (u_1,v_1) is located. This again calls Bohm.

-The first two points imply that it is much cheaper to evaluate a set of parameters on b-Splines knotspan-wise, instead of in a random order. This can be confirmed by a test suggested by Roman, where e.g a million poins are evaluated in a random order vs a million points are evaluated sequentially. According to my own tests, a sequential evalutaion is significantly faster. Obviously, as Roman suggests, many things can be made faster when the sequence of evalutation is taken into account, but the thing is that many high level algorithms (e.g. BRepClass3d_SClassifier, or many Boolean operations) do not allow for such optimizations.

Now comes my question:

Would it be possible that we cache the Taylor coefficients for every knot span? This could be done at the construction time of the b-spline surface once, and then Bohm could be avoided later on by just returning the already cached coefficients.

I think I am gonna try this out and give an update if I have some results on this.

László

Mon, 10/26/2015 - 15:21

Dear László,

Please note that the current implementation of B-Spline cache will be seriously revised in OCCT 7.0.

As you can see in the current OCCT Git master, the cache has been separated from curve and surface classes to dedicated classes, BSplCLib_Cache and BSPLSLib_Cache.

This has been done to reduce memory footprint of data classes, and to avoid data races when B-Splines are evaluated in parallel from several threads.

Thus please consider using current master (rather than previous releases) for your experiments and developments, to be inline with OCCT evolution.

In the current implementation, coefficient for only one knot span of B-Spline are cached (just as before).

Your idea of caching coefficients of all spans is quite reasonable, and we appreciate your intention to try this development.

We suggest that you share your progress and results on the development portal http://dev.opencascade.org/index.php?q=home/forum.

That forum is specifically intended for discussion of OCCT developments.

Best regards

FSR