Using Grand Central Dispatch where available

Istvan Csanady's picture

Apple released GCD a two years ago. GCD is a basically parallel programming toolkit, with super-efficient, lightweight asynchronous tools, like dispatch queues, semaphores, asynchronous I/O tools etc. Here is the reference:
https://developer.apple.com/library/Mac/DOCUMENTATION/Performance/Refere...

I have created a patch, that uses dispatch_semaphore_t instead of pthread_mutex_t in Standard_Mutex, and it seems to give some performance improvement over the phtread_mutex implementation. I have attached the patch for further examination.

The other thing where it can be useful, is the replacement of Intel TBB parallel_for_each, with dispatch queues, they can be easily implemented, reducing the 3rd party dependencies of OCCT. I also attached a simple implementation of it. It uses std::function, which is a C++11 feature, but C++11 is supported by Apple since 10.7+ and iOS 5.0+, so this should not be a problem. Otherwise, it can be rewritten work just like TBB's parallel_for_each.

These should not be considered as tested, and reviewed codes, these rather ideas, and suggestions for using this state of the art technology where it is available.

PS.: I could not attach the file, becasue only "jpg jpeg gif png txt doc xls pdf ppt pps odt ods odp" files are allowed (why?) so here is it:
https://www.cubby.com/pl/OCCT/_917a934c23ef42b0a1de1adedd637747

tpaviot's picture

Re: Using Grand Central Dispatch where available

Hi Istvan,

This looks very interesting. Feel free to submit this experimental work to oce as well, we have a few OSX users who might be helpful.

Thomas

Istvan Csanady's picture

Parallelization of FillDS part of BO

I have noticed, that the mentioned project is going great, but I have a remark on the flexible_for construction. Since the Intel TBB is already used (without any kind of wrapping) in the Mesh framework, it should be considered to move the flexible_for and flexible_range constructions to the TKernel framework, and use it in the Mesh framework just like it is now in the BO framework. One advantage of this would be that it would be much easier to provide an alternative parallel_for(_each) implementation, like I do it now. I have successfully implemented the flexible_for and flexible_range using Apple GCD, and it works great. It would be even better if I had to maintain these Intel TBB alternatives in a single place, especially that it seems like you are working hard on multithreading features in OCC, so we can expect many other parallelisation codes in OCC.

Ps.: of course I will publish this GCD based implementation as soon as possible.

abv's picture

Good idea

Hello Istvan,

Thank you for your comments and ideas, and sorry we have not commented on your previous message.

I agree it is very good idea to have a wrapper over tools we use for parallelization, to be able to support different implementations and not be bound to TBB. Feel free to register an issue in Mantis and -- as soon as you have something at hands -- to submit your changes to Git for review.

Regarding use of GCD on Mac, do you really find it anyhow better for mutexes than pthreads? I guess both libraries should be equally readily available on Mac OS X and iOS, and difference in performance should be negligible, at least for simple cases we have in OCCT now. If you have real experience of having your code running faster with GCD-based mutexes than with ptheads ones, or it is easier to build with GCD, please share some details.

Andrey

Istvan Csanady's picture

"Regarding use of GCD on Mac,

"Regarding use of GCD on Mac, do you really find it anyhow better for mutexes than pthreads? I guess both libraries should be equally readily available on Mac OS X and iOS, and difference in performance should be negligible, at least for simple cases we have in OCCT now. "

Yes, since OCCT usually does not require more than 2-3 threads, the difference is negligible. However GCD multithreading can be by orders of magnitudes faster when the application requires a lots of threads (considering only the overhead added by synchronisation). But this is not the case in OCCT.
But the GCD based parallel_for_each implementation does worth attention I think. It's native, does not require any 3rd party dependencies, and super-effective. It has been already integrated into OCE. I will submit the parallel_for implementation as well.

Istvan Csanady's picture

I have created the ticket:

www.opencascade.com

Copyright 2011-2019
OPEN CASCADE SAS
Contact us