An example of why Parallel.ForEach on renderer is bad...


Go to page 1, 2  Next

Users browsing this forum: No registered users and 2 guests

Next topic
Previous topic
Post new topic Reply to topic  NeoAxis Forum Index » Additional » Programming
Search for:
Author Message
Moderator
Moderator
User avatar

Joined: Wed Nov 28, 2007
Posts: 1526
Location: Vancouver, British Columbia
Country: Canada (ca)
PostPosted: Sat Apr 28, 2012 4:24 am Post subject: An example of why Parallel.ForEach on renderer is bad...
Bottom of Page Back to top
Currently doing experiments with .NET 4.0 and Task Parallel Library (TPL). One of my adventures taught me that you should never attempt to parallel anything that runs a native pointer. A good example of this would be the renderer since it uses OGRE.

Screenshot002.png
Parallel.ForEach Renderer.dll results...



Things that parallel are good for seem to be limited to purely managed functions. Something that is mostly written in managed code is the GUI presentation system within the game engine. So if you are looping through controls and have lots of drawing functions TPL will be a god-send to you because you can do things like this:

BEFORE:
Code: Select all   Expand view
            // close all windows
            foreach (Control control in this.controlManager.Controls)
            {
                control.SetShouldDetach();
            }


AFTER:
Code: Select all   Expand view
            // close all windows
            Parallel.ForEach(this.controlManager.Controls, control => control.SetShouldDetach());


The code looks similar and does not become unreadable and best of all it will tell the task scheduler built into .NET 4.0 and Mono 2.10x that it should create more threads to do this job if the main one becomes cluttered. The task scheduler already does things like this automatically but by doing a little bit of this 'hand-holding' it is possible to steer the task scheduler to do optimizations on things we care more about.

More experiments are needed but I thought it would be worth mentioning my findings so far. There is good potential for serious speed increases using TPL that will work across Windows, Mac and Linux. :work:

_________________
I'm NOT paid or work for NeoAxis. I only help moderate these boards. My opinions are not the opinions of NeoAxis Group LTD.
 
 Profile  

Joined: Tue Nov 01, 2011
Posts: 59
Country: Cuba (cu)
PostPosted: Wed May 02, 2012 9:51 pm 
Bottom of Page Back to top
Maxwolf wrote:
Things that parallel are good for seem to be limited to purely managed functions.


Um, I am not agree. Anyway I will check later. I think you found an special case where the behavior of running things in parallel is wrong.

First native C++ on Windows also have an equivalent of C# TPL and second running things in a Parallel.ForEach function will not warranty that it will be executed in parallel and either that it will perform better. It all depends on current processors scheduling and the code your are trying to execute in parallel. To be sure you always need to make tests of viability. And be careful because running not thread safe code in parallel can have ugly consequences.

Maxwolf wrote:
There is good potential for serious speed increases using TPL that will work across Windows, Mac and Linux.


I agree here.

 
 Profile  
Moderator
Moderator
User avatar

Joined: Wed Nov 28, 2007
Posts: 1526
Location: Vancouver, British Columbia
Country: Canada (ca)
PostPosted: Wed May 02, 2012 9:58 pm 
Bottom of Page Back to top
@YandyZM I agree my methods are not the most sophisticated; I'm experimenting :)

There is also Parallel.Invoke which allows things to be run in a specific order but I agree with you that much planning is required!

_________________
I'm NOT paid or work for NeoAxis. I only help moderate these boards. My opinions are not the opinions of NeoAxis Group LTD.
 
 Profile  

Joined: Tue Nov 01, 2011
Posts: 59
Country: Cuba (cu)
PostPosted: Wed May 02, 2012 10:03 pm 
Bottom of Page Back to top
Ok, anyway happy to see people writing code that may effectively use nowadays multicore CPUs.

 
 Profile  
Source License
Source License

Joined: Sun Sep 14, 2008
Posts: 689
Country: Norway (no)
PostPosted: Thu May 03, 2012 12:12 am 
Bottom of Page Back to top
I'm also stumbling in the dark trying to gain some advantage with the Parallel approach.

The thing is that a huge chunk of the core calls end up in a low level call to Ogre. Keeping the threads from bumping into each other is very difficult. At times I thought that I hit the nail and successfully made some major part running in parallel, but then suddenly 3 minutes into the simulation the fun is over; Threads bump heads in the same memory space... At other times I successfully fork something into parallel execution, but then it tend to be some minor part sufficiently isolated and doesn't give any advantage. Very often there is actually a disadvantage since the TPL has a considerable init overhead.

In my project the most heavy CPU consumer outside of ogre main rendering is MapObject transformations. Notably nearly all end up in the method OnSetTransform(...). Here is a snippet of some testing I've done with this... ( hope Ivan doesn't mind me posting this...)
Code: Select all   Expand view
protected virtual void OnSetTransform( ref Vec3 pos, ref Quat rot, ref Vec3 scl )
{       
    //Added by John
    /*if ( !TPLTasksMapSystem.OnSetTransformQueue.ContainsKey( this ) )
    {
        TPLTasksMapSystem.DataFormTransform data;
        data.pos = pos;
        data.rot = rot;
        data.scl = scl;

        TPLTasksMapSystem.OnSetTransformQueue.Add( this, data );
    }*/
    //End Added

    if( lastTickTime != Entities.Instance.TickTime )
      OnUpdateOldTransform();

   position = pos;
   rotation = rot;
   scale = scl;

   //foreach( MapObjectAttachedObject attachedObject in attachedObjects )
   //   attachedObject.OnSetTransform(); //Disabled by John

    for ( int i = 0; i < attachedObjects.Length; i++ )
        attachedObjects[ i ].OnSetTransform(); //Added by John

   CalculateMapBounds();

   if( allowAddToMapNodes )
      Map.Instance.UpdateObjectIntoNodes( this );

   if( TransformChange != null )
      TransformChange( this );
}

//Added by John
/*public void OnSetTransformFromTPL( Vec3 pos, Quat rot, Vec3 scl )
{
    if ( lastTickTime != Entities.Instance.TickTime )
        OnUpdateOldTransform();

    position = pos;
    rotation = rot;
    scale = scl;

    if ( !TPLTasksMapSystem.OnSetTransformAttachedObjectsQueue.ContainsKey( this ) )
        TPLTasksMapSystem.OnSetTransformAttachedObjectsQueue.Add( this, attachedObjects );

    CalculateMapBounds();

    if ( allowAddToMapNodes && !TPLTasksMapSystem.UpdateObjectsIntoNodes.ContainsKey( this ) )
        TPLTasksMapSystem.UpdateObjectsIntoNodes.Add( this, this );

    if ( TransformChange != null )
        TransformChange( this );
}*/
//End Added


Here I'm working on a backend TPL class. Main idea was to queue the calls and the execute them in parallel thus preventing memory corruption. This actually worked from a stability point of view. Unfortunately I didn't see much performance advantage, but what's worse (and kinda expected) is the random execution order in parallel causing "transformation" artifacts.

I fully admit I don't understand this properly yet and thus probably miss the better opportunities, but the hope of finding that silver bullet is fading...

 
 Profile  
Administrator
Administrator
User avatar

Joined: Wed Oct 11, 2006
Posts: 4908
Location: Kazan
Country: Russia (ru)
PostPosted: Thu May 03, 2012 1:31 am 
Bottom of Page Back to top
Entity System methods are not multithreaded. Use multithreading on your risk :)

_________________
Ivan Efimov.
Founder of NeoAxis Group Ltd.
News on NeoAxis Blogs, Twitter, Google+, Facebook, VK
 
 Profile  
Indie License
Indie License
User avatar

Joined: Sat May 22, 2010
Posts: 403
Location: London, UK
Country: United Kingdom (uk)
PostPosted: Thu May 03, 2012 8:26 am 
Bottom of Page Back to top
@Ivan: is there any future plans to implement tpl inside the engine?

_________________
Currently working on a first person space rpg - Test Server for Demo SDK 1.32 (Village Map): 199.195.214.95
 
 Profile  
Administrator
Administrator
User avatar

Joined: Wed Oct 11, 2006
Posts: 4908
Location: Kazan
Country: Russia (ru)
PostPosted: Thu May 03, 2012 10:45 am 
Bottom of Page Back to top
Mutithreading for Entity Entity, Map System is impossible. If we will make all methods of libraries mutithreaded, then this will works slow.
Good idea to add multithreaded features for rendering, physics components. This components is biggest for CPU time.
For entity classes it is possible to make multithreaded components like path finding system, which can be calculated separately from world.

MaxWolf's way is wrong way. This can't add significant benefits, and also will add potential bugs.

_________________
Ivan Efimov.
Founder of NeoAxis Group Ltd.
News on NeoAxis Blogs, Twitter, Google+, Facebook, VK
 
 Profile  
Indie License
Indie License
User avatar

Joined: Sat May 22, 2010
Posts: 403
Location: London, UK
Country: United Kingdom (uk)
PostPosted: Thu May 03, 2012 11:37 am 
Bottom of Page Back to top
ok good news. From what i saw it using this method produced huge benefits to my project. i will post a demo.

also with Maxwolfs example here, he was trying to show that this is a bad use of multi threading tpl... in other areas it has helped me hugely,

_________________
Currently working on a first person space rpg - Test Server for Demo SDK 1.32 (Village Map): 199.195.214.95
 
 Profile  
Administrator
Administrator
User avatar

Joined: Wed Oct 11, 2006
Posts: 4908
Location: Kazan
Country: Russia (ru)
PostPosted: Thu May 03, 2012 11:44 am 
Bottom of Page Back to top
Quote:
also with Maxwolfs example here, he was trying to show that this is a bad use of multi threading tpl... in other areas it has helped me hugely,

Crazy scientist in action! :)

_________________
Ivan Efimov.
Founder of NeoAxis Group Ltd.
News on NeoAxis Blogs, Twitter, Google+, Facebook, VK
 
 Profile  
Source License
Source License

Joined: Sun Sep 14, 2008
Posts: 689
Country: Norway (no)
PostPosted: Thu May 03, 2012 3:09 pm 
Bottom of Page Back to top
@Nethrix
Quote:
From what i saw it using this method produced huge benefits to my project. i will post a demo.


Interesting. Is this for methods executed at OnTick speed or faster? I mean, with methods like "CalculateProceduralWorld(...)" obviously we get huge benefits, but with smaller methods running often I cannot find much improvement...

 
 Profile  

Joined: Tue Nov 01, 2011
Posts: 59
Country: Cuba (cu)
PostPosted: Thu May 03, 2012 8:41 pm 
Bottom of Page Back to top
betauser wrote:
If we will make all methods of libraries mutithreaded, then this will works slow.


That's right. Some time is useful comment the concurrency strategy of the API methods, e.g.
- Sequential: must not be called from different threads or you can expect incorrect behavior
- Concurrent: can be called effectively from simultaneous threads because they were designed for that
- Guarded: they are thread-safe, but only one thread can execute it at a time, the rest will be blocked
What betauser means is that adding locks to some methods for thread-safety are not going to improve performance, on the contrary it will make the whole engine slowly because sequential calls will have now the unnecessary overhead of locks. And if you don't need multi-threading you mustn't have to pay for it. Good candidate methods for being guarded are those that are called very infrequently like a Connect function. If the majority of the functions of a class are concurrent, then adding locks for the rest if they are called infrequently will make the whole class thread safe.

betauser wrote:
Good idea to add multithreaded features for rendering...


On the other hand there are the things that can be done to internally parallelize some taks or create concurrent methods. One of they as beatuser point is rendering, also physics though they are more complicated. Also background loading of adjacent regions in parallel while user move for a large world and unloading unused regions. This gave the impression to the user that a very big world is loaded in memory when the true is there is only an small region. This avoid the uncomfortable loading windows. Also many other internal algorithms can be parallelized including external libraries like network and file access. These are only some ideas. The sum of this things will make a better engine for the moment.

 
 Profile  
Moderator
Moderator
User avatar

Joined: Wed Nov 28, 2007
Posts: 1526
Location: Vancouver, British Columbia
Country: Canada (ca)
PostPosted: Thu May 03, 2012 9:17 pm 
Bottom of Page Back to top
Smaller methods would offer no speed increase, the smaller the method actually would add additional overhead. Ivan in correct and so is Goto10. I did not post this as some silver bullet solution to make all your projects go faster but more as a general "this is what I am up to" in case others are interested in messing around with it also.

Personally I use TPL in my archive system, our game projects don't use the ZipArchive.dll because on Mono it takes forever to seek on them. Instead we use a huge binary blob of data and just stream what we need from it. Using TPL on this system helped a great deal.

_________________
I'm NOT paid or work for NeoAxis. I only help moderate these boards. My opinions are not the opinions of NeoAxis Group LTD.
 
 Profile  

Joined: Tue Nov 01, 2011
Posts: 59
Country: Cuba (cu)
PostPosted: Thu May 03, 2012 9:54 pm 
Bottom of Page Back to top
Sorry again,

A concurrent method, no matter its size, even a single line instruction that can be executed in parallel utilizing available hardware is a great plus.

Remember this is not about the size, but about how often operation takes place.

 
 Profile  
Source License
Source License

Joined: Sun Sep 14, 2008
Posts: 689
Country: Norway (no)
PostPosted: Fri May 04, 2012 12:31 am 
Bottom of Page Back to top
Quote:
A concurrent method, no matter its size, even a single line instruction that can be executed in parallel utilizing available hardware is a great plus.


Per definition that sounds correct, but the emphasis should be on if you are able to get more work done per time. For example a short loop that gets called very often with little content, usually performs better single threaded, even if you throw unlimited amount of cores against it when utilizing TPL methods. Reason: In case of foreach vs Parallel.Foreach the later can cost more to just initialize in terms of required cycles than the former need to complete and you will simply never catch up.

 
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  NeoAxis Forum Index » Additional » Programming
Go to page 1, 2  Next

Jump to:  

Next topic
Previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum


All times are UTC




Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group