The purpose of this article is to discuss how to incorporate GPU accelerated effects into a project without negatively impacting the overall performance and framerate of a game.


The key to making this work is to run all visual effects independently of the main game thread and rendering pipeline.  While the concept is simple, as always, the devil is in the details and there are a lot of things to consider to make this work properly.

This presentation will be from the perspective of a developer planning to use the PhysX 3.x SDK and optionally the APEX SDK as well.

The following is a short video clip showing GPU particle effects running inside of the game ‘Planetside 2’ from Sony Online Entertainment.  This video clip demonstrates all of the techniques discussed in this article working in a current generation MMO game engine.

Most game integrations create a single physics scene (PxScene) where the bulk of the physics and collision detection processing occurs.  This scene will typically contain the world representation and step each frame to correspond with the amount of time which has passed since the last one.  Typically a game engine will enforce fairly strict requirements on how stepping the simulation is to occur.  Consider the example where an application is using a fixed time step of 60hz but the current frame rate is at 30hz.  In this example the physics will be simulated twice during each render frame to make sure that the physical simulation is caught up with ‘real time’.  Many games today feature a multiplayer component, so keeping the simulation time extremely close to actual physical time is especially important to prevent the various clients from diverging appreciably.  For games which experience highly varying frame rates, such as an MMO, this becomes even more critical to keep all of the connected players resolved within the same frame of reference.  
 

The Well of Despair

Care has to be taken to avoid the ‘well of despair’; a state where more CPU is consumed to keep up with ‘real-time’ which then, in turn, causes yet even more delay to be introduced.  This can continue up the point where the simulation time cannot ‘keep up’ with real-time.  This creates a kind of death-spiral where more and more time is spent simply running the simulation in sub-steps trying to catch up to the real-world clock.  Eventually something has to give, so tuning the amount of CPU consumed by the physics simulation even during low frame-rate situations is a challenge many game designs have to take into account.  Using continuous-collision-detection and a variable time step is often a practical way to help deal with this problem.  Some game engines use this approach, some do not, and some use a hybrid solution which incorporates a little bit of both within certain heuristic limits.


The PhysX SDK itself enforces no specific mechanism for stepping the simulation over time; this decision is left up to the application; with each approach having certain advantages and disadvantages that need to be taken into consideration.
 

Simulation Timing Requirements

The first time a game developer wants to add GPU accelerated particle effects to their title their natural instinct will be to create the particle simulation in the same primary scene as everything else.  This makes sense, since the particles need to interact with the rest of the game environment and the primary scene already has all of that set up.


The problem with this approach is that most games require a specific synchronization point between the primary physics scene and a single logical frame for both game logic and graphics rendering.  This synchronization point is used to allow all game objects in the world to get their graphics transforms updated with their current dynamic state.

The problem with trying to run GPU physics effects in the primary scene is that they will inherit the same timing and synchronization requirements as everything else in that scene.  Let’s say the primary scene uses a 60hz fixed time-step and the game is running at 30hz; causing the simulation to occur twice in one frame.  Since the GPU effects are in the same scene they too would get simulated twice in the same frame; which is a use case we would never want to occur.  There should never be more than a single synchronization point per frame with GPU effects and, in fact, the goal is to introduce no explicit synchronization point at all.   Since GPU effects are just that, ‘effects’ which typically only exist in the game environment to enhance a visual element and would never directly affect gameplay, they do not have the same timing requirements as the rest of the dynamic objects in the scene.

What is particularly problematic with this approach is that if the game begins sub-stepping which, in turn, causes the GPU simulation to sub-step, then rather than getting all of that extra GPU work ‘for free’, instead a potentially massive delay can be introduced while the simulation sits around waiting for multiple substeps of GPU work to complete.  This could then potentially cause the ‘well of despair’ to raise it’s ugly head and send the whole simulation into a tailspin trying to ‘catch up’.  These issues are exacerbated by games which have highly varying frame rates.  For games which never drop below 60hz, under any condition, then most of these issues are not as big of a concern.  However, for large scale MMO’s where the frame rate can vary dramatically depending on whether there is one dynamic object in the scene or thousands, this can be a real problem.

Another point to consider is that using a hard synchronization point to force CUDA compute work to be completed by the end of each graphics frame can create additional delays. Code which runs fine in an environment with limited graphics may exhibit larger and more variable latency under heavy graphics load.
 

The Mirror Scene Class

The first step to remove these dependencies is to create an additional physics scene (PxScene) for all of the GPU effects simulation to take place.  This scene will not perform any sub-stepping (instead using a variable time step) and should use a minimum and maximum simulate time (for example, no greater than 60hz, no less than 10hz).  While it may be extremely important for your primary scene to ‘keep up with real-time’, that would never be the case for the effects scene. Let’s consider the pathological case where the frame rate has dropped to something horrible like say five frames per second.  Rather than passing a simulation time of 1/5th of a second to the effects scene, you might pass 1/15th of a second instead.  The only visual artifact that would arise from this is that the GPU particle effects might briefly appear as if they were simulating in slow motion; however if your baseline frame-rate is 5fps, you probably have much greater concerns than just some slowed down particle effects.


The first problem you will encounter when you create your effects scene, is that none of the geometry for your environment will exist within it.  You may be able to create GPU effects but they will have nothing to collide with until you also have a copy of the world geometry inside it.  Depending on your game environment you might accomplish this in one of two ways.  First, you might simply insert each physics object you create in the primary scene also into the effects scene as well.  Another approach is to mirror just a subset of the primary scene into the effects scene.  

This article provides an example implementation of a helper class called ‘NxMirrorScene’ which automates the process of mirroring the contents of a primary physics scene into a secondary effects scene.  

To create the ‘NxMirrorScene’ class you pass the following parameters.
 

NxMirrorScene *createMirrorScene(physx::PxScene &primaryScene,

  physx::PxScene &mirrorScene,

  NxMirrorScene::MirrorFilter &mirrorFilter,

physx::PxF32 mirrorStaticDistance,

  physx::PxF32 mirrorDynamicDistance,

  physx::PxF32 mirrorRefreshDistance);


The first two parameters specify the primary physics scene and the effects scene which will be used to mirror a portion of the primary scene.  Next, the application provides a callback interface class called ‘MirrorFilter’.  The MirrorFilter class is used to decide which actors and shapes should or should not be mirrored into the effects scene.  This callback class also provides an opportunity for the application to make modifications to the actors and shapes which are to be mirrored prior to their being added to the effects scene.  Finally, there are three heuristic parameters passed which define the radius to mirror static objects, the radius to mirror dynamic objects and a refresh distance.


Each frame, when the primary scene has completed it’s simulation step, the application calls the method ‘synchronizePrimaryScene’ passing in the current camera/origin location relative to the where objects should be mirrored.  Each time the camera has moved past the user specified ‘mirrorRefreshDistance’ the mirror scene class will update two trigger sphere shapes.  Whenever new objects enter or leave these spheres (reflecting the radius to mirror both static and dynamic actors) the mirror scene class will be notified using the trigger event callback and create events to mirror those objects into the effects scene.  This is all done in a thread safe way without creating any dependencies between the two scenes.  In addition to updating the trigger shapes and processing trigger events, the call to ‘synchronizePrimaryScene’ will also create a thread safe command list containing the current position of each mirrored dynamic object.  The next time the effects scene simulation step completes, this command list will be processed so that the kinematically mirrored dynamic actors are revised to their current primary scene locations.

The effects scene should be simulated from a completely separate thread than the primary scene; to achieve as much parallelism as possible.  It also should be stepped in a non-blocking fashion so that there are no artificially introduced synchronization points.  When the effects scene has finished a simulation step the application should call the mirror scene method ‘synchronizeMirrorScene’.  At this point, any objects which need to be inserted or removed from the mirrored scene; which were previously posted during the ‘synchronizePrimaryScene’ call are unspooled and processed from the separate thread.

This article includes sample code demonstrating how to implement a mirrored physics scene.  This sample is provided more as a learning tool than as source you should just put directly into your engine.  The source is provided as the following four files which compile against the PhysX 3.2 SDK.  Some minor changes may be required to get it to compile against different versions of the SDK.

MirrorScene.cpp : Contains the implementation of the NxMirrorScene class.
MirrorScene.h : Contains the header file for the implementation class.
NxMirrorScene.h : This is the public header file defining the pure virtual NxMirrorScene abstract interface.
PlatformConfig.h : This header file contains macros which define system specific data types for memory allocation, mutex, and container classes.  This should be modified to reflect your own platform requirements.  The default header provided is configured for the Windows platform.

The implementation of the mirror scene class makes heavy use of an STL hash map container to rapidly translate between the base actor and shape pointers in the primary scene to their corresponding versions in the mirror scene.

An important note for the developer; this helper class assists in the process of mirroring actors between the primary scene and the effects scene only.  It does not, however, handle the mechanism for releasing triangle meshes, convex hulls, and heightfields.  This logic must be taken into account by your own game engine.  Imagine the use case where you release an actor on the primary scene and then immediately release a triangle mesh which this actor had been using.  If you are using the mirror scene class, this use case would cause undefined behavior, most likely a crash; because the mirrored scene actor would still have a reference to that triangle mesh.  If you want to release triangle meshes, convex hulls, or heightfields, you will need to put them on a queue ‘to be released’ once the mirror scene has had a chance to be simulated and the mirrored actor has had a chance to be deleted as well.

Removing Dependencies

The following steps should be taken to make sure there are no dependencies between the primary scene and the effects scene:

  • The primary scene should not use a CudaContextManager since it won’t be doing any GPU accelerated effects.
  • The CudaContextaManager used by the effects scene should not be shared with any other object.
  • As previously discussed, the effects scene should use a single ‘simulate’ call with a variable time step.  If should have a minimum and maximum time step duration.  Something along the lines of no greater than 60hz and no less than 10hz would work well.
  • The effects scene should be simulated in a non-blocking manner.  The application should call ‘checkResults’ to see if the simulation is complete and only synchronize the mirrored scene, invoke fetchResults, and fire off the next simulate step at that time.  Whether or not the simulate step on the effects scene takes more or less time than the primary scene is irrelevant.  In fact, this is the key element that makes this technique work.  We consider the effects scene to be done whenever it is done, and if it takes more or less time than the primary logical render frame it doesn’t matter.
  • All access to the effects scene should happen from a different thread than the primary scene or rendering threads.
  • The application should disable continuous collision detection for the effects scene; as all dynamic objects in the effects scene are simply kinematically moved copies from the primary scene and this feature is unnecessary.

Rendering Considerations

The previous section focused on the need to run the effects simulation entirely independent from the primary game loop which typically consists of the pattern ‘simulate, get transforms, render world’.  Obviously the GPU simulated effects still need to be rendered every frame and since the underlying simulation runs independently of the rendering pipeline you might wonder how this is to be done.

To implement the effects rendering pipeline your engine must support a double-buffered and thread safe approach.  Each time the effects scene simulate step completes, the current state of the simulated data should be copied to a relevant rendering resource (vertex buffer, texture buffer, transform buffer, etc.)  Two buffers are created for each relevant render resource and the effects scene thread is always writing the results to one buffer while the rendering thread is always displaying the contents of the previous frames simulated results.

An important note here is that typically this rendering data will end up getting copied to something like a locked direct3d vertex buffer and, therefore, your graphics pipeline and wrapper classes must provide thread-safe access to these resources.

Rendering Artifacts

One rendering artifact which arises from using these techniques is due to the fact that the kinematic mirrored objects the GPU effects are interacting with are always going to be slightly time delayed from their primary scene versions.  In most cases this is not an issue.  If particles hit dynamic objects, even if they are slightly time delayed, it is rare that you will be able to notice any discontinuity.  However, if particles come to rest on a moving dynamic object, this slight time separation between the transform of the primary scene graphics objects and the version the particles are simulated against can create a visual artifact which can appear as a jittering effect; this occurs when the dynamic object in the primary scene moves and it takes a frame for the particles to react to that motion.

Below is an example of APEX GPU turbulence particles interacting with dynamic objects


Summary

This article has presented a series of techniques which, combined, allow a developer to simulate a great deal of GPU particle effects while minimizing the frame rate impact on the rest of the game.  These techniques require a certain amount of complexity by implementing a mirrored effects scene, double buffering of rendering resources, and a lot of thread safe coding.  However, the results can be worth the extra effort when you can run simulations comprising hundreds of thousands of particles interacting with your game environment while experiencing minimal impact on overall frame rate even in situations where a game engine might be under heavy load.

Notes

If you would like to discuss this blog post, please join us on this forum thread on our new DevTalk forums.