SLI Zone
NVIDIA.com Developer Home

Last Updated: 07 / 18 / 2008

FAQ: PhysX Tips and Tricks

Introduction

The following collection of PhysX Tips and Trips was collected through internal and external feedback/questions. The collection is written in a free-format (blog like style) since several people contributed to it.

Tip Topics:

Particles

  • Fluids simulate collisions by loading “packets” of collision data. The size of these packets is determined by the KernelRadiusMultiplier, the RestParticlesPerMeter and the packetSizeMultiplier. In order to calculate the packet size you can do the following:
        PacketSize = KernelRadiusMultiplier * (1/RestParticlesPerMeter) * packetSizeMultiplier
  • These packets are loaded into Broad Phase which maintains a list of all of the shapes needed for collision. The problem is that the broadphase list can be huge and can e.g. grow each time a building is destroyed. The broadphase insert of the packets is done for each axis x,y,z and for min and max points on the bounding box so this can get expensive fast if there are a large number of shapes in broadphase. If this results in a major performance bottleneck, the user can disable fluid collision for small objects (e.g. based on a threshold).
  • The current particle simulation makes use of a configurable packet size per fluid emitter. The packet size defines the size of a spatial packet in which particles are simulated. If there are too few particles in a packet or too many particles in a packet the particle performance decreases considerably. E.g. in the Warmonger game, since fluids used very small packet sizes (something like 8) an enormous amount of packets were used (like 700 for AD-Siege_1) and several were being deleted and created each frame. This was causing broadphase to take a long time and as a result the CPU time went up and slowed down the game. The solution is to increase the packet size for the fluids by tweaking the values in the equations. It was suggested to leave the packetSizeMultiplier at 16 and tweak the KernelRadiusMultiplier instead. The following settings gave us a packet size of 16 and dropped the number of packets down to around 30 rather than 700.
    • packetSizeMultiplier = FPSM_16
    • restParticlesPerMeter = 5.0
    • kernelRadiusMultiplier = 5.0
    • motionLimitMultiplier = .9
    • collisionDistanceMultiplier = .12
  • In order for two different fluids to share the same packets and hence the same collision geometry they must have the following matching parameters. We should try to match these across as many fluids as possible to achieve good particle performance.
    • packetSizeMultiplier
    • restParticlesPerMeter
    • kernelRadiusMultiplier
    • motionLimitMultiplier
    • collisionDistanceMultiplier
  • Always choose the maximum particle count for the fluid particle system wisely to ensure that fluids won't exceed the maximum particle number and start slowing down the whole game during certain interactions.
  • In order to maximize the particle performance, a FIFO can be used to limit the maximum number of particles and automatically reuse the oldest particles for newly spawned once. This can be achieved by defining a particle reserve euqal to the maximum number of particles which can be generated in a frame. E.g. if the user wants to use a maximum of 1000 active particles and want to generate up to 100 particles each frame, the fluid max particles should be set to 1100, the reserve (NXFluid::setNumReserveParticles()) to 100 and the NX_FF_PRIORITY_MODE flag should be set. Now if adding particles would cause more than 1000 particles to be active, the oldest particles will be prematurely deactivated. See Guide->Fluids->Usage->Particle Priority Mode in the PhysX SDK documentation.

Deformables (Cloth)

  • Cloth behavior is very highly dependent on the timestep. The smaller the timestep, the fewer iterations are required to get decent behavior.
  • Don’t use variable simulation timesteps with cloth because behavior will be completely erratic.
  • Cloth behavior is dependent on the spacing of the vertices. A cloth mesh with an area of lots of vertices and small triangles, and other areas with few vertices and large triangles, will result in inconsistent behavior in the various parts of the cloth. Therefore, try to keep a regular spacing of vertices in your cloth mesh, even for irregular shapes.
  • Fast-moving vertices have very long simulation times. Increasing the “relativeGridSpacing” parameter might help a bit if this occurs. The validBounds parameter also can help limit the exposure to a single frame, since after one frame the vertex will be outside of the bounds. However, validBounds only works in limited circumstances where one can keep the cloth confined to one area of the level. There is no SDK parameter to limit velocity, so ultimately one has to design their level to ensure that no large forces can affect the cloth enough to move any vertices at a high velocity.
  • Manually moving a vertex (using setPosition for example) to a distant location will have the same bad effects as a fast moving vertex. So don’t do that.
  • Ortho bending is more expensive computationally than normal bending and tends to give worse behavior, so don’t use it.
  • Self collision is done based on vertices, not triangles, so don’t expect self collision to work perfectly for meshes with triangles much larger than the cloth thickness.

Network Synchronization

  • In general the use of additional physics for networked games requires network synchronization between the different clients. This is especially required for game play physics, e.g. rigid bodies which can block an opponent. There are different approaches to network synchronization, e.g. the server-client based model or an improved authority based model.
  • In case of the server-client based model, the server collects information from the clients, performs the physics simulation and updates the clients. Since this approach can cause huge latency lags, the client in general performs its own physics simulation (client side prediction) and updates it’s internal state with the parameters received from the server. In general this works fine. In case the client based physics simulation e.g. of a rigid body is too far off from the server based simulation, the client snaps the rigid body back into proper location. If the delta is small enough, this adjustment can also be performed gradually. This approach works fairly well as long as the amount of synchronized events is relatively small.
  • In case of a large synchronized rigid body world, an authority scheme can yield much better results in regards to synchronization problems (snapping). In the authority managed model, the server keeps track of game updates (default authority). In order to avoid snapping, the client can take authority over objects he interacts with. The server in this case accepts state changes for those objects from the client. If required, the server can overwrite the updates from the client to make sure that all state changes are properly in sync.
  • Most of the non rigid body simulation can be simulated completely on the client or only requires state changes (e.g. particle emitter state and position).
  • Small rigid bodies: as long as they rigid bodies are not game play affecting, they don’t need to be synced.
  • Particles: As long as the particles are used purely as effects particles, no synchronization is required and the simulation can run completely on the client. If the particles can cause a major change in visibility, the emitter location as well as the emitter rate can be synced if required. This will ensure that all clients have to deal with almost the same visibility problems.
  • Cloth: As long as cloth is not affecting gameplay, a client based simulation is enough. Similar to particles, there can be slight visible differences between clients, but overall those should be neglectable since cloth is moving all the time. In case of cloth tearing, the torn cloth pieces (vertices) can be synced to ensure that the torn cloth pieces look similar (but not the cloth simulation itself is synchronized). In general it is not required to run the cloth simulation on the server, but in case of metal cloth with longer lasting visible changes it is of advantage to run the metal cloth on the server to ensure proper visibility propagation to all clients. The client metal cloth simulation might be a little off compared to the server based simulation, but in general this should be neglectable.

Wheelshapes / Vehicles

  • Sort contacts along ray and only keep the first one
  • Recommended flag settings:
    • NX_WF_AXLE_SPEED_OVERRIDE : false
    • NX_WF_EMULATE_LEGACY_WHEEL : false
    • NX_WF_INPUT_LAT_SLIPVELOCITY : false
    • NX_WF_CLAMPED_FRICTION : true
  • Tricks against toppling vehicles:
    • Set vehicle's center of mass low
    • Use angular damping on the vehicle
    • Problem of jumping vehicle when driving over steps:
    • Smaller wheels work better with the wheelshape model
    • Use a convex to simulate the wheel's collision shape and move it with the suspension, so it will collide with the step and lift the vehicle before the raycast hits the step. (Note: updating a compound actor every frame can be expensive!)
  • A lot of tuning is always necessary

Joints

  • Keep the mass ratio between objects small
  • Mass ordering in joint chains/trees is important (heavy-middle-light => ok, light-heavy-light =>bad)
  • Keep inertia tensors symmetric
  • Check iteration count

Ragdolls

  • See Joints*
  • Good to have same mass for all body parts
  • Default solver iterations of 4 is not much (8 is better)
  • Tune ragdolls without CCD and projection
  • Use drives (not too strong) to prevent bad situations
  • Skin width needs to be right (collision geometry has to be bigger than graphics representation)
  • Projection:
    • Can help with strong forces (explosions, ...)
    • Use with care
    • Problems:
      • Order of projection is not known
      • *** Can break ccd
  • Use a short animation to init the simulation of the ragdoll
  • Avoid soft limit on linear motion when the joint actors has fixed distance like head-neck joint
  • For ragdolls, try to use the following structure (from head to toes):
    • spine2
    • pelvis
    • root
    • L-leg1 R-leg1

CCD

  • Use a single vertex as CCD skeleton for small objects
  • Make sure skeleton is smaller than the normal collision mesh, so the normal collision algorithm can do its work
  • CCDMotionThreshold: smallest diameter of object (maybe half, but then response can be wrong)

Timing

  • Trade off fixed timesteps:
    • A lot of substeps: physics can be the bottleneck
    • Few substeps: moon gravity effects
  • Switch to variable substeps in certain cases
    • Problems with determinism and behaviour changes (e.g. stiffness of softbodies and cloth, different effect of applied forces, ...)
  • It's best to call simulate with the exact multiple of maxTimestep
  • Size of timestep has influence on stability (e.g. important for ragdolls)
  • Some values from experience:
    • 30Hz (33.3ms) for simple collision
    • 60Hz (16.6ms) for stacks

AI

  • For UT3 we pre-calculated all possible paths while ignoring dynamic blocking objects. Then at runtime when a dynamic object fell asleep we would gather all the paths that were near the object (AABB query of the octree) and then did a capsule sweep of all nearby paths against the dynamic object. If there was an intersection we marked the path as disabled and added the blocking object to a list of actors that are blocking said path. When an object wakes up again we remove it as a blocker from all associated paths, if any of the paths then have no blockers we enable that path again. This technique works most of the time but its prone to many edge cases... for instance if you have lots of dynamic objects in the scene its easy to get "islands" of pathnodes which will result in bots getting stuck, even if there is a seemingly obvious way for the AI to escape, if there isn't a pre-calculated path, it won't take it. The result is you may end up creating a much denser path mesh than you normally would, which results in higher memory costs (specially since each node contains a list of blocking actors), and higher path finding cost at runtime. But it's easily implemented in UE3 based games without changing the engine.
    • For best results its recommended that you avoid making very small objects block. Stick to the rather few large objects in your scene or static breakables... This helps prevent something like a soda can from confusing the AI.
    • Also, if an AI can't "see" and object it will likely run into it and get stuck... so for any objects that can't block a path, players/bots should not collide against it... so for those small objects you have to just let the AI go through it.
  • We also built some levels for UT3 that had breakable walls and used the technique above for path finding around breakable walls. But this isn't the behavior you want exactly all the time. Real players will knowingly shoot objects they know are breakable in order to gain a short cut. So we wanted the same behavior in UT3... but we also didn't want the bots to go through and break all the walls in the first 5 seconds of the game. The best solution (as in it was easy for level designers to control) we came up with was just place some trigger volumes nearby breakable objects and walls along paths where it was likely the bot might want to go through said object. When the trigger was hit we would signal (in UE3 this is done all in kismet) the bot to attack a specific object. The result of this is bots would periodically shoot at breakable walls and go through them without them getting overly excited and breaking everything. This of course is NOT an ideal solution! But it's what worked for us in UT3 without modifying the engine.
    • A better approach would be to give paths a heuristic as to how "difficult" it is to cross. For instance a normal path with no occluders might be 0.0, whereas a path that is blocked by an unbreakable object might be 1.0, and a path occluded by a breakable object might be 0.5. This will cause the bots to avoid breaking through walls unless it significantly reduces their travel time. Then for the actual shooting we simply gather a list of occluding objects along its currently chosen path and as the objects come into visible range the AI will start shooting at them.

Profiling

  • To ensure repeatable/automatic performance profiling the user is encouraged to use the UE3 flythrough performance system. Internally we used the same mechanism coupled with scripts to extract the important timing information from the UE3 engine. This provides a quick overview of bottlenecks within the game.
  • Additional PhysX specific information can be extracted using the AGPerfmon tool, which allows more in depth investigation of bottlenecks.
  • AGPerfHUD was useful to find hotspots in real time. The tool allows the user to see performance bootlenecks in real time while playing a particular level.
  • For rendering bottlenecks we highly recommend NVPerfHUD, which is very useful to identify and fix rendering bottlenecks by simplifying materials and tweaking distance culling values.
  • While profiling for performance it is important to consider the broken state, especially if a destructible object was used as an occluder.

PhysX Links at Developer.NVIDIA.Com:

NVISION 08