|
|

Last Updated:
03
/
18
/
2009
FAQ: PhysX Tips and Tricks
(Click here to return to the PhysX homepage)
Introduction
The following collection of PhysX Tips and Trips was collected through
internal and external feedback/questions. The collection is written in a
free-format (blog like style) since several people contributed to it.
Tip Topics:
- Fluids simulate collisions by loading “packets” of collision
data. The size of these packets is determined by the KernelRadiusMultiplier,
the RestParticlesPerMeter and the packetSizeMultiplier. In order to
calculate the packet size you can do the following:
PacketSize = KernelRadiusMultiplier *
(1/RestParticlesPerMeter) *
packetSizeMultiplier
- These packets are loaded into Broad Phase which maintains a list of all
of the shapes needed for collision. The problem is that the broadphase
list can be huge and can e.g. grow each time a building is destroyed.
The broadphase insert of the packets is done for each axis x,y,z and for
min and max points on the bounding box so this can get expensive fast if
there are a large number of shapes in broadphase. If this results in a
major performance bottleneck, the user can disable fluid collision for
small objects (e.g. based on a threshold).
- The current particle simulation makes use of a configurable packet size
per fluid emitter. The packet size defines the size of a spatial packet
in which particles are simulated. If there are too few particles in a
packet or too many particles in a packet the particle performance
decreases considerably. E.g. in the Warmonger game, since fluids used
very small packet sizes (something like 8) an enormous amount of packets
were used (like 700 for AD-Siege_1) and several were being deleted and
created each frame. This was causing broadphase to take a long time and
as a result the CPU time went up and slowed down the game. The solution
is to increase the packet size for the fluids by tweaking the values in
the equations. It was suggested to leave the packetSizeMultiplier at 16
and tweak the KernelRadiusMultiplier instead. The following settings
gave us a packet size of 16 and dropped the number of packets down to
around 30 rather than 700.
- packetSizeMultiplier = FPSM_16
- restParticlesPerMeter = 5.0
- kernelRadiusMultiplier = 5.0
- motionLimitMultiplier = .9
- collisionDistanceMultiplier = .12
- In order for two different fluids to share the same packets and hence the
same collision geometry they must have the following matching parameters.
We should try to match these across as many fluids as possible to achieve
good particle performance.
- packetSizeMultiplier
- restParticlesPerMeter
- kernelRadiusMultiplier
- motionLimitMultiplier
- collisionDistanceMultiplier
- Always choose the maximum particle count for the fluid particle system
wisely to ensure that fluids won't exceed the maximum particle number and
start slowing down the whole game during certain interactions.
- In order to maximize the particle performance, a FIFO can be used to
limit the maximum number of particles and automatically reuse the oldest
particles for newly spawned once. This can be achieved by defining a
particle reserve euqal to the maximum number of particles which can be
generated in a frame. E.g. if the user wants to use a maximum of 1000
active particles and want to generate up to 100 particles each frame, the
fluid max particles should be set to 1100, the reserve
(NXFluid::setNumReserveParticles()) to 100 and the NX_FF_PRIORITY_MODE
flag should be set. Now if adding particles would cause more than 1000
particles to be active, the oldest particles will be prematurely
deactivated. See Guide->Fluids->Usage->Particle Priority Mode in the
PhysX SDK documentation.
- Cloth behavior is very highly dependent on the timestep.
The smaller the timestep, the fewer iterations are required to get
decent behavior.
- Don’t use variable simulation timesteps with cloth because behavior
will be completely erratic.
- Cloth behavior is dependent on the spacing of the vertices.
A cloth mesh with an area of lots of vertices and small
triangles, and other areas with few vertices and large triangles,
will result in inconsistent behavior in the various parts of the
cloth. Therefore, try to keep a regular spacing of
vertices in your cloth mesh, even for irregular shapes.
- Fast-moving vertices have very long simulation times. Increasing the
“relativeGridSpacing” parameter might help a bit if
this occurs. The validBounds parameter also can help limit the exposure
to a single frame, since after one frame the vertex will be outside of
the bounds. However, validBounds only works in limited circumstances
where one can keep the cloth confined to one area of the level.
There is no SDK parameter to limit velocity, so ultimately one has to
design their level to ensure that no large forces can affect the
cloth enough to move any vertices at a high velocity.
- Manually moving a vertex (using setPosition for example) to a distant
location will have the same bad effects as a fast moving vertex. So
don’t do that.
- Ortho bending is more expensive computationally than normal bending and
tends to give worse behavior, so don’t use it.
- Self collision is done based on vertices, not triangles, so don’t
expect self collision to work perfectly for meshes with triangles much
larger than the cloth thickness.
- In general the use of additional physics for networked games requires
network synchronization between the different clients. This is especially
required for game play physics, e.g. rigid bodies which can block an
opponent. There are different approaches to network synchronization, e.g.
the server-client based model or an improved authority based model.
- In case of the server-client based model, the server collects information
from the clients, performs the physics simulation and updates the
clients. Since this approach can cause huge latency lags, the client in
general performs its own physics simulation (client side prediction) and
updates it’s internal state with the parameters received from the
server. In general this works fine. In case the client based physics
simulation e.g. of a rigid body is too far off from the server based
simulation, the client snaps the rigid body back into proper location. If
the delta is small enough, this adjustment can also be performed
gradually. This approach works fairly well as long as the amount of
synchronized events is relatively small.
- In case of a large synchronized rigid body world, an authority scheme can
yield much better results in regards to synchronization problems
(snapping). In the authority managed model, the server keeps track of
game updates (default authority). In order to avoid snapping, the client
can take authority over objects he interacts with. The server in this
case accepts state changes for those objects from the client. If
required, the server can overwrite the updates from the client to make
sure that all state changes are properly in sync.
- Most of the non rigid body simulation can be simulated completely on the
client or only requires state changes (e.g. particle emitter state and
position).
- Small rigid bodies: as long as they rigid bodies are not game play
affecting, they don’t need to be synced.
- Particles: As long as the particles are used purely as effects particles,
no synchronization is required and the simulation can run completely on
the client. If the particles can cause a major change in visibility, the
emitter location as well as the emitter rate can be synced if required.
This will ensure that all clients have to deal with almost the same
visibility problems.
- Cloth: As long as cloth is not affecting gameplay, a client based
simulation is enough. Similar to particles, there can be slight visible
differences between clients, but overall those should be neglectable
since cloth is moving all the time. In case of cloth tearing, the torn
cloth pieces (vertices) can be synced to ensure that the torn cloth
pieces look similar (but not the cloth simulation itself is
synchronized). In general it is not required to run the cloth simulation
on the server, but in case of metal cloth with longer lasting visible
changes it is of advantage to run the metal cloth on the server to ensure
proper visibility propagation to all clients. The client metal cloth
simulation might be a little off compared to the server based simulation,
but in general this should be neglectable.
- Sort contacts along ray and only keep the first one
- Recommended flag settings:
- NX_WF_AXLE_SPEED_OVERRIDE : false
- NX_WF_EMULATE_LEGACY_WHEEL : false
- NX_WF_INPUT_LAT_SLIPVELOCITY : false
- NX_WF_CLAMPED_FRICTION : true
- Tricks against toppling vehicles:
- Set vehicle's center of mass low
- Use angular damping on the vehicle
- Problem of jumping vehicle when driving over steps:
- Smaller wheels work better with the wheelshape model
- Use a convex to simulate the wheel's collision shape and move it with
the suspension, so it will collide with the step and lift the vehicle
before the raycast hits the step. (Note: updating a compound actor
every frame can be expensive!)
- A lot of tuning is always necessary
- Keep the mass ratio between objects small
- Mass ordering in joint chains/trees is important (heavy-middle-light => ok, light-heavy-light =>bad)
- Keep inertia tensors symmetric
- Check iteration count
- See Joints*
- Good to have same mass for all body parts
- Default solver iterations of 4 is not much (8 is better)
- Tune ragdolls without CCD and projection
- Use drives (not too strong) to prevent bad situations
- Skin width needs to be right (collision geometry has to be bigger than
graphics representation)
- Projection:
- Can help with strong forces (explosions, ...)
- Use with care
- Problems:
- Order of projection is not known
- *** Can break ccd
- Use a short animation to init the simulation of the ragdoll
- Avoid soft limit on linear motion when the joint actors has fixed
distance like head-neck joint
- For ragdolls, try to use the following structure (from head to toes):
-
- spine2
- pelvis
- root
- L-leg1 R-leg1
- Use a single vertex as CCD skeleton for small objects
- Make sure skeleton is smaller than the normal collision mesh, so the
normal collision algorithm can do its work
- CCDMotionThreshold: smallest diameter of object (maybe half, but then
response can be wrong)
- Trade off fixed timesteps:
- A lot of substeps: physics can be the bottleneck
- Few substeps: moon gravity effects
- Switch to variable substeps in certain cases
- Problems with determinism and behaviour changes (e.g. stiffness of
softbodies and cloth, different effect of applied forces, ...)
- It's best to call simulate with the exact multiple of maxTimestep
- Size of timestep has influence on stability (e.g. important for ragdolls)
- Some values from experience:
- 30Hz (33.3ms) for simple collision
- 60Hz (16.6ms) for stacks
- For UT3 we pre-calculated all possible paths while ignoring dynamic
blocking objects. Then at runtime when a dynamic object fell asleep we
would gather all the paths that were near the object (AABB query of the
octree) and then did a capsule sweep of all nearby paths against the
dynamic object. If there was an intersection we marked the path as
disabled and added the blocking object to a list of actors that are
blocking said path. When an object wakes up again we remove it as a
blocker from all associated paths, if any of the paths then have no
blockers we enable that path again. This technique works most of the time
but its prone to many edge cases... for instance if you have lots of
dynamic objects in the scene its easy to get "islands" of pathnodes which
will result in bots getting stuck, even if there is a seemingly obvious
way for the AI to escape, if there isn't a pre-calculated path, it won't
take it. The result is you may end up creating a much denser path mesh
than you normally would, which results in higher memory costs (specially
since each node contains a list of blocking actors), and higher path
finding cost at runtime. But it's easily implemented in UE3 based games
without changing the engine.
- For best results its recommended that you avoid making very small
objects block. Stick to the rather few large objects in your scene or
static breakables... This helps prevent something like a soda can from
confusing the AI.
- Also, if an AI can't "see" and object it will likely run into it and
get stuck... so for any objects that can't block a path, players/bots
should not collide against it... so for those small objects you have to
just let the AI go through it.
- We also built some levels for UT3 that had breakable walls and used the
technique above for path finding around breakable walls. But this isn't
the behavior you want exactly all the time. Real players will knowingly
shoot objects they know are breakable in order to gain a short cut. So we
wanted the same behavior in UT3... but we also didn't want the bots to go
through and break all the walls in the first 5 seconds of the game. The
best solution (as in it was easy for level designers to control) we came
up with was just place some trigger volumes nearby breakable objects and
walls along paths where it was likely the bot might want to go through
said object. When the trigger was hit we would signal (in UE3 this is
done all in kismet) the bot to attack a specific object. The result of
this is bots would periodically shoot at breakable walls and go through
them without them getting overly excited and breaking everything. This of
course is NOT an ideal solution! But it's what worked for us in UT3
without modifying the engine.
- A better approach would be to give paths a heuristic as to how
"difficult" it is to cross. For instance a normal path with no
occluders might be 0.0, whereas a path that is blocked by an
unbreakable object might be 1.0, and a path occluded by a breakable
object might be 0.5. This will cause the bots to avoid breaking through
walls unless it significantly reduces their travel time. Then for the
actual shooting we simply gather a list of occluding objects along its
currently chosen path and as the objects come into visible range the AI
will start shooting at them.
- To ensure repeatable/automatic performance profiling the user is
encouraged to use the UE3 flythrough performance system. Internally we
used the same mechanism coupled with scripts to extract the important
timing information from the UE3 engine. This provides a quick overview of
bottlenecks within the game.
- Additional PhysX specific information can be extracted using the
AGPerfmon tool, which allows more in depth investigation of
bottlenecks.
- AGPerfHUD was useful to find hotspots in real time. The tool allows the
user to see performance bootlenecks in real time while playing a
particular level.
- For rendering bottlenecks we highly recommend NVPerfHUD, which is very
useful to identify and fix rendering bottlenecks by simplifying materials
and tweaking distance culling values.
- While profiling for performance it is important to consider the broken
state, especially if a destructible object was used as an occluder.
(Click here to return to the PhysX homepage)
|
|
  |