Wednesday, March 31, 2010

Utils3D.projectVectors Slow?

If you have tried to implement 3D stuff in flash without using some already made library out there then its likely that you are using Utils3D.projectVectors somewhere in your code. This function takes 3d vertexes packed in a single Vector<Float> and outputs them on a second Vector<Float> instance applying a projection and transformation defined by a 4x4 Matrix

It is logical to assume that this function is faster than a manual implementation of a 3D projection, but this doesn't seem to be the case in the current version of Flash Player. 

I implemented the following perspective projection class:

class Projector 
{
public var m : Matrix4A;
public var focusDistance : Float;

public function new()
{
focusDistance = 1000;
}

public inline function projectVecList(
src : flash.Vector < Float > ,
dstVerts : flash.Vector < Float > ) : Void
{
var sindex = 0;
var dindex = 0;
var n = src.length;

while ( sindex < n ) {
var srcx = src[sindex++];
var srcy = src[sindex++];
var srcz = src[sindex++];

var zInv = focusDistance /
(srcx*m.m13 + srcy*m.m23 + srcz*m.m33 + m.m43);

dstVerts[dindex++] =
(srcx*m.m11 + srcy*m.m21 + srcz*m.m31 + m.m41) * zInv;

dstVerts[dindex++] =
(srcx*m.m12 + srcy*m.m22 + srcz*m.m32 + m.m42) * zInv;
}
}
}

Matrix4A represents an affine 4x4 matrix ( which means the bottom row is always 0,0,0,1 ), multiplication between affine matrices can be optimized to skip a lot of multiplications without mentioning that you also save the space of 4 Floats members.

So granted, this isn't exactly what Utils3D projection function is doing.

Utils3D.projection uses the Flash 10 Matrix3D class, which is a full 4x4 matrix ( without the constraint of being affine ). This makes it slower but also gives it a bit of extra power, since it can express projection information inside. I'm not completely in touch with the mathemagics of homogenous coordinates, but I think there are some other useful tricks in there that might make 4x4 matrices worth it in some cases.

What my projector class is doing is first transforming the vertexes using its matrix, and then divide the resulting x and y coords by the resulting z multiplied by focusDistance.

I timed how long it takes to project a vector filled with 100k vertexes with both Utils3D.projectVectors and how long it takes using the Projector class:

Utils3D.projectVectors: 1031ms

Projector.projectVecList: 755ms

Projector is faster than the native function! You might say that this shows nothing since it is not doing exactly the same stuff as Utils3D, but the point is that the same projection algorithm implemented in C/C++ should beat my projector by far.

Also, as Andre Michelle has pointed out in this blog entry, the Utils3D.projectVector api could use a few changes. If the api isnt really faster than writing it yourself then that means that there's no reason to wait for adobe to implement such changes.

Anyway, I never really liked having the vertices packed in a Float vector like required by the Utils3D.projectVectors function. Since I was going to be using my own projection now, I decided to see how performance would be affected if I used Vector<Vec3D> instead of Vector<Float> ( packed vertexes ) as the source vertexes of the projection.

So instead of: [ x1, y1, z1, x2, y2, z2, ... ] now I use: [ v1, v2, ... ].

Old times were:
Utils3D.projectVectors: 1031ms
Projector.projectVecList Vector<Float>: 755ms

New time is:
Projector.projectVecList Vector<vec3f>: 631ms

Its even faster! Takes almost half the time it takes Utils3D to finish.

So in conclusion:

From now own I'll manually project the vertices, and I'll do it by receiving a Vector of Vertices and not a Vector of Floats with vertices packed in them.

Run the test here.


Update:

Made some tests using HaXes flash.Memory api. I pack source vertexes in a ByteArray in a similar way to how they are packed in the Vector<Float> version.

I tested storing the vertex components as Double (8 bytes) and as Float ( 4 Bytes ). The result with the Double storage was a bit faster than the Vector<Float> implementation but slower than the Vector<Vec3D> one. 

Storing the components as Float turned out to be faster than any of the other methods, although the difference between it and the Vector<Vec3D> was very small and a lot of precision is lost by this conversion.


Utils3D.projectVectors: 1071
Projector Vector<Float>: 766
Projector Vector<Vec3D>: 642
ByteArray Double: 733
ByteArray Float: 599

I think I'll stay with the Vector<Vec3D> implementation for now, since I'm not very keen on sacrificing numeric precision or usability for such a small increase in performance.

1 comment:

  1. 7k+ for projectVectors.
    Others never finished(15 sec restriction).

    ReplyDelete