Properly scaling Point Sprites in DirectX

When rendering point sprites with DirectX, many people deals with the issue of point sprite scaling. There are many options regarding this, but in the 99% of the cases, people just wants one of the following two things:

1.- Point Sprites with a constant screen-space size

Ver imagen en tamaño completo

This sprites are rendered at the 3D position (projected to the viewport of course), but with a constant size, in pixels. So, if you move the camera towards the point sprite, it feels like if it shrinks, and if you move the camera away from the sprite, it will look like if “growing”.

This effect is sometimes desirable, specially if you are using point sprites to simulate 2D effects, post-processing or camera effects like Lens Flares (on your left).

To setup this kind of Point Sprite rendering in DirectX, you just need to disable PointSprite scaling. This way, you will be able to specify point sizes both with a default value or in each point vertex information. These are the render states you need to modify:

Disable point scaling: D3DRS_POINTSCALEENABLE = false

Set default point size (remember, in pixels): D3DRS_POINTSIZE = XX

2.- Point Sprites with a real 3D size

By contrary to what’s shown in the previous chapter, sometimes you need point sprites to behave like real 3D objects, i.e. when using them to simulate a particle system, or that kind of stuff.

In this case, you don’t want sprites to remain at the same size if you move away from them, as real 3D objects get smaller as we get farther from them and vice versa. It is obvious that we need to change the size of sprites (remember, it’s a 2D size, in pixels), accordingly to our 3D movements, to give the illusion of 3D behavior.

DirectX is already prepared for that, with a built-in system to handle point sprite scaling. You just need to activate it setting the render state D3DRS_POINTSCALEENABLE to true, and setting up some other parameters. If you take a look at the docs, you will find the following:

Point Size Computations

Point size is determined by D3DRS_POINTSCALEENABLE. If this value is set to FALSE, the application-specified point size is used as the screen-space (post-transformed) size. Vertices that are passed to Direct3D in screen space do not have point sizes computed; the specified point size is interpreted as screen-space size.

If D3DRS_POINTSCALEENABLE is TRUE, Direct3D computes the screen-space point size according to the following formula. The application-specified point size is expressed in camera space units.

S s = Vh * S i * sqrt(1/(A + B * D e + C *( D e2 )))

In this formula, the input point size, S i, is either per-vertex or the value of the D3DRS_POINTSIZE render state. The point scale factors, D3DRS_POINTSCALE_A, D3DRS_POINTSCALE_B, and D3DRS_POINTSCALE_C, are represented by the points A, B, and C. The height of the viewport, V h, is the D3DVIEWPORT9 structure's Height member representing the viewport. D e, the distance from the eye to the position (the eye at the origin), is computed by taking the eye space position of the point (Xe, Ye, Ze) and performing the following operation.

D e = sqrt (Xe2 + Y e2 + Z e2)

The maximum point size, Pmax, is determined by taking the smaller of either the D3DCAPS9 structure's MaxPointSize member or the D3DRS_POINTSIZE_MAX render state. The minimum point size, Pmin, is determined by querying the value of D3DRS_POINTSIZE_MIN. Thus the final screen-space point size, S, is determined in the following manner.

  • If Ss > Pmax, then S = P max
  • if S < Pmin, then S = P min
  • Otherwise, S = S s

Well, that´s all. Clear ugh? Just kidding…

Setting the A,B,C scaling parameters for a correct, 3D, point scaling

In order to achieve the behavior you want, you just have to set the following configuration:

D3DRS_POINTSCALE_A = 0.0f
D3DRS_POINTSCALE_B = 0.0f
D3DRS_POINTSCALE_C = 1.0f
D3DRS_POINTSIZE_MIN = 0.0f
D3DRS_POINTSIZE_MAX = 9999.0f

Et voilá ! 3D point sprites…

Ultimas apariciones en prensa de Simax

SimaxReflect_256x128Simax ha aparecido últimamente en prensa y radio, a raíz del galardón en los premios Bancaja, e interesándose especialmente por nuestros productos para formación en conducción ecológica. Ahí van los links:

La razón:

http://www.larazon.es/noticia/conduccion-ecologica-virtual

Punto Radio (con audio de la entrevista):

http://actual.lasprovincias.es/jovenesemprendedores/etiquetas/inaki-ayucar/

Saludos.

Dell XPS 630i, the BT Mini-Receiver, download managers, and other crap…

If you are one of the owners of a Dell computer, you´ll probably be happy with it, like me. However, nothing´s perfect, and of course dell is not either.

One of the things I hate most on Earth is that too-usual way of managing download from websites. Every company has it’s own download manager, and in a few months you end up with a dozen of them installed on your computer or on your browser: Adobe Update Manager, Dell Download Manager, Apple Update Manager, Google updater, Aaaaarrgggllll !!!! I just want to download the f**ing file !!!!! Stop selling me your crap !!!!

Well, back to business. If you own an XPS 630i (and probably other Dell models), and you don’t install all the crap that comes with your system (Dell tune up utility, Dell application, Dell recovery, Dell diagnostics, Dell sandwich maker and the Dell fantastasystem), or if you simply lost your original CD drivers, you will probably miss one: the driver for the BlueTooth Mini-Receiver.

It is not initially listed in the default downloads list at www.dell.com for the 630i (a very deficient list, I must say. Is it that difficult to include a small description of the purpose of each application or update?), and believe me, it’s not easy to find.

I also tried to use one of the magnificent Dell helper-applications to find it, but the one it suggested didn’t work. Finally, I found it here, thanks to the people of My630i.

Thanks a lot!

Ramblings about your own, fast C# Maths library

Why not building your own C# Maths library instead of using the ones included in APIs like DirectX or XNA? That would give you some advantages when developing a game or something similar. Keep reading…
It is late at night. Very late. In fact it is so late, that I should say it’s almost early in the morning… but I´m immersed in a task I have delayed for too much time: extend my own Math library, directly written in C#. What you said? That cannot be fast enough? ha ha… Some FAQs first:

Intro

Why to write my own library?

Because I think it’s good to depend on the Graphics API as less as possible. Why?
  • Because APIs are deprecated from time to time (as happened with Managed DirectX), or evolve, or change, or re-appear in some form (as it seems that is going to happen with some parts of Managed DirectX, via the Windows 7 Code Pack), or you just decide to use another API, or you want your code to be compatible both with XNA and with DirectX.
  • Because I want to be able to debug the inners of my Maths.
  • Because writing your own library (even if you copy most parts of it), you will learn a lot.
  • Because seeing the internals of things, you will realize how much work some tasks take, and you will think it twice before calling them…
Do you need any more reasons?

Wait a minute. Faster code, using no API?

For CPU-related stuff, YES, it is possible.
XNA (or DirectX) makes no magic. They just calculate things as you will do. But of course you must choose the best algorithms for each task.If you do so, you will end up with a code as fast as the one written in the API, but the point is, as you will know how things work in the inners, you will have a much more detailed idea of the cost of things, and you will optimize your code. In addition to that, you will be able to tune things a bit for your specific needs, increasing the speed a bit more.
However, keep in mind that companies like Microsoft have hundreds or thousands of the best engineers working in their products, so if there is a better method of doing something, the API will have it. You need to be that good too!
Let’s see an example of the kind of optimization you need to achieve to make your own Maths library as fast (or faster) than the ones included in the APIs. We will include some matrix operations and make some speed tests with different algorithms.

A note about Affine Transformation Matrices

In 3D graphics applications, transformations are typically built from the combination of elementary matrices (being this elementary matrices: translations, rotations, scales and the identity matrix). Matrices built this way, always result in Affine transformations (matrices with the right column of the matrix being 0,0,0,1 –if using row representation of matrices-).
There are some algorithm optimizations in this article that are only valid for Affine Transformation matrices. You should avoid using them for non-affine matrices. It is the case of Perspective Projection matrices, which DO NOT BELONG to the affine transformations group.

A note about JIT compilation and .NET maths performance

This is a very interesting question: Are local variables faster than member variables?
At first, one could say that they aren’t, as they have to be re-created each time the method is called. BUT, as this interesting blog post points out, while the JIT compiler can en-register local variables, it cannot do the same with member fields. So we should say that YES, THEY ARE, when talking about math calculations. As you can see the in the test, it makes a remarkable difference. So:
If you are developing in .Net and you have a performance critical math. calculation method, you should use local copy of variables to speed up calculations
In fact, if you get a glimpse inside the XNA code, every non-static method inside the Matrix class, which is performance critical, makes a local copy of each member field.

Part #1 Matrix Determinant Calculation

The general Method (Based on Graphics Gems I)

The following is a pretty straight-forward algorithm for calculating the determinant of a matrix. It is mostly designed for non-realtime applications, as it quite slow and uses double-precission. It calculates a 4x4 determinant basing on the 3x3 method, then in the 2x2 and so on. It is a based in an algorithm used in Graphics Gems I for matrix inversion.
        public double Determinant4x4()
        {
            double ans;
            double a1, a2, a3, a4, b1, b2, b3, b4, c1, c2, c3, c4, d1, d2, d3, d4;
 
            a1 = this.M11; b1 = this.M12;
            c1 = this.M13; d1 = this.M14;
 
            a2 = this.M21; b2 = this.M22;
            c2 = this.M23; d2 = this.M24;
 
            a3 = this.M31; b3 = this.M32;
            c3 = this.M33; d3 = this.M34;
 
            a4 = this.M41; b4 = this.M42;
            c4 = this.M43; d4 = this.M44;
 
            ans = a1 * Determinant3x3(b2, b3, b4, c2, c3, c4, d2, d3, d4)
                - b1 * Determinant3x3(a2, a3, a4, c2, c3, c4, d2, d3, d4)
                + c1 * Determinant3x3(a2, a3, a4, b2, b3, b4, d2, d3, d4)
                - d1 * Determinant3x3(a2, a3, a4, b2, b3, b4, c2, c3, c4);
            return ans;
        }      
        public double Determinant3x3(double  a1,double  a2,double  a3,double  b1,double  b2,double  b3,double  c1,double  c2,double  c3 )
        {
            double ans;
            ans = a1 * Determinant2x2(b2, b3, c2, c3)
                - b1 * Determinant2x2(a2, a3, c2, c3)
                + c1 * Determinant2x2(a2, a3, b2, b3);
            return ans;
        }
        public double Determinant2x2( double a,double  b, double c, double d)
        {
            double ans;
            ans = a * d - b * c;
            return ans;
        }

The general Method (real-time applications optimized)

This second version is also a full, general method, but optimized for real-time applications. It is much faster, because of the calculation and because of the grouping of duplicated mults in numXX variables, saving multiplications and adds.
       public float Determinant()
        {
            float num18 = (M33 * M44) - (M34 * M43);
            float num17 = (M32 * M44) - (M34 * M42);
            float num16 = (M32 * M43) - (M33 * M42);
            float num15 = (M31 * M44) - (M34 * M41);
            float num14 = (M31 * M43) - (M33 * M41);
            float num13 = (M31 * M42) - (M32 * M41);
            return ((((M11 * (((M22 * num18) - (M23 * num17)) + (M24 * num16))) - (M12 * (((M21 * num18) -
                   (M23 * num15)) + (M24 * num14)))) + (M13 * (((M21 * num17) - (M22 * num15)) +
                   (M24 * num13)))) - (M14 * (((M21 * num16) - (M22 * num14)) + (M23 * num13))));
        }
This is the one used in XNA or SlimDX. It is the standard de-facto.

Optimization I (faster Memory Accessing)

As stated in the intro, about the performance of local variables vs. member fields, we´ll make an optimization of the above method using this concept:
    public float Determinant()
        {
            float L11 = this.M11;
            float L12 = this.M12;
            float L13 = this.M13;
            float L14 = this.M14;
            float L21 = this.M21;
            float L22 = this.M22;
            float L23 = this.M23;
            float L24 = this.M24;
            float L31 = this.M31;
            float L32 = this.M32;
            float L33 = this.M33;
            float L34 = this.M34;
            float L41 = this.M41;
            float L42 = this.M42;
            float L43 = this.M43;
            float L44 = this.M44;
            float C1 = (L33 * L44) - (L34 * L43);
            float C2 = (L32 * L44) - (L34 * L42);
            float C3 = (L32 * L43) - (L33 * L42);
            float C4 = (L31 * L44) - (L34 * L41);
            float C5 = (L31 * L43) - (L33 * L41);
            float C6 = (L31 * L42) - (L32 * L41);
            return ((((L11 * (((L22 * C1) - (L23 * C2)) + (L24 * C3))) - (L12 * (((L21 * C1) -
                   (L23 * C4)) + (L24 * C5)))) + (L13 * (((L21 * C2) - (L22 * C4)) + (L24 * C6)))) -
                   (L14 * (((L21 * C3) - (L22 * C5)) + (L23 * C6))));
        }
See the test for performance results.

Optimization II (Based on Graphics Gems II, for Affine transformations only)

The following algorithm is a direct translation of a piece of code found in a Affine Matrix Inversion algorithm, inside Graphics Gems II (see in the next post for the full algorithm). It makes only 12 mults and 7 adds. Keep in mind that this method is only valid for affine transformations (see intro for further information):
        public float Determinant()
        {
            return (this.M11 * this.M22 * this.M33) +
                   (this.M12 * this.M23 * this.M31) +
                   (this.M13 * this.M21 * this.M32) -
                   (this.M13 * this.M22 * this.M31) -
                   (this.M12 * this.M21 * this.M33) -
                   (this.M11 * this.M23 * this.M32);
        }

 

The test

So, it’s time to measure this little boys.

Performance

The test was made calculation a 4x4 affine matrix determinant 100 Million times, in a Intel Core 2 Quad Q9450 2.66Ghz with 3 Gb (non multi-threading).
General method (Graphics Gems I, non real-time)
1 minute: 0.9 secs
General method (real-time)
2.6832 secs
Optimization I (memory access)
2.496 secs
Optimization II (affine transformations)
1.404 secs

Reliability

The reliability test was made with two different matrices.There is no surprises here, all the algorithms work well. The following is the matrices used for the tests (affine transformations) and the results (the determinant) calculated by each algorithm.
Matrix 1:
0.7071068 -0.235702276 0.6666667 0.0
0.0 0.9428091 0.33333334 0.0
-0.7071068 -0.235702276 0.6666667 0.0
0.0 0.0 -15.0 1
General method (Graphics Gems I)
1.0000001641611829
General method (real-time)
1.00000012
Optimization I (memory access)
1.00000012
Optimization II (affine transformations)
1.00000012

Matrix 2:
9.848019 0.10061159 0.0 0.0
-3.50578356 0.282625765 0.0 0.0
0.0 0.0 94.22 0.0
0.0 0.0 0.0 1
General method (Graphics Gems I)
295.47639778127973
General method (real-time)
295.4764
Optimization I (memory access)
295.4764
Optimization II (affine transformations)
295.4764

Part #2: Matrix Inverse Calculation

One of the most expensive Matrix operation is calculating the inverse of a matrix. Will start with the general algorithm for this, and study a couple of optimizations later:

The general Method

This algorithm (explained here), is something like this:
        public Matrix Inverse()
        {
            float determinant = this.Determinant();
            if (determinant == 0)
                throw new System.InvalidOperationException("Matrix has zero determinant (singular matrix). Inverse doesn´t exist");
 
            Matrix b = new Matrix();
            b.M11 = M22 * M33 * M44 + M23 * M34 * M42 + M24 * M32 * M43 - M22 * M34 * M43 - M23 * M32 * M44 - M24 * M33 * M42;
            b.M12 = M12 * M34 * M43 + M13 * M32 * M44 + M14 * M33 * M42 - M12 * M33 * M44 - M13 * M34 * M42 - M14 * M32 * M43;
            b.M13 = M12 * M23 * M44 + M13 * M24 * M42 + M14 * M22 * M43 - M12 * M24 * M43 - M13 * M22 * M44 - M14 * M23 * M42;
            b.M14 = M12 * M24 * M33 + M13 * M22 * M34 + M14 * M23 * M32 - M12 * M23 * M34 - M13 * M24 * M32 - M14 * M22 * M33;
 
            b.M21 = M21 * M34 * M43 + M23 * M31 * M44 + M24 * M33 * M41 - M21 * M33 * M44 - M23 * M34 * M41 - M24 * M31 * M43;
            b.M22 = M11 * M33 * M44 + M13 * M34 * M41 + M14 * M31 * M43 - M11 * M34 * M43 - M13 * M31 * M44 - M14 * M33 * M41;
            b.M23 = M11 * M24 * M43 + M13 * M21 * M44 + M14 * M23 * M41 - M11 * M23 * M44 - M13 * M24 * M41 - M14 * M21 * M43;
            b.M24 = M11 * M23 * M34 + M13 * M24 * M31 + M14 * M21 * M33 - M11 * M24 * M33 - M13 * M21 * M34 - M14 * M23 * M31;
 
            b.M31 = M21 * M32 * M44 + M22 * M34 * M41 + M24 * M31 * M42 - M21 * M34 * M42 - M22 * M31 * M44 - M24 * M32 * M41;
            b.M32 = M11 * M34 * M42 + M12 * M31 * M44 + M14 * M32 * M41 - M11 * M32 * M44 - M12 * M34 * M41 - M14 * M31 * M42;
            b.M33 = M11 * M22 * M44 + M12 * M24 * M41 + M14 * M21 * M42 - M11 * M24 * M42 - M12 * M21 * M44 - M14 * M22 * M41;
            b.M34 = M11 * M24 * M32 + M12 * M21 * M34 + M14 * M22 * M31 - M11 * M22 * M34 - M12 * M24 * M31 - M14 * M21 * M32;
 
            b.M41 = M21 * M33 * M42 + M22 * M31 * M43 + M23 * M32 * M41 - M21 * M32 * M43 - M22 * M33 * M41 - M23 * M31 * M42;
            b.M42 = M11 * M32 * M43 + M12 * M33 * M41 + M13 * M31 * M42 - M11 * M33 * M42 - M12 * M31 * M43 - M13 * M32 * M41;
            b.M43 = M11 * M23 * M42 + M12 * M21 * M43 + M13 * M22 * M41 - M11 * M22 * M43 - M12 * M23 * M41 - M13 * M21 * M42;
            b.M44 = M11 * M22 * M33 + M12 * M23 * M31 + M13 * M21 * M32 - M11 * M23 * M32 - M12 * M21 * M33 - M13 * M22 * M31;
 
            // This last operation involves a float * Matrix multiplication
            return (1f / determinant) * b;
        }
You can use any of the methods presented in Part 1 of this article for calculating the Determinant.

Optimization I (memory access and some operation savings)

This method applies the same concept, but using local variables and packing duplicated calculations in local variables too:
        public static Matrix InvertXNALocalValues(Matrix matrix)
        {
            Matrix outMatrix = new Matrix();
            float L1 = matrix.M11;
            float L2 = matrix.M12;
            float L3 = matrix.M13;
            float L4 = matrix.M14;
            float L5 = matrix.M21;
            float L6 = matrix.M22;
            float L7 = matrix.M23;
            float L8 = matrix.M24;
            float L9 = matrix.M31;
            float L10 = matrix.M32;
            float L11 = matrix.M33;
            float L12 = matrix.M34;
            float L13 = matrix.M41;
            float L14 = matrix.M42;
            float L15 = matrix.M43;
            float L16 = matrix.M44;
            float L17 = (L11 * L16) - (L12 * L15);
            float L18 = (L10 * L16) - (L12 * L14);
            float L19 = (L10 * L15) - (L11 * L14);
            float L20 = (L9 * L16) - (L12 * L13);
            float L21 = (L9 * L15) - (L11 * L13);
            float L22 = (L9 * L14) - (L10 * L13);
            float L23 = ((L6 * L17) - (L7 * L18)) + (L8 * L19);
            float L24 = -(((L5 * L17) - (L7 * L20)) + (L8 * L21));
            float L25 = ((L5 * L18) - (L6 * L20)) + (L8 * L22);
            float L26 = -(((L5 * L19) - (L6 * L21)) + (L7 * L22));
            float L27 = 1f / ((((L1 * L23) + (L2 * L24)) + (L3 * L25)) + (L4 * L26));
            float L28 = (L7 * L16) - (L8 * L15);
            float L29 = (L6 * L16) - (L8 * L14);
            float L30 = (L6 * L15) - (L7 * L14);
            float L31 = (L5 * L16) - (L8 * L13);
            float L32 = (L5 * L15) - (L7 * L13);
            float L33 = (L5 * L14) - (L6 * L13);
            float L34 = (L7 * L12) - (L8 * L11);
            float L35 = (L6 * L12) - (L8 * L10);
            float L36 = (L6 * L11) - (L7 * L10);
            float L37 = (L5 * L12) - (L8 * L9);
            float L38 = (L5 * L11) - (L7 * L9);
            float L39 = (L5 * L10) - (L6 * L9);
            outMatrix.M11 = L23 * L27;
            outMatrix.M21 = L24 * L27;
            outMatrix.M31 = L25 * L27;
            outMatrix.M41 = L26 * L27;
            outMatrix.M12 = -(((L2 * L17) - (L3 * L18)) + (L4 * L19)) * L27;
            outMatrix.M22 = (((L1 * L17) - (L3 * L20)) + (L4 * L21)) * L27;
            outMatrix.M32 = -(((L1 * L18) - (L2 * L20)) + (L4 * L22)) * L27;
            outMatrix.M42 = (((L1 * L19) - (L2 * L21)) + (L3 * L22)) * L27;         
            outMatrix.M13 = (((L2 * L28) - (L3 * L29)) + (L4 * L30)) * L27;
            outMatrix.M23 = -(((L1 * L28) - (L3 * L31)) + (L4 * L32)) * L27;
            outMatrix.M33 = (((L1 * L29) - (L2 * L31)) + (L4 * L33)) * L27;
            outMatrix.M43 = -(((L1 * L30) - (L2 * L32)) + (L3 * L33)) * L27;         
            outMatrix.M14 = -(((L2 * L34) - (L3 * L35)) + (L4 * L36)) * L27;
            outMatrix.M24 = (((L1 * L34) - (L3 * L37)) + (L4 * L38)) * L27;
            outMatrix.M34 = -(((L1 * L35) - (L2 * L37)) + (L4 * L39)) * L27;
            outMatrix.M44 = (((L1 * L36) - (L2 * L38)) + (L3 * L39)) * L27;
            return outMatrix;
        }

Optimization II (for Affine Transformations only -Based on the Kevin Wu article, Graphics Gems II-)

The Graphics Gems books series are a must in any graphics or game developer’s bookshelves. You can by the the book I’m talking about here, or get more info about the Graphics Gems series here. Google have some excerpts of this book in the following link, but one page out of each two is missing: Graphics gems II‎ - Página 342.
So, the algorithm purposed by Mr. Wu looks like the following (ported to C#, original in C++):
   public static bool Inverse(Matrix min, Matrix mout)
        {
            float det_1;
            float pos, neg, temp;
            double PRECISION_LIMIT = 1.0e-15;
            // Calculate determinand
            pos = neg = 0.0f;
            temp = min.M11 * min.M22 * min.M33;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = min.M12 * min.M23 * min.M31;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = min.M13 * min.M21 * min.M32;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M13 * min.M22 * min.M31;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M12 * min.M21 * min.M33;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M11 * min.M23 * min.M32;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            det_1 = pos + neg;
 
            // Is the submatrix A singular? then Matrix M has no inverse
            if ((det_1 == 0.0) || (Math.Abs(det_1 / (pos - neg)) < PRECISION_LIMIT))
                throw new System.InvalidOperationException("Matrix has zero determinant (singular matrix). Inverse doesn´t exist");
 
            // Calculate inverse(A) = adj(A) / det(A)
            det_1 = 1.0f / det_1;
            mout.M11 = (min.M22 * min.M33 - min.M23 * min.M32) * det_1;
            mout.M21 = -(min.M21 * min.M33 - min.M23 * min.M31) * det_1;
            mout.M31 = (min.M21 * min.M32 - min.M22 * min.M31) * det_1;
            mout.M12 = -(min.M12 * min.M33 - min.M13 * min.M32) * det_1;
            mout.M22 = (min.M11 * min.M33 - min.M13 * min.M31) * det_1;
            mout.M32 = -(min.M11 * min.M32 - min.M12 * min.M31) * det_1;
            mout.M13 = (min.M12 * min.M23 - min.M13 * min.M22) * det_1;
            mout.M23 = -(min.M11 * min.M23 - min.M13 * min.M21) * det_1;
            mout.M33 = (min.M11 * min.M22 - min.M12 * min.M21) * det_1;
 
            /* Calculate -C * inverse(A) */
            mout.M41 = -(min.M41 * mout.M11 + min.M42 * mout.M21 + min.M43 * mout.M31);
            mout.M42 = -(min.M41 * mout.M12 + min.M42 * mout.M22 + min.M43 * mout.M32);
            mout.M43 = -(min.M41 * mout.M13 + min.M42 * mout.M23 + min.M43 * mout.M33);
 
            /* Fill in last column */
            mout.M14 = mout.M24 = mout.M34 = 0.0f;
            mout.M44 = 1.0f;
            return true;
        }
Please keep in mind that this algorithm is only valid for affine transformations. Of course, some memory accessing optimization could be done here, using local variables as explained in Part 1.

Optimization III (for matrices composed by Rotations and Translations only)

If you know that your matrix is composed by rotations and translations only, you can apply this (even faster) method, which avoids calculating the determinant, and takes advantage of some properties of this kind of matrices. You can read all the info here.
        public Matrix Inverse()
        {
            Matrix r = new Matrix();
            r.M11 = this.M11;
            r.M12 = this.M21;
            r.M13 = this.M31;
            r.M14 = 0f;
 
            r.M21 = this.M12;
            r.M22 = this.M22;
            r.M23 = this.M32;
            r.M24 = 0f;
 
            r.M31 = this.M13;
            r.M32 = this.M23;
            r.M33 = this.M33;
            r.M34 = 0f;
 
            r.M41 = -(r.M11 * this.M41 + r.M21 * this.M42 + r.M31 * this.M43);
            r.M42 = -(r.M12 * this.M41 + r.M22 * this.M42 + r.M32 * this.M43);
            r.M43 = -(r.M13 * this.M41 + r.M23 * this.M42 + r.M33 * this.M43);
            r.M44 = 1f;
            return r;
        }

Additional Optimizations

If you need it, other optimizations could be included in this algorithms, detecting special matrices whose inverse can be calculated even faster. It is the case of diagonal matrices (where only the diagonal should be inverted using 1/value entries), or Orthogonal matrices (whose inverse is equal to its transpose).

The Test

We made a very simple test, with 50.000.000 (50 Million) inverse calculations of affine transformations, in an Intel Core 2 Quad Q9450 2.66Ghz with 3 Gb (non multi-threading).
General method
11.523 secs
Optimization I (memory accessing and op. savings)
5.4912 secs
Optimization II (affine transformations only)
2.7144 secs
Optimization III (rotations and translations only)
1.326 secs

Conclusions

Searching a bit, it is indeed possible to reach the performance of your graphics API and as mentioned above, the advantages of having (and building) your own library are countless.
Take care!

2D Circle Packing algorithm ported to C#

Yesterday, a friend asked me if I knew of any C# implementation of a Circle Packing algorithm. In fact, I didn’t, but searching a bit I found this algorithm, and this Java Applet implementation
Packing of different circles into a 2D space, trying to minimize the unused space, is a typical geometrical problem. Humans can solve it quite quickly, but it is very hard to find a solution mathematically. The most frequent implementations use numeric algorithms that get closer to the solution in each iteration.
It is the case of the above algorithm. It doesn’t implement the limits of the available space, but reorders the circles around a center point quite well. The “flowing” nature of numerical algorithms like these fit specially well if you want to use them for some kind of graphical interface, as you can change the adaptation (or reordering) speed, giving a very nice animated result, or you can even interact with your circles, as moving some of them will make the algorithm to react adapting others (you can try the above applet).
So, with all that material, it was easy to port it to C#. The result is the following:
image
The code for a very easy Circle class (please note that I´m using the XNA framework for the maths –Vector2, etc-):
public class Circle
    {
        public Vector2 mCenter;
        public float mRadius;
 
        /// <summary>
        ///
        /// </summary>
        /// <param name="iCenter"></param>
        /// <param name="Radius"></param>
        public Circle(Vector2 iCenter, float Radius)
        {
            mCenter = iCenter;
            mRadius = Radius;
        }
        /// <summary>
        ///
        /// </summary>
        /// <returns></returns>
        public override string ToString()
        {
            return "Rad: " + mRadius + " _ Center: " + mCenter.ToString();
        }     
    }
And the important class. CirclePacker:
This class holds a list of circles that will be re-arranged around a “PackingCenter” point, with a “MinSeparation” between them. You only need to iteratively call the OnFrameMove method, passing an “iterationCounter” parameter, that will hold the damping on the adaptation speed (the bigger value, the slower adaptation). Reset this parameter to 1 always you want to reset speed (never set this parameter to 0).
  public class CirclePacker
    {
        public List<Circle> mCircles = new List<Circle>();
        public Circle mDraggingCircle = null;
        protected Vector2 mPackingCenter;
        public float mMinSeparation = 1f;
 
        /// <summary>
        /// Generates a number of Packing circles in the constructor.
        /// Random distribution is linear
        /// </summary>
        public CirclePacker(Vector2 pPackingCenter, int pNumCircles,
                            double pMinRadius, double pMaxRadius)
        {
            this.mPackingCenter = pPackingCenter;
 
            // Create random circles
            this.mCircles.Clear();
            Random Rnd = new Random(System.DateTime.Now.Millisecond);
            for (int i = 0; i < pNumCircles; i++)
            {
                Vector2 nCenter = new Vector2((float)(this.mPackingCenter.X +
                                                      Rnd.NextDouble() * pMinRadius),
                                              (float)(this.mPackingCenter.Y +
                                                      Rnd.NextDouble() * pMinRadius));
                float nRadius = (float)(pMinRadius + Rnd.NextDouble() *
                                       (pMaxRadius - pMinRadius));
                this.mCircles.Add(new Circle(nCenter, nRadius));
            }
        }
        /// <summary>
        ///
        /// </summary>
        /// <param name="?"></param>
        /// <returns></returns>
        private float DistanceToCenterSq(Circle pCircle)
        {
            return (pCircle.mCenter - mPackingCenter).LengthSquared();
        }
        /// <summary>
        ///
        /// </summary>
        private int Comparer(Circle p1, Circle P2)
        {
            float d1 = DistanceToCenterSq(p1);
            float d2 = DistanceToCenterSq(P2);
            if (d1 < d2)
                return 1;
            else if (d1 > d2)
                return -1;
            else return 0;
        }
        /// <summary>
        ///
        /// </summary>
        public void OnFrameMove(long iterationCounter)
        {
            // Sort circles based on the distance to center
            mCircles.Sort(Comparer);
 
            float minSeparationSq = mMinSeparation * mMinSeparation;
            for (int i = 0; i < mCircles.Count - 1; i++)
            {
                for (int j = i + 1; j < mCircles.Count; j++)
                {
                    if (i == j)
                        continue;
 
                    Vector2 AB = mCircles[j].mCenter - mCircles[i].mCenter;
                    float r = mCircles[i].mRadius + mCircles[j].mRadius;
 
                    // Length squared = (dx * dx) + (dy * dy);
                    float d = AB.LengthSquared() - minSeparationSq;
                    float minSepSq = Math.Min(d, minSeparationSq);
                    d -= minSepSq;
 
                    if (d < (r * r) - 0.01 )
                    {
                        AB.Normalize();
 
                        AB *= (float)((r - Math.Sqrt(d)) * 0.5f);
 
                        if (mCircles[j] != mDraggingCircle)
                            mCircles[j].mCenter += AB;
                        if (mCircles[i] != mDraggingCircle)
                            mCircles[i].mCenter -= AB;
                    }
 
                }
            }
 
 
            float damping = 0.1f / (float)(iterationCounter);
            for (int i = 0; i < mCircles.Count; i++)
            {
                if (mCircles[i] != mDraggingCircle)
                {
                    Vector2 v = mCircles[i].mCenter - this.mPackingCenter;
                    v *= damping;
                    mCircles[i].mCenter -= v;
                }
            }
        }       
        /// <summary>
        ///
        /// </summary>
        public void OnMouseDown(MouseEventArgs e)
        {
            Vector2 center = new Vector2(e.X, e.Y);
            mDraggingCircle = null;
            foreach (Circle circle in mCircles)
            {
                float dist = (circle.mCenter - center).LengthSquared();
                if (dist < (circle.mRadius * circle.mRadius))
                {
                    mDraggingCircle = circle;
                    break;
                }
            }           
        }
        /// <summary>
        ///
        /// </summary>
        public void OnMouseMove(MouseEventArgs e)
        {
            if (mDraggingCircle == null)
                return;
 
            mDraggingCircle.mCenter = new Vector2(e.X, e.Y);
        }
    }

 

To do

A variable speed system could be done, and a way to interactively change the PackingCenter point would be nice too. It would also be necessary to implement the limits of the available space, changing the radius of the circles proportionally if they don´t fit.
Cheers!