Writing a correct Main Loop for your application

Have you ever wondered how can Windows run more than one application at a time, even in a single-processor machine? The answer is easy. It doesn’t. The point is that it switches from one application to another so fast that you don´t notice the change, and it seems both applications are running in parallel.
To allow this, running Windows applications should not always be running.
It´d be a waste of resources to massively use the processor just to find out that nothing had to be done, don’t you think?. That´d be kind of a psychotic behavior. Windows has it´s protocols and procedures to avoid this, letting applications or threads enter idle states, even for a few milliseconds, and giving then priority to other, non-idling processes.
So, what would happen if your application never enters idle mode? It would bring down the multi-tasking capability of Windows, slowing down every other application than yours. That´s why I say that running applications should not always be running.
Under this point of view, applications can be divided into three groups:
  1. Psychotic applications that should be updated always, i.e. intense computing applications (not so frequent, nowadays)
  2. Applications that should be updated only as a response to some user interaction (most Windows Forms applications)
  3. Applications that sometimes need to be “always updated”, and sometimes not (like a video-game)
As you know, when an application is run, the Main method is invoked, and the O.S gives the application access to the processor and some memory. Apart from thread priority policies, application´s behavior will be one or another depending on what we do in this Main() method:

Case # 1. Compute, compute, compute

The Main method structure is simple: update as fast as you can, and consume as much processor as possible. The Main() method in pseudo-code:
            while (Application is Alive)
            {
                Compute()
            }
This is the typical behavior of old applications in non multi-tasking environments. They consume as much processing power as is available.

Case # 2. Don´t disturb unless absolutely necessary

What should be done here is easy: if events are pending, process them (DoEvents). If not, just relax and sleep, yielding processor to other threads or applications. The exact implementation of such a procedure can be more complex, as we should decide when to enter sleep modes and when not. Luckily, .Net does that for us. The C# implementation you should use is:
            System.Windows.Forms.Application.Run(mForm);
The Run() method takes care of all the things above mentioned (yielding and many other things), and it does it using the standard .Net procedures and protocols.

Case # 3: The mixed approach

Other situations need a mixed solution. For example, a Windows Forms application that sometimes needs to enter a “running” state, where everything has to be updated, like a Windows Forms-based video-game, or a 3D editor where you want to run an animation. Here, updates will be applied when the application is in Running state, but not when it´s not, saving this way CPU (and batteries, if running in a laptop).
So, we want to combine the efficiency of Case # 2, with the possibility of switching to Case # 1 when needed. The correct implementation for this, in C#, is:
       ...
            System.Windows.Forms.Application.Idle += new EventHandler(Application_Idle);
            System.Windows.Forms.Application.Run(mForm);
        }
        static void Application_Idle(object sender, EventArgs e)
        {
            if(Application.RunningState)
            {
                 while(ApplicationStillIdle)
                      Compute();
            }
        }
The Application_Idle event will be fired each time the application is about to enter in Idle state (where all the pending messages for the windows has been processed).
This way, interaction between our application and others (running at the same time) will be correct: when we decide it (activate running state), we will have all the computing power for ourselves, but when it´s not needed (RunningState is off), we will yield it.

How to check if application is still Idle

The Application_Idle event is fired only once, when the application is about to enter idle state. Here, we will have to update computations while it is still Idle, giving priority to Windows Messages if there are some. How to check if the application is still Idle? Easy, we can use a property like the following:
        /// <summary>Checks to see if the application is still idle</summary>
        private bool AppStillIdle
        {
            get
            {
                NativeMethods.Message msg;
                return !NativeMethods.PeekMessage(out msg, IntPtr.Zero, 0, 0, 0);
            }
        }

An extra option to make things even better

We have seen how to activate/deactivate the psychotic (processor consuming) behavior of an application, basing on a state variable, but sometimes it is interesting to also deactivate it when our application does not have the focus (it is not the active application in the O.S.).
Windows controls and forms have a “Focused” property, but that doesn’t help much in this case, as we want to check if the entire application has focus, not an specific form (the main form of the application will loose focus often, when we open a child window, for instance).
Then, how to check if our application is the active application in Windows? Unfortunately, it seems that there is no C# managed code to check that, but we can easily use DLLImport to invoke the “GetForegroundWindow” method, comparing its result to our main form handle:
[System.Runtime.InteropServices.DllImport("user32.dll")]
static extern IntPtr GetForegroundWindow();
 
public bool MyAppFocused
{
    get { return GetForegroundWindow() == this.Handle; }
}
This way, we can put an additional condition to the Compute method, deactivating it also when the application is not focused:
static void Application_Idle(object sender, EventArgs e)
{
   if(Application.RunningState && MyAppFocused)
   {
       while(ApplicationStillIdle)
            Compute();
   }
     }

What should not be done

Some people tend to write the following code when need the mixed solution mentioned:
while (mForm.Created)
{
      System.Windows.Forms.Application.DoEvents();
 
      if(Application.RunningState)
             Compute();
}

Though it works, this approach is not correct, as it doesn’t yield processor to other threads (your application won´t enter idle mode). Use the above Idle event approach instead.

Conclusion

Quality software use the processor when really need it. Adding a few extra lines of code, you software will better share resources with other applications, and will improve laptop or handheld device´s battery life.
Take Care!

AutoBild estudia los efectos del cansancio y del alcohol al volante con simuladores Simax

En el nĆŗmero de esta semana de AutoBild (nĀŗ 214, del 18 al 24 de Diciembre de 2009), se publica la primera parte de un estudio realizado en colaboraciĆ³n con Toyota EspaƱa y Simax, en el que se analizan los efectos del cansancio y del consumo de alcohol en la conducciĆ³n.

PortadaAutobild¿Y quĆ© mejor manera para hacerlo que realmente analizar la conducciĆ³n de voluntarios en condiciones de fatiga reales, y despuĆ©s de haber bebido? Y no nos referimos a 10 minutos de test, sino a 2 horas seguidas conduciendo en un entorno monĆ³tono como una autovĆ­a, para incrementar todavĆ­a mĆ”s el cansancio y el peligro. AdemĆ”s, y para obtener la mayor cantidad de informaciĆ³n posible, hemos comparado sus resultados con los de otros voluntarios, descansados y sobrios.

Por razones obvias, este estudio no puede hacerse en la vĆ­a pĆŗblica, y un circuito cerrado no sirve, ya que tendrĆ­an que estar 2 horas dando vueltas a un circuito, sin trĆ”fico, y que nada tiene que ver con una carretera real.

La soluciĆ³n diseƱada por AutoBild ha sido utilizar simuladores Simax, cedidos por Toyota EspaƱa. Nos propusieron la idea y desde el principio nos encantĆ³, asĆ­ que nos pusimos manos a la obra.

AdemƔs de permitir a los conductores circular por una vƭa real (autopista), e interactuar con trƔfico, desde Simax diseƱamos un ejercicio en el que, a lo largo de 2 horas, los conductores irƭan enfrentƔndose a incidencias de trƔfico de dificultad creciente. Desde una simple balsa de agua en la calzada, hasta un desprendimiento de rocas.

Los simuladores Simax monitorizan en todo momento todas las reacciones de los conductores y los vehƭculos, lo que nos ha permitido extraer conclusiones de lo mƔs interesantes. TenƩis vuestro ejemplar de AutoBild en el quiosco.

¡No os lo perdĆ”is!

La semana que viene se publica la segunda parte del estudio, y las conclusiones finales.

My ListView won´t show groups! Why?

This is a well documented issue, but just in case it helps someone.

If your ListView doesn't show groups, ensure the following (starting with the most obvious, and ending with the probable cause):

  1. Ensure your ListView has, at least, one column
  2. Ensure you have created Groups correctly
  3. Properly assign a group to each Item, through their “Group” property
  4. Set listView.ShowGroups property = True
  5. Ensure you have called System.Windows.Forms.Application.EnableVisualStyles() method at the beginning of your application

ListView groups rely on VisualStyles of the application, what is only available on Windows XP systems and above. So, groups won´t work on anything below Windows XP.

Here, you can find some explanations, like the following:

“ListView groups are available only on Windows XP and Windows Server 2003 when your application calls the Application.EnableVisualStyles method. On earlier operating systems, any code relating to groups has no effect and the groups will not appear. As a result, any code that depends on the grouping feature might not work correctly.”

Cheers!

Combos de controles en DotNetManĆ­a

El nĆŗmero 64 de la gran revista DotNetManĆ­a publica un artĆ­culo que he escrito sobre creaciĆ³n de combos de controles en Windows Forms con un solo click, ademĆ”s de muchas otras cosas muy muy interesantes. Espero os resulte de ayuda. Saludos!

  portadagr

http://www.dotnetmania.com/Articulos/064/index.html

Autonomous car driving by V.A.I.L

Autoblog has published today a very interesting video on autonomous driving. It surely worth the time you´ll invest watching it…

 

During the first seconds, you can see some Java code. It appears there because that’s the “yesterday” part of the video. I´m sure that they were just about putting some .Net code in the “tomorrow” chapter. They probably run out of time… ;)

Specially beautiful the close look at the accelerating wheel, at the end of the video.

Hope you enjoy it.

Simax ECO-Driving goes U.K.

Simax is starting to distribute the ECO-Experience Simax Simulator in the U.K. Of course, they drive on the left, so we had to develop new environments, with a British look.

Want it real? Check this…

IQ1lond7lond1  lond8

Simax ECO-Experience allows drivers to improve their driving skills, with a special focus on ecological driving concepts. Simax ECO-Experience has been installed in several 1st line Motor Shows around Europe, where thousands of drivers learned how to save tons of CO2 emissions. To know more:

www.simaxvirt.com

The real simulation experience

Usuario y Password para configuraciĆ³n del router ADSL de timofĆ³nica

Si, como a mi, no te dieron el usuario y password para configurar tu router de telefĆ³nica no desesperes. Todos suelen tener los mismos. Aqui va una recopilaciĆ³n:

  • Usuario: admin, Password: admin
  • Usuario: adminttd, Password: adminttd
  • Usuario: 1234, Password: 1234
  • Usuario: user, Password: admin
  • Usuario: admin, Password: 1234
  • etc

Nota: Para acceder a la config de tu router tienes que ir en el navegador web (IE) a una direcciĆ³n IP concreta, que suele ser: http://192.168.1.1

Espero que te sirva…

Properly scaling Point Sprites in DirectX

When rendering point sprites with DirectX, many people deals with the issue of point sprite scaling. There are many options regarding this, but in the 99% of the cases, people just wants one of the following two things:

1.- Point Sprites with a constant screen-space size

Ver imagen en tamaƱo completo

This sprites are rendered at the 3D position (projected to the viewport of course), but with a constant size, in pixels. So, if you move the camera towards the point sprite, it feels like if it shrinks, and if you move the camera away from the sprite, it will look like if “growing”.

This effect is sometimes desirable, specially if you are using point sprites to simulate 2D effects, post-processing or camera effects like Lens Flares (on your left).

To setup this kind of Point Sprite rendering in DirectX, you just need to disable PointSprite scaling. This way, you will be able to specify point sizes both with a default value or in each point vertex information. These are the render states you need to modify:

Disable point scaling: D3DRS_POINTSCALEENABLE = false

Set default point size (remember, in pixels): D3DRS_POINTSIZE = XX

2.- Point Sprites with a real 3D size

By contrary to what’s shown in the previous chapter, sometimes you need point sprites to behave like real 3D objects, i.e. when using them to simulate a particle system, or that kind of stuff.

In this case, you don’t want sprites to remain at the same size if you move away from them, as real 3D objects get smaller as we get farther from them and vice versa. It is obvious that we need to change the size of sprites (remember, it’s a 2D size, in pixels), accordingly to our 3D movements, to give the illusion of 3D behavior.

DirectX is already prepared for that, with a built-in system to handle point sprite scaling. You just need to activate it setting the render state D3DRS_POINTSCALEENABLE to true, and setting up some other parameters. If you take a look at the docs, you will find the following:

Point Size Computations

Point size is determined by D3DRS_POINTSCALEENABLE. If this value is set to FALSE, the application-specified point size is used as the screen-space (post-transformed) size. Vertices that are passed to Direct3D in screen space do not have point sizes computed; the specified point size is interpreted as screen-space size.

If D3DRS_POINTSCALEENABLE is TRUE, Direct3D computes the screen-space point size according to the following formula. The application-specified point size is expressed in camera space units.

S s = Vh * S i * sqrt(1/(A + B * D e + C *( D e2 )))

In this formula, the input point size, S i, is either per-vertex or the value of the D3DRS_POINTSIZE render state. The point scale factors, D3DRS_POINTSCALE_A, D3DRS_POINTSCALE_B, and D3DRS_POINTSCALE_C, are represented by the points A, B, and C. The height of the viewport, V h, is the D3DVIEWPORT9 structure's Height member representing the viewport. D e, the distance from the eye to the position (the eye at the origin), is computed by taking the eye space position of the point (Xe, Ye, Ze) and performing the following operation.

D e = sqrt (Xe2 + Y e2 + Z e2)

The maximum point size, Pmax, is determined by taking the smaller of either the D3DCAPS9 structure's MaxPointSize member or the D3DRS_POINTSIZE_MAX render state. The minimum point size, Pmin, is determined by querying the value of D3DRS_POINTSIZE_MIN. Thus the final screen-space point size, S, is determined in the following manner.

  • If Ss > Pmax, then S = P max
  • if S < Pmin, then S = P min
  • Otherwise, S = S s

Well, that´s all. Clear ugh? Just kidding…

Setting the A,B,C scaling parameters for a correct, 3D, point scaling

In order to achieve the behavior you want, you just have to set the following configuration:

D3DRS_POINTSCALE_A = 0.0f
D3DRS_POINTSCALE_B = 0.0f
D3DRS_POINTSCALE_C = 1.0f
D3DRS_POINTSIZE_MIN = 0.0f
D3DRS_POINTSIZE_MAX = 9999.0f

Et voilĆ” ! 3D point sprites…

Ultimas apariciones en prensa de Simax

SimaxReflect_256x128Simax ha aparecido Ćŗltimamente en prensa y radio, a raĆ­z del galardĆ³n en los premios Bancaja, e interesĆ”ndose especialmente por nuestros productos para formaciĆ³n en conducciĆ³n ecolĆ³gica. AhĆ­ van los links:

La razĆ³n:

http://www.larazon.es/noticia/conduccion-ecologica-virtual

Punto Radio (con audio de la entrevista):

http://actual.lasprovincias.es/jovenesemprendedores/etiquetas/inaki-ayucar/

Saludos.

Dell XPS 630i, the BT Mini-Receiver, download managers, and other crap…

If you are one of the owners of a Dell computer, you´ll probably be happy with it, like me. However, nothing´s perfect, and of course dell is not either.

One of the things I hate most on Earth is that too-usual way of managing download from websites. Every company has it’s own download manager, and in a few months you end up with a dozen of them installed on your computer or on your browser: Adobe Update Manager, Dell Download Manager, Apple Update Manager, Google updater, Aaaaarrgggllll !!!! I just want to download the f**ing file !!!!! Stop selling me your crap !!!!

Well, back to business. If you own an XPS 630i (and probably other Dell models), and you don’t install all the crap that comes with your system (Dell tune up utility, Dell application, Dell recovery, Dell diagnostics, Dell sandwich maker and the Dell fantastasystem), or if you simply lost your original CD drivers, you will probably miss one: the driver for the BlueTooth Mini-Receiver.

It is not initially listed in the default downloads list at www.dell.com for the 630i (a very deficient list, I must say. Is it that difficult to include a small description of the purpose of each application or update?), and believe me, it’s not easy to find.

I also tried to use one of the magnificent Dell helper-applications to find it, but the one it suggested didn’t work. Finally, I found it here, thanks to the people of My630i.

Thanks a lot!

Ramblings about your own, fast C# Maths library

Why not building your own C# Maths library instead of using the ones included in APIs like DirectX or XNA? That would give you some advantages when developing a game or something similar. Keep reading…
It is late at night. Very late. In fact it is so late, that I should say it’s almost early in the morning… but I´m immersed in a task I have delayed for too much time: extend my own Math library, directly written in C#. What you said? That cannot be fast enough? ha ha… Some FAQs first:

Intro

Why to write my own library?

Because I think it’s good to depend on the Graphics API as less as possible. Why?
  • Because APIs are deprecated from time to time (as happened with Managed DirectX), or evolve, or change, or re-appear in some form (as it seems that is going to happen with some parts of Managed DirectX, via the Windows 7 Code Pack), or you just decide to use another API, or you want your code to be compatible both with XNA and with DirectX.
  • Because I want to be able to debug the inners of my Maths.
  • Because writing your own library (even if you copy most parts of it), you will learn a lot.
  • Because seeing the internals of things, you will realize how much work some tasks take, and you will think it twice before calling them…
Do you need any more reasons?

Wait a minute. Faster code, using no API?

For CPU-related stuff, YES, it is possible.
XNA (or DirectX) makes no magic. They just calculate things as you will do. But of course you must choose the best algorithms for each task.If you do so, you will end up with a code as fast as the one written in the API, but the point is, as you will know how things work in the inners, you will have a much more detailed idea of the cost of things, and you will optimize your code. In addition to that, you will be able to tune things a bit for your specific needs, increasing the speed a bit more.
However, keep in mind that companies like Microsoft have hundreds or thousands of the best engineers working in their products, so if there is a better method of doing something, the API will have it. You need to be that good too!
Let’s see an example of the kind of optimization you need to achieve to make your own Maths library as fast (or faster) than the ones included in the APIs. We will include some matrix operations and make some speed tests with different algorithms.

A note about Affine Transformation Matrices

In 3D graphics applications, transformations are typically built from the combination of elementary matrices (being this elementary matrices: translations, rotations, scales and the identity matrix). Matrices built this way, always result in Affine transformations (matrices with the right column of the matrix being 0,0,0,1 –if using row representation of matrices-).
There are some algorithm optimizations in this article that are only valid for Affine Transformation matrices. You should avoid using them for non-affine matrices. It is the case of Perspective Projection matrices, which DO NOT BELONG to the affine transformations group.

A note about JIT compilation and .NET maths performance

This is a very interesting question: Are local variables faster than member variables?
At first, one could say that they aren’t, as they have to be re-created each time the method is called. BUT, as this interesting blog post points out, while the JIT compiler can en-register local variables, it cannot do the same with member fields. So we should say that YES, THEY ARE, when talking about math calculations. As you can see the in the test, it makes a remarkable difference. So:
If you are developing in .Net and you have a performance critical math. calculation method, you should use local copy of variables to speed up calculations
In fact, if you get a glimpse inside the XNA code, every non-static method inside the Matrix class, which is performance critical, makes a local copy of each member field.

Part #1 Matrix Determinant Calculation

The general Method (Based on Graphics Gems I)

The following is a pretty straight-forward algorithm for calculating the determinant of a matrix. It is mostly designed for non-realtime applications, as it quite slow and uses double-precission. It calculates a 4x4 determinant basing on the 3x3 method, then in the 2x2 and so on. It is a based in an algorithm used in Graphics Gems I for matrix inversion.
        public double Determinant4x4()
        {
            double ans;
            double a1, a2, a3, a4, b1, b2, b3, b4, c1, c2, c3, c4, d1, d2, d3, d4;
 
            a1 = this.M11; b1 = this.M12;
            c1 = this.M13; d1 = this.M14;
 
            a2 = this.M21; b2 = this.M22;
            c2 = this.M23; d2 = this.M24;
 
            a3 = this.M31; b3 = this.M32;
            c3 = this.M33; d3 = this.M34;
 
            a4 = this.M41; b4 = this.M42;
            c4 = this.M43; d4 = this.M44;
 
            ans = a1 * Determinant3x3(b2, b3, b4, c2, c3, c4, d2, d3, d4)
                - b1 * Determinant3x3(a2, a3, a4, c2, c3, c4, d2, d3, d4)
                + c1 * Determinant3x3(a2, a3, a4, b2, b3, b4, d2, d3, d4)
                - d1 * Determinant3x3(a2, a3, a4, b2, b3, b4, c2, c3, c4);
            return ans;
        }      
        public double Determinant3x3(double  a1,double  a2,double  a3,double  b1,double  b2,double  b3,double  c1,double  c2,double  c3 )
        {
            double ans;
            ans = a1 * Determinant2x2(b2, b3, c2, c3)
                - b1 * Determinant2x2(a2, a3, c2, c3)
                + c1 * Determinant2x2(a2, a3, b2, b3);
            return ans;
        }
        public double Determinant2x2( double a,double  b, double c, double d)
        {
            double ans;
            ans = a * d - b * c;
            return ans;
        }

The general Method (real-time applications optimized)

This second version is also a full, general method, but optimized for real-time applications. It is much faster, because of the calculation and because of the grouping of duplicated mults in numXX variables, saving multiplications and adds.
       public float Determinant()
        {
            float num18 = (M33 * M44) - (M34 * M43);
            float num17 = (M32 * M44) - (M34 * M42);
            float num16 = (M32 * M43) - (M33 * M42);
            float num15 = (M31 * M44) - (M34 * M41);
            float num14 = (M31 * M43) - (M33 * M41);
            float num13 = (M31 * M42) - (M32 * M41);
            return ((((M11 * (((M22 * num18) - (M23 * num17)) + (M24 * num16))) - (M12 * (((M21 * num18) -
                   (M23 * num15)) + (M24 * num14)))) + (M13 * (((M21 * num17) - (M22 * num15)) +
                   (M24 * num13)))) - (M14 * (((M21 * num16) - (M22 * num14)) + (M23 * num13))));
        }
This is the one used in XNA or SlimDX. It is the standard de-facto.

Optimization I (faster Memory Accessing)

As stated in the intro, about the performance of local variables vs. member fields, we´ll make an optimization of the above method using this concept:
    public float Determinant()
        {
            float L11 = this.M11;
            float L12 = this.M12;
            float L13 = this.M13;
            float L14 = this.M14;
            float L21 = this.M21;
            float L22 = this.M22;
            float L23 = this.M23;
            float L24 = this.M24;
            float L31 = this.M31;
            float L32 = this.M32;
            float L33 = this.M33;
            float L34 = this.M34;
            float L41 = this.M41;
            float L42 = this.M42;
            float L43 = this.M43;
            float L44 = this.M44;
            float C1 = (L33 * L44) - (L34 * L43);
            float C2 = (L32 * L44) - (L34 * L42);
            float C3 = (L32 * L43) - (L33 * L42);
            float C4 = (L31 * L44) - (L34 * L41);
            float C5 = (L31 * L43) - (L33 * L41);
            float C6 = (L31 * L42) - (L32 * L41);
            return ((((L11 * (((L22 * C1) - (L23 * C2)) + (L24 * C3))) - (L12 * (((L21 * C1) -
                   (L23 * C4)) + (L24 * C5)))) + (L13 * (((L21 * C2) - (L22 * C4)) + (L24 * C6)))) -
                   (L14 * (((L21 * C3) - (L22 * C5)) + (L23 * C6))));
        }
See the test for performance results.

Optimization II (Based on Graphics Gems II, for Affine transformations only)

The following algorithm is a direct translation of a piece of code found in a Affine Matrix Inversion algorithm, inside Graphics Gems II (see in the next post for the full algorithm). It makes only 12 mults and 7 adds. Keep in mind that this method is only valid for affine transformations (see intro for further information):
        public float Determinant()
        {
            return (this.M11 * this.M22 * this.M33) +
                   (this.M12 * this.M23 * this.M31) +
                   (this.M13 * this.M21 * this.M32) -
                   (this.M13 * this.M22 * this.M31) -
                   (this.M12 * this.M21 * this.M33) -
                   (this.M11 * this.M23 * this.M32);
        }

 

The test

So, it’s time to measure this little boys.

Performance

The test was made calculation a 4x4 affine matrix determinant 100 Million times, in a Intel Core 2 Quad Q9450 2.66Ghz with 3 Gb (non multi-threading).
General method (Graphics Gems I, non real-time)
1 minute: 0.9 secs
General method (real-time)
2.6832 secs
Optimization I (memory access)
2.496 secs
Optimization II (affine transformations)
1.404 secs

Reliability

The reliability test was made with two different matrices.There is no surprises here, all the algorithms work well. The following is the matrices used for the tests (affine transformations) and the results (the determinant) calculated by each algorithm.
Matrix 1:
0.7071068 -0.235702276 0.6666667 0.0
0.0 0.9428091 0.33333334 0.0
-0.7071068 -0.235702276 0.6666667 0.0
0.0 0.0 -15.0 1
General method (Graphics Gems I)
1.0000001641611829
General method (real-time)
1.00000012
Optimization I (memory access)
1.00000012
Optimization II (affine transformations)
1.00000012

Matrix 2:
9.848019 0.10061159 0.0 0.0
-3.50578356 0.282625765 0.0 0.0
0.0 0.0 94.22 0.0
0.0 0.0 0.0 1
General method (Graphics Gems I)
295.47639778127973
General method (real-time)
295.4764
Optimization I (memory access)
295.4764
Optimization II (affine transformations)
295.4764

Part #2: Matrix Inverse Calculation

One of the most expensive Matrix operation is calculating the inverse of a matrix. Will start with the general algorithm for this, and study a couple of optimizations later:

The general Method

This algorithm (explained here), is something like this:
        public Matrix Inverse()
        {
            float determinant = this.Determinant();
            if (determinant == 0)
                throw new System.InvalidOperationException("Matrix has zero determinant (singular matrix). Inverse doesn´t exist");
 
            Matrix b = new Matrix();
            b.M11 = M22 * M33 * M44 + M23 * M34 * M42 + M24 * M32 * M43 - M22 * M34 * M43 - M23 * M32 * M44 - M24 * M33 * M42;
            b.M12 = M12 * M34 * M43 + M13 * M32 * M44 + M14 * M33 * M42 - M12 * M33 * M44 - M13 * M34 * M42 - M14 * M32 * M43;
            b.M13 = M12 * M23 * M44 + M13 * M24 * M42 + M14 * M22 * M43 - M12 * M24 * M43 - M13 * M22 * M44 - M14 * M23 * M42;
            b.M14 = M12 * M24 * M33 + M13 * M22 * M34 + M14 * M23 * M32 - M12 * M23 * M34 - M13 * M24 * M32 - M14 * M22 * M33;
 
            b.M21 = M21 * M34 * M43 + M23 * M31 * M44 + M24 * M33 * M41 - M21 * M33 * M44 - M23 * M34 * M41 - M24 * M31 * M43;
            b.M22 = M11 * M33 * M44 + M13 * M34 * M41 + M14 * M31 * M43 - M11 * M34 * M43 - M13 * M31 * M44 - M14 * M33 * M41;
            b.M23 = M11 * M24 * M43 + M13 * M21 * M44 + M14 * M23 * M41 - M11 * M23 * M44 - M13 * M24 * M41 - M14 * M21 * M43;
            b.M24 = M11 * M23 * M34 + M13 * M24 * M31 + M14 * M21 * M33 - M11 * M24 * M33 - M13 * M21 * M34 - M14 * M23 * M31;
 
            b.M31 = M21 * M32 * M44 + M22 * M34 * M41 + M24 * M31 * M42 - M21 * M34 * M42 - M22 * M31 * M44 - M24 * M32 * M41;
            b.M32 = M11 * M34 * M42 + M12 * M31 * M44 + M14 * M32 * M41 - M11 * M32 * M44 - M12 * M34 * M41 - M14 * M31 * M42;
            b.M33 = M11 * M22 * M44 + M12 * M24 * M41 + M14 * M21 * M42 - M11 * M24 * M42 - M12 * M21 * M44 - M14 * M22 * M41;
            b.M34 = M11 * M24 * M32 + M12 * M21 * M34 + M14 * M22 * M31 - M11 * M22 * M34 - M12 * M24 * M31 - M14 * M21 * M32;
 
            b.M41 = M21 * M33 * M42 + M22 * M31 * M43 + M23 * M32 * M41 - M21 * M32 * M43 - M22 * M33 * M41 - M23 * M31 * M42;
            b.M42 = M11 * M32 * M43 + M12 * M33 * M41 + M13 * M31 * M42 - M11 * M33 * M42 - M12 * M31 * M43 - M13 * M32 * M41;
            b.M43 = M11 * M23 * M42 + M12 * M21 * M43 + M13 * M22 * M41 - M11 * M22 * M43 - M12 * M23 * M41 - M13 * M21 * M42;
            b.M44 = M11 * M22 * M33 + M12 * M23 * M31 + M13 * M21 * M32 - M11 * M23 * M32 - M12 * M21 * M33 - M13 * M22 * M31;
 
            // This last operation involves a float * Matrix multiplication
            return (1f / determinant) * b;
        }
You can use any of the methods presented in Part 1 of this article for calculating the Determinant.

Optimization I (memory access and some operation savings)

This method applies the same concept, but using local variables and packing duplicated calculations in local variables too:
        public static Matrix InvertXNALocalValues(Matrix matrix)
        {
            Matrix outMatrix = new Matrix();
            float L1 = matrix.M11;
            float L2 = matrix.M12;
            float L3 = matrix.M13;
            float L4 = matrix.M14;
            float L5 = matrix.M21;
            float L6 = matrix.M22;
            float L7 = matrix.M23;
            float L8 = matrix.M24;
            float L9 = matrix.M31;
            float L10 = matrix.M32;
            float L11 = matrix.M33;
            float L12 = matrix.M34;
            float L13 = matrix.M41;
            float L14 = matrix.M42;
            float L15 = matrix.M43;
            float L16 = matrix.M44;
            float L17 = (L11 * L16) - (L12 * L15);
            float L18 = (L10 * L16) - (L12 * L14);
            float L19 = (L10 * L15) - (L11 * L14);
            float L20 = (L9 * L16) - (L12 * L13);
            float L21 = (L9 * L15) - (L11 * L13);
            float L22 = (L9 * L14) - (L10 * L13);
            float L23 = ((L6 * L17) - (L7 * L18)) + (L8 * L19);
            float L24 = -(((L5 * L17) - (L7 * L20)) + (L8 * L21));
            float L25 = ((L5 * L18) - (L6 * L20)) + (L8 * L22);
            float L26 = -(((L5 * L19) - (L6 * L21)) + (L7 * L22));
            float L27 = 1f / ((((L1 * L23) + (L2 * L24)) + (L3 * L25)) + (L4 * L26));
            float L28 = (L7 * L16) - (L8 * L15);
            float L29 = (L6 * L16) - (L8 * L14);
            float L30 = (L6 * L15) - (L7 * L14);
            float L31 = (L5 * L16) - (L8 * L13);
            float L32 = (L5 * L15) - (L7 * L13);
            float L33 = (L5 * L14) - (L6 * L13);
            float L34 = (L7 * L12) - (L8 * L11);
            float L35 = (L6 * L12) - (L8 * L10);
            float L36 = (L6 * L11) - (L7 * L10);
            float L37 = (L5 * L12) - (L8 * L9);
            float L38 = (L5 * L11) - (L7 * L9);
            float L39 = (L5 * L10) - (L6 * L9);
            outMatrix.M11 = L23 * L27;
            outMatrix.M21 = L24 * L27;
            outMatrix.M31 = L25 * L27;
            outMatrix.M41 = L26 * L27;
            outMatrix.M12 = -(((L2 * L17) - (L3 * L18)) + (L4 * L19)) * L27;
            outMatrix.M22 = (((L1 * L17) - (L3 * L20)) + (L4 * L21)) * L27;
            outMatrix.M32 = -(((L1 * L18) - (L2 * L20)) + (L4 * L22)) * L27;
            outMatrix.M42 = (((L1 * L19) - (L2 * L21)) + (L3 * L22)) * L27;         
            outMatrix.M13 = (((L2 * L28) - (L3 * L29)) + (L4 * L30)) * L27;
            outMatrix.M23 = -(((L1 * L28) - (L3 * L31)) + (L4 * L32)) * L27;
            outMatrix.M33 = (((L1 * L29) - (L2 * L31)) + (L4 * L33)) * L27;
            outMatrix.M43 = -(((L1 * L30) - (L2 * L32)) + (L3 * L33)) * L27;         
            outMatrix.M14 = -(((L2 * L34) - (L3 * L35)) + (L4 * L36)) * L27;
            outMatrix.M24 = (((L1 * L34) - (L3 * L37)) + (L4 * L38)) * L27;
            outMatrix.M34 = -(((L1 * L35) - (L2 * L37)) + (L4 * L39)) * L27;
            outMatrix.M44 = (((L1 * L36) - (L2 * L38)) + (L3 * L39)) * L27;
            return outMatrix;
        }

Optimization II (for Affine Transformations only -Based on the Kevin Wu article, Graphics Gems II-)

The Graphics Gems books series are a must in any graphics or game developer’s bookshelves. You can by the the book I’m talking about here, or get more info about the Graphics Gems series here. Google have some excerpts of this book in the following link, but one page out of each two is missing: Graphics gems II‎ - PĆ”gina 342.
So, the algorithm purposed by Mr. Wu looks like the following (ported to C#, original in C++):
   public static bool Inverse(Matrix min, Matrix mout)
        {
            float det_1;
            float pos, neg, temp;
            double PRECISION_LIMIT = 1.0e-15;
            // Calculate determinand
            pos = neg = 0.0f;
            temp = min.M11 * min.M22 * min.M33;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = min.M12 * min.M23 * min.M31;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = min.M13 * min.M21 * min.M32;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M13 * min.M22 * min.M31;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M12 * min.M21 * min.M33;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            temp = -min.M11 * min.M23 * min.M32;
            if (temp >= 0.0)
                pos += temp;
            else
                neg += temp;
            det_1 = pos + neg;
 
            // Is the submatrix A singular? then Matrix M has no inverse
            if ((det_1 == 0.0) || (Math.Abs(det_1 / (pos - neg)) < PRECISION_LIMIT))
                throw new System.InvalidOperationException("Matrix has zero determinant (singular matrix). Inverse doesn´t exist");
 
            // Calculate inverse(A) = adj(A) / det(A)
            det_1 = 1.0f / det_1;
            mout.M11 = (min.M22 * min.M33 - min.M23 * min.M32) * det_1;
            mout.M21 = -(min.M21 * min.M33 - min.M23 * min.M31) * det_1;
            mout.M31 = (min.M21 * min.M32 - min.M22 * min.M31) * det_1;
            mout.M12 = -(min.M12 * min.M33 - min.M13 * min.M32) * det_1;
            mout.M22 = (min.M11 * min.M33 - min.M13 * min.M31) * det_1;
            mout.M32 = -(min.M11 * min.M32 - min.M12 * min.M31) * det_1;
            mout.M13 = (min.M12 * min.M23 - min.M13 * min.M22) * det_1;
            mout.M23 = -(min.M11 * min.M23 - min.M13 * min.M21) * det_1;
            mout.M33 = (min.M11 * min.M22 - min.M12 * min.M21) * det_1;
 
            /* Calculate -C * inverse(A) */
            mout.M41 = -(min.M41 * mout.M11 + min.M42 * mout.M21 + min.M43 * mout.M31);
            mout.M42 = -(min.M41 * mout.M12 + min.M42 * mout.M22 + min.M43 * mout.M32);
            mout.M43 = -(min.M41 * mout.M13 + min.M42 * mout.M23 + min.M43 * mout.M33);
 
            /* Fill in last column */
            mout.M14 = mout.M24 = mout.M34 = 0.0f;
            mout.M44 = 1.0f;
            return true;
        }
Please keep in mind that this algorithm is only valid for affine transformations. Of course, some memory accessing optimization could be done here, using local variables as explained in Part 1.

Optimization III (for matrices composed by Rotations and Translations only)

If you know that your matrix is composed by rotations and translations only, you can apply this (even faster) method, which avoids calculating the determinant, and takes advantage of some properties of this kind of matrices. You can read all the info here.
        public Matrix Inverse()
        {
            Matrix r = new Matrix();
            r.M11 = this.M11;
            r.M12 = this.M21;
            r.M13 = this.M31;
            r.M14 = 0f;
 
            r.M21 = this.M12;
            r.M22 = this.M22;
            r.M23 = this.M32;
            r.M24 = 0f;
 
            r.M31 = this.M13;
            r.M32 = this.M23;
            r.M33 = this.M33;
            r.M34 = 0f;
 
            r.M41 = -(r.M11 * this.M41 + r.M21 * this.M42 + r.M31 * this.M43);
            r.M42 = -(r.M12 * this.M41 + r.M22 * this.M42 + r.M32 * this.M43);
            r.M43 = -(r.M13 * this.M41 + r.M23 * this.M42 + r.M33 * this.M43);
            r.M44 = 1f;
            return r;
        }

Additional Optimizations

If you need it, other optimizations could be included in this algorithms, detecting special matrices whose inverse can be calculated even faster. It is the case of diagonal matrices (where only the diagonal should be inverted using 1/value entries), or Orthogonal matrices (whose inverse is equal to its transpose).

The Test

We made a very simple test, with 50.000.000 (50 Million) inverse calculations of affine transformations, in an Intel Core 2 Quad Q9450 2.66Ghz with 3 Gb (non multi-threading).
General method
11.523 secs
Optimization I (memory accessing and op. savings)
5.4912 secs
Optimization II (affine transformations only)
2.7144 secs
Optimization III (rotations and translations only)
1.326 secs

Conclusions

Searching a bit, it is indeed possible to reach the performance of your graphics API and as mentioned above, the advantages of having (and building) your own library are countless.
Take care!

2D Circle Packing algorithm ported to C#

Yesterday, a friend asked me if I knew of any C# implementation of a Circle Packing algorithm. In fact, I didn’t, but searching a bit I found this algorithm, and this Java Applet implementation
Packing of different circles into a 2D space, trying to minimize the unused space, is a typical geometrical problem. Humans can solve it quite quickly, but it is very hard to find a solution mathematically. The most frequent implementations use numeric algorithms that get closer to the solution in each iteration.
It is the case of the above algorithm. It doesn’t implement the limits of the available space, but reorders the circles around a center point quite well. The “flowing” nature of numerical algorithms like these fit specially well if you want to use them for some kind of graphical interface, as you can change the adaptation (or reordering) speed, giving a very nice animated result, or you can even interact with your circles, as moving some of them will make the algorithm to react adapting others (you can try the above applet).
So, with all that material, it was easy to port it to C#. The result is the following:
image
The code for a very easy Circle class (please note that I´m using the XNA framework for the maths –Vector2, etc-):
public class Circle
    {
        public Vector2 mCenter;
        public float mRadius;
 
        /// <summary>
        ///
        /// </summary>
        /// <param name="iCenter"></param>
        /// <param name="Radius"></param>
        public Circle(Vector2 iCenter, float Radius)
        {
            mCenter = iCenter;
            mRadius = Radius;
        }
        /// <summary>
        ///
        /// </summary>
        /// <returns></returns>
        public override string ToString()
        {
            return "Rad: " + mRadius + " _ Center: " + mCenter.ToString();
        }     
    }
And the important class. CirclePacker:
This class holds a list of circles that will be re-arranged around a “PackingCenter” point, with a “MinSeparation” between them. You only need to iteratively call the OnFrameMove method, passing an “iterationCounter” parameter, that will hold the damping on the adaptation speed (the bigger value, the slower adaptation). Reset this parameter to 1 always you want to reset speed (never set this parameter to 0).
  public class CirclePacker
    {
        public List<Circle> mCircles = new List<Circle>();
        public Circle mDraggingCircle = null;
        protected Vector2 mPackingCenter;
        public float mMinSeparation = 1f;
 
        /// <summary>
        /// Generates a number of Packing circles in the constructor.
        /// Random distribution is linear
        /// </summary>
        public CirclePacker(Vector2 pPackingCenter, int pNumCircles,
                            double pMinRadius, double pMaxRadius)
        {
            this.mPackingCenter = pPackingCenter;
 
            // Create random circles
            this.mCircles.Clear();
            Random Rnd = new Random(System.DateTime.Now.Millisecond);
            for (int i = 0; i < pNumCircles; i++)
            {
                Vector2 nCenter = new Vector2((float)(this.mPackingCenter.X +
                                                      Rnd.NextDouble() * pMinRadius),
                                              (float)(this.mPackingCenter.Y +
                                                      Rnd.NextDouble() * pMinRadius));
                float nRadius = (float)(pMinRadius + Rnd.NextDouble() *
                                       (pMaxRadius - pMinRadius));
                this.mCircles.Add(new Circle(nCenter, nRadius));
            }
        }
        /// <summary>
        ///
        /// </summary>
        /// <param name="?"></param>
        /// <returns></returns>
        private float DistanceToCenterSq(Circle pCircle)
        {
            return (pCircle.mCenter - mPackingCenter).LengthSquared();
        }
        /// <summary>
        ///
        /// </summary>
        private int Comparer(Circle p1, Circle P2)
        {
            float d1 = DistanceToCenterSq(p1);
            float d2 = DistanceToCenterSq(P2);
            if (d1 < d2)
                return 1;
            else if (d1 > d2)
                return -1;
            else return 0;
        }
        /// <summary>
        ///
        /// </summary>
        public void OnFrameMove(long iterationCounter)
        {
            // Sort circles based on the distance to center
            mCircles.Sort(Comparer);
 
            float minSeparationSq = mMinSeparation * mMinSeparation;
            for (int i = 0; i < mCircles.Count - 1; i++)
            {
                for (int j = i + 1; j < mCircles.Count; j++)
                {
                    if (i == j)
                        continue;
 
                    Vector2 AB = mCircles[j].mCenter - mCircles[i].mCenter;
                    float r = mCircles[i].mRadius + mCircles[j].mRadius;
 
                    // Length squared = (dx * dx) + (dy * dy);
                    float d = AB.LengthSquared() - minSeparationSq;
                    float minSepSq = Math.Min(d, minSeparationSq);
                    d -= minSepSq;
 
                    if (d < (r * r) - 0.01 )
                    {
                        AB.Normalize();
 
                        AB *= (float)((r - Math.Sqrt(d)) * 0.5f);
 
                        if (mCircles[j] != mDraggingCircle)
                            mCircles[j].mCenter += AB;
                        if (mCircles[i] != mDraggingCircle)
                            mCircles[i].mCenter -= AB;
                    }
 
                }
            }
 
 
            float damping = 0.1f / (float)(iterationCounter);
            for (int i = 0; i < mCircles.Count; i++)
            {
                if (mCircles[i] != mDraggingCircle)
                {
                    Vector2 v = mCircles[i].mCenter - this.mPackingCenter;
                    v *= damping;
                    mCircles[i].mCenter -= v;
                }
            }
        }       
        /// <summary>
        ///
        /// </summary>
        public void OnMouseDown(MouseEventArgs e)
        {
            Vector2 center = new Vector2(e.X, e.Y);
            mDraggingCircle = null;
            foreach (Circle circle in mCircles)
            {
                float dist = (circle.mCenter - center).LengthSquared();
                if (dist < (circle.mRadius * circle.mRadius))
                {
                    mDraggingCircle = circle;
                    break;
                }
            }           
        }
        /// <summary>
        ///
        /// </summary>
        public void OnMouseMove(MouseEventArgs e)
        {
            if (mDraggingCircle == null)
                return;
 
            mDraggingCircle.mCenter = new Vector2(e.X, e.Y);
        }
    }

 

To do

A variable speed system could be done, and a way to interactively change the PackingCenter point would be nice too. It would also be necessary to implement the limits of the available space, changing the radius of the circles proportionally if they don´t fit.
Cheers!