06/2002 by Zadig.

Tools needed:

  • g++
  • a disassembler

Introduction:

You probably noticed that Object Oriented Programming is more and more often used, most of the time to reuse objects (and ease developpement), and sometimes to do the same as the neighbours.
Anyway looking into a C++ disassembled file deadlisting is no more an exception. This make thinks a little bit harder for the poor reversers that we are because C++ compilers generate files much bigger than the C ones. Moreover the instruction and data flows are very different between C and C++. But as you will see there are also some interesting things with this language.

This tutorial is intended to be a generic analysis of C++ structures you will find in an asm deadlisting, thus it should be usefull on any os. All that you will find here is the result of g++ compiled apps studies. I think that there shouldn't be a lot of differences with other compilers but it may, so keep it in mind.

At the end of this essay you will find the C++ source that will be used throught this text. It is a very basic (and useless) app, but it contains most C++ basic features. If you don't understand it, then don't go further in text because you probably won't understand anyhing!
Moreover to understand better how can be handled Object functionnality with a structuredlanguage you should read and undertand this text that explains howto do Object Oriented Programming in C.

I- Methods.

1- Methods and functions.

2- Overloading.

II- Constructor.

1- Simple class.

2- Derived class.

III- Destructor.

IV- exceptions.

1- Raising an exception.

2- Exception handling.

V- Conclusion.

ANNEXE 1

I- Methods.

1- Methods and functions.

Methods are the root of C++. This is what you always use when you write a C++ app. But what is the difference between a method and a function? Well none... from the asm view of course! A method is just a function with params, just like an asm or C one. OK that's fine but I can hear some of say: "Hey dude, it can't be just a function because a method is linked to the object it refers to. How can the assembler know which objects called the method ?". That is a very good question, with a very simple answer: In fact the object handle is hidden when you use methods. It is the first param of all functions that are C++ methods. For example let's take this method:

Motorbike::Motorbike(int nb_cylinder, int engine_size);

From the C++ point of view the first param is nb_cylinder and the second is engine_size, that's all. During compilation this will be translated to something like that:

Motorbike(Motorbike* object_handle, int nb_cylinder, int engine_size);

This is the first very important thing to know: The first parameter of a method is always it's object handle. From now I will use the term "method" to refer to a C++ method , and "function" to refer to its low-level code.

2- Overloading.

Now let's continue with another C++ specific feature: Methods overloading. Method overloading is the fact that the same method can have different protoypes. Here is its definition from the C++ standart (iso/iec 14882):

-1- When two or more different declarations are specified for a single name in 
    the same scope, that name is said to be overloaded. By extension, two 
    declarations in the same scope that declare the same name but with 
    different types are called overloaded declarations. Only function 
    declarations can be overloaded; object and type declarations cannot be 
    overloaded.

-2- When an overloaded function name is used in a call, which overloaded 
    function declaration is being referenced is determined by comparing the 
    types of the arguments at the point of use with the types of the parameters
    in the overloaded declarations that are visible at the point of use. This 
    function selection process is called overload resolution...
At first it seems to be trivial to translate this to asm. A function is just an address after all, so the 2 functions just have to be mapped at different addresses no ? That's right for a method that is internal to the app you are writting.

But now what happens if you write a C++ add-on (dynamic library)? The apps that will use your add-on will need to find you classes methods. To do this the dynamic linker searches for the function name at runtime, and bind the name with it's function address. This is perfect for C or asm routines because you can't have several functions with the same name, but C++ allows this with methods.
This is where a wonderfull thing comes in business: Name Mangling. Name mangling is the way methods names are modified by the compiler to avoid having two functions with the same name. For example look at the two definitions of the Motorbike constructor:

Motorbike::Motorbike(int nb_cylinder, int engine_size);
Motorbike::Motorbike(int engine_size);

These two methods have the same name (i.e. Motorbike) but must not have the same name once the app is compiled. To do this compilers generate the new name from the name of the class, the name of the method, and the types of the arguments. As you can't have 2 methods with the same types and number of arguments, it works well. Here is the unmangled name of our two methods and the "PrintInfos" one:

__9Motorbikeii
__9Motorbikei
PrintInfos__9Motorbike

The first part of the function's name is method's name. Then you have "__". After it, The number 9 is the lenght of the class's name. Finally earch parameter type is represented by one or more letters: On the first constructor we have 2 integers (nb_cylinder and engine_size), and on the second there is only one integer. So now I think that you see why this is so interesting for us: If you have the mangled name of a method you know exactly how much parameters it uses, thanks to C++. Since most of the times symbols are not stripped from binaries, you will get the prototypes of all methods in 90% of the cases.
One last thing on name mangling: this is compiler specific, so don't learn the meaning of each letter because it won't be the same on another compiler (there are tools to do this for you ;). Name mangling is different on each compiler because this is the only way to avoid cross-link that would generate rubish binaries. For example you write a library with metrowerks compiler and you use it in an application compiled with g++. If names where the same and you link the 2 parts together you may end with a rubbish binary because some C++ features are compiler specifc (Templates...). This is a problem for developpers that want to write portable add-ons: They must externize their exported declarations, that is to say use C functions.

II- Constructor.

1- Simple class.

Now we will continue with constructors. In fact they are just specific methods used to initialize a class. Let's have a look to the "Wheel" constructor:

Function Wheel::Wheel(int, int)

000013d0:                    55     push   %ebp
000013d1:                  89e5     mov    %esp,%ebp
000013d3:                8b5508     mov    0x8(%ebp),%edx  ; param 0 (Object handle)
000013d6:                8b450c     mov    0xc(%ebp),%eax  ; param 1 (int)
000013d9:                  8902     mov    %eax,(%edx)     ; param 1 (int)
000013db:                8b4510     mov    0x10(%ebp),%eax ; param 2 ( int)
000013de:                894204     mov    %eax,0x4(%edx)  ; param 2 ( int)
000013e1:                  89d0     mov    %edx,%eax       ; param 0 (Object handle)
000013e3:                  eb00     jmp    13e5 

Referenced by (conditionnal) jump(s) at Address(es):
	000013E3  
000013e5:                  89ec     mov    %ebp,%esp
000013e7:                    5d     pop    %ebp
000013e8:                    c3     ret    

There are several things to see from this very small function: As we saw in the previous section, the first param is the object handle. All that is done here is initializing the class members. This is done at addresses 0x13d9 and 0x13de.
The 2nd point is that the class memory space is already allocated when this function is called. This means that memory allocation is not done in the constructor, but by the caller. This is what you do when you use the "new" function. To ensure this point let's look to the "main" function that creates Motorbike objects:

00001b5d:                  6a0c     push   $0xc

Reference to function "__builtin_new"
00001b5f:            e8fcf5ffff     call   1160 
00001b64:                83c404     add    $0x4,%esp
00001b67:                  89c0     mov    %eax,%eax       ; eax = pointer to allocated memory
00001b69:                  89c6     mov    %eax,%esi
00001b6b:              c645eb01     movb   $0x1,0xffffffeb(%ebp)
00001b6f:            6894030000     push   $0x394          ; 0x394 == 916
00001b74:                    56     push   %esi            ; object handle

Reference to function "Motorbike::Motorbike(int)"
00001b75:            e8f2faffff     call   166c 

You can see that the object is allocated before the constructor is called. From this piece of code we can see also that the code is really not optimized: I think that the instruction at 0x1b67 deserves to be removed! Anyway I just want to say that most of apps you will find won't be compiled with optimized options, which means that you will have to deal with even more rubbish code. This simple line was a simple example but you will see later that you can find much more useless code.
Finally there is a very interesting thing in this object allocation: With the "builtin_new" call we know the memory size of a class object. Basically The "new" function is just a malloc. So looking at its parameter gives us the memory size it allocates which is 12 (0x12) on our Motorbike class. So where does this come from?

Classes are made of 2 member types: function members and data members. We already saw how are defined function members (the methods). No let's sew how are stored data members. A class definition mix data and function declarations. As we saw on the previous call, the functions are directly called when needed, they are not functions pointers that are initialized when creating a new object. This simply means that all the methods calls are handled by the compiler, they are not part of the class object. So all that is left in our class are the data members, and tell me what are several data members linked together ? Yes, it is a structure. From the memory point of view, classes are just a structure that contain all its data members. This means that the size of our class is just the addition of all its data members size. Let's look at it with the Motorbike class:

class Motorbike
{
   public:
      Motorbike::Motorbike(int nb_cylinder, int engine_size);
      Motorbike::Motorbike(int engine_size);
      Motorbike::~Motorbike(void);
      void Motorbike::PrintInfos(void);
   private:
      Engine   *h_Engine;
      Wheel    *h_FrontWheel;
      Wheel    *h_RearWheel;
};

We don't care of the methods because they are handled by the compiler, and we have 3 handles. On an IA32 architecture (intel 32bits), addresses are coded on 32bits. Thus our class size is 4*3 i.e 12 Bytes, exactly what we found by looking at the "builtin_new" call.

2- Derived class.

Here is the definition from the C++ standart:

A class is said to be (directly or indirectly) derived from its (direct or 
indirect) base classes. [...] Unless redefined in the derived class, members 
of a base class are also considered to be members of the derived class. The 
base class members are said to be inherited by the derived class. Inherited 
members can be referred to in expressions in the same manner as other members 
of the derived class, unless their names are hidden or ambiguous [...]. 

NOTE: the scope resolution operator :: [...] can be used to refer to a direct 
      or indirect base member explicitly. This allows access to a name that has
      been redefined in the derived class. A derived class can itself serve as 
      a base class subject to access control [...]. A pointer to a derived 
      class can be implicitly converted to a pointer to an accessible 
      unambiguous base class [...]. An lvalue of a derived class type can be 
      bound to a reference to an accessible unambiguous base class.
Now let's have a look to The Trike constructor to see how is handled this feature. This class derives from the Motorbike one:

Function Trike::Trike(int, int)

         ... rubbish code ...
00001a35:                8b7508     mov    0x8(%ebp),%esi  ; param 0 (Object handle)
00001a38:                8b4510     mov    0x10(%ebp),%eax ; param 2 ( int)
00001a3b:                    50     push   %eax            ; param 2 ( int)
00001a3c:                8b450c     mov    0xc(%ebp),%eax  ; param 1 (int)
00001a3f:                    50     push   %eax            ; param 1 (int)
00001a40:                    56     push   %esi            ; param 0 (Object handle)

Reference to function "Motorbike::Motorbike(int, int)"
00001a41:            e87afaffff     call   14c0 
00001a46:                83c40c     add    $0xc,%esp
00001a49:                  6a08     push   $0x8

Reference to function "__builtin_new"                      ; wheel = new(8)
00001a4b:            e810f7ffff     call   1160 
00001a50:                83c404     add    $0x4,%esp
00001a53:                  89c0     mov    %eax,%eax       ; very powerfull optimization...
00001a55:                  89c7     mov    %eax,%edi       ; edi = wheel handle
00001a57:              c645ff01     movb   $0x1,0xffffffff(%ebp)
00001a5b:                  6a00     push   $0x0
00001a5d:                  6a10     push   $0x10
00001a5f:                    57     push   %edi

Reference to function "Wheel::Wheel(int, int)"
00001a60:            e86bf9ffff     call   13d0 
00001a65:                83c40c     add    $0xc,%esp
00001a68:                  89c0     mov    %eax,%eax
00001a6a:              c645ff00     movb   $0x0,0xffffffff(%ebp)
00001a6e:                89460c     mov    %eax,0xc(%esi)  ; trike handle + 12 = wheel handle
00001a71:                  eb05     jmp    1a78 

Referenced by (conditionnal) jump(s) at Address(es):
	00001AC0  

Reference to function "__throw"
00001a73:            e828f6ffff     call   10a0 

Referenced by (conditionnal) jump(s) at Address(es):
	00001A71  
00001a78:              807dff00     cmpb   $0x0,0xffffffff(%ebp)
00001a7c:                  7412     je     1a90 
00001a7e:                    57     push   %edi

Reference to function "__builtin_delete"
00001a7f:            e8ccf6ffff     call   1150 

As you can see we first call the motorbike constructor, and its memory location is at the begining of the Trike object. After that a new wheel is created at 0x1a4b and then initialized at 0x1a60. This means that class derivation seems to be the same than having a data member of the base class type in the derived class (from the asm point of view of course, this is completely wrong when writing C++): The base class is at the begining of the derived class memory space, just before its other data members.
So what is the size of this class? We know that it uses 12 byes for the Motorbike (this is confirmed at 0x1a6e), and it uses 1 handle for the wheel. This means that the class size is 12 + 4 = 16 bytes. You can check this in the "main" function. So our derived class is a fork from its base class, plus its new data members.

To finish with this section, I'd like to show you something that is very usual and sometimes disappointing: Dead code. You probably noticed that this function is not optimized at all! just have a look at 0x1a53 and you will see a nop line. This is "very strange" because this file was not compiled with debug options, so we may expect that the asm code it somewhat optimized. The previous example was just one line but if you look at 0x1a73, and 0x1a7e you will see that these parts of the function will never be executed. At first I thought it was called when something goes wrong in the constructor but it not the case. I searched some time to find what it was doing, I even thought about polymorphic code! But compile the example with optimization options and these lines will just disappear! In fact half of the function is useless. You will most of the time have to deal with such rubbish code, which is in fact a solution to hide a protection: add 50% of useless code in your protection function and it will be more "secured" because incomprehensible (maybe applications size grow up each year because their rubbish code protection is improved ;).

III- Destructor.

After constructors, let's see how are handled destructors. This time it will be much more simple. All that is needed in the destructor is eventually call some final functions, and free the memory space used by the class object. Here is the Wheel destructor:
      ... some code ...
000013fd:                8b7508     mov    0x8(%ebp),%esi   ; param 0 (Object handle)
00001400:                8b450c     mov    0xc(%ebp),%eax
00001403:                83e001     and    $0x1,%eax
00001406:                  85c0     test   %eax,%eax
00001408:                  740b     je     1415 
0000140a:                    56     push   %esi   ; param 0 (Object handle)

Reference to function "__builtin_delete"
0000140b:            e840fdffff     call   1150 
00001410:                83c404     add    $0x4,%esp
00001413:                  eb00     jmp    1415 

Referenced by (conditionnal) jump(s) at Address(es):
	00001408  00001413  
00001415:                8d65f8     lea    0xfffffff8(%ebp),%esp
00001418:                    5b     pop    %ebx
00001419:                    5e     pop    %esi
0000141a:                  89ec     mov    %ebp,%esp
0000141c:                    5d     pop    %ebp
0000141d:                    c3     ret    

The interesting thing here is that this function takes 2 parameters: The object handle as usual, and another one that is read at 0x1400. If this parameter is odd then the memory space is fred. Looking at all destructor calls you will see that this value is always 3. It probably means that the bit 0 is not the only one to be used by the destructor. Maybe more complexe destructors use these ones?
This also means that the object memory management is handled by the destructor and not by the caller as it was done in the constructor.

IV- exceptions.

The last point that will be discussed here are exceptions. Here is the list of functions that are used when an exception is raised:

  • __eh_alloc
  • __throw
  • int type_info function
  • __cp_push_exception
  • __eh_rtime_match
  • __start_cp_handler
  • __cp_pop_exception
That's pretty much! When I dissasm'ed the sample code, I thought that exceptions where only a jump. I was far from what is really done. In fact exceptions are an advanced feature that do a lot of things before jumping to the exception handler. The fact that the exception handler is a gcc "core feature" (You don't need to write your exception handler when writing a new language frontend for gcc) probably adds some complexity to it. Here is a partial rip from the gcc mailing list of what is done. You will find these functions called in "Motorbike::PrintInfos" and "main".

1- Raising an exception.

"__eh_alloc" allocates memory for the exception object. This is C++ specific, as an arbitrary-sized information may travel with an exception in C++.
Then the exception object will be created in the allocated space, and "type_info" will be called. I still don't know exactly what this last functions does, but it is often used in C++ applications. All I know is that it is part of RTTI (Real-Time Type Information), which means that it probably gives informations about the exception object we will send.
After that "__cp_push_exception" is used to push the exception in the exception handler, and finally "__throw" is called to raise it. When this point is reached, the normal instruction flow will be interrupted, and the next instruction of the application that will be executed is the corresponding exception handler.

2- Exception handling.

When an exception is raised, you will fall in you exception handler just before a call to "__eh_rtime_match". Then is called "__start_cp_handler" that probably initialize the exception object. After that you will find the code that is done by the application, until a call to "__cp_pop_exception" which is the end of the exception handling routine.
Due to the complexity of the instruction flow at this time, it don't know how we can find at the first time which exception handler will be called when an exception is raised. The more efficient solution is probably to put breakpoints at the begining of all exception handlers that may be used, and see where the debugger stops.

V- Conclusion.

Here we are, that's all for this time. Hope that now you will feel better when traveling into a C++ deadlisting. If you wonder about a specific function you should look at the gcc mailing list archive and search for it. Someone probably already asked how it works and what it does :)

See you later,
Zadig.




ANNEXE 1.

#include 

/***************************
 *    We need 2 wheels     *
 **************************/
class Wheel
{
   public:
      Wheel(int size, int type);
      ~Wheel(void);
      int GetWheelSize(void);
      int GetTyreType(void);
   private:
      int wheel_size;
      int tyre_type;
};

Wheel::Wheel(int size, int type)
{
   wheel_size = size;
   tyre_type = type;
}

Wheel::~Wheel(void)
{
}

int Wheel::GetWheelSize(void)
{
   return(wheel_size);
}

int Wheel::GetTyreType(void)
{
   return(tyre_type);
}

/************************
 *    and an engine     *
 ************************/
class Engine
{
   public:
      Engine(int nb_cylinder, int engine_size);
      ~Engine(void);
      int GetNbCylinders(void);
      int GetEngineSize(void);
   private:
      int nb_cylinders;
      int size;
};

Engine::Engine(int nb_cylinder, int engine_size)
{
   nb_cylinders = nb_cylinder;
   size = engine_size;
}

Engine::~Engine(void)
{
}

int Engine::GetNbCylinders(void)
{
   return(nb_cylinders);
}

int Engine::GetEngineSize(void)
{
   return(size);
}

/***************************************
 *    Now we can make what we want     *
 ***************************************/

class Motorbike
{
   public:
      Motorbike::Motorbike(int nb_cylinder, int engine_size);
      Motorbike::Motorbike(int engine_size);
      Motorbike::~Motorbike(void);
      void Motorbike::PrintInfos(void);
   private:
      Engine   *h_Engine;
      Wheel    *h_FrontWheel;
      Wheel    *h_RearWheel;
};

Motorbike::Motorbike(int nb_cylinder, int engine_size)
{
   h_Engine = new Engine(nb_cylinder, engine_size);
   h_FrontWheel = new Wheel(16, 0);
   h_RearWheel = new Wheel(16, 0);
}

Motorbike::Motorbike(int engine_size)
{
   h_Engine = new Engine(2, engine_size);
   h_FrontWheel = new Wheel(16, 0);
   h_RearWheel = new Wheel(16, 0);
}

Motorbike::~Motorbike(void)
{
   delete h_Engine;
   delete h_FrontWheel;
   delete h_RearWheel;
}

void Motorbike::PrintInfos(void)
{
   if(h_Engine->GetNbCylinders() >= 4)
      throw 0;

   cout << h_Engine->GetNbCylinders() << " cylinders, " <<
             h_Engine->GetEngineSize() << "cm3\n";
   cout << "Front wheel is " << h_FrontWheel->GetWheelSize() << "inches \n";
   cout << "Rear wheel is " << h_RearWheel->GetWheelSize() << "inches \n\n";
}

/************************
 *    and even more     *
 ************************/
class Trike : public Motorbike
{
   public:
      Trike::Trike(int nb_cylinder, int engine_size);
      Trike::~Trike(void);
   private:
      Wheel    *h_RearWheel2;
};

Trike::Trike(int nb_cylinders, int engine_size) : Motorbike(nb_cylinders, engine_size)
{
   h_RearWheel2 = new Wheel(16, 0);
}

Trike::~Trike(void)
{
   delete h_RearWheel2;
}


/********************
 *  Now let's ride  *
 ********************/
int main(void)
{
   Motorbike *h_MonsterS4, *h_SpeedTriple;
   Trike     *h_MyTrike;

   h_MonsterS4 = new Motorbike(916);
   h_SpeedTriple = new Motorbike(3, 955);
   h_MyTrike = new Trike(4, 1300);

   try
   {
      cout << "Monster S4 infos:\n";
      h_MonsterS4->PrintInfos();
      
      cout << "Speed triple infos:\n";
      h_SpeedTriple->PrintInfos();
   }

   catch(int)
   {
      cout << "Seems that something's wrong...\n";
   }

   delete h_MonsterS4;
   delete h_SpeedTriple;
   delete h_MyTrike;
   return(0);
}