Skip to main content

Deep Dive to Windows, Part 1


This part of the discussion will revolve more towards inside of Windows OS. I bet you’ll love the way it has been designed and evolved. Issues might be there while using the OS but it’s a matter of debate whether application has been designed the way windows expect etc. This is not the space for that debate. Also, it’s is a not aim to teach anyone about windows but some facts and figure and little bit stack analysis and more.
This is the first attempt from my side to do little bit debugging (I’m not an experienced debugger without source code) for user mode.

Note: Mark’s (Mark Russionovich, widely popular for his work in Windows Internals) work always been inspiring for me.

Let’s start the journey:

In this series target environment is x86.
In this series, I’d like to cover:

1.       Thread Environment Block: It’s a data structure that stores info about the currently running thread of a process.

2.       How it looks and what’s the significance of it?
Let’s have a look at the structure. To do that, I’ve written a small C++ hello world application with following piece of code:

#include

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
       char ch;
       cout << "Hello World" << endl;
       cin >> ch;
       return 0;
}


I’ve compiled it in debug mode using VS 2005.
Attached the executable to WinDBG and set the symbol path to Microsoft Symbol server.

The MS symbol server path is:

After opening the executable through WinDbg (It launches and attached the process), you might see first chance exception like below but nothing to worry, just continue debugging:
(1080.10e8): Break instruction exception - code 80000003 (first chance).
The highlighted one is the thread id causes first chance exception.
Now I wanted to see the loaded modules while executing the app, just issued lm command.
0:000> lm
start    end        module name
00400000 0041c000   Sample_Hello C (private pdb symbols)  F:\ForDebugging\Sample_Hello\debug\Sample_Hello.pdb
68fc0000 690e1000   MSVCR80D   (deferred)            
6a510000 6a60e000   MSVCP80D   (deferred)            
75a90000 75ba0000   kernel32   (deferred)            
75f90000 7603c000   msvcrt     (deferred)            
76260000 762a6000   KERNELBASE   (deferred)            
76ea0000 77020000   ntdll      (pdb symbols)          f:\fordebugging\sample_hello\debug\wntdll.pdb\D74F79EB1F8D4A45ABCD2F476CCABACC2\wntdll.pdb

The highlighted line shows my application.

Now I want to see the stack trace. Used command kb. This shows current thread call stack.

0:000> kb
ChildEBP RetAddr  Args to Child             
0018fb34 76f21383 7efdd000 7efde000 76fa206c ntdll!LdrpDoDebuggerBreak+0x2c
0018fcb0 76ee52d6 0018fd24 76ea0000 76af36a2 ntdll!LdrpInitializeProcess+0x12cc
0018fd00 76ed9e79 0018fd24 76ea0000 00000000 ntdll!_LdrpInitialize+0x78
0018fd10 00000000 0018fd24 76ea0000 00000000 ntdll!LdrInitializeThunk+0x10

But at this moment I don’t know how many threads are there for my app (Sample_Hello). So I’ve issued the command ~*.

0:000> ~*
.  0  Id: 1080.10e8 Suspend: 1 Teb: 7efdd000 Unfrozen
      Start: *** WARNING: Unable to verify checksum for Sample_Hello.exe
Sample_Hello!ILT+145(_wmainCRTStartup) (00411096)
      Priority: 0  Priority class: 32  Affinity: 3

In this case, it has only one thread.
Now I’ve used the !teb to get the TEB (Thread Environment Block) related information:

0:000> !teb
TEB at 7efdd000
    ExceptionList:        0018fb24
    StackBase:            00190000
    StackLimit:           0018e000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7efdd000
    EnvironmentPointer:   00000000
    ClientId:             00001080 . 000010e8
    RpcHandle:            00000000
    Tls Storage:          7efdd02c
    PEB Address:          7efde000
    LastErrorValue:       0
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0

This will be handy to calculate current stack size (per thread). What’s the stack size windows by default provide per thread.

To calculate the current stack size, I’ve just issued command ? - (Stack grows downward so stack base – stack limit). If we refer the result of !teb it’s ? 00190000 - 0018e000

WinDbg says, it 8192 bytes or 8kb or two page size

0:000> ? 00190000 - 0018e000
Evaluate expression: 8192 = 00002000

The next thing is, I’d like to know what default stack size windows provide per thread of running process.

If we see !teb output we’ll see a line TEB at 7efdd000. Executed dd 7efdd000 + e0c L1

The e0c isn’t a magic number. I’ll show it what it is later. L1 means display one line result.

So, the output is:

0:000> dd 7efdd000 + e0c L1
7efdde0c  00090000

This highlighted one is significant. Then executed the command ? 00190000 – 00090000.

00190000 is the value for stack base.

The result revealed by WinDbg is:

0:000> ? 00190000 - 00090000
Evaluate expression: 1048576 = 00100000

Hence the default stack size in windows is 1 MB.

Regarding the magic number e0c, what is it? To understand that, I’ve issued the command like below:

0:000> dt nt!_TEB -r @$teb
ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
      +0x000 ExceptionList    : 0x0018fb24 _EXCEPTION_REGISTRATION_RECORD
         +0x000 Next             : 0x0018fcf0 _EXCEPTION_REGISTRATION_RECORD
         +0x004 Handler          : 0x76f171d5           _EXCEPTION_DISPOSITION  ntdll!_except_handler4+0
      +0x004 StackBase        : 0x00190000 Void
      +0x008 StackLimit       : 0x0018e000 Void
      +0x00c SubSystemTib     : (null)
      +0x010 FiberData        : 0x00001e00 Void
      +0x010 Version          : 0x1e00
      +0x014 ArbitraryUserPointer : (null)
      +0x018 Self             : 0x7efdd000 _NT_TIB
         +0x000 ExceptionList    : 0x0018fb24 _EXCEPTION_REGISTRATION_RECORD
         +0x004 StackBase        : 0x00190000 Void
         +0x008 StackLimit       : 0x0018e000 Void
         +0x00c SubSystemTib     : (null)
         +0x010 FiberData        : 0x00001e00 Void
         +0x010 Version          : 0x1e00
         +0x014 ArbitraryUserPointer : (null)
         +0x018 Self             : 0x7efdd000 _NT_TIB

------------- Trimmed for Readability purpose --------------------- 

      +0x210 FlsListHead      : _LIST_ENTRY [ 0x7efde210 - 0x7efde210 ]
         +0x000 Flink            : 0x7efde210 _LIST_ENTRY [ 0x7efde210 - 0x7efde210 ]
         +0x004 Blink            : 0x7efde210 _LIST_ENTRY [ 0x7efde210 - 0x7efde210 ]
      +0x218 FlsBitmap        : 0x76fa4238 Void
      +0x21c FlsBitmapBits    : [4] 1
      +0x22c FlsHighIndex     : 0
      +0x230 WerRegistrationData : (null)
      +0x234 WerShipAssertPtr : (null)
      +0x238 pContextData     : 0x001b0000 Void
      +0x23c pImageHeaderHash : (null)
      +0x240 TracingFlags     : 0
      +0x240 HeapTracingEnabled : 0y0
      +0x240 CritSecTracingEnabled : 0y0
      +0x240 SpareTracingBits : 0y000000000000000000000000000000 (0)
   +0x034 LastErrorValue   : 0
   +0x038 CountOfOwnedCriticalSections : 0
   +0x03c CsrClientThread  : (null)
   +0x040 Win32ThreadInfo  : (null)
   +0x044 User32Reserved   : [26] 0
   +0x0ac UserReserved     : [5] 0
   +0x0c0 WOW32Reserved    : 0x73132320 Void
   +0x0c4 CurrentLocale    : 0x409
   +0x0c8 FpSoftwareStatusRegister : 0
   +0x0cc SystemReserved1  : [54] (null)
   +0x1a4 ExceptionCode    : 0n0
   +0x1a8 ActivationContextStackPointer : 0x005b07e0 _ACTIVATION_CONTEXT_STACK
      +0x000 ActiveFrame      : (null)
      +0x004 FrameListCache   : _LIST_ENTRY [ 0x5b07e4 - 0x5b07e4 ]
         +0x000 Flink            : 0x005b07e4 _LIST_ENTRY [ 0x5b07e4 - 0x5b07e4 ]
         +0x004 Blink            : 0x005b07e4 _LIST_ENTRY [ 0x5b07e4 - 0x5b07e4 ]
      +0x00c Flags            : 0
      +0x010 NextCookieSequenceNumber : 1
      +0x014 StackId          : 0x4deb2a
   +0x1ac SpareBytes       : [36]  ""
   +0x1d0 TxFsContext      : 0xfffe
   +0x1d4 GdiTebBatch      : _GDI_TEB_BATCH
      +0x000 Offset           : 0
      +0x004 HDC              : 0
      +0x008 Buffer           : [310] 0
   +0x6b4 RealClientId     : _CLIENT_ID
      +0x000 UniqueProcess    : 0x00001080 Void
      +0x004 UniqueThread     : 0x000010e8 Void
   +0x6bc GdiCachedProcessHandle : (null)
   +0x6c0 GdiClientPID     : 0
   +0x6c4 GdiClientTID     : 0
   +0x6c8 GdiThreadLocalInfo : (null)
   +0x6cc Win32ClientInfo  : [62] 0
   +0x7c4 glDispatchTable  : [233] (null)
   +0xb68 glReserved1      : [29] 0
   +0xbdc glReserved2      : (null)
   +0xbe0 glSectionInfo    : (null)
   +0xbe4 glSection        : (null)
   +0xbe8 glTable          : (null)
   +0xbec glCurrentRC      : (null)
   +0xbf0 glContext        : (null)
   +0xbf4 LastStatusValue  : 0
   +0xbf8 StaticUnicodeString : _UNICODE_STRING ""
      +0x000 Length           : 0
      +0x002 MaximumLength    : 0x20a
      +0x004 Buffer           : 0x7efddc00  ""
   +0xc00 StaticUnicodeBuffer : [261]  ""
   +0xe0c DeallocationStack : 0x00090000 Void
   +0xe10 TlsSlots         : [64] (null)
   +0xf10 TlsLinks         : _LIST_ENTRY [ 0x0 - 0x0 ]
      +0x000 Flink            : (null)
      +0x004 Blink            : (null)
.....................   
.....................
This part trimmed for readability purpose.

The highlighted line is the DeallocationStack which has offset e0c. The max stack limit gets stored here.

Happy debugging

Comments

Popular posts from this blog

Reversing char array without splitting the array to tokens

 I was reading about strdup, a C++ function and suddenly an idea came to my mind if this can be leveraged to aid in reversing a character array without splitting the array into words and reconstructing it again by placing spaces and removing trailing spaces. Again, I wanted an array to be passed as a function argument and an array size to be passed implicitly with the array to the function. Assumed, a well-formed char array has been passed into the function. No malformed array checking is done inside the function. So, the function signature and definition are like below: Below is the call from the client code to reverse the array without splitting tokens and reconstructing it. Finally, copy the reversed array to the destination.  For GNU C++, we should use strdup instead _strdup . On run, we get the following output: Demo code

XOR (Exclusive OR) for branchless coding

The following example shows the array reversing using the  XOR operator . No need to take any additional variable to reverse the array.   int main(int argc, _TCHAR* argv[]) { char str[] = "I AM STUDENT"; int length = strlen(str); for(int i = 0; i < ((length/2)); i++) { str[i] ^= str[length - (1+i)]; str[length - (1+i)] ^= str[i]; str[i] ^= str[length - (1+i)]; } cout << str << endl; return 0; } The above example is one of the uses of XOR but XOR comes in handy when we can do branchless coding  methods like butterfly switch etc. Sometimes this is very effective in speeding up the execution.  Let's see one of the uses of XOR in branchless coding. I am taking a simple example of Y = | X |.  Yes, I am generating abs of a supplied number. So, my function signature/definition in C++ looks like below: int absoluteBranch( int x) {     if (x < 0 ) {         return -x;     }     else {         retur

Power of Two

  I n this post will be discussing how to calculate if a number is a power of two or not. As an example, 8 is a power of two but the number 10 is not. There are many ways we can solve this. First , we will take an approach which is simple and iterative. In this case, we will calculate the power of two one by one and check with the supplied number. The below code illustrates it. bool isPowerofTwo(unsigned num) { auto y = 1; while (0 != y) { if (num == y) return true; if (num < y) return false; y <<= 1; } return false; } Second , assuming, the number is a 32-bit number, this is also an iterative solution. In this scenario, iterating all bits and counting the set bits. Any number which is a power of 2 will have only one bit set and the rest will be zeros. As an example, 8 in binary representation is 1000. Using this observation, we can implement an iterative solution. bool isPowerofTwo(unsigned num) { auto one_count = 0; for (auto index = 0; index < 32;