Skip to main content

Shellcode

In this blog we will learn how to write a position independent C code. Writing shellcode traditionally meant battling with raw assembly and dealing with massive, unreadable byte arrays. But as we know how Windows Portable Executable (PE) format works, you can write fully functional shellcode in C, compile it directly into our binary, and extract it at runtime. This method is used in my project YetAnotherReflectiveLoaderEXTERNAL LINK TOhttps://github.com/Oorth/YetAnotherReflectiveLoaderWebsite Preview and documented at Reflective DLL Injection, you can watch it in action there.

C
x64dbg

Setup

Everything which we are going to talk about is done on latest Windows and defender versions, which at the time of writing this blog are -

Windows OS

  • Edition: Windows 11 Pro
  • Version: 25H2
  • OS Build: 26200.7840

Defender Engine

  • Client: 4.18.26010.5
  • Engine: 1.1.26010.1
  • AV / AS: 1.445.222.0

Environment

Everything is created and built to test modern security with all security feature turned ON:

✓ Real-time protection

✓ Tamper Protection

✓ Memory integrity

✓ Memory access protection

✓ Microsoft Vulnerable Driver Blocklist

Warning

This is not just any project built to run in a vulnerable environment with security features turned off. This is some serious work and hence made just for education and research purposes.

The Landscape

Writing Position Independent Code (PIC) in C is an interesting thing.

Standard applications are pampered by the operating system. They rely on the CRT (C Runtime) to set up their environment, absolute memory addresses to find their variables, and the IAT (Import Address Table) to call Windows APIs.

Our Shellcode has none of that. It will be blindly injected into a foreign process, and must survive entirely on its own without crashing the host.

STANDARD EXE / DLL

  • Relies on C Runtime (CRT)
  • Uses Import Address Table
  • Variables scattered in .data
  • Code isolated in .text

RAW SHELLCODE (PIC)

  • NO CRT Initialization
  • NO OS API Resolution
  • 100% Self-Sufficient
  • Merged into a Single Data Blob

The Compiler Fights

By default, whenever we compile a program, the compiler acts like a filing cabinet. It naturally scatters different parts of our program into different PE sections. To understand what we are fighting against, lets look at this standard C function. When we compile, the linker rips this code apart and scatters it across the executable based on memory protections:

Default Compiler Behavior

int g_counter = 5;

➔ Mapped to .data (RW)

int g_uninitialized;

➔ Mapped to .bss (RW)

void ExecutePayload() {

➔ Mapped to .text (RX)

const char* msg = "Hello";

➔ Mapped to .rdata (R)

g_counter++;
}

If we extract just the .text section of this compiled binary to use as our shellcode, it will immediately crash. Because the instructions in .text will be trying to reference variables and strings that were left behind in the .data and .rdata sections and we will have a cute EXCEPTION_ACCESS_VIOLATION.

To successfully write shellcode, our main goal is to force the compiler to stuff everything (code, strings, and variables) into a single PE section.

The Compiler Helps

Luckily the msvc compiler is very humble and does listen to us. We can add special commands embedded directly into the source code that tell the compiler how to compile a specific piece of code, rather than telling the program what to do at runtime.

These are called Compiler Directives or Compiler Attributes. We will use these to force everything our shellcode requires into a single PE section.

#pragmaDirectives

Pragmas are standard C mechanisms that allow compilers to offer machine-specific or operating-system-specific features.

#pragma code_seg(push, .stub)

Tells the MSVC linker to put the compiled machine code for the functions that follow into a custom section of the executable named .stub.

#pragma code_seg(pop)

Pops the section name off the internal compiler stack, returning the compiler to its default behavior (putting code back into .text).


We will wrap our whole shellcode in these, so now the linker will put all of our shellcode in .stub section in the PE

__declspecAttributes

This is a Microsoft-specific keyword used to specify storage class information for variables and functions. (Note: GCC and Clang use attribute((...)) for the exact same purpose).

__declspec(allocate(.stub))

Tells the compiler to forcefully place a specific variable or data object into the .stub memory segment instead of the default .data or .rdata sections.

__declspec(noinline)

Forces the compiler to never inline a particular function, even if the compiler's optimization algorithms think it would make the program run faster.

The Shellcode

Writing C shellcode is a game of extreme isolation. We cannot rely on standard headers, we cannot call standard APIs, and we cannot let the compiler optimize our functions. As this shellcode will be injected in a process, We will insted walk the host process' PEB and use its version of windows api to execute our stuff.

Locating the PEB

Hardware registers (FS in 32-bit, GS in 64-bit) always hold a pointer to the current thread's environment. Offset 0x60 inside that environment contains the pointer to the PEB.

x64 REGISTERS

RAX
RBX
...
FS

GS Register

TEB (Thread Env Block)

0x00InheritedAddressSpace
0x01ReadImageFileExecOptions
...
0x60ProcessEnvBlock
...
>_We can do it like
entry_point.cpp
#ifdef _M_IX86
PEB* pPEB = (PEB*) __readfsdword(0x30);
#else
PEB* pPEB = (PEB*) __readgsqword(0x60);
#endif

For x64 systems we can read the offset 0x60 and for x86 systems we can read at offset 0x30

The next step is to walk the InLoadOrderModuleList to get the base memory addresses (handles) of the libraries the process has already loaded, most importantly ntdll.dll and kernelbase.dll. To find the base addresses of ntdll.dll and kernelbase.dll, we must walk the InLoadOrderModuleList. Windows stores loaded modules in a circular doubly-linked list (LIST_ENTRY).

However, the pointers in this list do not point to the top of the structures. They point to the LIST_ENTRY fields buried inside the structures. Here is exactly how this chain looks sitting in physical memory:

PEB_LDR_DATA
0x00 Length
0x04 Initialized
0x08 SsHandle

0x10 InLoadOrder...

.Flink━━━━➔
LDR_DATA_TABLE_ENTRY

0x00 InLoadOrderLinks

.Flink━━━━➔
0x10 InMemoryOrder...
0x20 InInitOrder...
0x30 DllBase
0x58 Name: "host.exe"
LDR_DATA_TABLE_ENTRY

0x00 InLoadOrderLinks

.Flink..loops to head
0x10 InMemoryOrder...
0x20 InInitOrder...
0x30 DllBase
0x58 Name: "ntdll.dll"

Here is the exact implementation using the CONTAINING_RECORD macro to safely extract the data during our traversal loop:

>_list_traversal
list_traversal.cpp
MY_PEB_LDR_DATA* pLdr = (MY_PEB_LDR_DATA*)pPEB->Ldr;

// Grab the head of the list
auto head = &pLdr->InLoadOrderModuleList;

// The first Flink points to the first actual loaded module (the host EXE itself)
auto current = head->Flink;

// Walk the circular load-order list
while(current != head)
{
// Find the base of the struct using the LIST_ENTRY pointer!
auto entry = CONTAINING_RECORD(current, LDR_DATA_TABLE_ENTRY, InLoadOrderLinks);

// ===========================================================================
// (We will extract the BaseDllName and DllBase from 'entry' here...)
// ===========================================================================

// Move to the next module in the chain
current = current->Flink;
}
info

WHY CONTAINING_RECORD?
The current->Flink pointer does not point to the top of the LDR_DATA_TABLE_ENTRY structure. It specifically points to the InLoadOrderLinks field situated inside the structure.

If we try to read the BaseDllName directly from the current pointer, we will read garbage memory and crash. The CONTAINING_RECORD macro performs reverse pointer math: it takes the address of the link we are currently pointing at, subtracts its structural offset (e.g., -0x10), and hands us back a clean pointer to the absolute top (0x00) of the entry! Even though InLoadOrderLinks sits at 0x00, using this macro ensures our code is type-safe and structurally sound.

The Libraries

Once we have safely resolved the pointer to the top of the LDR_DATA_TABLE_ENTRY, we need to check if the module we are currently looking at is one of the core OS libraries we need to survive (ntdll.dll, kernel32.dll, or kernelbase.dll).

STEP 1Extracting the Unicode Name

The BaseDllName is stored as a UNICODE_STRING. Because our shellcode is isolated, we cannot rely on the standard C Runtime to parse or clean up this string. We extract the raw buffer and pass it to our own custom parsing function.

list_traversal.cpp
    if(entry->BaseDllName.Buffer)
{
const WCHAR* namePtr;
SIZE_T nameLen;

// Custom function to strip away any file paths and leave just the raw DLL name
HelperSplitFilename(entry->BaseDllName.Buffer, entry->BaseDllName.Length / sizeof(WCHAR), &namePtr, &nameLen);

We divide the Length of the unicode string by sizeof(WCHAR) to get the exact character count, then pass it to HelperSplitFilename. This ensures we are only comparing the actual file name (e.g., "ntdll.dll") and ignoring any full-path weirdness that might exist in memory.

STEP 2Custom String Matching

We cannot use wcscmp or strcmp because they exist inside msvcrt.dll which we haven't loaded. Instead, we use a custom isSameW function to match the length and raw bytes of the string to our target libraries.

list_traversal.cpp
        // Check for user32.dll
SIZE_T k32len = sizeof(kUsr32)/sizeof(WCHAR) - 1;
if(nameLen == k32len && isSameW(namePtr, kUsr32, k32len))
sLibs.hUsr32 = (HMODULE)entry->DllBase;

// Check for kernelbase.dll
k32len = sizeof(hKernelbase)/sizeof(WCHAR) - 1;
if(nameLen == k32len && isSameW(namePtr, hKernelbase, k32len))
sLibs.hKERNELBASE = (HMODULE)entry->DllBase;

// Check for ntdll.dll
k32len = sizeof(kNtdll)/sizeof(WCHAR) - 1;
if(nameLen == k32len && isSameW(namePtr, kNtdll, k32len))
sLibs.hHookedNtdll = (HMODULE)entry->DllBase;
}

Api Resolution

Now that we have the HMODULE base address of kernelbase.dll and other core dlls, our shellcode is no longer blind. But, we cannot simply call GetProcAddress to find functions because we don't have its address Instead, we use a custom function (ShellcodeFindExportAddress) that manually parses the Export Address Table (EAT) of any given module. So, now we can simply:

api_resolution.cpp
// Resolve Debug Logging (Optional, but incredibly useful for dev)
my_OutputDebugStringW = (pfnOutputDebugStringW)ShellcodeFindExportAddress(
sLibs.hKERNELBASE, cOutputDebugStringWFunction, my_LoadLibraryA);
if(my_OutputDebugStringW == NULL) __debugbreak();

Afterword

Now our shellcode is ready, it is now a fully functional program capable of executing complex Windows API calls, spawning threads, and deploying payloads. The Shellcode will reside in the .stub section of the injector and can be simply copy-pasted into a target in a RWX region.

warning

This shellcode requires definition of some Windows' internal structures like _LDR_DATA_TABLE_ENTRY and PEB_LDR_DATA which are very Windows version specific and will crash the shellcode if changed in an update :)

This method of creating a shellcode is used in my project:

YetAnotherReflectiveLoaderLoading...
View Repository ›

and documented somewhat in detail in Reflective DLL Injection give it a read u haven’t :)

Future improvements

Stack Strings

Currently we use __declspec(allocate(.stub)) to store the string in memory, we can instead use stack strings to hide the intentions of the shellcode a little better.

API Hashing

Instead of comparing raw strings, we can calculate a mathematical hash (like DJB2 or MurmurHash) of the namePtr buffer and compare it against hardcoded hash integers

References

icon
TEB structuremicrosoft
icon
PEB structuremicrosoft
icon
PEB_LDR_DATA structuremicrosoft
icon
Windows 11 Structuresvergiliusproject
icon
Writing and Compiling Shellcode in Cired
icon
From C to shellcodeprint3m
VISITOR
[CONNECTED] _
Your IP: Scanning...|LOC: Unknown|ISP: Unknown|CPU: 8 Cores|RAM: ? Gb|PWR: Unknown|00:00:00