This article explores how dynamic linking and symbol resolution work in Mach-O binaries.

Let’s compile and inspect this minimal program.

% echo 'print("Hello!")' > hello.swift
% swiftc hello.swift
% ./hello
Hello!

nm is a utility that reveals the symbols within the executable, including both defined and undefined symbols. Some symbol names, particularly in Swift, contain mangled characters that have special meaning for the compiler. We can use swift demangle to convert these into a more readable format:

nm hello | swift demangle

Dissecting the Compiled Output

A simple print("Hello!") has turned into the following.

  1. This is how Swift internally creates string literals from raw memory.
    Swift.String.init(
      _builtinStringLiteral: Builtin.RawPointer, 
      utf8CodeUnitCount: Builtin.Word, 
      isASCII: Builtin.Int1
    ) -> Swift.String
    
  2. Private array management functions allocating the array.
    Swift.Array._endMutation() -> ()
    Swift._allocateUninitializedArray<A>(Builtin.Word)->([A],Builtin.RawPointer)
    Swift._finalizeUninitializedArray<A>(_owned [A]) -> [A]
    
  3. A call to the print function.
    Swift.print(
      _: Any..., 
      separator: Swift.String, 
      terminator: Swift.String
    ) -> ()
    
  4. Metainformation and infrastructure:
    • __swift_reflection_version: Version information for runtime reflection
    • __mh_execute_header: Mach-O header information
    • _main: The program’s entry point
    • _swift_bridgeObjectRelease: Memory management for bridged objects

This reveals that even printing an array of characters requires quite a bit of supporting infrastructure and runtime functionality.

External Symbols

The letters at the start of each line (U, T, s) indicate the symbol type. Uppercase letters indicate global (external) symbols, while lowercase letters indicate local (non-external) symbols.

  • T/t: The symbol is in the text (code) section.
  • D/d: The symbol is in the data section.
  • B/b: The symbol is in the bss (uninitialized data) section.
  • S/s: The symbol is in a section other than those above.

This indicates that some symbols reference code outside our executable.

All these undefined symbols belong to two libraries:

  • libSystem.B.dylib core C library on macOS, providing system calls, threading, and basic I/O
  • libswiftCore.dylib standard Swift library with String, Array, print

We could go to those libraries, inspect the symbol table, and we would see the symbol. Library is at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/macosx/libswiftCore.dylib.

nm $(find /Applications/Xcode.app -name libswiftCore.dylib | head -n 1) | swift demangle | grep Swift.print

We’ll see the same signature but with an uppercase T on the function signature that indicates a global Text symbol, where Text in this context is the section containing code.

The dynamic linker connects these calls to the appropriate library functions. Here’s how this process works.

Launching

The Mach-O loader, responsible for preparing the executable in memory, executes the Load commands (LC), which are directives embedded in the binary that instruct the system on how to load and link the program.

  • Some LC commands prepare a memory area for the executable, and page in (copy) most of it.
  • Others are delegated to the dynamic loader (dyld) to page in libraries into the process space and resolve symbols.

Even when libraries are loaded into memory, the actual addresses of external symbols (functions, variables, etc.) used by the executable are not fully resolved. This means the executable does not yet know the exact memory addresses of these external symbols.

We’ll see linking in the next sections, but first, an interesting takeaway from what we’ve already seen: every executable knows what libraries it needs. This means we can query it, and here is an example.

The output below says “load the dynamic linker, then use it to load libSystem.B.dylib and libswiftCore.dylib”.

% otool -l hello | grep -E "LC_LOAD_DYLINKER|LC_LOAD_DYLIB| name"
     cmd LC_LOAD_DYLINKER
    name /usr/lib/dyld (offset 12)
     cmd LC_LOAD_DYLIB
    name /usr/lib/libSystem.B.dylib (offset 24)
     cmd LC_LOAD_DYLIB
    name /usr/lib/swift/libswiftCore.dylib (offset 24)

You can also inspect the minimum operative version required, and what SDK created this program.

% vtool -show hello
hello:
Load command 10
      cmd LC_BUILD_VERSION
  cmdsize 32
 platform MACOS
    minos 15.0
      sdk 15.1
   ntools 1
     tool LD
  version 1115.7.3
Load command 11
      cmd LC_SOURCE_VERSION
  cmdsize 16
  version 0.0

Dynamic linking for Mach-O binaries

Remember the symbols marked with uppercase when we ran nm hello? These are undefined symbols that the executable expects to find in dynamic libraries at runtime. Before they can run they need to point to the symbol in the library.

Years ago, Apple used lazy pointers (__DATA,__la_symbol_ptr) for dynamic symbol resolution. Like this:

However, this has changed in modern toolchains for security and performance reasons. Now, the __DATA_CONST,__got section is used instead. The pointers in __DATA_CONST,__got are resolved by the dynamic linker (dyld) during program load and then marked as read-only, eliminating runtime modifications.

Similar to modern macOS, modern Linux systems have moved away from lazy binding as the default behavior. Most distributions now use the -z now linker flag which forces all symbols to be resolved at load time. You can still enable lazy binding if needed, but it’s no longer the default.

Before the executable starts running all pointers have been resolved and the dyld is out of the picture.

The main executable code is in the __text section of the __TEXT segment. The naming convention is to write segment,section so this section is labeled __TEXT,__text.

Then for each undefined symbol (like print) the Mach-O format contains:

  • A Symbol Stub (__TEXT,__stubs section): These are small code snippets that act as placeholders for the actual functions. Each stub typically contains a jump instruction that transfers control to an address stored in a corresponding symbol pointer.
  • A Symbol Pointer in the Global Offset Table (GOT) (__DATA__CONST,__got section): These pointers hold the addresses of external symbols, which are resolved by the dynamic linker (dyld) during program loading.

When the main executable code wants to execute an undefined symbol, this happens:

  1. The call goes through the stub in the __TEXT,__stubs section.
  2. The stub contains a jump to the address in the corresponding symbol pointer in the __DATA_CONST,__got section.
  3. During program loading, the dynamic linker (dyld) resolves the addresses of undefined symbols and updates the GOT entries in the __DATA_CONST,__got section.
  4. Since the GOT is in a read-only segment after initialization, the symbol pointers remain constant during program execution.
  5. When the stub is executed, it jumps directly to the resolved function address via the symbol pointer.

At this point the actual function is executed. This causes a bit of delay during loading but in subsequent calls, the actual function is executed immediately without the overhead of resolving the symbol at runtime.

There are several reasons why the code uses symbol stubs and the GOT:

  • Performance: By resolving symbols at load time (non-lazy binding), the program avoids the overhead of resolving symbols during execution, leading to better runtime performance.
  • Security: Placing the GOT in a read-only segment (__DATA_CONST,__got) prevents symbol pointers from being modified at runtime, enhancing security by mitigating attacks that attempt to overwrite function pointers.
  • Flexibility: This mechanism allows dynamic libraries to be updated independently of the executable, enabling updates and patches without recompiling the application.

If you want to see the tables run the following. The call to print appears in the index 11.

% otool -v -I hello | swift demangle
hello:

Indirect symbols for (__TEXT,__stubs) 5 entries
address            index name
0x0000000100003f44     7 Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String
0x0000000100003f50     9 type metadata accessor for Swift.Array
0x0000000100003f5c    10 Swift._allocateUninitializedArray<A>(Builtin.Word) -> ([A], Builtin.RawPointer)
0x0000000100003f68    11 Swift.print(_: Any..., separator: Swift.String, terminator: Swift.String) -> ()
0x0000000100003f74    13 _swift_bridgeObjectRelease

Indirect symbols for (__DATA_CONST,__got) 7 entries
address            index name
0x0000000100004000     7 Swift.String.init(_builtinStringLiteral: Builtin.RawPointer, utf8CodeUnitCount: Builtin.Word, isASCII: Builtin.Int1) -> Swift.String
0x0000000100004008     8 type metadata for Swift.String
0x0000000100004010     9 type metadata accessor for Swift.Array
0x0000000100004018    10 Swift._allocateUninitializedArray<A>(Builtin.Word) -> ([A], Builtin.RawPointer)
0x0000000100004020    11 Swift.print(_: Any..., separator: Swift.String, terminator: Swift.String) -> ()
0x0000000100004028    12 type metadata for Any
0x0000000100004030    13 _swift_bridgeObjectRelease

Static vs Dynamic Linking

Static linking involves copying the library code directly into the final executable during compilation. This requires a .a archive file containing raw object files. While static linking can work with both Address Space Layout Randomization (ASLR) and Position-Independent Code (PIC), it specifically needs these .a archives as input.

Dynamic libraries (.dylib) cannot be statically linked because they contain metadata specifically designed for the dynamic linker (dyld), such as:

  • Load commands
  • Export tables
  • Relocation information

Many libraries are only distributed as dylibs for several benefits:

  • Libraries can be updated independently without recompiling applications
  • Multiple applications can share the same library in memory
  • Applications have smaller file sizes

However, some libraries support both linking methods through mergeable libraries, which contain both static (.a) and dynamic (.dylib) formats, allowing developers to choose their preferred linking method.

Conclusion

While arcane, this information can help diagnose missing symbols and even improve launch performance. For example, Emergetools offers a feature that reorders symbols to reduce launch time by up to 18%. The tool generates an order file that decreases page faults by grouping launch-related symbols together. Michael Eisel demonstrated how to achieve similar results manually in his article Improving App Performance with Order Files.

Though I had a vague understanding of these concepts, writing this article has helped me develop a clearer mental model. For more insights into Mach-O and beyond, check out this article index.