In this page:

  1. Mach-O
  2. Header
    1. File Type
    2. Magic Number
    3. Dumping bytes
  3. Load Commands
  4. Conclusion
  5. References

Every executable binary an Apple computer is a Mach-O data structure. In this article we’ll peek behind the curtain and get a glimpse at its internal structure.

Mach-O

Mac OS X, released in 2001, is built on the XNU kernel, a derivative of the Mach Kernel from 1985, originally developed for operating system research. This explains why Apple devices employ the Mach-O format for structuring binaries.

Here’s a brief overview of the Mach-O file structure:

  • Header: Contains metadata about the file (CPU type, file type: exec, dylib, etc.), with a fixed size across all files.
  • Load Commands: Describe the memory layout and linking information, with a variable size.
  • Data: segments to be loaded into memory, notably these three:
    • __TEXT: immutable code and read-only data.
    • __DATA: mutable data.
    • __LINKEDIT: dyld instructions to relocate the library, import functions it needs, and export functions it implements.

Mach-O Header

The header for all 64 bit binaries is mach_header_64, defined in loader.h. That’s better documented that the mach_header_64 API docs, or the translated Swift version you can read by command-clicking this code:

import Darwin
let header = mach_header_64()

With this header we can read the header of a Mach-O binary from Swift. Although in practice, it’s faster to use the otool command from the terminal.

To see the file type of a Mach-O product, dump the header with otool -h:

Mach-O File Type

Mind the column filetype, it corresponds with the types defined in <mach-o/loader.h>. Here are the one you’ll find during development:

  • MH_OBJECT 0x1: Relocatable object file (.o)
  • MH_EXECUTE 0x2: Executable
  • MH_DYLIB 0x6: Dynamic library (.dylib)
  • 0x21: Static library (.a)
  • (…) core dumps, legacy formats, and others.

The following don’t have specific types:

  • Frameworks and xcframeworks: not a Mach-O but a bundle that contains a static or dynamic library.
  • Mergeable libraries: uses a dylib file type (MH_DYLIB) despite the additional metadata.
  • TBD: a text file that work as a placeholder of a dylib. How so? if the program that uses the dylibs runs in an environment where the dylib is present, it doesn’t need the symbols until executed. It is enough to know what symbols are defined in the library.

Mach-O Magic number

The magic number that indicates the CPU architecture.

These values are the macOS equivalent of 0x7FELF for ELF or MZ for Windows Portable Executables. In macOS there are three possible values:

  • 0xcafebabe (cafe babe) Universal binary
  • 0xfeedface (feed face) for 32-bit
  • 0xfeedfacf (feed face f) for 64-bit

A universal binary (x0cafebabe) contains several architectures, so accordingly, it has a “fat header” that contains multiple values.

% otool -f UseHello
Fat headers
fat_magic 0xcafebabe
nfat_arch 2
architecture 0
    cputype 16777223
    cpusubtype 3
    capabilities 0x0
    offset 4096
    size 43400
    align 2^12 (4096)
architecture 1
    cputype 16777228
    cpusubtype 0
    capabilities 0x0
    offset 49152
    size 63792
    align 2^14 (16384)

Dumping bytes

A binary is just a data structure so we can read it dumping bytes from certain positions. Because the magic number is at the beginning let’s dump the first 4 bytes (no offset needed).

% xxd -l 4 UseHello   
00000000: cffa edfe

Note that the result is reversed (cffa edfe instead feed facf) because that’s the file format. The ARM CPUs use little endian (less meaningful byte on the right).

As you may guess, xxd is an utility that translates bytes to hex digits. But do you know why I read exactly 4 bytes?

  • Each hex digit represents 4 bits (since 16 = 2^4)
  • 0xfeedface has 8 hex digits
  • 8 digits × 4 bits = 32 bits = 4 bytes

Well, either make that calculation or go to the header and see that the magic number is a int32.

Mach-O Load Commands

The Mach-O binary format contains segments of data, each preceded by a load command (LC) that instructs the kernel and dynamic linker how to handle the binary. Here’s an example of what load commands were doing in the object code of the previous article:

% otool -l Hello.o | grep LC_
      cmd LC_SEGMENT_64      - maps sections into process address space
      cmd LC_BUILD_VERSION   - sets the minimum OS and SDK version
     cmd LC_SYMTAB           - Location and structure of symbol table, string table
            cmd LC_DYSYMTAB  - Idem for the Dynamic symbol table
     cmd LC_LINKER_OPTION - load swiftSwiftOnoneSupport
     cmd LC_LINKER_OPTION - load swiftCore
     cmd LC_LINKER_OPTION - load swift_Concurrency
     cmd LC_LINKER_OPTION - load swift_StringProcessing
     cmd LC_LINKER_OPTION - load objc runtime interoperability

Executable files may have additional load commands like LC_LOAD_DYLIB, LC_MAIN, etc. All possible load commands are defined in <mach-o/loader.h>.

LC_SEGMENT_64

LC_SEGMENT_64 includes a number of parameters indicating to the kernel how to map the code into virtual memory to get the code ready for execution. They are provided for program code, program data, and symbol tables used by the linker.

LC_LINKER_OPTION

We see this code uses several libraries. Some of them are:

  • swiftSwiftOnoneSupport is a support library for Swift’s “Onone” optimization level, which is typically used for debug builds.
  • String Processing provides declarative string processing APIs.
  • The Objective-C runtime system is still included by default on Apple platforms. Some features of the Swift language like dynamic dispatch and reflection use the Objective-C runtime. It’s also needed to interact with Apple frameworks written in Objective-C.

LC_BUILD_VERSION

As for the LC_BUILD_VERSION, there is an alternative quicker way to look at it: vtool -show Hello.o. It indicates what is the minimum operative version required, and what SDK was used to compile.

% vtool -show Hello.o 
Hello.o:
Load command 1
      cmd LC_BUILD_VERSION
  cmdsize 24
 platform MACOS
    minos 15.0
      sdk 15.0
   ntools 0

LC_LOAD_DYLIB

For an executable, there are a lot of load commands, try for instance otool -l UseHello | grep LC_. While it is a lot of information, it’s not impossible to decipher, it just takes time.

It may help you see the dependencies of a program. For instance, here is the UseHello executable telling the kernel to load the dynamic loader (dyld) and then look for symbols in a number of libraries.

 % otool -l UseHello | grep -E "LC_LOAD_DYLINKER|LC_LOAD_DYLIB| name"
          cmd LC_LOAD_DYLINKER
         name /usr/lib/dyld (offset 12)
          cmd LC_LOAD_DYLIB
         name @rpath/Hello.framework/Hello (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libSystem.B.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libc++.1.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name /usr/lib/libobjc.A.dylib (offset 24)
          cmd LC_LOAD_DYLIB
         name @rpath/libswiftCore.dylib (offset 24)

Conclusion

This knowledge may seem arcane, but it allows us to peek into executables to examine architectures and dependencies. Here, I discussed some LC commands; on the next page I shed some light on the linking process.

References

Mach-O

Several books mention Mach-O