Porting free: Linux-like memory statistics tool for macOS

2024-01-18

Info

I published this tool as a full port of the Linux free command for macOS. You can find the source code and installation instructions on the project repository on GitHub: free-mac.

Introduction and Motivation

As a graduate student, my work often involves running complex simulations and data analysis tasks. Typically, these tasks are executed on powerful university clusters or at CERN machines, which boast robust hardware capabilities, particularly in terms of memory. However, when working locally on my poor MacBook, I encountered a challenge: macOS lacks a direct equivalent of Linux’s memory reporting tools, which are integral to my workflow.

To address this gap, I embarked on a quest to develop a free, Mac compatible tool that replicates the memory statistics functionality of Linux. This blog post chronicles my journey, from the initial motivation to the technical intricacies of the tool’s development. Memory management on macOS is known for its efficiency, often attributed to the sharing of large blocks of read-only memory between applications. Linux, on the other hand, has a complex memory management system with many configurable settings, accessible via the /proc file system and adjustable using sysctl. Despite these differences, I needed a tool that could provide me with a clear and comprehensive view of memory usage on my Mac, similar to what I was accustomed to on Linux. free -h command was my go-to tool for this purpose, and I wanted to replicate its functionality on macOS because I couldn’t find a suitable alternative.

The tool I developed is a command-line program that fetches and displays memory statistics in a human-readable format. It’s designed to provide detailed information about total, used, free, cached, application, and wired memory, along with swap usage for macOS systems. The program leverages the Mach API and sysctl to gather memory statistics, which are not as straightforward to access on macOS as they are on Linux. There are some differences between how macOS handle cached memory.

Technical details

The Mach API allows us to interact with the low-level features of macOS, which is different from the Linux approach where memory information is typically read from files in the /proc directory. In my program, I use mach_host_self() function to get the Mach port¹ for the host, and host_page_size() to determine the page size², which is essential for calculating memory usage.

I then fetch the total physical memory using host_info()³ with HOST_BASIC_INFO⁴, and detailed memory statistics using host_statistics64()⁵ with HOST_VM_INFO64. These functions provide a wealth of information about the system’s memory, which I then use to calculate the different components of memory usage.

Now I have the building blocks to calculate most of the memory types. Still there is one particular type that I could not find a way to calculate based on this information which is Swap. I will talk on how we can an estimate for the swap usage later when I talk about the implementation.

Implementation

First we need to include the necessary headers.

#include <stdio.h>
#include <mach/mach.h>
#include <sys/types.h>
#include <sys/sysctl.h>

The first header is stdio.h which is needed for printf and snprintf functions. The second header is mach/mach.h which is needed for the Mach API functions. The third header is sys/types.h which is needed for size_t type. The last header is sys/sysctl.h which is needed for sysctl function.

I needed to have the same format as free -h because I’m so used to how it looks. So I needed a way to emulate that. I wrote a function called formatBytes that converts the number of bytes into a human-readable format with appropriate suffixes like KB, MB, GB, etc. This function is crucial for presenting the data in a way that’s easy to understand at a glance.

void formatBytes(unsigned long long bytes, char *buffer, int bufferSize, int human) {
    if(human == 0) {
        snprintf(buffer, bufferSize, "%llu", bytes);
        return;
    }

    const char *suffixes[] = {"B", "KB", "MB", "GB", "TB"};
    int suffixIndex = 0;
    double result = bytes;

    // Loop to determine the appropriate suffix and reduce the bytes accordingly
    while (result > 1024 && suffixIndex < 5) {
        result /= 1024.0;
        suffixIndex++;
    }

    // Format the result with the determined suffix
    snprintf(buffer, bufferSize, "%.2f %s", result, suffixes[suffixIndex]);
}

In the formatBytes function, bytes are converted into a human-readable format. It uses a boolean human to decide whether to format the data. If human is false, it simply prints the byte count. Otherwise, it selects the appropriate unit (B, KB, MB, GB, TB) by dividing the byte count by 1024 repeatedly until the count is small enough for the unit. This loops both finds the right unit and reduces the byte count to a human-friendly size. The result is then formatted into a string using snprintf, considering the buffer size to prevent overflow.

Now let’s get back to the swap problem, macOS does not provide simple way like Linux (which can be read from /proc/swaps) to get the swap usage. So I had to find a way to get an estimate for the swap usage. I figured that this can be done by using sysctl with CTL_VM and VM_SWAPUSAGE to do that.

Now we are ready for a main function to rule them all. If I was better programmer, I would have written a function for each of the memory types. But I’m not, so I wrote a single function that calculates all the memory types.

I start the main function by define some structs and variables that will be used later.


int main() {
    // Initialize Mach port and page size variables
    mach_port_t host_port = mach_host_self();
    vm_size_t page_size;
    host_page_size(host_port, &page_size);

    // Fetch total physical memory using host basic info
    host_basic_info_data_t hostInfo;
    mach_msg_type_number_t info_count = HOST_BASIC_INFO_COUNT;
    if (host_info(host_port, HOST_BASIC_INFO, (host_info_t)&hostInfo, &info_count) != KERN_SUCCESS) {
        fprintf(stderr, "Failed to get total memory\n");
        return 1;
    }

    // Fetch detailed memory statistics using VM statistics
    vm_statistics64_data_t vm_stat;
    mach_msg_type_number_t host_size = sizeof(vm_statistics64_data_t) / sizeof(integer_t);
    if (host_statistics64(host_port, HOST_VM_INFO64, (host_info_t)&vm_stat, &host_size) != KERN_SUCCESS) {
        fprintf(stderr, "Failed to get memory statistics\n");
        return 1;
    }

}

I start with the initialization of a mach_port_t variable, named host_port, by calling the mach_host_self function, which returns the Mach port for the host. Following this, a vm_size_t variable named page_size is declared and filled with the page size in bytes by invoking the host_page_size function with host_port and the address of page_size. Subsequently, a host_basic_info_data_t structure named hostInfo is declared to hold basic information about the host, and a mach_msg_type_number_t variable, info_count, is initialized with HOST_BASIC_INFO_COUNT, representing the number of elements in hostInfo.

The host_info function is then used with these parameters to populate hostInfo with the host’s basic information. If the host_info function call does not succeed, meaning it returns a value other than KERN_SUCCESS, an error message is printed and 1 is returned to indicate an error occurred.

After that, a vm_statistics64_data_t structure named vm_stat is declared to hold detailed virtual memory statistics. The mach_msg_type_number_t variable host_size is then initialized with the size of vm_stat divided by the size of integer_t. The host_statistics64 function is called with these parameters to fill vm_stat with the host’s detailed virtual memory statistics. If the host_statistics64 function call does not succeed, which means it returns a value other than KERN_SUCCESS, an error message is printed and 1 is returned to indicate that an error occurred.

Now it’s the time to write a function to get the swap information from syst1.

Here is the code that does that:

// Get swap information using sysctl
struct xsw_usage swapinfo;
size_t swapinfo_sz = sizeof(swapinfo);
int mib[2] = {CTL_VM, VM_SWAPUSAGE};
if (sysctl(mib, 2, &swapinfo, &swapinfo_sz, NULL, 0) != 0) {
    perror("sysctl");
    return 1;
}

The function begins by declaring a structure of type xsw_usage named swapinfo to hold swap usage information. It also declares a size_t variable swapinfo_sz and initializes it with the size of swapinfo. An integer array mib of size 2 is declared and initialized with the constants CTL_VM and VM_SWAPUSAGE which are used to specify the information to be retrieved. The sysctl function is then called with these parameters to fill swapinfo with the swap usage information.

The sysctl function reads and/or writes kernel parameters and in this case, it is used to read the swap usage information. If the sysctl function call fails (i.e., it returns a value other than 0), it prints an error message using the perror function and returns 1 to indicate that an error occurred.

Now every piece of information is ready for use to begin our calculations. First thing to do now is to declare some variables for different memory types.

// Declare variables for different memory types
unsigned long long total_memory = hostInfo.max_mem;
unsigned long long free_memory = (unsigned long long)(vm_stat.free_count - vm_stat.speculative_count) * page_size;
unsigned long long wired_memory = (unsigned long long)vm_stat.wire_count * page_size;
unsigned long long app_memory = (unsigned long long)(vm_stat.internal_page_count - vm_stat.purgeable_count) * page_size;
unsigned long long cached_memory = (unsigned long long)(vm_stat.purgeable_count + vm_stat.external_page_count) * page_size;
unsigned long long used_memory = total_memory - free_memory - cached_memory;

The total_memory variable is set to the maximum memory available on the host, which is obtained from the max_mem field of the hostInfo structure. The free_memory is calculated by subtracting the speculative page count (speculative_count) from the free page count (free_count), both obtained from the vm_stat structure. The result is then multiplied by the page size (page_size) to convert the count into bytes. The wired_memory variable is calculated by multiplying the wired page count (wire_count from vm_stat) by the page size. The wired_memory is calculated by multiplying the wired page count (wire_count from vm_stat) by the page size. Wired memory is memory that can’t be paged out to disk.

The app_memory variable is calculated by subtracting the purgeable page count (purgeable_count from vm_stat) from the internal page count (internal_page_count from vm_stat). The result is then multiplied by the page size. The cached_memory calculated by adding the purgeable page count and the external page count (external_page_count from vm_stat), and then multiplying the result by the page size. And finally, the used_memory variable is calculated by subtracting the free memory and the cached memory from the total memory. This gives the amount of memory that is currently being used.

All what is left is some formatting and print statement to show the output.

// Formatting memory sizes for human-readable output
char totalStr[20], usedStr[20], freeStr[20], cachedStr[20], appStr[20], wiredStr[20];
char swapTotalStr[20], swapUsedStr[20], swapFreeStr[20];

// Convert memory and swap statistics into human-readable format
formatBytes(total_memory, totalStr, sizeof(totalStr), 1);
formatBytes(used_memory, usedStr, sizeof(usedStr), 1);
formatBytes(free_memory, freeStr, sizeof(freeStr), 1);
formatBytes(cached_memory, cachedStr, sizeof(cachedStr), 1);
formatBytes(app_memory, appStr, sizeof(appStr), 1);
formatBytes(wired_memory, wiredStr, sizeof(wiredStr), 1);
formatBytes(swapinfo.xsu_total, swapTotalStr, sizeof(swapTotalStr), 1);
formatBytes(swapinfo.xsu_used, swapUsedStr, sizeof(swapUsedStr), 1);
formatBytes(swapinfo.xsu_avail, swapFreeStr, sizeof(swapFreeStr), 1);

// Printing formatted results
printf("%20s %14s %14s %14s %14s %14s\n", "total", "used", "free", "cached", "app", "wired");
printf("Mem: %15s %14s %14s %14s %14s %14s\n", totalStr, usedStr, freeStr, cachedStr, appStr, wiredStr);
printf("Swap: %14s %14s %14s\n", swapTotalStr, swapUsedStr, swapFreeStr);

return 0;

This code is used to format the previously calculated memory statistics into a human-readable format and print them. First, it declares several character arrays (totalStr, usedStr, freeStr, cachedStr, appStr, wiredStr, swapTotalStr, swapUsedStr, swapFreeStr) to hold the formatted memory sizes. Then, it calls the formatBytes function for each memory statistic. This function as I have shown will convert the memory sizes from bytes into a human-readable format (like KB, MB, GB, etc.) and stores the result in the corresponding character array.

The sizeof operator is used to pass the size of each array to the formatBytes function. Finally, it prints the formatted memory statistics using the printf function. The %20s, %14s, etc. are format specifiers that indicate that a string should be printed with a specific width. The printf function is called three times to print the headers (“total”, “used”, “free”, etc.), the memory statistics, and the swap statistics.

Usage

Warning

Don’t ever run any code I write. If you have to, then please be careful and check things.

Our program is ready to be used. To compile it, save the full file as free.c, then we can use the following command:

gcc -o free free.c

To run it, we can use the following command:

./free

The output should look like this:

./free
               total           used           free         cached            app          wired
Mem:        16.00 GB       12.63 GB       46.88 MB        3.33 GB        5.81 GB        1.95 GB
Swap:         0.00 B         0.00 B         0.00 B

Developing this tool which I did because I wanted to be more aware on memory usage on my poor Mac was a fun experience. I learned a lot about how memory is managed on macOS and how to use the Mach API (for the first time). This is a simple tool that anyone probably can write in a better way and will never be used by anyone. And to be honest I always advice anyone no to run any of the code I write not to bother relying on it. But I just did it and wanted to write about it.