For months now, I have been scratching my head over a small but persistent number of “crash reports” affecting a few of my apps. The issue is most prevalent in MarsEdit, where I have a handful of users who run into the issue multiple times per day. Luckily, one of these users is my good friend and colleague, Manton Reece. I’ve been peppering him with questions about the issue for weeks, while he stoicly puts up with the behavior.
Even with the assistance of a highly technical friend who can reproduce the issue at will, I had thrown my arms up in despair several times. I put “crash reports” in quotes above, because although my in-app crash reporter notices the app abruptly terminates, the system doesn’t create any obvious artifacts. No crash or hang reports. No “Quit Unexpectedly” dialog. The app is just … gone. I wrote a question in the Apple Developer Forums, which turned into a kind of de facto diary as I pursued the issue.
When I started to feel bad about asking Manton to try this, that, and the other thing, I finally asked if he could send me a “sysdiagnose” report. If you’re curious, the easiest way to grab one of these on any Mac is to simply press the Control, Option, Command, Shift, and “.” (period) keys at once. You’ll see the screen flash, an indication the system is starting to collect the reports. A few minutes later the report will be revealed in the Finder: a probably quite large zip archive. Open it up and see the wealth of information about nearly every aspect of the system.
Yet even with this wealth of information, I was stymied. It wasn’t until I chanced upon the delightfully pertinent nuggets of information in “/var/log/com.apple.xpc.launchd/launchd.log” that I got my first whiff of a clue:
2022-05-03 09:15:22.088718 (gui/501/application.com.red-sweater.marsedit4.384452971.384452977 )
: exited with exit reason (namespace: 15 code: 0xbaddd15c) - OS_REASON_RUNNINGBOARD | <RBSTerminateContext| code:0xBADDD15C explanation:CacheDeleteAppContainerCaches requesting termination assertion for com.red-sweater.marsedit4
Here we have a message asserting that MarsEdit was terminated, on purpose, and better still, it includes an explanation! As far as explanations go, “CacheDeleteAppContainerCaches” is not much of one, but it did give me something to go on. Searching for the term yielded pertinent results like this post about Apple Mail and Safari “suddenly quitting.” Unfortunately, they all seem to be scratching their heads as much as I am.
The other thing that jumped out at me from the log was the term “OS_REASON_RUNNINGBOARD”. Searching for this results in only a few scant links, all related to Apple’s open source Darwin kernel. However, Searching instead for just “RunnningBoard” offered a glimmer of hope. A post on Howard Oakley’s blog, “RunningBoard: a new subsystem in Catalina to detect errors“, includes a particularly succinct description of the eponymous OS subsystem (emphasis mine):
Catalina brings several new internal features, a few of which have been documented, but others seem to have slipped past silently. Among the latter is an active subsystem to replace an old service assertiond, which can cause apps to unexpectedly terminate – to you and me, crash – in both macOS 10.15 and iOS 13: RunningBoard.
Unexpected termination. Yep. To you and me? Crashing. At this point in the story I’m going to elide several hours of long, tedious, and yet still somehow fun work, wherein I disabled System Integrity Protection on my Mac, so that I could attach to the pertitent system daemons and try to make sense of how, and when, they might decide to unilaterally terminate an app like MarsEdit. While digging deeper into the issue, I remembered that “explanation” from the log, CacheDeleteAppContainerCaches, and it reminded me of system maintenance software like CleanMyMac. I normally shy away from these kinds of apps because they are historically known to be overly-aggressive in what they decide to delete. In the name of science, however, I decided to run it, with care, on my Mac.
Boom! After running CleanMyMac once, MarsEdit, along with Numbers, were suddenly not running anymore. I had finally reproduced the issue on my own Mac for the first time. Anybody who has fixed software bugs, either for a living or as a passion, knows this is the critical first step to really addressing an issue. With some tinkering, I was able to narrow down the reproduction steps to running the “Free Up Purgeable Space” action. It turns out this is invokes a system API responsible for trying to delete caches, etc., from a Mac. Normally the system only does this when disk space is critically low, but CleanMyMac gives you the option to exercise the behavior at any time.
That single log line quoted above turns out to hold another gem of information. The “code:0xBADDD15C” looks like it could be an arbitrary hexadecimal value, but it’s an example of an error code designed to both uniquely identify and suggest a mnemonic clue to the underlying issue. Apple documents many of these codes, which include 0xc00010ff (cool off), 0xdead10cc (deadlock), and 0xbaadca11 (bad call). I searched the system frameworks for this code and found it in the disassembly of “/System/Library/PrivateFrameworks/CacheDelete.framework”. Particularly, in an internal function called “assert_group_cache_deletion”. It was only after exploring the issue in the forums, did Quinn explain that the code in this scenario is a mnemonic for “bad disk”. I guess it was easier to spell out than trying to represent “full disk”.
Equipped with all this new information, what can we do about the unexpected terminations? Well, nothing. I do wish Apple’s framework would try asking nicely if the app would quit, before summarily terminating it, but I guess the thinking is that this functionality should typically only be reached in extenuating circumstances. After learning more about the issue, I confirmed with Manton that his Mac did have low disk space, so I guess it was just the system trying its best to free up space that caused the issue for him.
The one thing I plan and hope to do as followup is to amend my built-in crash reporter so that it will not prompt the user or report a crash when the app terminates for this reason. I think it should be possible to detect the codes alluded to above, and simply let “0xBADDD15C” terminations happen without fanfare.