Yes, I am fully aware that I just told every one to update to Mac OS X 10.4.9 and I know this may be construed as me telling people not to.
When you see the "Optimizing System Performance" phase of a software update, Mac OS X is really updating prebinding. Updating prebinding has a very, very nasty bug in it (look at _dyld_update_prebinding). If multiple processes are updating prebinding at the same time, then it is possible for a system file to be completely zero'd out. Basically, all data in the file is deleted and it is replaced with nothing. This bug is usually triggered when updating Mac OS X and every update to Mac OS X has the potential to render your system unbootable depending on if the "right" file is deleted or not. It's triggered during the "Optimizing System Performance" phase of installing an update. This phase is actually just running update_prebinding. If you launch an application that links to libraries that are not yet prebound, there is a chance one of those files will be zero'd out as dyld automatically redoes the prebinding on that file.
I've been tracking this particular bug for about 18 months now. Most of the real "random" failures reported on various Mac OS X "troubleshooting" sites after a user has installed an Apple software update are actually manifestations of this bug. By real I mean not imagined problems or ones that have been there for a very long time but the user is just now noticing it and artificially connecting the cause to the recent update (it's called Pareidolia). Yes, this nasty prebinding bug has been reported to Apple and yes, it is 100% reproducible if you want to reproduce it.Every single time you install an update to Mac OS X whether it be an iTunes update, a QuickTime update, an update for daylight saving time, a security update, an Airport update, or an actual Mac OS X update, you can be hit by this bug. In order to prevent yourself from being smacked in the face by this bug, follow this simple rule: When "Optimize System Performance" appears during the update process do not touch your computer and definitely do not launch any applications. Just back away from your computer box as if it were a swarm of bees. Yes, it does mean that if you install the Mac OS X 10.4.9 update, you may get hit by the bug.
I think it's important to note that APE's use of update_prebinding at login is not affected due to the time at which APE runs it on the ICBMs. In order for the nasty prebinding bug to manifest, multiple prebinding operations must be going on (either explicit or implicit). Assuming some implicit action needed to be done, it's already been done by the login window application (the one that shows that bar graph when you boot) on any shared libraries it links to. And once you log in, even forcing a prebinding operation won't actually cause any files to be reprebound since that action was just done.
Symptoms and Signs
The worst sign you've been hit by this bug is an inability to boot after installing a Mac OS X update. Sometimes the little wheel will just keep on spinning. Other times you'll get to the point where you should see your desktop but all you see is a blue screen (because [the] loginwindow is repeatedly crashing due to a missing library). The "easiest" sign is an application will crash either at launch or when you do a specific action and the console.log /Applications/Utilities/Console (or a crash log) will spew out a message about dyld that says: "Reason: no suitable image found." and then sometimes "file to short" [sic]. The file is too short because it is zero-length. There is an example of someone being hit by this on the internets.
Sadly, most people suffering from this bug never have a chance to see an error message to find out what file was zero'd out, especially when it prevents boot. One of the things that can help a lot in troubleshooting the problem is booting in verbose mode. Setting this option is really quite easy. Open the terminal (/Applications/Utilties/Terminal) and type:
sudo nvram boot-args="-v"
Then just enter your password. From then on, every time you reboot, you'll be in verbose mode. You can also hold Command-V at boot time to boot into verbose mode for that boot only.
Verbose mode basically just doesn't show the Apple logo (OEM logo on the ICBMs) at boot time. You get to see all the gritty ugly details in booting Mac OS X. Note: You will not understand everything that's going on during booting in verbose mode. You don't need to. All you need to do is recognize what is normal and what isn't. Also, it really helps to boot into verbose mode after an upgrade as the first reboot can take an exceedingly long time (up to 10 minutes) and booting into verbose mode will help you know your Mac hasn't frozen. (Seeing "diskarb not ready" is normal, especially after a system update).
Preventative Measures
Well, I already listed it: When "Optimize System Performance" appears during the update process do not touch your computer and definitely do not launch any applications. Just back away from your computer box as if it were a swarm of bees.
I should note that this bug seems a lot more likely to happen on an ICBM as the prebinding operation has to work on two sides of a fat file (the x86 side and the PowerPC side) so it takes longer, which means there is more time for you to trigger the bug.
Solving the Problem if You're Hit
Recovering from this bug is not the easiest thing in the world. If it's a simple crash, you can just copy a "good" version of the file off a Mac running the exact same build as the busted one or you can just run the most recent combo updater (assuming the zero'd out file is in the combo updater). If it is really bad, you'll have to boot the busted Mac into FireWire target disk mode and try to run the combo updater off a working machine. TDM is not an option if the other Mac has a different processor architecture of it is is running a newer version of Mac OS X.
What Will Apple Do?
Who knows. This bug has been filed with Apple, along with steps to reproduce it 100% of the time (at least in my testing). It was marked as a duplicate, which means the bug was already in Apple's system before I filed it. And since it is duplicate, I don't know what is going on with it. Yes, before anyone mentions it, I know prebinding is deprecated. However, Mac OS X still does it when installing Apple updates. It doesn't matter that it is deprecated if it still happens, after all.
Even if prebinding goes completely away in Mac OS X 10.5 that doesn't solve the problem for Mac OS X 10.4.x users. Security updates will continue to be released for Mac OS X 10.4.x until Mac OS X 10.6 is released (which could be many, many years from now) and every one of those updates carries the risk of "destroying" the Mac of the person that installs it until this bug is fixed.
Note: This post only applies to Mac OS X 10.4.9 for the ICBMs. Since the PowerPC-based Macs cannot run Rosetta, none of the below is applicable to the PowerPC-based Macs.
Good news everyone. Mac OS X 10.4.9 has finally been released. This release fixes a major bug in Rosetta that caused PowerPC applications to crash. This was especially true if APE was installed. Although APE didn't cause the bug, it did trigger it the far majority of the time. I guess it should be mentioned that this bug happened on some people's Macs without APE having ever been installed.
But Mac OS X 10.4.9 fixes the bug, so it's cool.
Why is the path unclear when we know home is near? It's kind of funny, really. This bug was introduced sometime between Mac OS X 10.4.5 and 10.4.6 for the ICBMs. We thought that Mac OS X 10.4.7 would fix it based on some rumors on the internets. We were actually holding back a lot of releases hoping that the bug would be fixed. Sadly, it did not get fixed when Mac OS X 10.4.7 was finally released. I think we all cried in a circle or something.When Mac OS X 10.4.7 was released and it didn't fix the bug, we had to make a decision. Customers were complaining about applications running under Rosetta crashing. We had an ugly temporary fix (that involved killing translated) but that was too ugly. So we made our choice.
Understand we'll go hand-in-hand while we walk alone in fear.
The "horrible" workaround we had was suggested by one of our users. Namely, that calling update_prebinding fixed the problem. Somehow running update_prebinding eventually causes dyld (the craptastic dynamic linker that Mac OS X uses) to eventually call new_system_shared_regions or something like it. This call to new_system_shared_regions somehow seems to have a side-effect of causing Rosetta to ignore its global cache, possibly since there are now two maps of shared libraries on the system. The cache was the root of the problem with Rosetta and APE and it's why killing translated would temporarily fix the issue; it disconnected Rosetta from its cache (and luckily the cache is not needed for Rosetta to work or for it to be fast).
dyld is the only thing that is allowed to call new_system_shared_regions. Updating the prebinding is one of the only easy/non-evil ways to get dyld to call new_system_shared_regions that we could easily find.
So why was update_prebinding horrible? Well, forcibly updating prebinding had a few side-effects.It caused a massive number of files to be added to the disk cache. This meant the amount of free memory as listed in top would be significantly less than it'd otherwise be during a normal login. While this didn't have any actual negative performance effects, a lot of people believed it did (since most people perceive the amount of free ram to be a measure of performance). The files added to the disk cache were immediately marked as stale and would be removed from the cache as soon as that memory was needed by other processes. Those files simply being in the disk cache didn't prevent any other task from using that memory. Mac OS X has extremely aggressive disk caching.
If you open the Terminal (/Applications/Utilities/Terminal) and run du -s / (which has a side-effect of emptying the disk cache) would confirm that the memory isn't actually "used up". Note that this command can actually decrease performance as it forces all of the files in the disk cache to be removed, which means that they have to be read from disk again the next time they are accessed. Reading from disk is slow.
Where do we go from here?
Well, first download Application Enhancer 2.0.3. Then run Software Update and download and install the Mac OS X 10.4.9 update. Upon reboot, APE will no longer run update_prebinding.
We will be updating the majority of our haxies to include APE 2.0.3 over the next few weeks. If there are no other small fixes to be made to a haxie and it's one of the less popular ones, we may just do a "silent" update. Meaning all we'd do is download the disk image, set it to read/write, replace the APE installer, set it back to compressed read-only and upload it back to the server without changing the version number.
We highly recommend everyone download Application Enhancer 2.0.3 and Mac OS X 10.4.9.
The curtains close on a kiss. God knows, we can tell the end is here.
Here's looking forward to 10.4.10!