|
October 21, 2002
Mach-O ABI
Here I go again, ranting about Mac OS X bowels. This time I want to talk about particular implementation details of Mach-O runtime ABI (Application Binary Interface). Before you get too confused, there are two different things under the 'Mach-O' name:
The latter is not what I want to talk about today; the first is what puzzles me most. I admit I am just a "small programmer" with no relationship with the powers-that-be at Apple at all (this means, no insider contacts who can explain the reasoning behind the particular important design decisions to me), so my impressions, judgements and guesses expressed in this article may be slightly or totally off the mark. I, however, as many other developers who have dug deep into the implementation of such things, can see obvious drawbacks and oddities about Mach-O ABI, and this is what I am going to talk about. Mach-O originates from NeXTstep, an operating system created at NeXT for its NeXTstation machines, and later expanded to x86 hardware with OpenStep. NeXTstations were originally based on Motorola 68k CPUs, just like old Macintoshes.Mac OS (classic), on the other hand, used an ABI for PowerPC which followed the ABI principles defined in a document by IBM/Moto for PPC processors. So as you all may already know, m68k and x86 are CISC architectures; PowerPC used in all new Macs is RISC. To make a long story short, Mac OS X uses an ABI designed for CISC processors, mostly ignoring RISC design principles. What do I mean by that? Mach-O ABI we see now used in Mac OS X is more or less a direct port of NeXT's Mach-O designed for m68k - it relies on PC (program counter) register to perform various manipulations with data (for the geeks: PC-relative addressing). There's nothing wrong with that, as its an effective and common practice, except for one little thing: there is no PC register in RISC processors (programmatically accessible). That is not a show-stopper though - Mach-O for PowerPC just takes one of 32 general purpose registers and turns it into a program counter-style register, to base all offset calculations off it. That works well, as you can see, as all of Mac OS X applications (except for the ones compiled with Carbon/CFM) use the Mach-O ABI. That approach works well, except for one small thing: global/static data access adds about 7 cycle overhead per function, and about triple of that for cross-context calls (that is for the G4 class processor) compared to the old, Mac OS Classic ABI (excuse me for the geek talk). Mac OS Classic CFM ABI, in comparison, needed almost 0 cycles for static data access and about 5 for cross-context calls. To rephrase - applications in Mac OS X could be faster, if the Mach-O ABI followed the principles set for the PowerPC chip, and not the ones created over a decade ago for CISC ones. This brings us the question, "how much faster would the applications be if the ABI was done right?". The answer is, according to some tests done by my friends on a Macintosh Development IRC channel, the speed gain would be 10-30%, depending on each particular application (how often does it calls functions). Realistically, the speed gain would be around 10 to 12 per cent (how do I get these numbers, below). So why did Apple used an outdated ABI for a modern operating system? Frankly, I don't know the reason. About the best one I have heard - it saved Apple a few months in the Mac OS X development time so they didn't have to do massive updates to its NeXT-derived tool chain. There are signs of change though -- the recent update to GCC, the compiler shipped with OSX, allows it to perform so-called -mdynamic-no-pic optimization, which hard-codes the data addresses in the code, so the result is roughly equivalent to the CFM ABI used in Mac OS Classic -- so the GCC itself, compiled with that optimization, is 10% faster. Applications, to take advantage of that, need to be recompiled, so it doesn't affect 80% of the titles already shipped for Mac OS X. Then again, the optimization above only works for executables and not shared libraries. Either way, there is no way to change the ABI now, as it would break all of the existing applications - which is obviously not what Apple (or us) would want. And after all, who cares about a 10% speed loss? You can always get a faster Mac, right? Further reading: Update 10/21: Warning, even more technical stuff follows! Somebody on /. pointed out you can easily add a new ABI to the system. While this is true, but if you do it, you run into three problems:
The downside is that (1) adds back a lot of the CPU hit that you was trying to get rid of, whereas (2) trades the CPU hit for a lot of extra memory and complexity (ie, having two copies of the code) - and Mac OS X uses already so much memory it is already a problem. I've also been told OPENstep runs on RISC processors (non-PowerPC) - however, I have not investigated how the Mach-O ABI works there - quite possible it obeys the PowerPC guidelines, although I am pretty sure it does the same as on PowerPC. Mac OS X was originally not moved to the PPC ABI because the NeXT Toolchain used the Mach-O ABI and it would have delayed the release of OS X a few months. Now Apple is spending a few months to speed up OS X in ways that may not have been necessary if they had gone with the PPC ABI in the first place. Thanks to the Macintosh Development oriented channel regulars and an anonymous Apple engineer for helping me with this article. Related:
Comments
What impresses me about this article is that I was able to follow you. I couldn't begin to explain it to anyone else, but that was insightful stuff m'man. Posted by: Josh on October 19, 2002 9:39 PMI suspect it may have to do with keeping the OS portable across many chip architectures, as NeXTSTEP and OpenStep were. Posted by: Mason on October 21, 2002 5:53 AMNice write-up! Heh.. you've been slashdotted as well. Posted by: TigerKR on October 21, 2002 9:46 AMYour thoughts and insights are valuable. Why do you post them in nearly unreadable grey type on a white background? My eyes hurt after just a few seconds reading your blog. 12% - just think of how many hours i could reclaim with that 12% ;) Posted by: rstevens on October 21, 2002 10:45 AMWith Apple promoting the use of bundles instead of 1 file monolithic applications, it seems like you could just beef up the application file sizes so they would be self-contained (i.e. have the proper ABI version compliant libraries) and only run the right code. This just means we're going through another "FAT" binary experience except this time it's even easier to pull out your unwanted code as contextually clicking (control-click) on any bundle lets you open the bundle and manipulate the constituent files. With today's huge hard disk sizes, this is probably the quickest and easiest solution. Posted by: TM Lutas on October 21, 2002 11:51 AMI would agree with Josh on this one. I think this aspect of OSX was intended to align with OpenStep's design goals. That is, portability being of greater importance than performance. I would also agree with the point in the article, stating that it was done initially to get OSX out to market quicker. It was the path of least resistance far Apple, at the time. Posted by: rory on October 21, 2002 12:01 PMI posted this on slashdot, but figured I'd post it here as well: It just so happens that I friend of mine has a copy of "PowerPC Mircoprocessor Family: Programming Environments for 32-bit Microprocessors" sitting on his desk, which I grabbed. Here is how PowerPC processors branch (from section 4.2.4.1 of said dead-tree document): 1. Branch relative addressing mode - the immediate displacement operand is sign exteneded and added to the current instruction address to produce the branch target address. So, PC relative addressing. There is no need for a programmatically accessible program counter because this is all done by the branch execution unit. Single 32-bit instruction. 2. Branch conditional to relative addressing mode - same as branch relative addressing, except that the branch is only executed if the proper condition codes are set. Single 32-bit instruction. 3. Branch to absolute addressing - the operand address is sign extended and used as the branch target. As the name implies, this is absolute addressing. Only problem is, the operand address is only 23 bits wide in a 32-bit implementation, and with the zero pad, it gives only 25 bits of absolute address (word alignment required). So, if you absolute address anything, you can only absolute address 25 bits worth of the address space. 4. Branch conditional to absolute - same as regular absolute addressing, except that you have to encode condition codes, so the operand address is nowo only 13 bits if I read the diagrams correctly, meaning that you can only absolutely address 15 bits of address space with the zero pad. 5. Branch conditional to link register - if you clobber the link register, you can branch to a 32-bit address. Of course, you have to clobble the link register, so I would think this would be most helpful in returning from a function call, not going to it, since the link register holds the return address. And if you use it forward instead of returning, you have to load the link register. 6. Branch conditional to count register - same as link register branching as above. All of that said, the reason that the Mac OS ABI uses PC relative addressing is because the only way to fully address a 32-bit address space is to do PC relative addressing. According to this book, there is no two instruction width branch, eg a branch instruction which encodes an entire 32-bit absolute address in two 32-bit words (one word for branch encoding and condition codes, one word for the whole 32-bit address). This leads me to believe that there is no way to do all absolute addressing on PowerPC unless you implement new instructions (which will take more time to get to the processor, and to decode) or limit yourself to 15 or 25 bits of the address space. So, the short version is that that there is no way for the Mac OS ABI to do absolute addressing. Posted by: nadador on October 21, 2002 12:28 PMSorry to clog the page, but there is this as well. I realize that we're also talking about data addressing, not just branching. > Its not about branching. Its about data references using PC relative addressing. The PowerPC has no PC relative data addressing modes. Point taken. Section 4.2.3.1 of the same book is "Integer load and store address generation". 1. Register indirect with immediate index addressing for integer loads and stores - In this case, you get a 16-bit index in the instruction added to the value in a general purpose register, which is used to compute the effective address. 2. Register indirect with index addressing for integer loads and stores - this is the same as above, except that two registers are used and there is no encoded index. 3. Register indirect addressing integer loads and stores - use just one general purpose register as an address for a load or store. So, the point is that in every case, some form of relative addressing is used. In order to make relocatable code, ie code that can be linked happily with other binary objects, you have to have some sort of reference address, and PC-relative addressing is the only way to do this. Even though there is no PC-relative addressing mode, the only way to guarentee that the relative addresses used in different object files won't clash is to do PC-relative. The fact that this is not easy on the PowerPC doesn't make it any less necessary. Posted by: nadador on October 21, 2002 12:41 PMHello, in response to nadador who said "This leads me to believe that there is no way to do all absolute addressing on PowerPC unless you implement new instructions (which will take more time to get to the processor, and to decode) or limit yourself to 15 or 25 bits of the address space." I have been programming PowerPC assembly for a few month now, and can assure you that absolute addressing is possible. We do this by first loading a 32 bit value into a register (over two instructions) and then branching to the address in the register. For example, to branch to the label go_here: lis r4, go_here@h # load top 16 bits into register r4 ok, so it clobbers a register. life isn't perfect. -andrew Posted by: Andrew de los Reyes on October 21, 2002 12:50 PMCool, cool. Yeah I think that I both misunderstood the question and my response. I shouldn't post things without drinking more coffee. What my point should have been is that if you want relocatable code, you have to do some sort of relative addressing, PC-relative being the most common. And since relocatable code is generally prefereable because it makes linking more fun, its a good thing. Sorry for all the posting, when just that might have done. Posted by: nadador on October 21, 2002 1:42 PMIsn't there a solution to this along the lines of the PPC/68K processor switch? Use the native and automatically switch to non-native when necessary? This seems to be what's mentioned in the update, but what's different that this _can't_ be done? Or perhaps, what's different that developers wouldn't want to take advantage of this, as they have for the 68K to PPC switch, or the OS 9 to OS X switch? Posted by: Bob Terrell on October 21, 2002 1:45 PM>With today's huge hard disk sizes, this is probably the quickest and easiest solution. That would be nice if it were true, but the concept is NextStep's. And I do get the creepy feeling about this architecture. It seems to me that some hacker is going to figure out a good way to exploit this someday. R. >I suspect it may have to do with keeping the OS portable across many chip architectures, as NeXTSTEP and OpenStep were. Portability and addressing the specific architecture your system is going to run on... Yes, OS was for a number of platforms, all CISC I think. But this is not supposed to be OpenStep. Remember Portable Data Objects? Posted by: Rixstep on October 21, 2002 7:34 PMApple should go back to using PEF/CFM. What's the alternative, creating a 3rd ABI with a separate set of shim libraries? Crack smokers... Posted by: strobe on October 21, 2002 7:35 PMIn the past (MacOS What I just described is the standard PowerPC ABI for handling global data & functions. Posted by: rincewind on October 21, 2002 7:40 PMWhy create a new ABI at all? Why can't we use the existing PowerPC ABI which from reading the comments uses a Global Pointer (remember the complaints from x86 linux users when everything went ELF cuz it wasted a register for the GP?) to address global data items. Personally it seems silly to make everything (even data) PC-relative. Would it be possible to generate new code with the PowerPC ABI, and have the linker handle things that are linked with Mach-O objects? Since we're at it, why can't (I know this is dangerous territory) we change to ELF for new objects, executables, and libraries? And force the linker to play tricks when it has to link against old stuff? Since there'd still be a need to use old Mach-O objects, is there a way to preserve compatibility? I confess I haven't looked too deep at the Mach-O runtime conventions, but I'll remedy that shortly. (Note: I am a member of this Macintosh Development IRC Channel. I also have a very good guess who the anonymous Apple engineer is.) Relatively similiar code is generated by each compiler, but GCC has Just a quick primer for terminology, just to make sure we're on same PEF has some, but not much, overhead... Omni is NextStep trained, and For a normal macho function call, the PIC can be up to 9 cycles Case in point: the _msgSend routine of ObjC is in /usr/lib/libobjc so Similiar penalties apply to global and static data... MachO will impose a These numbers come from a friend of mine, who is an Apple engineer. He iTunes recompiled by GCC/MachO was 30% slower at MP3 decompression than with CodeWarrior/PEF. Calling trough a local function pointers is much faster in MachO. The code bellow is 20% faster when compiled to Mach0 (5.12 sec for CFM, 4 sec for Mach0). #include int TestFunc (int i) { return (i+1); } int main() clock_t start = clock(); I think Apple knows this problem wey well. But the real answer is portability. If Moto has to close /because of loosing lot of money/ its powerpc business, Apple can do nothing than change to x86 proc. Posted by: Zoltan on October 22, 2002 3:35 AMHas anyone checked out the comparitive cycle counts (is the info even available yet) for PEF v Mach-O accesses on the IBM 970? It seems, given trends in CPU design, that this situation would only worsen over time. I doubt portability was really the primary factor in the decision. Apple _had_ to know that the result of this situation would intrinsically cause porters to favor PEF's performance. Since anything that would direct design towards PEF/CFM is automatically non-portable, they would have been explicitly choosing portability at the expense of virtually ALL of their big ISV's apps. It seems FAR more likely they just didn't have time/resources to make the release schedules if they'd had to change the basic ABI for so much of Rhapsody/OSX's toolchain/frameworks. Unfortunately, now they appear utterly stuck. The only solutions readily available (thunking, etc.) would all cause significant slowdowns of existing code, Recompiling with a modified toolchain might solve the Apple-internal switchover, but getting ALL existing Mach-O-using ISV's to recompile their commercial apps (using a new, probably buggy, modified toolchain) is a VERY hard sell. So...where can Apple go from here? Mach-O's performance penalties will likely worsen over time on new PPC CPUs. What can they do? To the portability folks: _Apple_ might entertain the notion of hauling everyone over to x86 or such at some future point, but unless they can sell it to Adobe, Macromedia, etc. there'd be no point. I cannot see that as something Apple even _could_ sell, given the pain those ISV's would receive in turn from Microsoft. While that portability is a nice rhetorical concept (and occasional 'big stick' in PPC vendor negotiations), it just doesn't seem practically useful. Punishing ALL existing apps for it does not seem like a "good" strategy for Apple in the long run. Posted by: JFW on October 22, 2002 10:54 AMMiklos, I've seen this discussed openly by Apple engineers on various mailing lists in MachO vs. CFM discussions. Apple engineers are well aware of this problem, but for whatever reason the directive from the top is to move away from CFM. Posted by: Trillan on October 22, 2002 4:03 PMJust to clarify the portability of NEXTSTEP: Version 3.3 runs on Intel, m68k, SPARC and PA-RISC hardware (see, for example, http://www.nleymann.de/Nextstep/index.htm). The latter two platforms are, of course, RISC. Posted by: Toby Thain on October 22, 2002 7:49 PMThe big problem as I see it is the schism between the PEF/CFM and Mach-O development camps. Apple has been pretty unsuccessful at convincing it's major ISVs (Adobe, Macromedia, etc.) to move to Mach-O. One less obvious side effect of this is that all the plug-in markets associated with those big ISVs are thus also stuck in the PEF/CFM camp. It would also seem that any ISV who needs maximum performance has no choice but to go to the PEF/CFM camp. Add those up, and that's a _big_ chunk of the total Mac ISVs. Meanwhile, Apple is clearly putting the vast majority of it's efforts and evangelism towards developing for Mach-O using the PB-GCC toolchains. A toolchain which cannot be used for PEF/CFM development. Apple's old "free" CFM toolchain, MPW, is deprecated, and cannot produce Carbon PEF/CFM, IIRC. This means that the only available toolchain for PEF/CFM development is CodeWarrior. While CW is a rather nice platform (and capable of Mach-O output as well), it's expensive. Which has the effect of inhibiting development from small ISVs and shareware/freeware developers, and in turn inhibiting the growth of the rather important "plug-in" markets of those big ISVs' PEF/CFM apps. It also "single sources" the toolchain for most Mac application development on an _outside_ company, which is never a good thing (and makes ISVs twitchy). So, where does that leave things? Apple cannot seem to get it's big ISV's to migrate "past" Carbon/PEF/CFM, partly I'm sure on performance grounds. Yet Carbon itself was supposed to be a "transition mechanism", and while Apple's done quite a bit to reduce Carbon apps' "second-class" status in OS X, use of "Cocoa in Carbon" is still a pretty big performance hit. Carbon/PEF/CFM is likely here to stay. While Apple will undoubtedly continue pushing Mach-O, it seems they also need to resurrect MPW or such so that the Carbon/PEF/CFM toolchain is NOT single-sourced outside Apple. Further, it draws into question the direction Apple should take OS X in the future. Is the answer that Apple needs to refocus on making Carbon the "first-class" OS X API, and start de-emphasizing Cocoa usage? It certainly would seem very difficult for them to redirect Cocoa towards PEF/CFM. Yet it's equally unlikely that the big ISVs will tolerate their apps being relegated to second-class status when it comes to Apple's tuning of OS X's performance and features. In the short- to mid-term, performance is very much a "critical concern" to Apple marketing, but so is retaining those big ISVs whose apps drive a big percentage of Mac sales. What to do? Opening PEF and subsidizing adding PEF generation to GCC would certainly be a good start. They also need to figure out how to maximize Carbon/PEF/CFM performance, even if it's at the expense of Cocoa apps (perhaps even moving Cocoa to PEF/CFM, long-term). Will it happen? Probably not. I think there's simply too much Apple exec ego invested in their current direction. And in the end, the users lose. I just have a hard time seeing how Apple can continue driving OS X tuning and development away from the API and ABI platform used by all it's big ISVs. Someone's got to blink here, and whether Avie and Steve like it or not, it's probably got to be them. In the long-term, they need to realize that MacOS is for the ISVs and users, not the other way around. Currently, they seem a wee bit murky on that particular concept. The big ISVs can leave Mac and survive (and the hit they'd take in doing so is shrinking over time). Maybe it's time Apple set egos aside, and examined whether it can survive the big ISVs leaving -- and act accordingly. "Portability" is meaningless if they lose their critical ISVs. Posted by: JFW on October 22, 2002 10:57 PMBy the way, it's also very much worth noting that Metrowerks, the producer of the toolchain used by virtually all of the major Mac ISVs, is now owned by Motorola. Apple migrating off Motorola, even if it's just to IBM, will impact their relationship with Motorola. I think Apple's need for a in-house Carbon/CFM/PEF toolchain is going to become a critical issue regardless of how they address any CFM/PEF versus Mach-O issues. And it's probably going to become a critical issue sooner as opposed to later. Posted by: JFW on October 22, 2002 11:11 PMARGH!!!! First of all, this couldn't be for strictly portable reasons, because the x86 doesn't have PC relative addressing, nor does it have a program accessible PC! (In, x86 though, it's called the IP, or EIP in 32-bit mode) x86 relies on code being written into explicit locations, if it's PIC code (position independant code) most of it is then done through a faked IP also. (same as PPC) Fact is that if your doing PIC, you need to have IP/PC relative addressing, or at least have a faked register/memory location which holds a base of the program. The x86 shouldn't need this at all really, when you think about it, because you've all been taught that globals are bad. Well, to tell you the truth, globals _AND_ statics are bad. If you're holding information outside of the function in the heap, then you're doing something wrong. The x86 (and I imagine the PPC, also) is significantly faster using local variables and parameters. (Single %ebp relative addressing mode) rather than globals (absolute addressing in PDC (position dependant) and worse in PIC) The solution in both? probably to avoid using globals, and just use locals/parameters... especially since the PPC could easily deal with those with a single register indirection. Just to make it clear, I'm an x86 assembly guru, and no close to zilch about PPC assembly. Posted by: Daniel Foesch on October 23, 2002 1:20 AMAndrew de los Reyes’ example with the “b r4” PowerPC instruction is wrong. The only branch-to-register instructions available in PowerPC are blr (branch to link register) and bctr (branch to counter register), as well as conditional variants of these. Other possibilities are forbidden because of the complications they would cause to the instruction pipeline. This issue confirms my impression of the whole OS X effort as a triumph of politics over business sense, or even technological sense. The NeXT folks had been trying unsuccessfully for over 10 years to find a market for their technologies, until they managed to take over the running of things at Apple. Now they’re trying to force the Apple market to adopt their way of doing things, even if it kills them. Posted by: Lawrence D'Oliveiro on October 23, 2002 2:22 AMJFW says that CodeWarrior is the only tool available for Carbon/PEF/CFM development. This is incorrect. MPW can certainly do it as well. I agree with JFW‘s other comments: Cocoa will have to be phased out at some point. It’s quite clear from developer adoption rates that Carbon is the only OS-X-native API with a long-term future. Posted by: Lawrence D’Oliveiro on October 23, 2002 2:30 AMBy the way, I never really liked that “official” ptrglue convention for doing cross-TOC calls on PowerPC. My ComponentGluePPC tool generates the following all-inline sequence instead: lwz r12, TheProc(RTOC) I figure this saves a branch over using ptrglue, so would it use less than the 5 cycles that slava and Feanor estimate? Posted by: Lawrence D’Oliveiro on October 23, 2002 2:39 AMRe: "Apple's need for a in-house Carbon/CFM/PEF toolchain" They have one! MPW works perfectly, today, for building Carbon and Classic PEF binaries (and, since Metrowerks' pathetic decision to drop 68K in CW7, it's still good for that platform as a bonus). Posted by: Toby Thain on October 23, 2002 4:26 AMSo Metrowerks dropped 68k - So what? If your still selling 68k software, then I feel for ya man =). Additionally, Carbon never was, nor ever will be a transitional technology. Carbon was developed because software vendors demanded a C interface to the GUI, and to tell you the truth, regardless of the advantages of Cocoa, it is good that we have one. And in the end, that's all Cocoa/Carbon are - GUI builders. Sure they have some additional application support in them both, but for the most part the really interesting stuff in the system is outside of their primary usage range. There is a framework called CarbonCore - a REALLY bad misnomer that I think is only there to make it easy to spot. What CarbonCore really is is a high level interface to the low-level BSD APIs that adds functionality that make all programs better. And on top of it all, CarbonCore is relied on by BOTH Carbon AND Cocoa. So I don't think it's going away anytime soon. At WWDC Apple stated that currently you can do Cocoa in Carbon (and Carbon in Cocoa). They also stated that in the future they anticipate you being able to mix and match at the widget level. That is, you can take a Cocoa edit text box, and a Carbon popup menu and put them in the same window. Well, the easiest way that I see that being done is if all of the Cocoa widgets are rehosted on top of Carbon widgets (or a 3rd widget library is created that both use). And then Cocoa becomes just like all other application frameworks on MacOS, just implemented in a different language (Obj-C instead of (generally) C++). Oh, and I wrote an app last night to look at how fast internal/external calls are on CFM vs MachO. Thus far it looks like internal CFM calls are faster than internal MachO calls, however function pointer calls are slower in CFM. Jury's still out on the MachO results however, since some of the data looks a little funny. Posted by: Rincewind on October 23, 2002 12:41 PMFirst of all, my apologies for stating Carbon/PEF/CFM wasn't possible with MPW, that was (kind of) an error. It's possible to do so, but it requires using tools which Apple has declared deprecated, and end-of-lifed (notably the in-house compilers, MrC and friends). That clearly is a decision which needs to be revisited, particularly if Apple's about to really tick off Motorola (and by extension, Metrowerks). I'm not sure it's the best approach, however, particularly if the OS itself no longer uses MPW (which is the case, IIRC). Personally, I think there's a lot to be said for Apple just "buying" (permanent source license, whatever) the CW toolchain, akin to what Be did with their CW environment. I doubt anyone would say that the MPW environment (and particularly, the tools) are substantially better than the tools in, say, CW8. "Buying" CodeWarrior would give Apple a toolset with longer legs, which most of it's developers are using, without leaving Apple at the mercy of Motorola/Metrowerks' future directional changes. It would certainly act to minimize any developer concerns about the single-sourcing of their primary toolchain outside Apple. Posted by: JFW on October 23, 2002 1:51 PMDoes CarbonCore talk to the BSD layer, or does it talk to the Mach primitives underneath directly? I don't recall seeing much documentation of how it works internally. Also, while CarbonCore may be used by some Cocoa parts, there is still a _substantial_ infrastructure in OS X (including the low-level stuff, such as drivers, etc.) which are living in Obj-C/Mach-O land. Those are probably the most impacted by inefficiencies in Mach-O, and also (unfortunately) include components of the OS which are among the most performance-critical. The drivers and driver-level APIs are tied up in IOKit (an intrinsically NEXTSTEP/Obj-C/Mach-O construct), The "recommended" user-space client hardware access APIs are also reliant on Obj-C/Mach-O. Even the microkernel itself is designed around and relies upon the Mach-O/dyld environment. It's very hard to see how Apple could do major changes to those pieces without having _severe_ performance impacts on the system (or incurring massive amounts of work). Having hardware developers rewrite all their drivers, or having Apple to do it, is also huge amount of work. All of this is almost certainly why Apple didn't do it in the first place. Unfortunately, what was convenient before is now (as usual) becoming a problem, functionally and politically. Posted by: JFW on October 23, 2002 2:44 PMto post a message accuratly explaining the issue:
Don't think so. CFM vs. Mach-O the Ultimate Fight Skimming through the various posts on the site given above, it appears that So, lets compare both ABIs: CFM is TOC based. This means that every fragment like a shared library comes A transition vector is a simple data structure which contains both a pointer Its very important to understand that whenever you want to call a function So a cross-fragment or cross-TOC call (i.e. app calls an OS function) must Apple's standard code for this looks like this: bl moo_glue ;call the cross-fragment glue moo_glue: As you can see, a cross-TOC call requires 7 instructions overhead whereby 5 Now, basically the same code sequence is necessary for functions which are Accessing exported variables (global vars) is very easy in a TOC based
Thus a call from an application function to the OS function looks like this: bl moo_glue ;call the cross-module glue moo_glue: (__DATA,__la_symbol_ptr) section The above code requires 5 instructions and 1 memory access. This is 2 instructions and 4 memory accesses LESS then in the CFM case. Now, in real life we have actually two important kinds of executables: The following example should make this clear: Consider framework A. Dyld loads it into memory at physical address 1000. One way to achieve this is to use PC relative addressing for both This is why a cross-module function call in the case of a framework or bl moo_glue ;call the cross-module glue moo_glue: (__DATA,__la_symbol_ptr) section In this case we need 8 instructions and 1 memory access. This is 1 instruction more than the CFM case but still 4 memory accesses None of the above glue code is necessary for functions which are called Accessing exported variables (global vars) is quite expensive in Mach-O. It This is 5 instructions more than in the CFM case.
Mach-O has as its most important advantage, compared to CFM, the fact that This is very important for OO languages like C++ or ObjC. Lets consider Every ObjC method call requires a jump to the objc_msgSend() function in the Now, in the case of Mach-O things are simple. The method cache contains the In the case of CFM, things would be more complicated. It would be necessary Remember, method implementations come from different libraries like In short: objc_msgSend() requires 1 cross-module call in the case of Mach-O; The biggest disadvantage of Mach-O is that it is, compared to CFM, more
First of all it would have been necessary to adapt the whole tool chain to Next is the kernel. It would have been necessary to make the Kernel aware of Don't think that it would be a good idea to make the Kernel dependent on a Next in line is the dynamic linking server. "Porting" the existing CFM from But wait we're not finished yet. Next in line is the ObjC runtime which also
CFM must relocate all TOC entries after the TOC has been read from disk. CFM must aggressively resolve all import transition vectors in the TOC at Mach-O, because it has no TOC, doesn't have to do these things. PEF requires in its current incarnation both a data and a resource fork per Mach-O is strictly a single fork file format. Though, it still manages to be
CFM is deployed on a system where 90% of all APIs are exported by a single Mach-o is deployed on a system which comes with 125 frameworks and 20 shared
Dietmar Planitzer _______________________________________________ Above and beyond that, those who have been testing the performance have been cheating!!. Dietmar Planitzer seems to think that having a kernel that knows about resource forks means tying it to one particular file system. This is not true. Not counting HFS and HFS Plus, there is already a widely-used filesystem that has integrated support for MacOS resource forks and Finder info, indeed it is fully extensible to cater for any conceivable kind of metadata. Soon it will probably be the most widely-used filesystem in the world. I’m talking about Microsoft NTFS. The idea that all files should be “flat”—just consist of a data fork, without support for extensible metadata—that is a concept more suited to the 1960s than the 21st century. Posted by: Lawrence D’Oliveiro on October 24, 2002 6:29 AMActually, outside of the Cocoa Frameworks nothing in the system depends on Cocoa. IOKit at it's lowest level is C and the drivers themselves are C++. I can't give you a direct implementation of CarbonCore (haven't disassembled THAT much of it =p) but I can tell you that for the most part it is implemented on the BSD APIs (after all, anything above them would cause infinate recursion =p). To add behavior parity between Cocoa & Carbon, Cocoa is moving towards using CarbonCore where it makes sense (e.g. in 10.0 -> 10.1 cocoa's document handling moved from file paths to FSRefs & Aliases). Getting back to the CFM/MachO bit. The performance hit at the startup by CFM having to resolve all of it's function/data exports is small in a program that actually uses those imports a lot. As an example, just imagine traversing a relatively large directory structure. You may call into the library a few hundred thousand times just for that with each function taking it's overhead. Depending on that overhead it may make sense to amortize that up front if you can. Posted by: rincewind on October 24, 2002 8:12 AMWow, this article sucks. I've noticed lots of articles lately from people trying to be the first to uncover some nasty plot, or flaw, or side effects in OS X. Please do research first. Or just keep your mouth shut when you don't know what you are talking about. How in the world could the MachO ABI carry over CISC principles from the x86 when the x86 doesn't even support PC-relative addressing? The x86 has PIC overhead as well, as you must execute extra instructions to induce the current PC. One could say that the processors are at fault for poor PIC support, but superscalar and superpipelined processor can't easily maintain PC information for every inflight instruction. I can't help laughing at your naive insinuations that the ABI ignores RISC principles. You completely ignore the fact that the ABI specifies register usage and other components which are CPU specific. The PowerPC has 32 registers, and MachO allocates 8 of them for parameter passing, one for the stack, some as callee-saved, some as caller-saved, etc. Before critizing an ABI, first look into the issues. Study the different ABIs and their uses. They always make compromises. The PowerPC has a variety of ABI standards: SVR4, embedded SVR4, AIX, MachO, and whatever OS 9 was using. But making code into a shared library is always difficult and has overhead. Also consider ABIs from other platforms to help put the issues into perspective. They may take different approaches that help clarify the trade-offs and compromises. But really, just keep quiet. You don't know what you are talking about. Posted by: wapentake on October 24, 2002 8:16 AM> Cocoa will have to be phased out at some point. It’s quite clear from developer adoption rates that Carbon is the only OS-X-native API with a long-term future. My friend, that is simply not going to happen. Read what you yourself wrote about the NS people. If they can bulldozer a CISC ABI on Apple with the latter's RISC CPU, do you really think someone else is going to budge them on Cocoa? They're not going to let that happen, Sente is not going to let that happen, Misc and Omni are not going to let that happen, etc etc etc. Besides - toss aside the possible politics here and give the learning curve some time to take effect, and you have the most powerful development environment on the planet. Carbon is waaaaay too confused with all the quiltwork legacy APIs. There is just no way they can hold all that together. I can see them scratching Carbon, but never Cocoa. It's a mess of a mess whatever way you look at it, but no one is ever going to sacrifice the love child. R. Posted by: Rixster on October 26, 2002 8:06 PMAgain, neither API is going away. Carbon is not a quiltwork of legacy APIs. Yes, there is legacy in Carbon, but that part is quite deprecated. Anyone developing a new App based on legacy APIs is setting themselves up for failure. But if Apple hadn't left in that legacy, then MacOS X would have been a HUGE failure. Mark my words, Carbon in 2 or 3 years will be as different from the MacOS 9 toolbox as the MacOS 9 toolbox is different from the System 1 toolbox. Posted by: Rincewind on October 27, 2002 5:47 PM"And after all, who cares about a 10% speed loss? You can always get a faster Mac, right?" That 10% is significant. Ten percent of 1 GHZ is 100 MHZ, which is more bandwidth than computers of only a few years ago had. That 100 MHZ is useful for multitasking. Without it, less gets done. Posted by: Golem on November 19, 2002 12:40 PMDecisions are not made in a vacuum. The next-generation Mac OS had been tried several times, and all had failed up to this point. I can completely understand them wanting to get something working out the door now, rather than spend 6 months fixing things that were already working fine. Especially when you consider at that point, the G4 was all they had. One of the goals of Mac OS X, AFAICT, was to make them less dependent on the PPC. The G4 wasn't the hottest chip on the block any more, and Mac OS X isn't terribly G4-specific (rumor has it there's an internal project called "Marklar" that has an x86 Mac OS X port). You can mock them for saving a few months early on at the cost of a few months of optimizing later, but if they had needed to wait a few months more to ship anything at all, would there even be a Mac OS X today? Would some companies have simply given up if Apple had said "Yeah, *this* next-generation Mac OS is going to be really great, and we're really going to ship it this time. It's done, but we're reworking the ABI to be completely incompatible with what's running today, so you can't have it for another 6 months -- but trust us"? For each part of any design, you can find something that's not opmital. If they waited to make everything completely optimal before shipping, it never would have shipped. I, for one, would rather have a good Mac OS X today than a perfect one never. Posted by: on November 11, 2004 4:02 PMHindsight certainly seems to be 20/20 with OS X on X86 coming out. :) Posted by: Mikey on July 9, 2005 10:27 AMHindsight certainly seems to be 20/20 with OS X on X86 coming out. : Yes, and many of these comments look positively stupid considering where apple is with OSX as of 2006. Its amazing the kind of zealotry shown here with people advocating RISC over CISC, Carbon over Cocoa, platform-specific resource forks and PEF over Mach-O just because it was stuff from NeXT and not "Apple". Just, Amazing. Posted by: Ken on May 7, 2006 1:13 AMKeep comments on topic. If a comment is unrelated to this post, it may be removed or moderated. |

