Sometimes you just don't realize how good you have it
[info]chanson
Yesterday I was listening to the latest episode of Hanselminutes, Building a Developer PC, and Scott started talking about how he needed multiple CPU cores so he could take advantage of the new parallel build support in Orcas.

"Orcas" is Microsoft Visual Studio 2008.

Yes, 2008.

Say what you will about Xcode, but at least it's been able to build multiple files at once on multi-CPU machines for the better part of a decade.

(Also, how could anyone build the ultimate developer PC? You can't run Mac OS X on a PC you build, only on a Mac. Therefore it can't be ultimate. But if he were to get, say, an 8-core Mac Pro, he could run both Mac OS X and Windows on it. And those 8-way machines are pretty damn ultimate...)

Tyler doesn't suck at threads
[info]chanson
R. Tyler Ballance, normalcy:
it's kind of depressing to realize that I am like 95% of the developers out there, in that:

i suck at threads.
What Tyler is forgetting is why most developers suck at threads. (And it's not just another example of Sturgeon's Law either, though that has something to do with it.)

The reason most developers suck at multithreading is because most developers approach learning and working with a new technology using a very straightforward but, in some cases, naïve process:
  1. Skim the API.
  2. Skim the API documentation.
  3. Write some trivial test code.
  4. Use in a real-world application.
Unfortunately, when it comes to concurrent programming, this is grossly insufficient. And furthermore, due to the very nature of concurrent programming, there's no way to make it sufficient.

The upshot of this is that a lot of people think they can write threaded code because they've read their platform's threaded API docs. Then they start running into all of the textbook concurrency issues — race conditions, deadlocks, crashes, etc. — and they flail while trying to solve them. For example, by putting invocations of sleep or usleep in threaded code to ensure that a dependent operation completes, instead of using a monitor. For example, by declaring a variable and checking its value in a loop rather than using a platform-provided lock (whether a blocking or spinning variant).

Fundamentally, then, someone who actually understands that they don't know enough to do a good job with a technology like multithreading is actually ahead of someone who thinks they understand it but really doesn't. The person who knows that they don't understand is likely to search the literature, seek out and apply best practices, and in general not try to solve every problem themselves. This is better for everyone.

By the way, the Wikipedia article on threads isn't a bad place to start to learn about some of the issues involved in writing concurrent code and how to address

Joe Duffy of Microsoft's Developer Division also has an excellent weblog where he discusses concurrent programming in a lot of detail. Anyone creating reusable frameworks — whether just for use within their apps, or for wider use — must read his extremely thorough post Concurrency and the Impact on Reusable Libraries. It covers a lot of the issues that every developer will need to address in writing reusable concurrent code; sure it's written from a .NET viewpoint, but virtually all of the concepts translate to any platform you happen to use.

Cooperative User Threads vs. Preemptive Kernel Threads
[info]chanson
James Robertson, Cooperative Threading:
Well, in Cincom Smalltalk, this model gives you predictability - you know exactly what a thread is going to do. The issue with runaway threads rarely comes up for a simple reason - most processes end up pausing for I/O (user input, db access, file access, sockets - what have you). That wait for I/O state is what prevents a problem from arising.
This is a classic problem and I'm honestly surprised to find out that Cincom Smalltalk implements cooperative user-level threads rather than supporting preemptive kernel threads.

Here's what I posted in response to James, unattributed thanks to the torturous comment interface on his blog:
One issue with cooperative threads relative to preemptive OS-supplied threads is that you get far less opportunity for true concurrency within an application. In an era when multi-core processors are becoming significantly more common, this is becoming exceptionally important to application developers. It's not just about doing I/O concurrently with other operations or allowing an application to perform multiple tasks at once; it's about allowing a task to be completed faster because more efficient use is being made of machine resources. This is why I take an extremely skeptical view of user-level threading packages, especially in software built on platforms that have reasonable kernel-level threading.
You'll note that the various threading APIs in Mac OS X are all built on kernel threads.

Furthermore, the Mach microkernel schedules exclusively in terms of threads. The microkernel doesn't even have a conception of processes! It only knows about collections of resources — tasks — such as address spaces and IPC ports, and flows of control — threads — that it can schedule on processors.

Mac OS X Developer Mailing Lists
[info]chanson

I just figured I would take some time out for a public-service announcement.

If you're developing Mac OS X software, you should really be on the appropriate developer mailing lists. Apple hosts a number of mailing lists at lists.apple.com and hosts archives there too. The lists are managed with GNU Mailman and have its convenient subscription and administration interface.

In my opinion, the principal lists for application developers are:

carbon-dev Developing software with the Carbon framework.
cocoa-dev Developing software with the Cocoa framework.
java-dev Developing software for Mac OS X in Java.
webobjects-dev Developing web applications with WebObjects and EOF.
xcode-users Using the Xcode Tools suite for Mac OS X development.

There are also a whole lot of task- and technology-specific lists, including lists for developers implementing AppleScript support in applications, developers working with networking, developers creating device drivers, developers writing multithreaded applications, developers working with Xgrid...

Be sure to look through the list of lists to see what else you can take part in.


Why can't I do that in a thread?
[info]chanson
Threads are a very powerful concept, but there's a lot of confusion about what is and isn't thread-safe in Cocoa. Just this morning there was a question on the Cocoa-Dev list about how to append to an NSTextStorage from a non-main thread.

Cocoa is a framework, not just a class library. The distinction is subtle but important: A class library provides a set of classes you can use to build software. The C++ Standard Template Library is a class library. On the other hand, a framework is something that your application plugs into to build software. In other words, a framework is like Hollywood: Don't call us, we'll call you.

Furthermore, Cocoa does all of its event handling and drawing on the main thread, the first thread created in your application. This means that no matter what you're doing on another thread, Cocoa may try to process user events or do some drawing. And since Cocoa is in control, not your code, just because you add locks around all of the non-thread-safe functionality in your application doesn't mean that Cocoa will use them.

So, for example, if you want to append to an NSTextStorage you need to do so from the main thread. If you want to reload an NSTableView you need to do so from the main thread. If you want to update or access the value of any control, you need to so from the main thread.

What's more, this kind of thing can happen as a side-effect now as a result of Key-Value Observing and Cocoa bindings. If you change a property using Key-Value Coding — or even using the property's accessors when automatic observer notifications are enabled — and there are observers, value-changed notifications will be sent to those observers immediately. In other words, on the same thread where the value was changed.

Don't assume thread safety.
[info]chanson
Is a random API thread-safe or callable from threads other than the main thread?

Think about it: It takes effort on the part of the creator to make an API thread-safe. You need to make sure it's properly managing access to any internal shared state, which means more work during design, more time during development, and more testing.

The upshot of this? You shouldn't assume, in the absence of general or specific documentation to the contrary, that any particular API is thread-safe. Period. I'm not talking about any particular platform or API here either — this applies everywhere. It's just one of those assumptions you shouldn't make. And it applies both when calling the API from multiple threads at once and when calling it from any thread other than the main thread.

The corollary is that you also can't assume that you can make an API thread-safe by surrounding it with locks. Just because you're ensuring your code is only ever invoking that API from one thread doesn't mean you're catching every invocation of it. Better to play it safe.

What?! Threads aren't rocket science, guys!
[info]chanson
Chris Brumme, Apartments and Pumping in the CLR [cbrumm's WebLog]:
The MTA [Multiple-Threaded Apartment] is effectively a free-threaded model.  (It’s not quite a free-threaded model, because STA [Single-Threaded Apartment] threads aren’t strictly allowed to call on MTA objects directly).  From an efficiency point of view, it is the best threading model.  Also, it imposes the least semantics on the application, which is also desirable.  The main drawback with the MTA is that humans can’t reliably write free-threaded code.

Well, a few developers can write this kind of code if you pay them lots of money and you don’t ask them to write very much.  And if you code review it very carefully.  And you test it with thousands of machine hours, under very stressful conditions, on high-end MP machines like 8-ways and up.  And you’re still prepared to chase down a few embarrassing race conditions once you’ve shipped your product.
Uh, what?

Look. I'm generally the first to point out — quite vocally at times — that 90% of software developers are crap. That's just Sturgeon's Law, exacerbated by a management culture that continues to treat software development like a manufacturing rather than R&D activity.

But writing threaded code isn't bloody rocket science. You just have to follow a few simple rules, like "Use a synchronization mechanism to manage access to shared data structures," and "Don't assume an API is thread-safe unless it's documented as such," and learn a couple simple synchronization primitives. Are so many developers so terrible that it's more worthwhile to develop APIs that hide threads from them than to keep a clean and general framework and expect them to be able to figure it out?

Then again, the exceptionally poor quality of most developers is a good jutsification for my high consulting rate.
Tags: ,

Objective-C Language Enhancements
[info]chanson
There have been some Objective-C language enhancements in GCC 3.3 that are available to anyone working on Mac OS X Panther (10.3). Unfortunately they require support from the Objective-C runtime, so while they integrate correctly with code developed on earlier operating systems, code that uses them won't run on earlier operating systems. Hopefully these changes will also be supported soon by the GNU Objective-C runtime used by GNUstep; they seem very useful.

There are two major enhancements described in the GCC 3.3 release notes included with the August 2003 GCC update: A Java-style exception handling system, and Java-style synchronization blocks. Previous versions of Cocoa used preprocessor macros based on setjmp and longjmp for exception handling, and neither the language nor the framework had built-in thread synchronization primitives. (Cocoa did have several classes providing synchronization functionality though.)

The new exception handling system is modeled very closely on Java's. In the current system, you write code like this to handle exceptions:

  NS_DURING
  {
    ...
    [NSException raise:NSInvalidArgumentException
                format:@"Invalid argument '%@' to method 'blah'", argument];
    ...
  }
  NS_HANDLER
  {
    if ([[localException name] isEqualToString:NSInvalidArgumentException]) {
      // handle the exception
    } else {
      [localException raise]; // re-raise the exception
    }
  }
  NS_ENDHANDLER

Now, you would write the same thing as:

  @try {
    ...
    @throw [[[MyInvalidArgumentException alloc] init] autorelease];
    ...
  }
  
  @catch (MyInvalidArgumentException *invalidArgumentException) {
    // handle the exception
    
    @throw; // re-throw the exception if it wasn't handled
  }
  
  @catch (id allOtherExceptions) {
    // all other exceptions will be caught by this general construct,
    // or if it isn't used, they'll be caught higher up the call chain
    // just like in Java or C++
  }
  
  @finally {
    // do any cleanup that needs to be done regardless of whether
    // an exception was thrown
  }

This will lead to much cleaner code in the long run, and it's a much closer match to what many developers new to Cocoa and Objective-C are used to as well. Everybody wins!

The other feature added is Java-style serialization. It's not a complete implementation of Java's synchronized keyword; the only thing supported are blocks synchronized on a single object. But that's plenty because it means you no longer have to maintain your own NSLock for simple synchronization:

  NSMutableArray *sharedQueue = [[NSMutableArray alloc] initWithCapacity:5];
  
  ...
  
  @synchronized (sharedQueue) {
    // add an element to the shared queue
    [sharedQueue addObject:object];
  }
  
  ...
  
  @synchronized (sharedQueue) {
    // remove an element from the shared queue
    if ([sharedQueue count] > 0) {
      element = [sharedQueue removeObjectAtIndex:0];
    }
  }
See? Much more straightforward than managing a separate NSLock. (Of course, for shared work queues you're generally better off using an NSConditionLock but this is just an illustration...)

Hopefully Objective-C will continue to evolve and do so in the right direction. There are some interesting discussions taking place now about the language's future in various places like the GNUstep discussion list and the comp.lang.objective-c Usenet newsgroup.

Mac OS X developer frameworks
[info]chanson
Anyone doing Mac OS X development should check out OCUnit from Sen:te and Log4Cocoa by Bob Frank.

OCUnit is a unit testing framework modeled on the Smalltalk unit testing framework written by Kent Beck, father of eXtreme Programming. I've raved about unit testing here before. OCUnit makes it extremely easy to integrate unit testing into the build cycle for your Cocoa applications.

Log4Cocoa is a straight port of Log4J to Objective-C and the Cocoa frameworks. Bob did a lot of good, hard work and now more developers need to pick up the ball and run with it. For the uninitiated, you use Log4Cocoa to add logging to all the various interesting parts of your code. And that logging can be controlled at run time. It can be useful for debugging hard-to-find bugs, like timing-dependent bugs and multithreading bugs.

(Oh, and if you haven't checked them out yet, be sure to look at BDControl and BDRuleEngine.)