MultiAgent Systems

I was looking at my copy of An Introduction to MultiAgent Systems by Michael Wooldridge. It was woefully out of date. My second, and much expanded, edition arrived in the mail today.

I guess I’ve got some reading to do…

Pattern Matching Book Titles

Lately I’ve noticed a trend in the names of technical books coming out. It seems we’re running out of titles for books and publishers are now using an algorithm.

How many technical book titles do you know fit the following pattern?

[ Beautiful | Effective | Programming ] TOPIC [ Driven | Oriented ] [ Analysis | Design | Development ]

My Best Technical Books

I recently went through my collection of technical books with the intent of pulling out my favourites. These are books that I consider to be the most important ones in my collection.

I have well over a hundred technical books. I use most of them and like many of them that are not on this list, but this list consists of the few that I revisit much more than others. These books may have an important idea in them that I use regularly in my work, or they may be of a reference sort that I refer to frequently.

I might not like all of the book or agree with the author on all points. However, if there’s a gem of an idea in there then I want to use it.

This is the list of those 22 favourite books. As I was making up the list I realized that there are several important “honourable mentions”. Perhaps that will be a posting for another time, though I will mention a few here and there.

As I complete the list below and look back over the books, I realize that most of them have a common theme to them, though this was not obvious to me as I constructed the list. If you remove most of the language specific books from the list (but even then keeping most of them), you’ll notice that all of the books in some way extract common patterns of importance to their subject area. Some are clearly written for this purpose (“Design Patterns”) but others do this as part of an agenda along the way (“Extreme Programming Explained” describes important patterns in a software development process, but really wants to promote Extreme Programming).

The ability to explain the common patterns clearly is probably why these books are on my list, now that I think about it.


Project Management and Estimation

Project Management and Software Estimation are hard, despite what the latest fad in either of these areas hawks. There’s no end to the theories on how things should be done correctly and generally my experience has been that software people (including myself) are poorly trained in this area. We have to learn by doing, making mistakes, mentoring with people we know and reading solid material.

There are many other good books like “Rapid Development” and “Software Project Survival Guide” that aren’t on this list. Those two are part of my honourable mentions.


Software Project Management, by Walker Royce
Addison-Wesley, 1998, ISBN 0-201-30958-0


This book is a great foundation in how a software project should be structured overall. It tends to avoid the process based questions (agile, spiral, waterfall, etc) and instead focuses on where your major milestones are, what you need to pay attention to in those milestones and who you need to answer to when you’re running your project.


A Guide to the Project Management Body of Knowledge (PMBOK) Third Edition, by The Project Management Institute
Project Management Institute, 2004, ISBN 1930699-45-X


Software projects have some unique aspects to them, but when you step away from the technology all projects run according to common principles. You have issues of risk, cost, quality, scope, communications and several more. These core principles are discussed in the PMBOK and you are provided with a standardized process for dealing with them. It needs to be customized for your department and project (see the previous book), but this book is essential to remind you of all the parts of a project of which you should be aware.


Function Point Analysis, by David Garmus and David Herron
Addison-Wesley, 2001, ISBN 0-201-69944-3


Function Point Analysis (FPA) isn’t very popular anymore. We have moved on from our COBOL days when everything was done with database queries to mainframes. Then again, have we? Is a web application connected to a database really that different at a high level? Are two layers of software talking to each other all that different from a request/query based system? (I just threw those in to stir the pot.)

Whether to use FPA or Source Lines of Code (SLOC) count is a point of endless religious debate. Personally I think that both approaches are flawed if you apply them from a purist perspective. I do like the FPA principles of responsibilities and relative difficulty as assigned to code blocks, since they seem more natural (and estimatable) to me. I use a responsibility/difficulty approach for my software estimates and so often refer back to this book for core principles.


Software Development Process

Software seems rife with hard nosed positions and heated debates. I believe that what software development process to use is the most heated of all of them by far (just in front of what computer language to use). In reality, different software development processes have their advantages and disadvantages, but these differences are not as stark as they may seem. Often very fine points of detail are wrongly made out to be of infinite importance.

That being said, as with the previous section, all software development processes have a core of commonality to them. The books in this section do support a specific process, but they do a good job of touching on the commonality of processes as well.


The Unified Software Development Process, by Ivar Jacobson, et al.
Addison-Wesley, 1999, ISBN 0-201-57169-2


This book came out as part of the trilogy of UML books by the three amigos (Booch, Rumbaugh, Jacobson) when UML was first being presented to the public on a large scale. Each amigo took an area of expertise for the books but they collectively worked on the trilogy. Jacobson took the process part, probably because of all his overall work on Object Oriented Software Engineering (OOSE), whereas Booch and Rumbaugh tended to focus more on design (classes and domains, respectively).

I like this book because it demonstrates a novel, scalable way of describing a software development process. It uses a lot of pictures, which is very important to me because I am big on clear visualization. It is scalable because you can describe your process in broad brushstrokes, or go down to infinite detail, like the Rational Unified Process (which derives from this book and is similarly scalable in it’s own way).

You can even abandon the process entirely and use the notation to describe any software development process, a feature I have used more than once in my career.


Extreme Programming Explained, by Kent Beck
Addison-Wesley, 2000, ISBN 0-201-61641-6


You always know where Kent Beck stands. I haven’t ever heard him present, but I don’t get the sense that he’s a subtle guy. This book was written in the early days of XP and can be at times heavy handed and evangelistic.

That being said, this book is very good for reminding you of what is important when developing software. Clear lines and basic principles. For example: “you have to deliver regularly”. Not “you have to deliver at the end of the project” or worse “you have to deliver sometime”. This is a reminder to structure your work so you can deliver. Project can forget this and find it difficult to deliver at the end.

I turn to this book when I need to be reminded of the essence of a process. You must deliver well. You must clearly show progress. You must define something clearly so you can clearly complete it. You must communicate that. This book reminds you to think of those issues and more.

Beck’s solution to these problems is the implementation of XP. Whether or not to use XP is not a simple matter, but Beck’s explanations of the principles are well done. I use this book to remind me of those principles.


Test Driven Development, by Kent Beck
Addison-Wesley, 2003, ISBN 0-321-14653-0


Another Beck book. Test Driven Development (TDD) is pretty much a gussied up version of more traditional testing techniques, a skein that makes it look more “Agile” than before. If you include unit testing as an important focus in your software development, you’re probably doing a lot of TDD already.

This book does a good job of reminding me how to think about testing properly. It is not enough to write code for two weeks and then toss off a couple of five line unit tests (not that I ever did that, but you get the point). Rather you need to think about testing as integrated in with your software, because it is your software. If you ignore testing then all the follow-on ugly details (like delivering) get a lot uglier.

I usually don’t have to go back to this book much but I often recommend it to people to highlight these principles.


Requirements

There are many ways to write requirements: requirements statements, use cases, user stories, wireframes and so on. Ultimately, they all aspire to achieve the same thing: describe a goal. The books in this section aren’t really exclusively about use cases, they’re about how to describe a goal (requirement) well.


Use Case Driven Object Modeling with UML, by Doug Rosenberg with Kendall Scott
Addison-Wesley, 1999, ISBN 0-201-43289-7


This book served as my first introduction to Robustness Analysis which was developed by Jacobson. Once I saw it I now almost exclusively design my architectures using Robustness Diagrams. It is so much a part of my thinking that I’m not sure I can imagine an architecture in a different way anymore, even when I’m not using use cases. This book has also strongly affected my practice of software estimation, in combination with “Function Point Analysis” mentioned earlier and “Software Reuse” mentioned later.

It also has the benefit of answering the question “How can I do a use case based implementation?” in about 150 pages. It’s not a perfect answer, but it’s a pretty good one.


Patterns for Effective Use Cases, by Steve Adolph, et al
Addison-Wesley, 2003, ISBN 0-201-72184-8


What do you need to pay attention to when writing requirements, in whatever form you prefer? This book answers that question.

Here’s an example from the book: “Developing use cases [or requirements] in a single pass is difficult and can make it expensive to incorporate new information into them. Even worse, it can delay the discovery of risk factors.” The section then goes on to explain this axiom and what it means for your project. It covers everything from developing the requirements team to the nuts and bolts of writing a requirement.

I refer to this book often to be reminded of these principles when writing and reviewing requirements.


Architecture

Software Architecture is a funny thing. There seems to be a lot of “magic” around architecture, with the stereotype of people who say things that on the surface sound reasonable but which quickly fall apart under direct inspection.





Ok, so I’m being flippant.

I do have major concerns about the way we describe architecture, though. As an industry, we’re not clear about architecture. We don’t do enough of it in the right places. We don’t describe it in a way that mere mortal developers can understand.

For me an architecture needs to be clear and understandable. It does not need to be simple, but great care should be taken to avoid unnecessary complexity. These books help me work towards those goals.


The Art of Systems Architecting Second Edition, by Mark W. Maier and Eberhardt Rechtin
CRC Press, 2000, ISBN 0-8493-0440-7


This book does the best job I’ve seen of explaining how to do a view-based architecture. There are other books that profess other types of views, but I like the clarity of the ones in this book. Better yet, I can explain the views to people in a few minutes and they get it. It doesn’t profess any particular process or notation, which means it can be easily customized.


Software Reuse, by Ivar Jacobson, et al
Addison-Wesley, 1997, ISBN 0-201-92476-5


I found this book after reading “Use Case Driven Object Modeling with UML” in the Requirements section. It goes into more detail how to do Robustness Analysis, which makes up a major portion of how I describe architecture. It also shows some key diagrams that explain how use cases translate into classes, but can be generalized into any kind of requirement. It promotes a responsibility-based approach to design.


The Algorithm Design Manual Second Edition, by Steven S. Skiena
Springer-Verlag, 2008, ISBN 978-1-84800-070-4


This is the newcomer to my “best” collection. I have only used it a little, but I really like what I have used. This book is an encyclopedia of sorts, divided into three sections. I use all parts of the book differently.

The first section describes common algorithm areas (Sorting and Searching, Graph Theory, Heuristics). The text is a little dry but keeps the hard lingo to a minimum. I use this to research an approach or technique.

The second section is tiny, but very helpful. It contains three pages of questions you should ask yourself when designing a new algorithm. It is very to the point and asks you hard questions. If you can’t answer these questions then you don’t know how to design your algorithm. This is very helpful and not to be underestimated.

The third section is a gold mine catalogue of problems. You find yourself with a problem. How do you solve it? If you generally know the algorithm area (from the first section), you can browse the problems, and the book will point you to particular algorithms and warn you of pitfalls. This section uses a bit more lingo and makes you work somewhat, but the rewards are finding an approach to solving your problem.


Design

I think I can safely claim that software is awash in design books. I’m looking at my shelves right now and I see a lot of them. Perhaps design is the most written about area of software, aside from programming languages themselves. When I separate the wheat from the chaff I get the following books.


Design Patterns, by Erich Gamma, et al
Addison-Wesley, 1995, ISBN 0-201-63361-2


This book has a lot of problems. The examples aren’t at all clear sometimes. You have to read something three times to figure out what the authors mean and even then sometimes you’re not sure.

That being said, when I want to know about a pattern, I open this book. When I need to understand the risks of the Visitor pattern, for example, I open this book. When I need to figure out how one pattern compares to another, I open this book.

So, despite all it’s flaws, this book is on this list.


Pattern Oriented Software Architecture, by Frank Buschmann, et al
John Wiley & Sons, 1996, ISBN 0-471-95869-7


Instead of design patterns described in the previous book, this book describes architecture patterns (or design patterns at the architecture level). For example, the Model View Controller (MVC) pattern is covered in this book. It’s well written and I open it as much as I open the design patterns book.

There is an excellent second volume on concurrent and network objects, but I don’t use it as much so it’s not on the list.


Real Time Design Patterns, by Bruce Powel Douglass
Addison-Wesley, 2003, ISBN 0-201-69956-7


This is turning into a repetitive, eh, pattern. This book covers design patterns for real time software. If you need to know anything about resource management, locking, concurrency and so on, this is the book for you.

It also has value anywhere you need to deal with shared resources, even if your software isn’t real time.


Refactoring, by Martin Fowler
Addison-Wesley, 1999, ISBN 0-201-48567-2


Refactoring has pretty much entered the common software vernacular. It’s the process of reworking existing code to a set of principles. Those principles might be lowering coupling and increasing cohesion, reworking interfaces, or anything that needs cleanup. The book does have a second section with respect to specific types of refactoring problems and how to fix them.

However, for me the most valuable section is the first part of the book that deals with the principles of refactoring and the things you need to consider whilst you are doing this. We all refactor in our jobs. This book reminds me how to do that more effectively. I also tend to dip into this book to read a random section for a refresher.


Prefactoring, by Ken Pugh
O’Reilly, 2005, ISBN 0-596-00874-0


I stumbled across this neat little book purely by accident. Prefactoring is the act of constructing a design so it will evolve well in the future. Or, I suppose you could say prefactoring is designing so in the future it is easy to do refactoring. It covers three areas called “extreme abstraction”, “extreme separation” and “extreme readability”. These three areas have short statements reminding you of a certain principle, like “Figure out how to migrate before you migrate: Considering the migration path might help you discover additional considerations in other areas of the design”.

This book is written for the beginner designer. This is the book I wish I was given when I started designing. I recommend it for every person starting in design.

I use it to remind myself of important things. I tend to dip into it by opening a page randomly or reading a section based on what I’m doing right now. Sometimes we forget basic principles. My friend Dan always says “Make it work, then make it better.” Pugh similarly concurs: “Get something working: Create something basic before adding refinements.”


The following sections cover books for specific languages. In some cases they teach essential principles that transcend the language. This makes them doubly valuable. There isn’t a lot to say about these books, though. They tend to be need-specific.

Language: C++

Effective C++, by Scott Meyers
Addison-Wesley, 1997, ISBN 0-201-92488-9


Meyers talks a lot about the pitfalls of C++, but much of this is important for implementing in any object oriented language.


The C++ Standard Library, by Nicolai M. Josuttis
Addison-Wesley, 1999, ISBN 0-201-37926-0


Contains clear information on the arcane area that is the Standard Template Library (STL). There’s lots of dragons here, which you discover once you wield the STL. This book helps you slay them.


C++ Templates, by David Vandevoorde and Nicolai M. Josuttis
Addison-Wesley, 2003, ISBN 0-201-73484-2


Templates in C++ are harder than I think they should be. Maybe it’s me, I don’t know. When I scratch my head because a template isn’t doing what I want, I open this book.


Language: Python

Python in a Nutshell Second Edition, by Alex Martelli
O’Reilly, 2006, ISBN 0-596-10046-9


I like having the greater part of Python in one book. My copy of this is pretty dog eared. I would have liked a better index, though. I’ve started annotating the index myself.


Language: C

The C Programming Language Second Edition, by Brian W. Kernighan and Dennis M. Ritchie
Prentice Hall, 1988, ISBN 0-13-110362-8


My oldest technology book still in use. (My oldest technology book not in use is probably a Commodore 64 book or the one on Xanadu.) The K&R is still in use because it’s still good. I can also find things quickly in it.


Technology: XML

XML in a Nutshell Third Edition, by Elliotte Rusty Harold and W. Scott Means
O’Reilly, 2004, ISBN 0-596-00764-7


I want to know about XML I look here. That’s about all there is to say.

I hope you’ve enjoyed reading the list. It was fun putting the list together and writing about all these great books.

If you have other favourites, feel free to send me a message about them.

Thinking about Data Types

Data types are the unsung heroes of any programming language. They quietly serve you in the background while you get on with the important bits of programming and meeting deadlines. In fact, they’re often so quiet that many programmers don’t think of them much at all. That is, until problems strike.

In this article I want to discuss the types of problems that can appear around using data types. I also argue for some up-front planning when using key data types in your application. Finally, I introduce protobuf, the cross-language data type generator from Google.

Data types vary from language to language and implementation to implementation but they generally consist of the following variations:
  • Simple types like integers (signed, unsigned, bytes, 16-bit, 32-bit, etc.)
  • “Sort of” simple types like characters and strings (8-bit or unicode). I say “sort of” because “here be dragons” going back and forth between unicode.
  • Enumerated types
  • Collections of multiple types using keys (dictionaries and maps) or no keys (structures and unions)
  • Arrays (multiple sequence of a specific type)

As long as you code in one particular language at a time, the language does the heavy lifting for you with respect to data types. You might need to spend a moment to explicitly define them (C++, C, Java, C#, Actionscript) or you might get away without defining them at all and let the compiler do it for you (Python, Lua). In all these cases you only need to worry about data types during edge conditions, like signed vs. unsigned conversions, bit manipulations, or math precision.

So generally data types don’t need to enter your awareness too much. This can lead to a, shall we say, “organic” growth of the use of data types in your application. That is, you create them as you need them while programming and may not necessarily plan out their long-term use. “Long-term” applies to data that has longevity in your application as opposed to temporary variables.

Problems occur when these data types need to be marshalled or unmarshalled. Marshalling is the process of converting types from one form to another (and unmarshalling converts them back), usually for transmission of some sort.




There are many examples of marshalling in regular programming:

  • Saving and retrieving data to/from files.
  • Communication boundaries between two languages. A very common example of this is the regular conversion between C++ and C-style strings. Other examples exist when connecting to a scripting language (like Lua scripting as an enhancement to applications like video games).
  • Communication through pipes or sockets.
  • Communication between processes or threads.
  • Client/Server communication.
  • Conversion to/from XML.

Marshalling is a messy process. It’s messy because you’re dealing with unprotected data from untrusted sources. The data is unprotected in that the compiler no longer has control of the data once it has been marshalled (turned into the cloud below). Normally a compiler prevents invalid data from being inserted into data types (depending on the language more or less protection).

The data is untrusted because once it crosses your application boundary you cannot know what happens to it in transit. If something happens to the data the unmarshalling process may not work and cause exceptions. Due to all this uncertainty, you need to build some form of verification into the unmarshalling process to make sure you have good data coming into your application.



You also have versioning issues. You may decide to change the format of the data or add extra properties (for example, adding extra items to a dictionary). You then need to deal with the older format of the data which may exist in older deployed applications or files. Even if you intend to upgrade all these interfaces quickly there is still a transition period where you need to deal with two different formats.

There are many other problems as well:

  • Endian issues deal with non-byte numbers being represented in a different sequence on different types of processors.
  • Numbers may be different byte sizes on each side of the marshalling.
  • Strings may only be 8-bit on one side of the marshalling and unicode on the other.

All of this highlights that you need a strategy for dealing with interface-critical data types. Some of this is handled through web services like SOAP or JSON, but these aren’t always available to you (or you may not wish to use them), especially when working with embedded applications.

A relatively recent alternative comes from Google. Protobuf (Protocol Buffers) is an open source language independent format for defining data types. A .proto file specifies the data types. Protobuf then generates the data types in the language of your choice.




Protobuf officially supports Python, C++ and Java. There are a host of third party extensions for other languages, including C#, C, and Actionscript.

Protobuf features versioning as well as simple data verification during marshalling. The language independence ensures that you can write language independent data types and continue to use them should you need to expand into a different language in the future. The Google documentation has a good explanation on protobuf’s marshalling efficiency as well.

I encourage you to check out protobuf for your next project. It solves many problems you may have not known were there and would rather not encounter. You honestly do not want to build these things from scratch.

You have better things to do.

You may like to see the mindmap that was used to write the draft of this entry. Please click on the map below for a bigger picture.



Python Factories

All of the code generation examples on my codegen page are meant to show the principles of how to use codegen. As such, I try to distil things down to their barest essentials.

That principle doesn’t always work well when you run into issues of scale. Maintainability starts to break down at those stages and a different implementation is needed.

I’ve been writing an open source codegen framework to address those issues. In this article I want to talk about the Factory class I use for the framework. It also happens to be a versatile Factory class that you can repurpose for anything else you happen to be writing in Python.

First, a quick reminder of what goes into a Factory implementation. The Gang of Four book describes the “Factory Method” design pattern as a mechanism for creating an object where the subclasses decide how to instantiate the class. In this way you can feed in data of your choosing and get out an object that was created based on that data.

In my codegen framework, I store the data in XML. Each XML node contains some portion of the codegen information and you need to perform a variety of activities at each node level. Using the car alarm example from my codegen page, there are several levels of nodes as shown in the diagram below. The codegen needs to act (and generate different types of code) at each level.




You may recognize this description as an implementation of the strategy pattern. In this way you can perform custom implementations with a minimal amount of overhead.

Factories need to be able to instantiate objects based on some criteria. They then return custom objects that the client can use. The implementation of this part of the Factory is very important from a maintenance perspective. If you have a small amount codegen then you can get away with a simpler implementation. In this article I argue for a more generic implementation so you don’t have to worry about issues of scale afterwards.

Example Files

Please download the example file codegenExample.py to provide an example for running the Factories. I also have created an example using the Car Alarm scenario from my codegen page. This may be found in the file stateExample.py. You will also need the car_alarm.xml file to run it.

Simpler Factory using Dictionaries

The first example in the codegenExample.py implements a factory using a dictionary (the implementation is in codegenUtilitiesWithDictionary.py). The name of the XML node is tied to a class. When this XML node is reached, the Factory returns an object based on the matching class.

The easiest implementation in Python is to create a dictionary where the key is the XML node name and the value is a reference to the class (shown below). This code is from the SampleCodegenPhaseWithDictionary class in codegenExample.py.

nodeLookup = { 'aa' : CodegenHandler_Node_Sample, 'ab' : CodegenHandler_Node_Sample, 'ba' : CodegenHandler_Node_Sample, 'bc' : CodegenHandler_Node_Sample, 'cb' : CodegenHandler_Node_Sample, 'cc' : CodegenHandler_Node_Sample, }
This implementation has the advantage that the lookup table is in one place and you can incrementally add classes as necessary as your codegen grows.

I’d argue that this isn’t the best implementation though.

A problem is created as you add more classes to your codegen. You have to remember to keep going back to this lookup table and updating it. It may not sound like a lot of effort but it can be easily forgotten. Another factor to consider is the use of phases, which I will explain in another article. The relevant point right now is that for each phase you need another set of classes for each of the XML nodes. I use 4 or 5 phases in my codegen, so I need 4 or 5 sets of classes and the same number of dictionaries.

Going forward in the maintenance schedule, the likelihood that I (or others) will forget to hook up the classes in the dictionary are pretty high.

But what if Python could do this for you? The answer is that it can.

Factory with Automatic Class Lookup

This second example codegenUtilitiesAutomatic.py uses a slightly different Factory. This Factory takes advantage of Python’s introspective capabilities and builds the list of classes automatically. The dictionary isn’t needed and so there aren’t any possibilities of hookup errors.

I learned this trick from my colleague Kevin, who learned it from the Lex/Yacc implementation in Python.

Basically, you raise an exception and immediately catch it. The traceback frame from the exception includes a snapshot of the globals() namespace. The globals() namespace is a dictionary matching the name of the classes available with a reference to the class. That gives us all the information we need to duplicate the dictionary from the simple Factory above.

One small catch, though. The traceback frame is nested, so you need to get the proper parent frame so you can access the right globals() namespace, otherwise you won’t be able to lookup the classes.

I’ve written a custom exception for this purpose in the codegenUtilitiesAutomatic.py file:

class CodegenException(Exception): "General error during the codegen processing." def __init__(self, *args): Exception.__init__(self, *args) self.wrapped_exc = sys.exc_info()
When an exception is raised, the traceback frame will point to the exception. The exception is contained within the Factory, so the factory is the parent traceback frame. The codegen classes are one level above that, so you need to go through two parent traceback frames to get at the globals() namespace for the classes, as shown below:




The code to do this in the Factory:

try: raise CodegenException except CodegenException: # Get the traceback information for the exception namespace (ignore , ignore, traceBack) = sys.exc_info() exceptionTraceBackFrame = traceBack.tb_frame # Get the traceback information for the codegenUtilities parent parentTraceBackFrame = exceptionTraceBackFrame.f_back # Get the traceback information for the codegen parent parentTraceBackFrame = parentTraceBackFrame.f_back # Save the parent's globals() namespace that contains the list of # classes that can be used by the factory. self.nodeLookup = parentTraceBackFrame.f_globals
The benefit from a maintainer perspective is that you don’t have to know about any of this. As you add classes to expand your codegen you don’t have to touch any of this and it is flexible enough to handle any changes you might make.

I recommend taking a look at the examples I mentioned before. They will show you how this code works and hopefully gives you some ideas about how it can be repurposed for many other purposes.