Einstein once said,"Everything should be made as simple as possible, but not simpler." This remark refers to the problem of oversimplification, and how it is not to be tolerated. Modularization, after the manner of Modula, C, Ada, etc. causes the structure of a program to APPEAR simple when viewed from the top, but this is just relegating the spaghetti to the linker, making noodle pudding out of it and saving it for desert. The top down modular design methodology is a hierarchical approach. The tree structure of a hierarchy mirrors the structure of an organization: its purpose is to permit the view from the top to be a comprehensible representation of the situation, even if it is oversimplified to the point of incorrectness. This view from the top is, of course, the view that management wants to see.
In reality, this structure is only simple from the top until about half way down, after which any given individual is likely to be performing several distinct job functions. For instance, lets say that the person who answers the phone is also the person that keeps the coffee supplies in stock. If an answering service is hired, and the switchboard operator is subsequently let go, then the office will run out of coffee. You wouldn't normally expect an improvement in the telephone service to result in a degradation of the coffee service: the functions should have been totaly unrelated. The hiercical tree structured organizational chart becomes a tangled mass of spaghetti at the lower levels.
This same kind of situation may be found in programming: a low level module is used in many different places in the same program. When a maintenance programmer examines the XYZ module to perform a bug-fix or enhancement, he may find that the best place to make the change is in the ABC module that XYZ calls. So he edits ABC to modify it for the desired function, compiles, links, and tests his change. It works beautifully, so he releases the code to the customer. But the ABC module was also called by the QRP module, whose existence the maintenance programmer was not even aware of. His change that worked so well for the XYZ module totally destroyed the function of the QRP module.
This problem occurs in other engineering disciplines also. A mechanical designer friend of mine explained to me that only 10 percent of drafting is drawing; the other 90 percent is bookkeeping, organizing, and checking. When a drawing is modified, the parts used in the drawing may change. The change to fix a problem with a part may be to switch from a split ring lockwasher to a interior star lockwasher. Now the designer is faced with the same problem that the programmer was faced with: whether to create a new drawing for the interior star lockwasher, or to edit the drawing for the split ring lockwasher to change it into an interior star lockwasher. This is not a trivial problem! To create a new drawing means to increase the number of parts that must be inventoried and tracked, but to edit the old drawing requires him to make sure that all the other uses of the split ring lockwasher can be met equally well by the interior star lockwasher. Maybe the other uses will be improved by the new washer; maybe they won't work at all.
The familiar indented bill of materials found in other engineering disciplines is just an outline representation of a tree structure. We are back to the old hierarchy view again. The bill of materials only shows what parts are used to build each sub-assembly, not what other sub-assemblies those same parts are used in, much less what top level assemblies those sub-assemblies are used in. The problem is compounded when several different jobs draw from the same inventory, just as when many programs link from the same set of libraries, and when library routines call other library routines. It is improtant to understand that each instance of a part in the bill of materials represents a separate and distinct object. Even if two occurances have the same part number, they represent separate representatives of this type of part in different places in the product.
In programming, we are in an even worse situation, because the same exact piece of code can be used in several different modules. Unlike the mechanical assembly, it's as though in software you only have one lockwasher routine, and that same instance of a lockwasher routine is in all those different places at the same time. If you change the lockwasher routine for one use, you must either change it for all the other uses, or you must use twice as many modules to build your programs' lockwasher inventory. Thus you must have two different instances of a lockwasher routine where you used to have only one. Now you must decide for each place where the old lockwasher routine was used whether you should still be using the old version, or the new design.
Many programmers would make a copy of the split ring lockwasher source file, edit it into an interior star lockwasher source file, compile it, and add it to the object library. This approach, while frequently rationalized as getting the job done quickest, is doomed to long term disaster. If a bug is later discovered in the split ring lockwasher routine, it probably exists in the interior star lockwasher routine as well, since they came from the same blood line, so to speak. Nonetheless, the bug will probably be fixed in the split ring routine, but be discovered later in the interior star routine. If the programmer who fixed the first bug is not the programmer looking into the second one, the new guy will have to debug the same problem all over again. This is an expensive waste of a programmer's time and talents.
If you are a truly fastidious programmer, you will examine the old version and the new version, and will factor out the stuff that is common between the two, and make a proto-lockwasher out of that. Then you will make two different "shells" to implement the "visible" lockwashers. Each of these shells will have a little bit of interface code and then call the proto-lockwasher to perform the bulk of the operation. In this way, if a bug is later discovered in the common, proto-lockwasher code, it gets fixed in one place for once and for all. Incidently, anything I say about fixing bugs applies equally well to adding enhancements, upgrading to a new operating system, or new hardware, etc. All of these operations effect a change in the implementation.
Factoring causes changes to appear in a single place, rather than being distributed out all over the place. It's just the same as high school algebra. Remember the distributive law? ax+bx=(a+b)x This segragates x in one place, so if we choose to substitute (that is, change) (3z+17)=x, then we can write (a+b)(3z+17) instead of a(3z+17)+b(3z+17). The worst implementation of all would be a3z+a17+b3z+b17. Given this last expression, how long would it take you to arive at the simpler, factored one? Now complicate the issue by commuting some of the additions and multiplications to produce 17a+3az+17b+3bz. That's even harder to factor, even though it is in what is generally called "simple" form. In algebra we factor to get at the roots of an equation. Factoring also gets at the roots of the software design problem.
The operation that Forth programmers call factoring is performed using the unification algorithm that was described in the BRIE article in the April 1986 DDJ. A programmer, however, must factor across equivalence classes that are determined by the nature of the operations being performed. Consider the Forth code:
Word1 and Word2 are equivalent operations when X, Y, and Z are integers. The inference rules of modulation and demodulation can transform one of these members of an equivalence class into the other, if it is given rules for commutativity, associativity, etc., but to apply these kinds of transformations to all the expressions in a large program could choke a Cray until the start of the next Ice Age!
The general problem of factoring, whether it be of large numbers into primes, or of polynomials into products of sums, or of programs into fundamental invariant units, is extremely difficult: it is demonstrably NP-complete. The factoring of large integers is of great interest in cryptography; codes are hard to break because factoring is hard. The excellent symbolic mathematics packages muMATH (developed by Dr. Albert Rich of the Soft Warehouse for the IBM-PC) and its big brother MACSYMA (developed by Dr. Joel Moses and others at MIT for Symbolics and other Lisp machines) only perform factoring of mathematical expressions within certain limits, or under human guidance. This is because a general algorithm to factor an arbitrary expression can potentially lead to a recursive black hole, the AI version of the endless loop. The general problem of applying resolution to prove theorems, as in the execution of a Prolog program, has been demonstrated to be generally intractable for the same reasons.
Compiler optimization schemes are based on the ability to factor out expressions that are invariant at run time and pre-compute them at compile time, or expressions that are invariant inside of a loop and pre-compute them before entry to the loop. The reason so much emphasis is placed on factoring in Forth is because Forth definitions are generally small; therefore, they are easily factored to find sub-parts that are common with other definitions. Most desirably, they are built from these factors in the first place, so that no factoring is needed when modification is performed. The success of Forth in the factoring problem is that not only is the distinction between code and data hidden, but the distinction between what happens at compile time and run time is also hidden. A word that was a compiling or defining word can be rewritten to be an interpreting word, or vice versa.
George Eberhardt of Computer Innovations tells me that he has run his new optimizing C86+ compiler in a mode where the compile time optimization was so extensive that it was able to compile the seive benchmark with the entire contents of the seive pre-computed at compile time! This is possible because all the terms that go into computing the seive are known at compile time, so the entire computation is predetermined. Simple (in principle!) factoring out of loop invariant expressions recursively reduces the problem to an array that is initialized statically. This produces a seive bench mark that runs in zero time. George said, "I think we've finally put the seive to rest!"
However, this kind of stunt is not always desirable. For one, the compile time must have been horrendous. Another consideration is that the compiler has no way of adjusting its optimizing efforts to allow for a situation such as occurs when a C program accesses a status flag in a memory mapped I/O controller, such as a hard disk drive has. If the program tests the flag in a loop, and the compiler determines that the program does not modify the flag, it would factor the flag test outside of the loop, since the compiler does not know that this particular memory location has the peculiar ability to change its value independent of the program's operation.
Similar problems occur in a multitasking situation. I have used an earlier version of C86 together with Hunter & Ready's VRTX multitasking executive on a four CPU real-time vehicular control system where the fancy optimization maneouvers of the new compiler would have rendered my program inoperable. For these reasons, George is not delivering the optimizer configured to carry things quite this far.
Lisp is a language that has the ability to control the compiler, by virtue of macros, as does assembler. C permits a very limited capability of this sort with #define, but the facilities provided are really quite paltry. C++ is supposed to help in this regard, but I don't see how the flexibility found in Forth or Lisp can ever be attained with such a rigid syntax. The generics of Ada, together with packages, are a step in the right direction, but the syntax is too complex, and the granularity is way too large to be tractable at the human level for effective factoring to be achieved. In Forth, the programmer is expected to extend the language by additions and alterations to the compiler. Not to do so ignores much of the power of the language.
In Lisp, the typing mechanism is dynamic: the type of an object may change during the course of execution of a program. Predicates exist in the language to determine the type of an object. The path of execution of a routine can be controlled by the types of the parameterrs passed to it. Thus one could write a concatenation routine that appended on list to another if the types were "list", or concatenated 2 character strings together into one string if the types were "string". Likewise, if concatenation were a meaningless operation to perform on the types passed, an error message could be displayed and the debugger invoked. Ada tackles the problem of passing the same routine different types of parameters with generics. Generic packages must be "elaborated" at compile time, much like assembly language macros, to generate a separate executable routine for each combination of types that the rest of the program passes it. This mechanism generates more efficient run-time code than Lisp, but uses more run-time memory and is more difficult to debug.
A significant distinction between Forth and Lisp is that even the module interconnect, or linkage, or calling sequence, or parameter passing, or whatever you want to call it, is also hidden, by virtue of the parameter stack. The Forth coding convention of the stack effect comment shows the before and after pictures of the stack. A stack effect comment is coded as follows:
: word-name ( before stack picture -- after stack picture ) words making up the definition of word-name ;
The "( before - after)" is the stack effect. Comments in Forth are enclosed in parenthesese, since parentheses serve no other useful purpose in the reverse Polish notation of Forth. These stack effect comments are just that: comments. They are in no way actually processed by the Forth compiler. The important thing to keep in mind here is that the "after" stack picture of a word must match up with the "before" stack picture of the word that follows it. Take for example the search of an index to a database and the subsequent reading of a record to get a value:
: lookup ( vendor part# -- filename record# ) words to perform lookup operation ; : retrieve ( filename record# -- buffer-ptr ) words to read the record ; : get-price ( vendor part# -- price ) lookup retrieve price @ ;
If a later change in the database implementation causes all the information to be held in the same file, the filename parameter that comes out of "lookup" and goes into "retrieve" is no longer needed. The "lookup" and "retrieve" routines may be re-written to adjust their stack effects as follows:
: lookup ( vendor part# -- record# ) new code for lookup ; : retrieve ( record# -- buffer-ptr ) new code for retrieve ;
The very significant lesson to be learned from this little example is that "get-price" does not need to be modified at all, even though the parameters passed between some of the routines that it calls have changed. This is because the composite stack effect of "lookup retrieve" has not changed, only what gets passed between the two routines. Likewise, "delete" may also always be preceded by "lookup". If so, then the definition of "delete" would need to be updated, but no changes would need to be made to occurances of any "lookup delete" sequences elsewhere in the code. In most other languages, all calls to these routines would need to be edited because the parameters passed in and out of them would have changed. Forth is the only language I know of where even the parameters passed between routines are hidden at the level where those routines are called.
If the definitions are kept suitably small, a single component's flow of control can be held in the human mind as one chunk. Furthermore, the black-box function of a word can be held in the feeble human mind without recourse to how it performs that function. The object oriented programming extensions to Forth that allow the implementation of Ada like packages and the hiding of private definitions are a help, but they also prevent a level of sharing of code that could otherwise help keep object program memory size down. With the decrease in memory prices and the increase in programmer's salaries, the trade-off is in favor of the packaged object oriented approach.
The obvious next step is classes and inheritance, after the manner of Smalltalk, LOOPS, Flavors, Units, Actors, Methods, Scheme, and XLisp. With a good inheritance network object browser, programming becomes almost a graphic art, or even a video game. But here, the inheritance network is the spaghetti! Each class or instance is small, concise, and easily understood, but the inheritance network and the message passing protocols take up more and more of the programmer's time and attention as the size of the program grows.
In most object oriented languages, assignment is legal to just about anything in the symbol table. Here is an interesting and paradoxical problem, represented in AML, an object oriented language for robot control developed at the IBM T. J. Watson Research Center, but it could have been written in Lisp or a dialect of Forth that supports forward referencing (such as the LMI Metacompiler or UR/Forth) just as well:
foo : new SUBR(n) /* define a factorial function */ RETURN( IF n EQ 0 THEN 1 ELSE n*foo(n-1) ); END; foo(3) /* try it out */ 6 /* it works */ bar : new foo; /* assign a copy of foo to bar */ bar(3) /* try it out */ 6 /* it works too */ foo : new SUBR(n) /* a new definition for foo */ RETURN( n+1 ); END; foo(3) /* try it out */ 4 /* it works fine */ bar(3) /* try bar again */ 9 /* @#$%*%! argh! */ bar(4) /* try another */ 16 /* Moby Foo !!! */
Why did the operation of bar change? My instinct tells me that it should have remained a factorial function; after all, I never did anything to bar since it last worked. Certainly this is not the kind of desirable and expected side effect that promoters of object oriented programming systems desire! The definition of foo, being self- referential, goes through the symbol table to make the recursive call. When bar is assigned the definition of foo, somewhat akin to a MOVD in muLISP, everything works as expected, but later on, when the definition of the original foo is changed, bar sees the change in its recursive call, which still goes to the current dynamic binding of foo, resulting in the effect of RETURNing in the ELSE clause a value of n*foo(n-1), not n*bar(n-1). But foo(z) is just z+1, so we return n*((n-1)+1), or n*n; this explains the squaring function of the clobbered bar. What is needed is a means of making the foo that appears in the recursive call inside the SUBR definition of bar refer not to the symbol foo's run time binding, but rather to the binding in effect at the time the copying assignment was performed. The resultant expression should be self referential without passing through the dynamic binding of any symbol. Here the spaghetti is distributed in the time dimension rather than the data or program space dimension, but it's spaghetti nevertheless!
This same situation can also occur in Lisp, but due to the unity between the representation of code and data, we have the option of manipulating the list structure that represents the function definition so that the self reference through the symbol table, or OBLIST as it is called in Lisp, is implemented by a circularly linked list, thereby bypassing the OBLIST in the self reference. We can write a function that takes the S-expression for a function that has already been defined and performs a REPLCA on it to make it circular without going through the symbol table, but there has got to be a more elegant (and readable) way. The situation becomes even more convoluted when compound recursion is involved, such as in the unification algorithm, where A calls B and B calls A. How could we ever display such a circular beast in human readable form on the screen? The listing would repeat forever as the pretty printer traversed the circular structure. It is also conceivable that in some cases a reference actually should pass through the symbol table so that it really could be modified later on.
Conventional Forths do not suffer from this dificulty since they permit references only to words that are already defined, and they permit redefinition while keeping the old early binding of previously defined words. Likewise, the Scheme dialect of Lisp solves the problem by implementing closures to retain early bindings. Late binding languages, such as conventional Lisp, AML/X, or forward referencing Forths suffer when unintentional redefinition occurs. It should be possible to specify, in a language such as Forth, whether early or late binding is desired for any words being compiled into the definition of another word. Currently, the choice must be made uniform across all definitions in a program, and that choice is exercised by choosing the dialect of Forth one is working with: either it is an early binding implementation, such as conventional Forth, or it is a late binding one that supports forward referencing.
The similar problem of QUOTEing and unQUOTEing has been elegantly solved in Lisp by the macros ' for quote and ` for back-quote, or un-quote. Perhaps a similar convention could be devised for the elimination of symbolic references to other functions in an object oriented environment. With it, one could define an object like "foo" above and assign it to other objects, confident in the knowledge that subsequent assignments to sub-functions used in its original definition would not alter the behavior of objects assigned to before the sub-function modification. This is effectively a packaging of the functional object, so that it is not affected by revisions to its component parts performed after that object has already been fabricated. The idea is to keep a finished pile of spaghetti in a dish so it won't be affected if someone later changes the recipe by substituting some of the ingredients: only spaghetti cooked later should see this change of ingredients.
In the Forth language, the words [ and ] are used to control whether words will be compiled into the dictionary, or interpreted immediately. This is the a similar concept to quote and unquote in Lisp. The RECURSE word is usually compiled as:
: RECURSE LAST , ; IMMEDIATE
This compiles an absolute address, but if RECURSE were implemented as a relatively addressed call rather than a conventional absolutely addressed call, then copying the function definition would make the new copy's recursive call refer to itself rather than the old word. This would require a new CODE word definition to effect a relatively addressed call to a word, but with this implementation, RECURSE would answer the need of a copyable and assignable self-referential defination. This makes a nice dish to keep the spaghetti in!
In Ray Duncan's LMI Metacompiler for Forth, an additional problem arises: there are two machines involved, the Host and the Target. Definitions may need to exist on the target, especially if a Forth interpreter is being ported to a new CPU, but some definitions also need to exist on the host, especially defining and compiling words. If the target is a stand-alone embedded system, as it frequently is in metacompiler applications, some definitions may be needed only on the target for interpretation, some only on the host to generate the target system image, and some may be needed on both. The spaghetti is now being served on two plates!
Ray has solved this problem in a restricted way by providing a metacompiler prepass, but doubly deferred defining words will not interpret correctly at metacompile time. One solution would be to perform multiple pre-passes, but the pre-pass idea still places the defining word definitions that are only used to build the target image into the target's dictionary as well as the host's. They are actually needed only in the host. A dictionary switching set of words, such as HOST, TARGET, and BOTH, could be implemented to select which dictionary a definition should be compiled into, but how far should this sort of thing be allowed to go? It is beginning to look like a view down the hall of mirrors at the carnival!
Object oriented programming is a young discipline, and much experimentation needs to be done about these topics. One thing is seems clear, however: if a deductive inference capability forms the "mind" of an AI program, then an object modelling capability forms its "body" and its "world". In a robotic system, the vision and tactile sensors together with servo systems and the mechanical manipulators are really only I/O devices: for the robot to interact effectively with its environment, it must have a mental picture of what is going on. This is the object oriented representation. To be truly autonomous, it must be able to plan and reason. This is the job of the inference engine.
The use of database systems has revolutionized many areas of applications programming. The hierarchical database has given way to the full network database as in the CODASYL standard. Programs to access and manipulate the data in these databases can be simple and easily understood, but the spaghetti is in the data!
Relational databases are even nicer in many theoretical respects, and they lead the way to logic programming and Prolog. What could be nicer than a Prolog program with very little depth of nesting in the source code, short concise predicate definitions, the ability to freely interchange code and data, and no pointers or side effects, or any of that bad stuff? The whole program runs by matching patterns. Well, if you were to draw a diagram of what patterns matched with what, you would end up with one big pile of spaghetti. The spaghetti is in the relations!
The Lisp advocates are not immune either; in fact they are probably the worst of the lot. They consider themselves very high-brow and academic by calling the spaghetti "property lists", "semantic nets", or "augmented transition networks" or whatever, but Fetticini Alfredo Verde is just spaghetti, only dressed up!
Forth programmers like to brag about how neat and modular everything is in Forth, but they blush and hide their faces with shame when asked to explain something like this:
: Stack-Fiddle ( his stack -- my stack ) ROT TOTE DUP SWOOP SWAP DROP FLOP PLOP ! ;
There are usually several gyrations like this somewhere in any Forth application. They serve as adapters to match the output stack effect of one word with the input stack effect of another word. In this respect, they are not unlike the adapter plugs, jacks, and cables used with audio equipment, or a null-modem RS-232 cable. Their analog in strongly typed languages is type conversion and casts. A good Forth programmer will bury this stuff at the bottom-most levels of his code, but the novice to Forth will have these perversions all through his source. The top portion of the stack writhes and twists like the serpents on the head of Medusa. The spaghetti is on the stack!
The very nature of threaded interpretive code implies something even worse than spaghetti. The colon definition of a Forth word is really the same thing as a CDR-coded Lisp function definition, such as in Zeta-Lisp on a Lisp machine, or in the D-code of muLISP on a PC. If you were to try drawing a diagram of all those threads, you would see a veritable cobweb of interconnections, but we don't even mind this. Indeed, we glory in it, because it is hidden from our sight by our normal way of thinking about these programs.
W. D. Hillis describes his Connection Machine as a device with active data structures, where the information is in the connections between the processing nodes. In a rather Einstein like perspective, he said, "Memory is just a wire turned sideways in time." In his unique architecture, one of the primary computing devices is the "router", which controls the connections between the processors. These connections can change as a result of the execution of the program, and the machine takes its name from this fact. In this case the spaghetti is alive! What we have is a veritable can of worms! Nonetheless, the Xector extensions to the Lisp programming language that he proposes for dealing with this kind of parallelism are extremely enlightening. The reason is that the spaghetti is hidden inside of an elegant programming metaphor, and whole gobs of the stuff can be manipulated with a single expression, and in very few machine cycles on this fine grained parallel architecture.
When parallelism is desired, it is usually necessary to explicitly specify which things can occur in parallel. In these cases, the parameter list mechanism of Lisp is probably preferable to the stack oriented parameter passing mechanism of Forth, since several sets of parameters may be in the process of being passed between several concurrently executing operations at the same time. The idea of streams and dataflow graphs is useful in these cases, since they provide a multidimensional analog to the one dimensional stack structure found in Forth. What is really needed is a cabling system to connect individual software functions, much like the cables that connect individual modules of hardware functions. Indeed, one of the main design considerations of Ada was the module interconnect part of language.
Board level systems integrators are constantly confronted with the problem of interconnecting circuit boards, sensors, and actuators of differing manufacture. Cables and connectors are the highest cost and most failure prone items in electronic systems today. Even with supposedly foolproof ribbon cable and stake-on connectors, confusion abounds. You can always stake the connector on upside down, and two like sexed connectors don't produce the same pin-out as differently sexed ones do.
I remember a friend of mine who was faced with the problem of making a ribbon cable extender for testing the circuit boards in the water tight computer chassis of a robot submarine. He figured circuit board edge connectors could be used to join two female ribbon cable connectors together, so he drew up the artwork for a simple printed circuit board that simply tied two board edge connectors together with straight traces. The idea was to extend the original cable by attaching it to another short cable with the little circuit board. The only trouble was that the ribbon cables together with the edge connectors interchanged the odd and even conductors of the cable. The spaghetti was in the stake-on connectors!
Another ribbon cable confusion arises from the fact that the cable vendors number the wire one way, the connector vendors number the connectors another way, and the board vendors use all sorts of diverse contact numbering schemes. The conversion of a board edge connector pin number to a cable wire number to a cable connector pin number would turn an Einstein into a blithering idiot. The spaghetti is in the module interconnections in hardware as well as in software.
Anyone who has worked with prototype hardware has seen circuit bread-boards. If these boards are the only specimens of their kind in the known physical universe, they are probably constructed from standard off the shelf integrated circuits, but the interconnections between them are hand wired with a wire-wrap gun. That's the spaghetti, the stuff that makes those boards different from other boards that use the same integrated circuits. When the boards go into mass production, the interconnections will be made by copper foil traces on a printed circuit board, thus hiding the spaghetti. The CAD tools that are used in a modern printed circuit board design shop to produce the circuit board artwork masters are fancy sophisticated interactive graphics systems. All this horsepower is needed because large amounts of spaghetti are hard to handle.
We as programmers also need to have high powered tools to deal with the complexities of the interconnections in our software. If you have ever used a big-time Lisp machine, or DigiTalk's SmallTalk for the IBM-PC, which provides a similar sort of picture with the Tree display (but this does not properly handle sharing of low-level structure), you have seen the wonderful graphic displays that show the interconnections between the structures of a large artificial intelligence application. Expert systems shells such as KEE, from Intellicorp, generate fabulous graphic pointer diagrams that may be explored using the mouse, moving around and zooming in on detail, or backing out to see the big picture. It is immediately obvious who uses what, and what depends on this, or what components are invoked by that, etc. Artificial intelligence researchers and knowledge engineers developing complicated expert systems depend on this ability to "see" the data structures in order to keep a larger view of the system in their heads at once.