William Newman of SBCL fame recently recommended Martin Fowler's book Refactoring. After reading it, I immediately found successful uses for his techniques. I took the opportunity to look back at some of my early Lisp code to see how refactoring could improve that code.
I found a few recurring themes that I was able to improve:
- Long FunctionsThese are functions that do more than one thing breaking Chris Riesbeck's Cardinal rule of Functions: One Function to A Function. One of the main problem with long functions is that are they are hard to read in review because they do too many things. Another problem is that by having all of the code in one function, the subcomponents of that function can't be used by other functions. A clue that such function do too many things, besides their length, is that they may contain comments or white space separating the various tasks. This problem is fixed by splitting the function into multiple functions, each with descriptive names. Then, comments splitting the sections are no longer needed, but rather, are embeded in the function names themselves.
- Duplicated CodeEfficiently removing duplication of code is one area where Common Lisp excels. As an example, in UMLisp there is a file which reads data from the a large number SQL tables and creates various CLOS objects from that data. My first pass at the code resulted in a large number of functions with lots of duplicated code. Their structure was in the form of:
WITH MUTEX SQL CONNECTION CREATE A SQL QUERY via FORMAT statement SUBMIT QUERY LOOP EACH ROW DESTRUCTURING-BIND FIELDS COLLECT MAKE-INSTANCE of CLOS OBJECTTo refactor this code, I created a macro to handle to generation of the SQL query and iteration on each row of the results. The new code looked like:COLLECT-FROM-SQL-QUERY [query parameters] MAKE-INSTANCE of CLOS OBJECT
In this design, the all of the field names required for the query were passed to the macro which created the query string and also setup the destructuring-bind. This refactoring resulted in my file shrinking 50%. Up to this point, other languages besides Common Lisp could have approximated this refactoring. But, the below shows Common Lisp brilliantly shining.
After the first pass, benchmark tests showed doubling in runtime. I tracked this performance decrease to the overhead of creating the SQL query string. In the original code, the SQL query string was hard-coded whereas in the new version it is generated dynamically from the input parameters. By modifying my new macro, I was able to move nearly all of the query string generation to compile-time rather than run-time. That is impossible with most other languages. With this change, my new code was only 10% slower than my original code.
In the process of micro-optimizing this centralized macro, I found a place where I could change a (format nil "~D" number) to (write-to-string number). With that one change, my new code was now 10% faster than my original code. This was complete success! Looking back, I could haven take that one optimization and applied it to my existing code, but that would have required duplicating the modification in 40 functions. With elimination of the code duplication, that optimization was only coded once.
I believe this is a recurring pattern: by refactoring to centralize a commonly used block of code, the centralized function may have increased overhead to support the many ways that it is used. However, optimizations can then be employed to improve the efficiency of that centralized function and improve the overall performance of the software.
- Not taking advantage of combined iteration/element-testing functionsWhen I first starting programming in Common Lisp, I was not aware of the very useful functions that iterate over a sequence while applying an predicate and testing the result. Employing such functions as some, nonany, and every allowed me to greatly simplify some code. For example,
(defun any-element-invalid-p (list) (dolist (elem list) (when (invalid-p elem) (return-from any-element-invalid-p t))) nil)simplifies to(defun any-element-invalid-p (list) (some #'invalid-p list))
- Consing inefficienciesIn my early Lisp code, I didn't pay as much attention to consing overhead when writing code. As a result, I inadvently created a few bottlenecks. A potent example is when I naively wrote a function like:
(defun list-to-comma-delimited-string (list) (let ((output (format nil "~A" (car list)))) (dolist (elem (cdr list)) (setq output (concatenate 'string output "," (format nil "~A" elem)))))That highly-consing code was found to be a bottleneck in a system when it was called with a list of 15,000 elements. A simple rewrite suggested by Larry Hunter uses a feature of the powerful FORMAT function and ran 100 times faster:(defun list-to-comma-delimited-string (list) (format nil "~{~A~^,~}" list))So while atrocities like the above example can be a bottleneck, I've learned when the overhead of consing may result in better code. In a rewrite of another function, I found a function that avoided consing and looked like:(let ((lst (partially-processed-elems))) (dotimes (i (length lst)) (declare (fixnum i)) (let ((elem (nth i lst))) ...lots of elem testing/processing... (when (not-fully-processed elem) (setf (nth i lst) (fully-process elem))) lst)However, this function could never be a bottleneck as it was only called once at the initialization of a software system. I was able to simplify the code with eliminating the list index and shrink the function at the cost of creating a new list:(loop for elem (partially-processed-elems) collect (...lots of elem processing... (if (not-fully-processed elem) (fully-process elem) elem)))

Comments (1)
You are absolutely right.
Posted by Fowler real estate | August 30, 2005 1:45 PM
Posted on August 30, 2005 13:45