August 2012
My job recently had a code golf challenge: the shortest code wins. The task was to write k-means: take a set of data and reduce it to k points. There are many algorithms to do this, but the specific algorithm for this challenge was to, until you had few enough data points, find the two closest points and merge them. First, my winning 227-character entry:
(def d(p q)(apply +(map(fn(m n)(expt(- m n)2))p q)))(def m(k s)(if(len> s k)(m k(let(r t)(best(fn(p q)(<(d p.0 p.1)(d q.0 q.1)))(apply join(map[map(fn(l)`(,_,l))(cdr(mem _ s))]s)))(cons(map(fn e avg.e)r t)(rem r(rem t s)))))s))
Ok, that's unreadable. I get it. The code, as I started with it:
(def distance (p1 p2)
(apply +
(map (fn (d1 d2)
(expt (- d1 d2)
2))
p1
p2)))
(def k-means (k points)
(if (len> points
k)
(k-means k
(collapse-nearest points))
points))
(def collapse-nearest (points)
(let (rem1 rem2) (closest points)
(cons (map (fn dimensions
avg.dimensions)
rem1
rem2)
(rem rem1
(rem rem2
points)))))
(def closest (points)
(best (fn (p1 p2)
(< (distance p1.0 p1.1)
(distance p2.0 p2.1)))
(cartesian points)))
(def cartesian (elements)
(apply join
(map [map (fn (ele) (cons _ ele))
(cdr (mem _
elements))]
elements)))
To get from the readable code to the golfed code, I did some standard minification techniques: making all names one-character, inlining functions, and removing extra whitespace. Nothing too fancy.
The good:
- Minimal syntax in lisp. less whitespace needed. This leads to incredibly ugly, unreadable, but short code.
- Arc's intra-symbol syntax. I only used one example: instead of writing (x y) , Arc lets me write x.y . This cuts down on the parentheses, so if the tokens before or after this function call were going to be parentheses anyway, I was able to drop one or two characters each time I used it.
- Quasiquote replacement lets data objects be constructed concisely.
- Anonymous functions are a giant win.
- Arc-specific functions (or lisp ones): best, len>
- Higher-order functions make the code work for any number of dimensions
- A repl makes for really simple testing and development.
The bad:
- Readability, obviously, dropped incredibly. The second-place entry was Python, but it was somewhat readable.
- Why does avg take a single argument: a list of nums? Seems like it should be variadic.
- On lists, + isn't join. It creates new conses. This is frustrating, and confusing:
arc> (def check-cars (join-fun) (let x (list (list 1)) (is (car (join-fun x)) (car x)))) #<procedure: check-cars> arc> (check-cars +) nil arc> (check-cars join) t
- Similarly, (mappend f seq) isn't (apply join (map f seq)), when f returns lists.
- If + is similar to join on lists, why not define (* seq1 seq2) as the cartesian product of the two sequences, when seq1 is a list? It somewhat mirrors +.
notes:
- You don't have to calculate everything properly, as long as you get the results you want. My distance function doesn't take the square root of the result. Because the distance between two points is only used is to compare two distances, you can compare the squared distances instead. The relationships are the same, which is all that matters.
- I thought of no simple way to do this but functionally, using recursion. The second place winner, who wrote in Python, approached this imperitively, using mutable state. We actually had to explain our solutions to each other, as the minified versions were not easy to gather the algorithm from. Programming styles can be restrictive.