OrthoCoders

You can code it, I can help!

Meet the Clojure Programming Language

I have the pleasure to welcome Sebastian Galkin as a guest contributor to my blog. I have known Sebastian for some years, since I used to be his manager in one of his first jobs in a software company. At that time he already had a degree in Electrical Engineering and a Masters in Physics under the belt. I have been always very impressed with how smart Sebastian is, his uncanny thirst for knowledge and how little time he needs in order to learn and get familiar with new technologies. Sebastian simply excels at sofware development. He is one of the most knowledgeable developers I know and he has been working with Clojure now for more than a year. I always enjoy to bounce ideas with him to help me realize how wrong I am. Please read on, I am positive you are in for a treat!

Introduction

In this post I'll present you with the Clojure programming language. I hope I'll be able to show you how writing Clojure code feels, and motivate you to dive deeper into it. If you want to know more about Clojure, you can't miss the three days hands-on training I'll be giving in Winnipeg, November 2011.

What is Clojure

Clojure is a general purpose language that runs on top of the Java Virtual Machine or the Common Language Runtime or, even, JavaScript. So, if you already have systems running Java or .NET you can easily integrate Clojure in your workflow. Clojure inherits powerful traits from the Lisp family of languages, and its ubiquity from the host platform (JVM, CLR, js). It encourages functional programming style and brings to the table a great set of multithreading primitives to make multithreaded programming easier. You may be thinking: "But I don't have the chance to use Clojure at work!", and wondering why you should keep reading. Read a bit more, you won't regret it. Here is the secret: Learning Clojure will greatly improve your toolbox as a programmer. You will find that after spending some time reading about it and using it, the way you write code will change dramatically . If we look at the market, there is a big wave of interest in functional languages, ranging from Haskell and F#, to Scala and Clojure. This is motivated in part by the increased complexity in our systems and the need to embrace multiple hardware cores.
ruby,java,scala,clojure Job Trends graph
This job trends graph shows relative growth for jobs we find matching your search terms
ruby,java,scala,clojure Job Trends Ruby jobs - Java jobs - Scala jobs - Clojure jobs

Sounds very nice, but where's the code?

Instead of enumerating the primitives and syntax constructs of the language, lets look at some examples and go over the explanation in detail togeteher. We will use a simple problem I just made up to showcase some of Clojure features. Here is the problem:
We have a file with words, one word per line. We want to read that file and print all 10 character words in it.
Simple, artificial and useless; but more than enough to illustrate the traits of the language. Read on... read on.... First lets define a constant holding the path to the file: [code language="clojure" light="true"] (def file-path "words.txt") [/code] What's going on here:
  • Clojure uses Lisp syntax: function calls begin with an opening parentheses, followed by the function name, followed by the arguments separated by spaces and ending with a closing parentheses. So, in this case, we are "calling" def, passing file-path and "words.txt" as arguments.
  • As you can see, Clojure uses prefix notation, if we would want to add two numbers we would write (+ 3 4), again start with "(", then the function name, then the arguments and finally the ")". In Clojure + is just a function name, there is no special behavior for operators.
  • def associates or binds a name with a value, it receives two arguments, first the new name, then the value associated with it. So, file-path is the name of the constant we are defining.
  • "words.txt" is a literal string. In Clojure strings are delimited by double quotes.
Now if we use file-path in our code, it will get "replaced" by its value, the string words.txt. Back to our problem, how do we read the file contents? [code language="clojure"] (with-open [reader (clojure.java.io/reader file-path)] ..... .....) [/code] Lets peel this onion (fine, orange for you) from the inside out: [code language="clojure" light="true"] (clojure.java.io/reader file-path) [/code] You already guessed this is a function call, because it conforms to the pattern we saw before with clojure.java.io/reader as the function name. The part of the name before the "/" is just a namespace, don't worry about it for now, it's just a function name. What this line does is open a java.io.Reader pointing to words.txt file in the current directory. A java.io.Reader is a Java object that allows you to read from a stream, in this case, a file. We will use this reader to get all the words from the file. [code language="clojure"] (with-open [reader ....(1)....] .......(2).....) [/code] with-open is a function that binds a name (in this case reader) to a value (in this case the expression (1)), then executes the nested body (in this case (2)), and finally calls the reader's close method. Inside the body of with-open, the expression (2), you can use the reader variable. So, why we use with-open? To make sure that it doesn't matter what happens inside the body, the reader gets always closed.
Short digression:
We just saw new syntax, the square brackets. Square brackets are the syntax for literal vectors in Clojure. [0 29 "hi" "bye"] is a vector with 4 elements. Also, parentheses are not just the syntax for function calls, it's also the syntax for lists, so (1 2 3 4) is a Clojure list. Lists and vectors differ in their complexity guarantees as you would expect. What we see here, is that Clojure code is written using Clojure data structures! Think about that for a moment, a function call in Clojure is just a list, and we saw that Clojure vectors also appear as part of Clojure code. This is a property of all Lisp languages and has fantastic implications!
Back to the problem again, we already have our file opened, we need to read all lines and process them. We'll create a function to make it easy to reuse: [code language="clojure"] (def process-lines (fn [path f] (with-open [reader (clojure.java.io/reader path)] (f (line-seq reader))))) [/code] Lots of parenthesis, I know... Our old friend def helps us bind the name process-lines to a new function. (fn [...] ...) is how we create new functions, [...] is a vector of arguments to the function. In process-lines we are receiving two arguments: path and f. Anything that comes after the arguments and inside the (fn) call, is called the function's body. The result of the function will be the last expression evaluated in its body. Then, process-lines is a function which takes two arguments. path indicates the location of the file to read, that's easy, but what is the f argument? In Clojure functions are first-class citizens, we can pass them as arguments to other functions return them from function calls, put them inside collections, or whatever we can do with other values such as integers or strings. So, process-lines is taking a function f as argument. That function will take care of processing all lines in the file. Do you see how we are reverting the usual flow here? We don't expect process-lines to return the files contents, instead, it receives a function that will do the work. is called with the result of line-seq, a sequence of strings read from the file, one string for each line. Now we need to create this function that will take care of printing the matching words, that is, the function that we will pass to process-lines [code language="clojure"] (def print-matching (fn [words] (prn ...))) [/code] print-matching receives the sequence of all words, selects the ones matching and prints the result using prn, we still need to fill the blanks. In some traditional languages you would use the following scheme to obtain the matching words:
  • Create an empty vector/list
  • loop through all the strings in the line sequence
  • if a string has 10 characters add it to the vector/list and continue with the next step of the loop
  • if a string length is not 10, just go to the next step of the loop
  • when the loop ends, return the populated vector/list
You probably have no problem understanding this, because it's familiar to you from languages such as C or Java. But it's not conceptually simple at all. There are lots of interactions to track, lots of interleaving moving parts. What we want to do is simpler, we just want to filter the list of words, keeping only those with length 10. Instead, we could write: [code language="clojure" light="true"] (filter length-10? words) [/code] Simple, right? This line express exactly what we have in mind. Let's dive in:
  • Of course, this is a call to the Clojure function filter
  • the first argument length-10? is a function we didn't define yet, which will return true whenever its only argument is a string of length 10. Function names can contain symbols, and the final "?" is just a convention used for functions that return a boolean
  • words is the argument to print-matching: the sequence with all words in the file
  • the result of the filter call will be a new sequence of strings, containing only those string in the input, that make length-10? return true. That's exactly what we wanted.
  • Notice that the original sequence of words is not modified. filter returns a new sequence leaving the input sequence intact.
  • In fact, all Clojure collections are immutable, you can't modify its contents. Sounds weird? Don't worry we self impose this constraint to get code that's easier to reason with, easier to test, to debug and to parallelize.
  • And, to get the cherry on top, we still can do everything you would do with traditional, mutable collections.
Lets write length-10? now: [code language="clojure"] (def length-10? (fn [word] (= 10 (count word)))) [/code] We are defining a function, that takes one argument word. It returns the result of calling the function = which is the equality comparison function (in some other languages == is used for this). (count word) returns the length of the word. So this function returns the boolean true value if the argument has length 10 and false otherwise. Now, we have written all the parts, we just need to call process-lines passing the appropriate arguments. Lets see the whole program [code language="clojure"] (def file-path "words.txt") (def process-lines (fn [path f] (with-open [reader (clojure.java.io/reader path)] (f (line-seq reader))))) (defn length-10? [word] (= 10 (count word))) (def print-matching (fn [words] (prn (filter length-10? words)))) (process-lines file-path print-matching) [/code] As you can see, we are calling process-lines passing our file path constant and the function print-matching. Passing a function as argument is easy, we already have a constant defined with def (print-matching) pointing to that function, so we just use that constant, exactly the same thing we are doing with the other argument file-path.

Smelly, smelly, naughty, naughty!

That's it, that works and it's easy to understand (once you get used to the parentheses), but let's improve it a little bit. length-10? smells bad. If the requirements changed to match 15 characters words, we don't want to change the function name. We are going to generalize the function. Lets go top-down and write first how we would like to use it: [code language="clojure" light="true"] (filter (length-matcher 10) words) [/code] That's better, 10 is just an argument to a function call, we could easily change it to other number or read it from the console. Now, we know filter takes as its first argument a boolean function, and that means that the result of the (length-matcher 10) call must be a function. If you never saw something like this in other languages, your head may be spinning right now. length-matcher is a function that, when called, will return another function. If you think about it, it's not as fancy as it sounds, functions are just another type of values, we can do whatever we want with them. Lets right length-matcher then [code language="clojure"] (def length-matcher (fn [n] (fn [word] (= n (count word))))) [/code] Wow, there is a lot of functioning there, we are writting functional code after all.
  • length-matcher is a function of 1 argument n, the length of the words we want to match
  • as you should remember (fn) is what we use to create new functions in Clojure, so the result of length-matcher will be another function
  • this returned function will also have 1 argument word, and will return a boolean, the result of the equality comparison.
So, length-matcher is a function that, when called with n will return a new function that when called with a word, will return true if that word has length n. Phew.... I need some water... feel free and get some too.... Little test to make sure you are following so far:
  • What is the type of length-matcher?
  • What is the type of (length-matcher 10)?
  • What is the type of ((length-matcher 10) "hello")?
Solutions:
  • A function of one integer argument that returns a function of one string argument
  • A function of one string argument
  • A boolean. The result of (length-matcher 10) is a function, we are calling that function passing "hello" as argument.
OK, now you understand how our new code works and how easy is to change the length matched. Here is the full program [code language="clojure"] (def file-path "words.txt") (def process-lines (fn [path f] (with-open [reader (clojure.java.io/reader path)] (f (line-seq reader))))) (def length-matcher (fn [n] (fn [word] (= n (count word))))) (def print-matching (fn [words] (prn (filter (length-matcher 10) words)))) (process-lines file-path print-matching) [/code]

What else do we get?

Short: Size is a great predictor for number of bugs. Short code has fewer bugs. Abstract:We are embracing the powerful filter abstraction. To understand and modify the code, you don't need to understand the details of the iteration mechanism, filtering is an abstract operation that we translate directly to code. Declarative: We are just expressing what the code must do, not how to do it. For instance, we are not giving details about how the file should be read, nor how the sequence should be traversed, nor how a temporary sequence should be populated. These are details we don't what to deal with, this allows as to work at a higher level of abstraction. Testable: You could test length-matcher, print-matching and process-lines in isolation. The result of each of this functions depends only on its arguments, there is no hidden state or side-effects, that makes them much easier to test. This is a huge advantage of the functional style. Maintainable: If requirements change, and you no longer need to output 10 characters strings, you just need to change the filtering function. Suppose you are asked to match all words with less than 10 characters. That's easy, change your length-matcher (or create a new function) to: [code language="clojure"] (defn length-matcher [n] (fn [word] (< (count word) 10))) [/code] Lazy: This would need a whole new post to explain it but it's so cool that I need to tell you about it. Suppose the file were 10 GB in size. Suppose you don't need to get all 10 character words, but only the first 50. What would you do? Clearly our code is too naive to deal with this use case. You don't want to load all 10 GB of words in memory, filter all of them to get a huge sequence of 10 characters words, and then, only keep the first 50 words. That would be awful. The key here is that our code is not too naive!. Clojure will do it for you: [code language="clojure"] (take 50 (filter (length-matcher 10) (read-lines file-path))) [/code] That will get the first 50 words. But the good news is that read-lines and filter are lazy functions (remember that read-lines is our own code, but is lazy anyway). Those functions will only process information when they are required to. The program will run without a problem on big files, because it won't process more than it needs to. Think about how would you do this in your language of choice. Chances are that you will need to change all the design and interfaces. Clojure embraces laziness in their sequence processing functions.

Homework

The code is not fully decoupled, there is a lot of room for improvements. You could try your hand at Clojure by fixing some of the following smells:
  • print-matching has the double responsibility of filtering and printing the results
  • print-matching "hardcodes" the length of the matched word
  • We are ugly-printing the matching words. To do a better job we need to iterate the result words and print each one individually.

Conclusion

Clojure is an unbelievable expressive language, with some of the most powerful abstractions I have seen in any language. We didn't even scratch the surface here, there is a lot of awesome features you will want to know about. If you are interested, hurry up and get the early bird discount to the three days, hands-on Clojure training we are doing in Winnipeg November 6, 7 & 8. See you there!