How much code is enough?
I’ve had a discussion recently on Twitter about the amount of code we produce versus readability. I am a backer of concise code, using every character very intentionally. This doesn’t mean that I would opt to create code where every name is either one char or one verb, absolutely not. I strive to achieve a good balance, where there is just enough code to make it work while being easy enough to read and grasp the concept encoded in it. It’s a damn hard task, and I almost always have doubts whether I’ve got it right. But there is a language that makes this task a little bit easier. Yes, I'm speaking about you, F#!
This post is my contribution to F# Advent Calendar 2017.
The F# story
A few years ago I spoke with my friend who was at that time doing some F# work. I wasn’t really charmed by the language at the time, as I was used to C-like syntax and OO concepts from languages like C# or PHP. But then he said a thing that stroke me: “I write (just) a few lines of code every day”. Still, those lines of code delivered a lot of value, and I’m pretty sure they were more readable than hundreds of lines I was producing each day (many automatically generated by tools like ReSharper).
At that moment I’ve decided to put some more interest into F# to find out why so little code can be as valuable (if not more) than the piles of code produced by many developers every day. Enlightenment came to me on Mark Seeman’s presentation at WROC# conference. I’ve got an example of code which was completely new to me, yet just after few minutes of explanation I was not only understanding what’s going on, but I was ready to extend it, to play around with it. That bought me completely.
So what is so special about F# that helps to reduce the amount of code you have to write to solve the issue? There are some language design decisions that make the code more terse (some of them inherited from ML family, some not), like the compact syntax, good type inference and other type system features (like built-in tuples or algebraic types). Many of them are common across different functional languages, and some of them also make their way into more object-oriented languages as well.
There are lots of things to talk about on this topic, but let's focus on a few key ones that let you keep the code concise yet readable. Following paragraphs present the most common concerns which affect (directly or indirectly) those aspects.
All the little things
First of all, I make sure I’ve reduced all the noise. For me, this includes things like non-critical comments, non-significant whitespace, not required braces and so on. The less noise you have, the more focus you can put on things that really matter. This is the place where F# actually shines - in most cases its syntax is very terse (yes, I know, lambda syntax...), and significant white space significantly reduces the need for braces, making code naturally structured. Powerful type inference reduces the need to specify types explicitly, leaving you the decision whether you need to add it (usually to increase readability).
Appropriate naming
Naming is hard - unless you don't care. Is a
a good name? It may be, in some cases, but can be totally confusing in others. Fortunately, we have now a possibility to use much more verbose names. Of course, people leverage this possibility to the extreme. Let's compare some extremes, just for fun:
let area a b = a * b
let calculateTheAreaOfRectangle lengthOfTheRectangle widthOfTheRectangle = lengthOfTheRectangle * widthOfTheRectangle
In terms of readability, the former version is much easier to digest and grasp, but it may be hard to reason about it. The latter takes ages just to read, but once you're done, you should know everything about this function. Unless you've run out of memory ;) You could get some more reasonable version as well:
let areaOfRectangle length width = length * width
Looks better? Hope it does. But let's not stop here. One of the most important aspects to consider when naming anything is the scope of the name. Using the first, a somehow extreme example in the right scope may totally change the way you receive it:
module Rectangle =
let area a b = a * b
let perimeter a b = 2 * (a + b)
In this case, the short form seems to be very appropriate, and I don't see a problem with using it. Of course, the domain also matters - in case of math names like a
and b
are commonly accepted and used. But when we change it to something more typical to business apps, things may change:
module Person =
let fullName a b = sprintf "%s %s" a b
Feels like something is missing here? Probably yes. Knowing the context you shouldn't need to guess or look into the implementation to know what are the parameters. Here more elaborate names will do a better job:
module Person =
let fullName firstName lastName = sprintf "%s %s" firstName lastName
This is the way I approach the naming and I think it works quite well. Keeping names brief will reduce the amount of code you need to read and also visual noise. The one thing to remember here is to make sure the name gives the reader enough information to understand the code. It's not about length, it's about saying enough without introducing noise. And as you can see, the default functional approach makes F# a good companion, letting you keep things concise without losing the readability.
The right amount (and kind) of abstraction
The solution for every problem is another layer of abstraction. That's the thinking I've seen many times, and the one I highly disagree with. Abstractions in the code are very useful, as they allow us to generalise, separate and group things or concerns. They can also increase testability of the code. But they are just a tool. The goal of coding should never be using a tool unless your goal is to learn to use it. As with any tool, it gets often abused, causing loads of unnecessary code being created to meet the needs of the abstraction. This happens in many cases, let it be over-generalised code, lasagna code, multi-level inheritance and so on.
I've recently stumbled upon a project which sole goal was to store or retrieve some information, almost a classic CRUD. And what I've encountered inside? Four projects to solve this super-advanced issue (plus one for tests). Things got even more interesting inside, where I've found 3 layers of code, split across those 4 projects (if your math instinct is also nagging you, the fourth one was for data models). To make things worse, most of the methods were one-liners, passing arguments as a parameter to another layer (Controller to Service to Repository). And where was any domain-related logic in this project? In the Repository. That's just nuts!
So what were tests doing there? Basically checking if you pass right parameters from Service to Repository layer. That's the thing you could just verify with a glance of the eye and forget about it. My recommendation? Compact everything into one project, remove one of the layers (basically join Service and Repository) and you'll remove about 50% of the code in the project. Oh, and probably remove unit tests and write some integration tests instead. They may have much more value than unit tests in such case, as they will be testing some actual logic.
One more example would be the use of interfaces. If you are to follow SOLID principles, you'll most likely end up with many single method interfaces. They will probably be used to extract implementation details to enable testability (e.g. to not to read from a file in case the code is under unit test). But this actually introduces a lot of code - and can be largely reduced by leveraging more functional approach of passing functions as parameters. Let's compare those two approaches:
// Constructor injection approach
type IReadFromFile =
abstract member Read: string -> string seq
type ReadFromFile() =
interface IReadFromFile with
member this.Read path =
System.File.ReadLines(path)
type Files(readFromFile: IReadFromFile) =
member this.GetFirstLine path =
match readFromFile path |> Seq.tryHead with
| Some text -> sprintf "Line #1: %s" text
| None -> ""
// Function injection approach
module Files =
let readLinesFromFile path =
System.File.ReadLines(path)
let getFirstLine readFromFile path =
match readFromFile path |> Seq.tryHead with
| Some text -> sprintf "Line #1: %s" text
| None -> ""
I skipped the coupling logic, but it's a similar amount of code for both cases. There's obviously less code with the function injection approach, not only in terms of LoC but also in the amount of the artefacts one has to deal with. What's more, when you want to test this code, creating a mock function also yields less code than the mock implementation of the interface:
// mock implementation of the interface
let ``printFirstLine should return first line from file`` () =
let mockReadFromFile =
{ new IReadFromFile
with member this.Read path = ["foo", "bar"] }
let files = Files(mockReadFromFile)
files.GetFirstLine "" |> should equal "foo"
// mock function implementation
let ``printFirstLine should return first line from file`` () =
let mockReadFromFile path = ["foo", "bar"]
Files.getFirstLine mockReadFromFile "" |> should equal "foo"
Although interfaces may be very useful in some of the cases, I would strongly recommend to not use them as a silver bullet to solve any potential issue. They introduce a lot of noise to the code when overused, without any clear benefits.
Those are just some of the examples where introducing abstraction can lead towards bloated code. Properly used abstraction increases code readability and helps to achieve compact but powerful code. I always think twice (or more) when I'm about to introduce an abstraction, because the downsides of using it may diminish the potential benefits. And even if I decide to introduce one, I try to choose the one that is less of a burden to the codebase.
And speaking about F# - some of the features that help with getting the right abstraction are partial application and union types. The former one allows you to have more granular dependency injection, helping to avoid leaky abstractions and unexpected dependencies. The latter one, combined with powerful pattern matching and active patterns, lead to clearer separation of different cases within the same scope, not forcing you to find the common denominator (when there even may not be any) just to be able to use things in one place.
Right grouping
Another aspect that affects the way you write code, and in the result - how much code you need to write, is grouping the code. Most of the people I speak with see the benefits of domain-based approach when it comes to designing the code, yet still, lots of projects suffer from poorly grouped code, based on the technical aspects rather than domain concerns the code is tackling. This usually leads to leaky abstractions and unnecessary glue code, not mentioning the problems with readability.
There is a catch here. When you group things by domains and separate them properly, you may actually end up with more code. No worries though - it's OK to have more code in places where it actually adds value. So what are the things to look at when comes to grouping? First, recognize what is the domain of your project. If there are multiple domains/subdomains, then try to list them all. Next, try to identify the operations that may happen within those domains. It should be fairly easy if you're working for example on some web-based API, where each endpoint may map to some domain operation. Another step can be figuring out what are the shared components/services. Knowing that you can group things into two main categories - operation specific ones and shared ones. This leads to clear separation between technical details (which usually land in the shared category) and domain logic (found in operation specific category) and enhances discoverability - finding things within a project is much easier.
The grouping concern also applies to making a decision on where to put different pieces of code. For example, if you need to extract a bit of code from your function, you may want to create another function to capture the logic (e.g. complex filters or conditions). In the object-oriented approach, you would probably use a private method to do it, but there is a problem. If you'll end up with more than one such function, especially not reused in multiple methods, you end up with the problem described below - ordering. Is it good to mix private and public methods together, or should them land on the top/bottom, and what if you have a mix of methods with one reference and those with more? This is where local functions come into play - they help to capture the code that doesn't need to be exposed outside of given function, inside that function. Using local functions may not reduce the amount of code needed (maybe a bit, if you'll approach the naming principles mentioned above), but it will reduce the time needed to switch between the locations to grasp the logic inside the function, and also make things like refactoring much easier (if it's local function then you don't have to care about other users of it, as there are none).
Again F# gives you a lot of tools to work with when it comes to grouping. First of all, you're not forced to keep things in separate files (because of compiler or convention) - you may define data structures, accompanying functions and the API in the same file. This helps to gather all the related logic in one place, increasing readability and reducing the noise within the project.
Correct ordering
There are many schools of ordering elements in the file, especially when it comes to languages with multi-pass compilers. Especially in object-oriented languages with encapsulation, this becomes sometimes almost a flame war - should public elements go first, or maybe private ones, what about protected ones, not to mention grouping and ordering fields, constructors, methods, properties... A lot of things to fight for. Fortunately, it's a little bit solved in F# by the single-pass compiler - everything you want to use needs to be declared already. That naturally puts things like public APIs on the bottom and leads you to read files / modules / namespaces just like recipes: use those tools, take those values and functions and take these steps to get the solution. Some find it quite limiting, others treat it as a relief from constant battles.
But what does it have to do with the amount of code? It turns out that it may have an impact on it as well. Due to the fact that you need to have at least a glance at what is already declared, you're more likely to reuse some existing components when you're about to add something new. It may lead to writing less unnecessary code, at least in some cases. The same applies to files - the most general things usually bubble to the top, while most specific things lean closer to the bottom.
So how much code is enough?
This is, of course, a very subjective topic. First and most important thing is that code needs to solve the problem it's meant to solve. Next, likely same important is the question of readability. As I've mentioned before, it very much depends on the context, as the same code may be perfectly readable in one case, and completely illegible in other. Some, especially inexperienced developers, tend to use a lot of white space and braces to separate things, while others, myself included, may prefer to keep code clean from those extras.
Apart from personal preferences, the choice of tools has also a big impact on this. As I've shown above, F# is a pretty good choice in this regards (and not only!), as are many other functional languages. With new features coming into more object-oriented languages like C# or Java, it becomes easier to keep the code concise as well - make sure you leverage their potential.
You may not realise it, but we spend a lot of our development time reading the code. Having briefer code we can shift it to actually thinking more about the solution we're trying to produce, and eventually to writing some code. I hope you will review your practices of writing code to make sure that you communicate things clearly and concisely, helping to make a better use of our limited time. Howdy!