Copperwall Blog

About
banner

Photo by @helloimnik

Make shitty versions of software libraries you use

Read through software projects you use

I've been spending some time over the last few weeks picking some software projects that I use and attempting to write small simple versions of them. It's been a pretty neat way to learn how or especially why certain things work the way they do.

The biggest example of a software project that I've used professionally for years is express. I've used express for practically any web project I've built with node, and chances are you've used it too if you're a backend node dev. Maybe sometimes you've questioned how the whole middleware pattern even works, or why you have to call a callback function instead of returning a Promise to advance to the next middleware function. Maybe you've been stepping through your web application and you find yourself in node_modules/express/lib/router/index.js and you can see the matrix and somehow you now know kung fu. Ending up in a node_modules file might be kind of intimidating, but if you look a little further you might realize that that scary module code is easy enough or small enough for you to grasp if you take a little time out of your day to read it.

For example, express only has 12 files for all of the request, response, routing, view rendering, nesting, and middleware chaining functionality it has. It is totally possible to read through express in a weekend. You probably won't understand all of it right off the bat, but you'll learn some neat implementation details and you can always take notes of parts that you don't quite understand to go back to later. The first time I was reading through the router logic, I came across this kind of weird detail where if you register a middleware function that has more than three arguments, express just silently skips it and moves on to the next middleware function. They don't write a debug message if you're in development mode, or even call the function knowing that the fourth argument will just have to be undefined (which sounds like a more JavaScript-y thing to do to me). Would you have done that differently if you were writing an application library? Up until that point I was reading through the source and looking at it like "yeah I guess that makes sense" or "oo cool I wouldn't have thought of that", but that was the first instance where I thought that I'd rather have done something differently. Which leads me to my next point...

Rewrite software projects you use (AKA make shitty versions of them)

I believe that reading through software projects and libraries that you use is helpful to have a better understanding of how the whole application works, but I think the next step in that journey is to pick some small libraries and make really simple or shitty versions of them. Continuing on with the express example, you can make an express clone that only implements app.use. Doing just that gives you an end product where you can handle all of the routing and rendering outside of the framework. In order to get to that point you need to be able to create an application object that can have middleware registered, you have to decide how you want to store middleware, how to chain each of the functions together and pass in a next function that will eventually call the next middleware. You'll also need to figure out how to run an HTTP server to get the Request and Response objects/streams that you'll pass to your registered middleware functions. As a next step maybe you can add some default middleware functions like express does to handle query string parsing and body parsing. Do you want to build in functionality for automatically parsing JSON request bodies or do you leave that up to the user? It's totally your call. Maybe you can parse the query string and let the user access it as SearchParams instance instead of a plain old object. Something you can aim for is to replicate the library's interface and then try dropping it into a project that uses the reference implementation. You can get a lot of insight from drop-in testing it and see which parts break or aren't implemented yet. I've been trying this out with Rapid, an express clone. Is it going to replace express? Hell no. Am I going to use it instead of express for all of my personal projects? Probably not. It's been fun so far and there's plenty of more functionality to add or recreate. Also if you feel like making a PR on Rapid they're more than welcome.

Don't just rewrite the projects verbatim, though. Copy the interface but try the implementation out for yourself, maybe you'll get the opportunity to arrive at the same decision crossroads the original implementors did. You could end up making a different choice or at least appreciating the different ways you can tackle the problem. You can always go check your work against the reference afterwards and improve it if the reference has a better solution. It probably will if it's a large project like express, but maybe you'll come up with a better solution! Big projects can be encumbered with complexity from maintaining backwards compatability for features that you might not need or care about.

Another cool project to recreate is DataLoader. DataLoader is a tool to batch load and cache resources. You make an instance of it by constructing it with a batch function that takes some keys and returns a Promise that resolves to the values that those keys relate to. The batch function could run a SQL query or hit a REST API or some other data source. The user interacts with the instance by calling a load function with a single key and that returns a Promise which will resolve to the key's value when the batch function successfully loads the data. The cool part is that DataLoader schedules the batch function to run after the current frame of execution by using some Node asynchronous primitives and caches the results. If you ask for the same resource multiple times throughout your web request lifecycle, DataLoader will only load it once. Also you can ask for individual resources throughout the request lifecycle and DataLoader will batch load them at the end of each frame of execution.

DataLoader's implementation is only one file (excluding tests) and there's even a YouTube video from one of the creators that covers the entire source. Writing your own implementation of that means you'll have to cover concepts like creating promises, caching, using things like process.nextTick for asynchronous operations and deciding how to handle errors from invalid user entries like bad batch functions.


Hopefully this left you with some projects in mind that you use often, but don't really understand how they work under the hood. If you have any other project recommendations that would be interesting to read through or rewrite feel free to tweet me at @copperwall.


Creative Commons License