DAO in application development stand for DomainAccessObject. It defines a data source and some standard methods how to access data source. Often, for each collection of data, the application has a single object, providing the access trough functions. Sometime applications have such an object for the whole runtime of the app, other create one object for the needed collections on a request or transaction basis.

Dao libraries are a bit like ORMs (Object Relational Mapper) but a bit more primitive/simple. ORMs try to reflect a class hierarchy into the storage engine of the Data source. (Now I often use the word Data source, usually this is a Database, SQL or NoSQL. But can also be an other system that is accessed over the internet, Such as REST or soap services.) DAO libraries try to make the data more direct available to the application, making it easy to reason about how code and data changes have an effect on each other, and also make it more straight forward to determine performance problems.

Usually you can use an ORM framework, and work with it, as if you have a DAO library, you just don’t use all the nitty gritty features. At my last company, we where using a DAO framework. It was not very good. Much to complex for its functionality. It was heavily focussed around SQL, it was not maintained, node-style callback based and it had cut of any stack trace. So when you see an SQL error, you have to figure out where in your code you made the mistake. I was even monkey patching it, to ease the situation.

In parallel I started developing my own first DAO library: tMySQLdao and tmysqlpromisedao. Actually, they started when I started an other personal project. I decided not to use any ORM or DAO library, but the plain mysql module for nodejs. I quickly saw I need the same kind of methods I basically worked at work with. **GET-BY-**Property methods are very common. getById, getByName/Email, so I started a small lib. I then needed to load relational data, so I name this kind of methods: **Fetch-**Relation. UserDao.fetchPosts, fetchLikes, these methods would take user objects as parameter and return the Posts or Likes. The fetchMethods would also mutate the given user object by adding the relations to the correct user or input objects. This is to avoid some code I needed very often with indexBy or group by, to complete a JOIN operation. I did the join - operation in memory, because I find that databases are not very good at this. Joins in DB are very slow, as they provide some guarantee about query the state at one point in time. When joining Objects however in memory of the app-server, the db can work much more effective and more parallelized.

Working on that private app, I added more and more functionality to that internal mysql library. I thought is that becoming an ORM now? I wanted to keep it simpler then a full blown ORM. As I believed popular ORMs actually produce very unoptimized SQL queries.

With that, I published tMySQLDao and later tmydqlpromisedao on NPM. At work, I often got a requirement, to process some data. later I often got a requirement, to accept lists of items and process many items at once. That is where I needed all the many indexBy and groupBy logic. and sorting the items around by hand. After I had a converted the processing of a very complex entity, I got hard into thinking if this could be avoided. The result was tcacher. tcacher was different from other libraries, I did not cache the results, but I cached the queries. and if the same query was executed at the same time, I was able to only execute the query once. and if the queries you run are very simple getByProperty functions, this effect is happening a lot. also, when one query requests a list, and an other only need one, I was able even to save such slightly different queries, while guaranteeing the correct result. This was astounding, I was able to optimise lots of the logic in our code, by improving the DAO calls, and not even touch the logic code at all. Only much later, I found that within the GraphQL community, they had a similar solution, of caching the requests, but they named it batching and implemented it with data loader. With this library for mysql I was then very happy, and I am still today. And I still even after more then 4 years, I believe it is the best data access solution I have seen so far.

# TMongoDao

It took me quite a while to come up with a solution to implement a Dao library, similar to SQL, but for mongodb. mainly, because mongodb can store deep data structures on each document. A second problem was the mutating of objects. In sqlDao, it was clear, a key they is an object or array, is not in the table. In mongodb there is no schema, so I decided that I need one, similar to the very popular mongoose. I settles with superstruct. It has schemas, that even can be serialized themselves. if there is the extra property from a mutation, the change is denied. Simply denying a save call, was not good enough dough. From underscore there is the _.pick method, it can remove unknown properties from objects, but only flat, so I implemented tpicker, that is leaving only the properties defined on an object compatible with superstruct.
other then it was in the mysql schema definition, I needed to provide the relationship definitions not within the schema, but as extra key to the DAO definition. It worked, it worked beautifully.

so good, in fact, that I provided some extra tooling. with tfilemonk you can prototype an app with a JSON file and when going to production simply deploy to a mongodb without changing any line of code.

# Cross Database Joins

Having a similar DAO implementation for mysql and mongo db, I was quite proud. because it would already be possible to change the definition from mysql to mongo Dao, and have the same app work on different database systems, with very limited effort. And next to no change to the application services.

A long time later, a new idea came to my mind. As I do the Join operations in memory already, could I fetch data from different data source and join them together? Turned out: yes. The daoRegistry got introduced. The DAO registry has a very special implementation, because it is a simple JavaScript object. By adding a check to the fetch methods, if there is a DAO on the registry, and in that case using a provided get method instead of a query to a collection/table on the same DB, and simply calling a getByProperty method, it was possible to do the join just so, in memory. across different databases. This is empowering, because it will use all the functionality of that related DAO, such as query batching or result mapping. It now is possible to join data across databases and different db engines.

# Redis Database Engine

After implementing the mongoDAO library, there was still the question how to work with a redis database. Redis is much different. It is a key value store. You can store objects, but only flat objects (maps). If you want to index your collection by some field, the application server needs to maintain those indexes. When inserting an object, the entries to the index need to get created, on update they have to get moved and of course, they have to get removed. It would be ok, if the remove has some error, but insert and update should not fail. In the project tkeyvaluedao, I was able to use a schema similar to mongoDAO, and maintain the indexes. With the implementation of the index maintenance, I actually found how powerful this method can be. As now, the indexes can even get stored on separate servers. Having one server only dedicated to store and serve objects by id, the load on that db server is greatly reduced. and can potentially store and serve much more data.

Having this implementation ready, I found, we would even be able to remove load from other databases, by removing indexes on these servers. other db systems can also profit from external indexes.

This pattern is so powerful!

# Conclusion

Having general Data Access APIs defined, that can be applied to different data source, can open up unexpected powerful opportunities. Being able to rely that certain methods with a defined behavior exist, allowing great leverage, when having databases work together. Being able to Join data, between SQL, document database, key value store and potentially any other data source, and having more control over the dataflow, it is possible to build highly scalable systems. Scaling from a single json-file to a giant database cluster and even more specialized architecture, is an ability I don’t want to miss.