SQL is awesome. It let us comfortably access our data on a server. We can define our schema, insert and manipulate our data and of course – this is why we all use it – read the data in a structured way. It allow us to describe conditions that reach beyond multiple data tables. A feature we can not get away with, is the ability to join tables, a way to merge our data on our gusto.

About 10 years ago, NoSQL databases got pretty much hyped. They could access data so much faster or store so much more. The two that I most often get across are mongodb and redis. One is very big, the other very fast.

Still, in business applications, SQL is practically the defacto standard. You have already read, that on my opinion it is better to use a DAO library over a depending on the language fully fledged ObjectRelationalMapper (ORM).

When reading about the goal of MySQL, it was that it should be a very fast SQL database. When I started programming, I was told, MySQL is faster but Postgres has more features. Today, MySQL got more features and postgres got much faster.

Now to my claim: Comparing databases of different types is like comparing apples to pears. Database systems are so rich in features, that we have to be careful, that not all databases become the same database. MongoDB has gained the ability to execute queries, across collections. and MySQL and Postgres now can store and index JSON objects.

I guess, I got a little off topic. So now about JOINs. first of all, they are very very useful, but also very very slow, they need a good amount of work to optimize, when coding, they make the project more complex and they waste lots of resources.

Waste Resources

Let’s say, we have a blog with posts and comments. using JOINs we can load then from database in a single query. That sounds very good and efficient. SELECT * FROM posts as p LEFT JOIN comments as c ON p.id=c.postId WHERE p.id=$0;
Now look at the result:

p.idp.contentc.idc.postIdc.text
1lorem ipsum11lorem ipsum
1lorem ipsum21lorem ipsum2
1lorem ipsum31lorem ipsum3

As you see, we are loading the the post three times, as often as we have comments for that post. This is wasted.

Make Software Projects More Complex

I think it is better, to let data be data. In most ORM, you have to define for every shape of resulting data a new class. Or type definitions. When updating a query, it is also necessary to update that definition as well. To make that process more convenient, I provide on my website the tool, to get js type definitions from json. I also like the tool schemats that pull types direct from the database schema.

But for JOIN queries, you need new types again. Or define a class/type. Now other functions need extra logic, to handle not only the basic types as they represent the data tables, but also the particular query. You might use overloading or other polymorphism techniques, making it harder to read and maintain the actual logic, and finally making the project more complex.

Need Optimizations

Each database system as well as each Engine within the dame DBMS can execute the JOIN queries in different ways. One system could read all entries on the left side, and then find one by one the entries on the right sight. Other systems could work internally with virtual tables, that get dropped after the result has delivered to the client. I found in MySQL, that the execution time is exponential. As a software developer you now not only need to keep in mind how to process data optimal in your Programming Language, but also how the db handle this queries internally. Changing just the order in which you write the query could drastically effect the speed of the query speed in the db.
In order to accommodate performance problems with joins, most SQL databases allow you do define Views or Temporary tables, Materialized Views or how it might be called. This can be solutions to improve the performance, but also increase the complexity of your system.

Are Very Slow

When executing a query, databases provide us with various guaranties. It is common, that a single query is running in something like a transaction. While queries are running, no updates can change underlying data. The result of a query is representing the state at one particular point in time. queries can nicely get executed concurrently, but not so updates.

Using JOIN queries is amplifying this problem exponentially. In the app I am currently working on, from a cloud function, a query that is joining 7 tables is executed on a regular bases. requiring it to have an update lock on over these 7 tables.

The query itself is actually executing very quickly, as all the ID fields are indexed. The problem will arise later, when updates will occur more frequently and they have to wait behind. As such slowdown is very hard to describe and find, because the report for ‘slow queries’ does not contain the true reason why the db is slow, many developers will be tempted to try a new database system. – Of corse, use every database system for two or three years, and you had a happy live as software engineer.

Conclusion

Personally, I don’t need to care what you are doing, but the disadvantages for using JOINs are so huge and unpredictable, That I will avoid them like the plague.

Contents
  1. 1. Waste Resources
  2. 2. Make Software Projects More Complex
  3. 3. Need Optimizations
  4. 4. Are Very Slow
  5. 5. Conclusion