Create a many-to-many ActiveRecord association in Ruby on Rails with has_many :through and has_and_belongs_to_many

A common obstacle when building web applications are relational database associations. Without these it would be difficult to decrease the amount of duplicate data as well as increasing the overall database efficiency. The most common relationships are the: one-to-one, one-to-many, and many-to-many.

In this article I'll go through the creation of a many-to-many relationship with the built-in Ruby on Rails ActiveRecord associations, has_many :through and has_and_belongs_to_many:. Additionally, the specific differences between the two will be examined, but before all that let's first get some background on what is a many-to-many relationship.

The Many-to-Many Relationship

In relational database schemas, a many-to-many association is when a database table is built in a way that relates it to another table through a joining table. There needs to be at least three table to accomplish this. This allows the developer to query the relationship between the two tables gather their respective collections.

Diving into an example, let's say you have an application that requires data for programmers, clients, and their connected projects. You could start by saying that a Programmer is a member of a Project as well as a Client. This would lead you to design your database schema to resemble something like this:

An ERD example of what not to do

Unfortunately, you now have set yourself up for a headache later on. To use this database you'll have to contend with having to create a project_id foreign key in each subsequent table that needs an association with the Project table. This can get hairy pretty quick.

Referential Integrity
A standard of databases that states data must remain consistent and valid when dealing with a relationship between two tables.

As an example, you could delete say a Project record that is related to a Programmer and Client. Well now both the Programmer and Client records contain a project_id key that refers to a deleted record. Which would be a violation of basic referential integrity.

So, that didn't work out very well. Let's take another stab at it from the opposite direction.

If every Project has a Programmer and Client attached to it, then we could say that a Project belongs to a Programmer and a Project also belongs to a Client. Here it is organized into a new entity relationship diagram.

An ERD example of a Many-to-Many association

Let's double check it to make sure it passes the previous problem we ran into.

If we were to delete a Project we would lose the project's data as well as the two foreign key fields: programmer_id and client_id. However, we don't end up with any invalid records in our system. Additionally, both the Programmer and Client records are completely usable as stand-alone data.

Good job! Looks like we successfully created a working database structure. Next I'm going to show off how to create these tables from the command line and the Rails association methods that make using them easier.

How to leverage ActiveRecord Associations

There are a few different types of ActiveRecord associations. has_one signifies that a model has one record of another specific model. has_many is the same except there can be many records. belongs_to is used to show that the model on the other side of the association is part of or belongs to the associated model.

These are some of the basic fundamentals of ActiveRecord Assocations, at least from a theorhetical standpoint. However, for the purposes of this article we will focus on many-to-many associations which are accomplished with two of the more challenging associations that can be implemented. The has_many :through and the has_and_belongs_to_many:.

has_many :through

Rails Tip
I am using the shortened version of the rails generate command which is simply rails g. Nifty!

The first step to implementing a has_many :through association is to run the rails generator to create the model and the migration files. We will use the entity relationship diagram (pictured above) to create our application. I've listed the appropriate migrations below. I recommend using the rails generate model syntax, as it not only gives you a model but the matching migration file as well.

rails g model Programmer name:string
rails g model Client name:string
rails g model Project programmer:references client:references
rake db:migrate

Pay close attention to the Project model command syntax. The :references syntax is a shortcut for creating an index on the preceding field name, programmer and client in this instance, as well as marking them as foreign key constraints for the programmers and clients database tables.

What is a Join Table?
A join table acts as an intermediary between two or more tables. This provides a convient location for shared database fields to be stored (project_name or budget would be examples) as well as associating the tables to each other.

For example using the programmer field, the :references syntax will create a reference to a model with name programmer by creating an id called programmer_id on the projects table and will constrain it to the programmers table. Once migrated (hint: rake db:migrate) you will be able to use the association between programmers and clients through the project join table.

Next, we'll need to come up with the proper Rails association methods to use. I find this part is easier when they are spoken out loud or in my head before I go about creating them. Not only is this a good practice to get into with the Rails methods but also before you create the initial database diagram.

This may sound silly, but it can really help you to catch any logic in the associations that doesn't make sense. So speaking these out loud.

"A Programmer has many projects."
"A Client has many projects."
"A Project belongs to a Programmer."
"A Project belongs to a Client."
"A Programmer has many Clients through a Project."
"A Client has many Programmers through a Project."

Alright, those seem to make sense (we already know that they should work based on the discussion at the beginning of the article). Now we need to add the Rails methods to our respective model files.

# app/model/programmer.rb
class Programmer < ActiveRecord::Base
  has_many :projects
  has_many :clients, through: :projects
end

# app/model/client.rb
class Client < ActiveRecord::Base
  has_many :projects
  has_many :programmers, through: :projects
end

# app/model/project.rb
class Projects < ActiveRecord::Base
  belongs_to :programmer
  belongs_to :client
end

By utilizing these associations we now have access to a number of helper methods (16 to be exact). Here they are directly from the RailsGuides documentation.

Built-in Association Methods

collection(force_reload = false)
collection<<(object, ...)
collection.delete(object, ...)
collection.destroy(object, ...)
collection=(objects)
collection_singular_ids
collection_singular_ids=(ids)
collection.clear
collection.empty?
collection.size
collection.find(...)
collection.where(...)
collection.exists?(...)
collection.build(attributes = {}, ...)
collection.create(attributes = {})
collection.create!(attributes = {})

This is now a properly set up has_many :through association which can be utilized by the preceding methods. I've given examples of collection.create(attributes = {}) and collection below.

Create Association

programmer = Programmer.create(name: 'Josh Frankel')
client     = Client.create(name: 'Mr. Nic Cage')

programmer.projects.create(client: client)

List ActiveRecord Collection

programmer.clients
 => #<ActiveRecord::Associations::CollectionProxy [#<Client id: 1, created_at: "2016-01-25 18:45:00", updated_at: "2016-01-25 18:45:00", name: "Mr. Nic Cage">]>

has_and_belongs_to_many:

An alternative way for creating many-to-many associations in Rails, is to use the has_and_belongs_to_many association or HABTM for short. The setup is very similiar to the has_many :through association. The migration is slightly different though for the has_many :through association so you'll need to make sure you are running these on a clean database (or you can rollback to previous migrations and create new ones).

rails g model Programmer name:string
rails g model Client name:string
rails g migration CreateClientsProgrammers programmer:references client:references
rake db:migrate

Custom names for join tables
If you really want to use a different name for the join table you can add the join_table: :database_table_name to your model association to point it to another database table.

Format: has_and_belongs_to_many :clients, join_table: :projects

One of the first differences to notice is that we are not creating a model for the join table (projects). has_and_belongs_to_many doesn't require you to creating a model for this table. Additionaly, this type of association relies off of the two other tables for its naming convention, clients_programmers in this instance.

So again we need to create the proper associations in our model code. Speak these out loud. Again they are slightly different than the has_many :through association.

"A Programmer has many Clients"
"A Programmer can belong to a Client"
"A Client has many Programmers"
"A Client can belong to a Programmer"

And, in our respective model code.

# app/model/programmer.rb
class Programmer < ActiveRecord::Base
  has_and_belongs_to_many :clients
end

# app/model/client.rb
class Client < ActiveRecord::Base
  has_and_belongs_to_many :programmers
end

Once again here is how you would utilize the associations in your code or console.

Create Association

programmer = Programmer.create(name: 'Josh Frankel')

programmer.clients.create(name: 'Mr. Nic Cage')

List ActiveRecord Collection

programmer.clients
 => #<ActiveRecord::Associations::CollectionProxy [#<Client id: 1, created_at: "2016-01-25 18:45:00", updated_at: "2016-01-25 18:45:00", name: "37Signals">]>

Now, you might be asking yourself "Why would I ever use the has_many :through association when has_and_belongs_to_many: is much easier to setup". Well the next section will explain the downsides of it and why generally using has_many :through is the best practice.

When should I use them? has_many :through vs HABTM

While it looks like less work to use a has_and_belongs_to_many association, it actually can end up costing you a lot of time down the road.

Validations

Imagine you build a system with the aforementioned HABTM association. One day your client (your actual client not the database table in the article) requests that the Project table must always have data for a name and description field filled out or else it will be invalid. With a HABTM association there isn't a model in which to place the validation code. While with has_many :through you have a model immediately ready for usage which would allow you to do write something like this:

class Project < ActiveRecord::Base
  validates :name, presence: true
  validates :description, presence: true
end

Shared Functionality

Furthermore, suppose that this same client (again, the person paying you for the application) requested that anytime a project's deadline_date was less than the current date that the system would automatically set the project as closed. With a HABTM association you would need to create an additional class or potentially a concern to package this functionality. While with has_many :through having the model available gives you a logical place to bundle the shared methods.

Here is an example of the shared methods while building upon the already existing model validations from above:

class Project < ActiveRecord::Base
  validates :name, presence: true
  validates :description, presence: true

  def status_update
    close_project unless active?
  end

  def active?
    current_date < deadline_date
  end

  def close_project
    self.status = :closed
  end
end

There isn't anywhere to place this without creating more files with a HABTM association. This is by far the largest downside. has_many :through on the other hand gives you a model for your join table, allowing you to validate fields and add shared functionality through methods.

Difference in creation

Here is an example of a difference in some of the built-in methods that are created when you use either one of these associations. For has_many :through you can create objects on the collection as seen above in the following format:

# First example
client = Client.create(name: 'Chris Cornell')
programmer.projects.create(client: client)

# Another way of doing the same thing
programmer.projects.create(client: Client.create(name: 'Keanu Reeves'))

Unfortunately, since we don't have a model with the configured HABTM association we'll need to build the association on the specific collection. Perceptive readers might have noticed this earlier in the HABTM create association example. Here it is again:

programmer.clients.create(name: 'Mr. Nic Cage')

Notice how for HABTM, we use programmer.clients.create(client_attribute: value) to build the collection, unlike a has_many :through where we can use the join table model like so: programmer.projects.create(client: client_object). Just another difference between the two association methods.

Existing database schemas

The best use case for a HABTM association is when you are working with an already built database. For some reason maybe the has_many :through association just isn't going to work with the existing structure. Or maybe the system is already utilizing a custom HABTM association and removing it could irrevocably break parts of the system.

A little dramatic I know.

These sort of cases typically happen with legacy applications as HABTM is an older feature of Rails. With some good refactoring they could be removed from the system, given there is a good suite of tests configured.

Recommended usage

RailsGuides
"The simplest rule of thumb is that you should set up a has_many :through relationship if you need to work with the relationship model as an independent entity. If you don't need to do anything with the relationship model, it may be simpler to set up a has_and_belongs_to_many relationship (though you'll need to remember to create the joining table in the database).

You should use has_many :through if you need validations, callbacks, or extra attributes on the join model."

I would recommend that 99% of the time that you stick to creating has_many :through associations. They end up being much cleaner as well as saving you time down the road by giving you a model that is tied to your database's join table.

The other 1% of the time, has_and_belongs_to_many: is very useful for legacy relational database schemas as well as older systems. Sometimes HABTM makes a lot of sense for these situations.

Conclusion

While has_and_belongs_to_many: associations are quicker to setup they tend not to scale well as your applications requires more functionality. has_many :through on the other hand are very versatile and give the added benefit of having a model directly mapped to your join table.

Both types of many-to-many associations require the creation of a migration for a join table, a table that sits between two other tables and helps to association them to each other. This table is directly mapped to a model when using a has_many :through association but not a HABTM.

TL;DR

has_many :through

Requires a join table in the database
Creates a model for the join table which allows shared functionality and validations. 1 to 1 matching between database table and model.
Scales better and is more versatile
Allows for creation of objects on the collection programmer.projects.create(client: client_object)
Takes slightly more setup work

has_and_belongs_to_many:

Requires a join table in the database
Does not map a model to the join table, which means one less file but no validations or shared functionality without the creation of a second class
Allows for custom join table names via: join_table: :database_table_name
Faster to setup
Best for legacy architecture

So we've gone through the basics of designing a good many-to-many database structure, the console commands needed to create proper migrations, and the different setups of Rail's two many-to-many association methods.

Was there something you would like added to this article? Got a better use case for the has_and_belongs_to_many: association? How about a favorite association? Weigh in on it in the comments below.

Add multiple columns to an existing table with a Rails migration

A development roadmap for 2016

Create a many-to-many ActiveRecord association in Ruby on Rails with has_many :through and has_and_belongs_to_many

The Many-to-Many Relationship

How to leverage ActiveRecord Associations

has_many :through

Built-in Association Methods

Create Association

List ActiveRecord Collection

has_and_belongs_to_many:

Create Association

List ActiveRecord Collection

When should I use them? has_many :through vs HABTM

Validations

Shared Functionality

Difference in creation

Existing database schemas

Recommended usage

Conclusion

TL;DR

Join the conversation