Three ways of building and altering a users table Synopsis John Doe just started a new job as a CTO in a Uber-like startup. He will have to scale, disrupt and innovate. But for now, he needs a users table. : Users are simple entities with ID, email, password, name and role (customer, driver or admin). Day 1 : John adds a phone number to all users. Day 2 : New use cases, business is growing. Users need country, gender and optional company. Day 3 : Let’s add a bio to driver users only. Day 4 : Some clients want to add custom properties to their customers account. Day 5 John could have met the day-to-day needs in an infinite number of ways. Let’s examine 3 scenarii amongst others (John chooses PostgreSQL). 1st Scenario — Bigmouth Buffalo’s Path Day 1 John creates a table with 6 columns: , , and users id ( uuid , primary key), email (varchar) password (varchar) role (enum: customer, driver, admin), created_at updated_at ( timestamps ). Day 2 John alters the table and adds a column is nullable: the users created on day one do not have a phone number, and John will maybe never know their phone numbers. users phone ( varchar , nullable). Phone Day 3 John alters again the table and adds , , and users country (varchar, nullable) gender (enum, nullable) company (varchar, nullable ). Day 4 Biographies are only for drivers. There are at least two possibilities: Create a column (NULL for non-drivers) nullable bio text Because it’s only for drivers, create a table with FK on and a column. drivers_biographies users bio text Each possibility has its pros and cons, John chooses to add a new table because he dislikes having too much NULL and he is certain he never will add biography to other account types. Maybe he is wrong. Day 5 Clients can now ask for custom fields on their users. So it’s some kind of John add nullable new column on demand on its users table (client ask for users’ pets names, John adds a pet_name column). But it does not sound scalable to him, maybe he is wrong. dynamic variable properties. could So, he decided to create two new tables: and . The table has 2 columns: properties users_properties properties id (uuid) Example: “Pet name” label (varchar) The has 3 columns: users_properties user_id (foreign key on _users_ table) ( ) Example: the id of “Pet name” property_id foreign key on _users_properties_ table value (uh… Something? it depends. So, Text I guess?!) Now every time a client ask for new properties on its users, John’s application adds a new entry in table. Then, for each new user with custom properties, some lines are added in users_properties table for its own properties ( pattern). properties EAV-like Post-mortem — Mary Foobar analysis There seems to be some problems with this approach: is an optional column which is empty most of time. There is only one for now, but what about day 12? company is a one to one table, it’s overkill. drivers_biographies Mixing users’ data in two tables is a mess. Why are some fields ( , ) in table and some other ( , ) in table (with a different schema)? John’s database architecture could be considered inconsistent. company gender users mobile_phone pet_name users_properties Querying a user with its properties would be harder than usual. Mapping to objects (assuming John uses this kind of thing) would be complicated too. There are 4 tables at day 5, only for describing users. Every new column in table has to be nullable: theses columns were added the first insertions of users, which have no value. users after Quick and dirty implementation of the Bigmouth Buffalo’s Path. Maybe John could have built a better architecture. 2nd Scenario —Bluefish’s path Day 1 John is foresighted. As of Buffalo Mouth’s Path, he creates a table with , and But he also creates immediately a table ( model), similar to approach. is a table with ( ), ( ), and ( ). users id , email password role . users_properties Entity-Attribute-Value Wordpress users_properties user_id foreign key on users table key varchar, example: name, company, phone, etc. value text Day 2, 3, 4, 5 John is relaxed, he does not have to alter its table schemas. Its program just adds new users in database with their new properties. He created an admin panel to quickly add new properties. Post mortem — Mary Foobar Analysis There is one benefit over the previous design. It’s simple to explain and understand, each property is located in one unique table. EAV is a well known pattern, even if John had not heard about it, he would have invented it himself. There are only two tables for describing users metadata, and there will not be more. But there are also caveats: Simple queries become versus a classic table with one column by property. becomes: complex and unreadable SELECT name FROM users WHERE age=50 AND gender='f' SELECTname.meta_value AS name,FROMusers_properties age,users_properties name,users_properties genderWHERE age.key = ’age’AND age.value = ’50'AND age.user_id = name.user_idAND gender.user_id = age.user_idAND gender.key = ’gender’ and gender.value = ’f’ The table is not readable with human eye. SQL is about rows and columns, the model is destroyed by EAV. Unreadable random table found on Google Images Duplicate data is harder to detect, no data types, no data hierarchy, no consistency, etc. 3rd Scenario — Rainbow Trout’s path Day 1 As many else, John took an interest years ago in “NoSQL”. He tried and abandoned MongoDB to return to its first love, PostgreSQL. He remembers feeling schema-less data can have benefits in some cases, especially in variable . So this time, John creates a table with , , , ( ) and . Its metadata column is schema-less, he could store objects like: metadata users id , email password role metadata JSONB created_at updated_at {"phone": "+33612345678", "company": "My company", "gender": "f"} Day 2, 3, 4, 5 Table schema does not change. Post mortem — Mary Foobar Analysis There are some benefits with this design over previous paths: Only one table for describing one entity: users. Queries are easy to write: select * from users where metadata->'age'=50 Data is easy to read in one row per user (to be fair, the JSON part is a bit harder to read if it grows). EAV is avoided for user defined fields. There are some warnings too: John should remember to add a on metadata. GIN index John should not add relation in metadata. With a JSONB column, there is a great temptation to add everything in that column. If John adds a new entity, he has to create a new table, not add a sub-property like: { "bookings": [{ "id": "aaa-bb-cc", "date": "2017–01–12", "duration": "10 days"},{ "id": "xxx-yy-zz", "date": "2017–02–10", "duration": "1 day"},]} Final Thoughts In my tinkerer career, I created table following these three paths (not precisely, but similar) and some other: we could think about dynamically adding column and tables, changing for another DBMS, etc. For the last few months, I’ve been following the third path. I’m sure there is no « right » way, and this path has many hidden caveats too (I should read more) but I’m OK with it for my day-to-day work. I migrated a legacy system with 60+ half-useless tables to about 10 tables by removing metadata tables. Not sure it’s better anyway, and I’m not saying less table is better, but in this specific case, I think code and database are easier to read, create, update and delete. I know I will discover a new path in a few month and be ashamed of what I have done. users The Rainbow Trout’s path is OK for users table, I don’t think it’s a good idea to add JSONB everywhere on every table every time. It may help to store . metadata only Not relations. I did not speak about indexes, that’s not the matter, but please use index everywhere. I only talked about PostgreSQL, maybe SQL is not the thing to build a users database. I’m still astonish I did not find an obvious best way to build a users table after all these years. Did I miss something? Feel free to comment with advices, feedback and criticism. I would be really happy to learn more. Sorry for long post, here is a potato