GUID’s are unique identifiers. When you create one it is guaranteed there will never be another one that is the same. Not once. Nowhere. (At least that’s the theory, in reality GUID’s are simply really – and I mean really, really – large random number that are very extremely unlikely to be repeated). GUID’s are called UUID’s sometimes (for example in Java, GUID is actually the Microsoft name) and in SQL Server they call them “uniqueidentifier”. However all of them mean the same thing: A 16 byte (128 bit) number. For a more detailed definition of GUID’s read this: http://en.wikipedia.org/wiki/Uuid.
They have a big advantage: They don’t need a central coordination to be created. This means wherever I create them they are guaranteed to be unique. This is great if you work with databases. It means you can create unique records without being connected with the database. It means as well, you can merge databases without a problem!
Therefore I use GUID’s (uniqueidentifiers) as primary keys when I design a database.
Funny enough I don’t see many people doing it, and I have almost always to convince colleagues using it. The main arguments against is are:
- They are big and therefore slow
- You can’t read them
Before I start going in to more details I have to mention: The following examples assume a Microsoft environment (SQL Server, .NET Framework).
GUID’s are bigger than your integer primary key (normally 4 times bigger – 16 bytes instead of 4 bytes). The fact that they are bigger usually is not an issue in itself. Most DB’s these days support huge sizes and having a bigger ID won’t make the difference. A lot of people think though that your database will be slower because you have bigger primary keys. That is usually a much bigger issue for people and therefore needs clarification:
We have to separate the issue of speed in two discussions: query statements and insert statements. In Query statements there is almost no difference using GUID’s or int’s. If you think about it, it is logical: You have an index that is sorted and you will do a binary tree search over it. All you do is comparing a few numbers. Comparing a 4 byte or a 16 byte number makes (almost) no difference and will therefore not make your query much slower.
Inserting is a bit more tricky: Inserting a record with a GUID primary key takes considerably longer than its integer counter part! This has a simple reason: If you use an auto incrementing integer primary key you have a nice little side effect: Your record will be inserted at the end of the index (as your primary key was incremented). If you use a GUID (due to the random nature of a GUID) it will be inserted somewhere in the index. Finding this place in the index and moving around the data (resulting in page splits) takes time. If I say it takes time I’m talking about miliseconds. If you insert single records that will make no difference at all. If you plan to insert huge amounts of data on a regular basis however (thousands of records in one go) it might be a performance issue. But there is a solution! Using the NEWSEQUENTIALID SQL Server generates GUID’s that have an incrementing value (http://msdn2.microsoft.com/en-us/library/ms189786.aspx). With this approach you will have a very good performance that is comparable to using integer primary keys.
Whenever I hear someone claiming this and this is faster than that and that I say: We don’t know until we run a test! Luckily someone has done exactly that for me: http://www.sql-server-performance.com/articles/per/guid_performance_p1.aspx.
The second argument is that GUID’s are not readable and not easy to remember. I must say this is a feature, not a bug! I think there is something wrong if you want users to remember primary keys. We are in the 21st century… If you need an Id (lets say an order Id, so they can talk to a sales rep) then generate one separately, but do not use it as your primary key.