A GUID is a long sequence of characters that are supposed to be globally unique -- that is, the likelihood of a particular sequence of characters appearing (returned by the GUID generating function) more than once on any computer running on our globe is miniscule. Thus, you can use GUIDs when you need unique values that span, say, database instances or multiple networks.
My latest obsession is a product named Qute - the Quick Unit Test Engine. You can find out more about it at www.unit-test.com. We use GUIDs for primary keys; this makes our work substantially easier since the product and underlying data are installed on a user's computer, and we do not have to worry about possible conflicts with sequence-generated primary key values.
The Delphi frontend in Qute also takes those GUIDs and hashes their values to integers, which are then used to index arrays of data and speed up access to browser data. All fine so far.
Unfortunately, a number of users reported an error involving a "Hash Mismatch" on starting up of Qute. After lots of patient cooperation from those users, we were amazed to find that the Oracle SYS_GUID function (the GUID generator offered by Oracle) was returning sequentially-incremented values, rather than pseudo-random combinations of characters.
We then had a bug in our Delphi code such that the hashing algorithm was set to "quick and not very secure" -- meaning that strings which were not very different were hashing to the same number, which then triggered our error. We adjusted that algorithm, making it more robust, and the problem seems to have disappeared. But I thought I should let the world know about our experience.
To see if you have this issue in your installation of Oracle, run the following script:
SET SERVEROUTPUT ON
FOR indx IN 1 .. 5
DBMS_OUTPUT.put_line ( SYS_GUID );
You should see five rows of wildly different values, like this:
But some Qute users on AIX and Compaq 64bit operating systems were instead seeing results like this:
See the difference? The algorithm that Oracle is using on these operating systems is clearly faulty; yes, the values are unique but they are far from even pseudo-random.
The moral of the story is: if you are going to rely on the Oracle SYS_GUID function to generate GUID values, test the algorithm and make sure you are satisfied and can live with the results.
Going Global with GUIDs
Steven Feuerstein, Copyright 2005
Originally published in Oracle Professional, Ragan Communications
What is a GUID?
You've probably heard about and even seen some funky-looking string that is referred to as a "GUID". This article explains what a GUID is, how they can be very useful, and how you can work with them inside Oracle.
Let's start with a formal definition of a GUID, taken from Wikipedia (http://en.wikipedia.org/wiki/GUID):
"A Globally Unique Identifier or GUID is a pseudo-random number used in software applications. Each generated GUID is "mathematically guaranteed" to be unique. This is based on the simple principal that the total number of unique keys (264 or ) is so large that the possibility of the same number being generated twice is virtually zero. The GUID is an implementation by Microsoft of a standard called Universally Unique Identifier or UUID, specified by the Open Software Foundation (OSF)."
Ah! So a GUID is actually the Microsoft version of the UUID, about which Wikipedia tells us:
"A Universally Unique Identifier is an identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that that identifier will never be unintentionally used by anyone for anything else. Information labelled with UUIDs can therefore be later combined into a single database without need to resolve name conflicts. The most widespread use of this standard is in Microsoft's Globally Unique Identifiers (GUIDs) which implement this standard.
"A UUID is essentially a 16-byte number and in its canonical form a UUID may look like this:
So, in point of fact and strictly speaking, a GUID is a special (Microsoft-specific) case of a UUID. Perhaps because "GUID" is easier and more fun to pronounce than "UUID," it is the most common of the two terms, even when used in a non-Microsoft context (as we find in Oracle and as I will be doing in this article). I will continue, therefore, to refer to this number as a GUID.
One comment regarding the "mathematical guarantee" mentioned above: as far as I understand, no one has come up with an algorithm that really does quarantee a unique GUID. It is possible, though very, very improbable, that you could generate a GUID that conflicts with an existing value.
GUIDs are GOOD for...
The core value of a GUID is expressed in a single line of the UUID definition above, namely: "The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination." This statement should get Oracle database programmers thinking about two things:
- Distributed databases: they are certainly a kind of "distributed system." So GUIDs should help us identify unique objects not just in a single instance but across different database instances.
- Primary keys and unique indexes: we use these constraints to "uniquely identify" rows of information within a given relational table. Perhaps a GUID could be used for these situations?
If I am writing code for an application that resides in a single database instance, then GUIDs are probably not very compelling. Suppose I face a different scenario: my application runs on many different instances. Each instance has its own users inserting their own data. On a regular basis, however, data from one instance must be merged into another instance.
In this scenario, reliance on sequence-generated primary keys gives me serious heartburn. Since users are inserting rows independently in multiple instances, sequence values are being used up at different rates. When I move data from one instance to another, I have to assume and program for primary key conflicts of various kinds.
This is exactly the situation I encountered as I built Qnxo (www.qnxo.com). This product provides access to a large and ever growing/changing set of PL/SQL scripts (reusable code and generation templates). Any changes to my central PL/SQL repository must be distributed to users. In my first phase of design and construction of Qnxo, I relied on sequence-generated primary keys. As soon as I realized the error of my ways, I converted as much as possible over to GUIDs.
Let's take a closer look at some of the lessons I learned about applying GUIDs in the PL/SQL environment.
Oracle and GUIDs
To help us work with GUIDs, Oracle provides a SQL function named SYS_GUID, documented as follows:
"SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes. On most platforms the generated identifier consists of a host identifier, a process or thread identifier of the process or thread invoking the function, and a non-repeating value (sequence of bytes) for that process or thread."
The difference between a RAW and a VARCHAR2 is that RAW string is not explicitly converted to different character sets when moved between systems by Oracle. This point is critical for GUIDs, since we wouldn't want our GUIDs to be changed as they move from one system to another. Conversely, it is also a non-issue for GUIDS, because the characters used in GUIDs are drawn from a restricted subset of ASCII characters, which are static across all character sets.
This function is also exposed in PL/SQL through the STANDARD package as follows:
I can therefore immediately obtain a GUID from within my PL/SQL program by writing code like this:
And here is the output from running this code:
Formatting for GUIDs
As Wikipedia specified earlier in this article, ""a UUID is essentially a 16-byte number and in its canonical form a UUID may look like this: 550E8400-E29B-11D4-A716-446655440000."
In addition, in many situations, the common presentation format for a GUID is:
So as you decide to incorporate GUIDs in your code and underlying database, you should decide how you want to store the GUIDs. You could stick with the unvarnished RAW values, or you could store a formatted version, if that is how the GUIDs are going to be used in your application.
We decided that in the Qnxo, we decided that the GUIDs would be stored with the full formatting, including dashes and the open and close brackets.
Using GUIDs in Oracle tables
Let's look at how we can use GUIDs inside a relational table.
Below I create a table using a GUID as the primary key for a table.
When I insert a row into this table, I can use the SYS_GUID function to generate my unique GUID. I try to do it this way:
but I get the following error:
Now I can select the data from this table:
That's certainly easy enough! Of course, since we want to make certain that a valid GUID is placed is in this column, it would be best to encode that logic into a BEFORE INSERT trigger on this table, as in:
And I can now insert into this table as simply as this:
In the case of Qnxo, since we decided to store the formatted version of the GUID, we created tables with primary keys defined as follows:
Encapsulating Oracle's GUID function
If you decide that you want to work with Oracle GUIDs in the RAW format and never need to deal with the formatted version, then you are pretty much ready to go. If, on the other hand, you want to able to work with GUIDs in a "standard" format, you may want to take advantage of the guid_pkg I wrote to assist in Qnxo development.
The elements in the package are:
A SUBTYPE that encapsulates the RAW(16) declaration of a GUID and gives it a name. With this subtype in place, you can declare GUIDs in your code as follows:
Another SUBTYPE, this time designed to allow you to easily and readably declare variables that hold a formatted GUID, used as follows:
A mask that specifies the valid format for a formatted GUID. It also serves as a wildcarded string that I can use within a LIKE statement to determine if a string matches the standard GUID format.
A Boolean function that returns TRUE if the string passed to it has a format consistent with the standard GUID format.
Two overloadings of functions that return a formatted GUID. In the first overloading, you pass a string, which could be a RAW GUID value or a VARCHAR2 string. It will return a string containing the GUID in the standard format (as specified by c_mask).
Here is a script that exercises this package:
With this output after execution:
The implementation of this package is shown in Listing 2. It relies on the SYS_GUID function to generate a new GUID value, but otherwise is devoted to applying the format mask and converting un-formatted GUIDs to strings that contain the correctly-placed hyphens.
Listing 2. The body of the guid_pkg package
GUIDs: a nice addition to the toolbox
I'd heard of GUIDs, but never had reason to work with them before developing Qnxo. I started with standard, sequence-generated primary keys in my script table, but found that this approach was sorely lacking when it came time to distribute that content (reusable scripts and templates) to my user base.
By relying on GUIDs, I was able to greatly simplify the logic needed to upgrade distributed content from a central source. Not only did I avoid the problem of conflicting sequence-generated primary keys, but I was also able to keep intact foreign key references, since they were based on unchanging GUIDs.