July 28th, 2006
Leaving an Audit Trail In Your Database
Part of your role as a database architect is to save users from themselves. It’s inevitable that critical records will get modified or deleted from time to time. If the system has more than one user, the frequency increases exponentially. It’s tempting to take the approach that if users are going to be liberal about destroying sensitive data, that it isn’t your fault (true) and it isn’t your problem (not entirely true).
Of course, any system with critical data is being backed up, but backup systems were primarily designed to restore an entire lost database. Very few packages make it easy to restore isolated pieces of data. In many cases you have to roll the entire database back to a moment in time before the data in question was modified or deleted, losing all the changes that have occurred since. A better solution is to make it difficult for users to permanently remove or overwrite vital information. Build audit trails into your database.
Recreating the Crime Scene
Let’s start with the basics of setting up an audit trail. Sometimes it’s enough just to know who changed the data and when. If users have the power to insert or update records into a table, you’ll be glad that you included the following four columns:
- created_by – name or id number of the user who created this record
- created_date – date (and possibly time) the record was created
- modified_by – name or id number of the user who last modified this record
- modified_date – date (and possibly time) the record was last modified
It may seem like it’s a lot of work to include these columns, especially if you have a large database, but they will save you time in the long run. (This information can also be useful when debugging a system that is still in development.)
Nothing Is Ever Forgotten
The created and modified data won’t help us when records are deleted. The solution, of course, is to not delete records. Wait… what? That’s right you heard me.
The users don’t have to know they aren’t deleting records. Their interface will still have a lovely delete button next to customer John Doe. They will still have a prompt asking if they really want to delete John Doe from the customer table. Yes, they’re sure. John Doe no longer appears in the customer list, and your users are happy.
On the back end, John Doe still exists in the database. The customer table has a column named archived or inactive or any other name that makes sense to you. Instead of deleting the record you set the value of archived equal to true. When you are generating the customer list for the users, you filter out any records that have been “deleted”. As far as the user is concerned, the record has been deleted. When they call you in a panic because they meant to delete Jane Doe instead of John Doe, you can change John’s archived value back to false and you’ve magically recovered the data. (Man, you’re good.) In large tables with a lot of deletions, this approach may not be practical. Your table will simply get too big and the performance of queries involving the table can drag to a slow crawl, but with small and medium sized tables it’s a great solution.
History Is Written By the Victors
When you really need to bring out the big guns, the best solution is to keep a complete history of all changes made to the record. Surprisingly this isn’t as big an undertaking as it sounds. As a bonus, if you keep a complete history, the information in the five columns described above will be stored in a separate table, keeping your main table sleek and easy to work with.
It’s easiest to understand if we work backwards. I’ll start by showing an example of the complete history model, and then explain how it was created. We have two tables: customer and customer history. Customer is the current information about a customer. Customer history has a record for every change that has ever been made to a customer. Below is a few records from each table.
We can tell from the history that on March 26th, Andy added John Doe to the customer table. Andy updated John’s phone number in June. Jim added the middle initial “Q” almost a week later. This kind of history turns the potentially difficult process of figuring out how a record got changed into child’s play, and it’s very easy to create.
When John Doe is first added to the database, you’ll run two insert queries: one into the customer table and one into the history table. Most of the information is the same, but in the history table you include “insert” as the action, who’s creating this record (Andy, in this case), and the date. When John Doe’s information is updated, you run an update query on the customer table and an insert query on the history table. In the history table, you’re inserting John’s latest information, “update” as the action, who’s doing the updating, and the date.
It’s not exactly rocket science, but taking the extra time to set this up, means never having to tell a client you’re sorry, but you don’t know what happened to their data. The third action to track is a delete. When John Doe is deleted from the customer table, you insert a record into the history table with the last version of his information, the action “delete” and so on.
I’ve purposely avoided triggers in this discussion because I wanted to focus on a simple explanation of the concept of history tracking. A history table is an excellent use for them, though, if you are interested in learning more about triggers.
Note that the customer table does not have any created_by or archived columns. The purpose of those columns was to preserve information about the versions of each record, but all of that information (and more) is stored in the history table. In some cases it may even be useful to give your users read only access to the history table.
As a sidebar, you may find yourself in a position where you are attempting to sell your services to someone who is intimidated by technology. This person prefers to keep a paper copy of everything in a file cabinet because “computers can’t be trusted.” You probably can’t change their mind, but demonstrating this kind of complete history tracking is going to make this potential client feel much more comfortable placing his or her data in your hands.