Bates Number vs Hash Value

There is a fascinating ongoing thread currently being played out on the LitSupport Listserv concerning the ability to create files of different contents and size with the same hash values when using the MD-5 hash.  Now I won’t get into the details other than to say the original discussion centered around the ability to hack digital security certificates with various forensics experts opining that the abilty to break an MD-5 hash “on demand” is imminent. (see for example  where the authors assert that “… sufficient computing power might allow to manipulate files in such a way (e.g. append data) that their MD5 values will match that of known irrelevant files so that they go unnoticed in a forensic examination.”  )

It seems to me, tho, that once again we have a disconnect between the lawyers and the techies.  Aren’t we really talking about using hash values as a substitute for Bates numbers? And isn’t the point of that exercise to identify the source of a document duing a production?  After all, in the legal world, the ultimate authentication of a document comes from a person, not a program.

Sam Gilcrist is a legal technologist with technical training and a background working in the lit support field. He made exactly this point in his post on the thread earlier today when he said ” Documents are authenticated, not by a hash value, but by the litigants themselves. In other words, during review and when the document is entered into evidence the document is validated by the witness as either being the email / contract they signed or not.”

Exactly.  As Sam also points out, we use  hash values largely to de-dup document sets because of cost restraints and “… unless we collect a complete bit-stream image of the entire computer/network and all backup media for the relevant period of time, hashing is just a waste of time. ”  So let’s stop arguing about who has the most technical expertise to play “hide the hash” and concentrate on what we need to worry about: finding and reviewing relevant documents.

Working with enormous amounts of data is still expensive, even we use hash values instead of Bates numbers. So what’s the answer? Well of course here at the docNative Paradigm we think it’s native file review instead of enormous processing costs. And so did poster Julie Wade when she said (also earlier today) “Just stop processing and go native.”

I couldn’t agree more. I’ll report more on this thread as the week progresses but in the meanwhile a Happy New Year to one and all.


No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: