HASH discussion continued

For those of you who follow the LitSupport listserv you know that this discussion has been continuing ad infinitum. I was having trouble following the often highly technical disucssions but was bothered by the assertions that MD5 hashing could produce “duplicates” so I asked the best experts I know in the forensics field, John Simek of Sensei Enterprises and Atty. Craig Ball.

Now I have to admit that even their responses gave me a headache but here’s a portion of those repsonses which will shed some light on the issue.  John  Simek said:  “…. is it possible to have two different (and actually usable files) contain different contents with the same MD5 hash value. My response is that anything is possible given enough time and money. But is it probable? I say no. ”  Craig Ball went so far as to say that the example mentioned in my previous post “… poses a security issue certainly, but it doesn’t meaningfully impact the viability of MD5 hashes in electronic discovery and computer forensics.”

Why? Well as Craig went on to point out:   “So, yes, it’s possible to create two different files with colliding hash values–I demonstrated that over four years ago when I fashioned and published “apparently intelligible” colliding files building on the work of the Chinese cryptographers which Stefan Fleischman identifies in the article to which you point–but to call these “documents” or leap to the conclusion that we can fashion a colliding, intelligible value for a particular hash value is a big stretch.  …   To create an intelligible “document” (in the sense we speak of a writing or image) and then alter that intelligible document to hash match the value for a known NIST NSRL irrelevant file remains well beyond our reach, and I’m persuaded it will, as a practical matter, remain so for some time.  The people posting on the topic aren’t discussing anything that’s new in the last several years and lack a fundamental understanding of how the vulnerability works and how little impact it has on how we use MD5 in CF and EDD, outside of the public key/private key infrastructure.”

Finally, as a further affirmation of those two opinions, Herb Roitblat of OrcaTec posted a comment on the list serv saying  … “Nevertheless, these are tricks. If there is anything to be learned from this example, it is that displays of electronically stored information may not faithfully reflect the content of the files that they are displaying. The very same code, which is what is hashed, can be used to display wildly disparate information. Look at the native
files.”

Indeed …  look at the native files.  As Craig Ball so succinctly put it:  “The sky is not falling in hashville”.

1 comment so far

  1. Rob Robinson on

    Excellent consolidation of expert commentary – complexity and practicality indeed seem to be able to coexist.

    Do there exist similiar challenges with newer alternative hashing protocols (i.e. SHA-256/512)?


Leave a comment