EDRR Wraps Up

The 12th Annual Electronic Records and Retention Conference in San Francisco finished yesterday and was once again a great experience. This smallish conference is put on by Thomson LegalWorks every fall in New York, Chicagoand San Francisco, with Houston added to the list this year.  Co-chaired by Browning Marean of DLA Piper, Laura Kibbe of TLC and George Socha, the educational sessions feature a high concentration of practicing litigators and judges who bring a combination of expertise, experience and enthusiasm that results in an enormous amount of relevant and comprehensive course material.   

As I mentioned in my post yesterday, two areas of discussion stood out for me.  The first is the issue of ethics and competent representation revolving around techical expertise  that Ralph Losey  and Ken Withers  have been championing.  Although I agree with both these great minds on their analysis, I personally think the core problem goes a bit deeper to an overall understanding of the discovery process and the basics of civil procedure.  That issue has been discussed in depth by Magistrate Judge Grimm in the Mancia decision but I would also point you to an article written by Judge James Rosenbaum in the July 2007 issue of Federal Lawyer entitled “The Death of E-Discovery”.

Judge Rosenbaum (who was a speaker at the EDRR Conference in San Francisco) writes  that discovery is discovery and e-discovery is nothing more that “..old wine in electronic bottles”.  Comparing the growth of the ED market to the voracious plant Audrey in The Little Shop of Horrors, Judge Rosenbaum posits that all good discovery still relies on “..the sensible winnowing process that good lawyers have performed for years.”   Bravo.  As I’ve always said … computers are fast and people are slow but computers are dumb, only people are smart.  Good computers don’t replace good lawyering.

Which leads us to the second large issue: concept searching. Once again, Judge Grimm has lead the charge up the electronic hillside with his decision in Victor Stanley which derides the use of simple search technology and puts forth the need for concept searching.  Two problems: one, nobody know what concept searching is and two, lawyers don’t trust it.

An excellent panel on this topic at the conference featured two of the people I think are most knowledgeable in this field,  Atty William Kellerman, the ED Manager at Wilson Sonsini and Gene Eames, Senior Data Analytics Consultant at Spi.  (The third member of that list is Dr. Herb Roitblat of OrcaTec who unfortunately wasn’t at this conference) The main takeaways here?  Gene recommending using analytic tools at the front end of the process to help you understand the data you have in order to do more detailed searches as you move on and Bil stating categorically that more lawyers don’t use analytic tools because they don’t trust the technology.

So there you have it … great analysis by great panelists of where we heading for 2009.  And showing more than ever the need for ongoing education about the dramatic paradigm shifts in the discovery process. 

That’s the way I see it …. now tell me what you think.

Advertisements

1 comment so far

  1. Herbert Roitblat on

    Thanks, Tom, for your kind words. I feel bad, that after almost 10 years of talking about concept search it is still true that people do not generally understand what it is all about. Let me take another shot at it.
    We have launched a concept-based Web search engine, called Truevert (www.truevert.com), that may help to illustrate what concept search is all about. This one is a green search engine, its concepts are all about sustainability and environmental awareness. When you search for CFL (http://www.truevert.com/search?query=cfl), for example, it returns pages about compact fluorescent light bulbs, not the Canadian Football league. It understands the words in the context of the documents it was trained on, which in this case, is a set of green documents.
    There are basically two kinds of technologies that provide concept information. Clustering technologies group together documents that are similar and those clusters can be said to represent concepts. These clusters do not require any kind of search, so I will not comment on them further here.
    Concept SEARCH, on the other hand, can be supported by a number of different methods. Concept information can come from an ontology, thesaurus, or taxonomy. A knowledge engineer, often with support from sophisticated experts, identifies relationships among words. For example, the ontology may contain the relationship that a car is a vehicle.
    Other approaches are based on some kind of statistical or machine learning, including latent semantic indexing/analysis, neural networks, language modeling, etc. These systems learn the pattern of word usage in a document collection. For example, if the word “lawyer” is in a document, then words like “judge,” “case,” “matter” and “Esq.” are also likely to be in the document. Conversely, if words like “judge,” “case,” “matter” or “Esq.” are in a document, then it is likely to be about the same topic as documents containing the word “lawyer.”
    Concept search works by expanding the user’s query to include this related information. Whether the relations come from someone writing them in explicitly or are extracted from the documents, the user’s query is modified by adding these related terms to it. Systems differ in the details, but all of them work in about the same way–add terms to the user’s query to better define the search.
    Two things come from this conceptual search. First, the results are ranked. The top-ranked documents are the most focused on the query as understood from the point of view of the document collection or the ontology. The top-ranked documents highlight the most significant use of a term in a collection, they provide better education to the reviewers, and they bring together documents on the same topic. They enable reviewers to better recognize the meaning of terms and their relationships in context.
    Second, concept search helps to overcome the difficulty of guessing which exact words the document authors used. There are often very many ways to express the same idea. Blair and Maron, in their famous study, found that attorneys were only about 20% effective at guessing the right words to search for. Because the expanded query includes related words, the system can often find relevant documents even when they don’t contain the specific query term entered by the user.
    There is nothing mysterious in concept searching. Although it is often likened to a black box, any vendor of such systems should be able to explain exactly what is going on. Our tool, for example, highlights the expanded terms so it is always transparent why a document was retrieved. If a vendor cannot or will not explain how their concept search works, find a different one. Transparency is becoming increasingly important in eDiscovery and vendors should be able to support that need.
    There is also concern about concept vs. keyword searching. Because the concept searches work by expanding the query, they include keyword searching as a subset of the results. With the additional advantage of ranking the results. Our system, for example, allows one to enter complex Boolean expressions, which can include terms that are conceptually expanded. Other systems provide similar capabilities. You don’t have to choose between keyword and concept searches, you get keywords “for free” with the concept search.
    I hope that these brief comments will help to alleviate the problem that “nobody know what concept searching is and two, lawyers don’t trust it.”
    The OrcaTec Information Discovery Toolkit includes near-duplicate clustering, language identification, interesting phrase finding, email threading, concept searching, and (soon) semantic clustering. It is designed to be incorporated in other vendors’ eDiscovery offerings.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: