Archive for October, 2009|Monthly archive page

Joint Post With Ron Friedmann On The Best Approach to EDD Search

This is a two-part, joint blog post.  As Ron Friedmann explains “I recently spent some time looking at Xerox’s new CategoriX EDD tool and writing a post about it.  After reading it, I realized it would be helpful to set my discussion in a broader context. So I turned to my friend and e-discovery expert Tom O’Connor and author of the docNative Paradigm Blog . What follows is a combined post; we wrote each section individually and are cross posting this. ”

Xerox CategoriX and Musings on the Best Approach to EDD Search by Ron Friedmann

In early October, Xerox Litigation Services released a new e-discovery search and review tool called CategoriX. How should EDD professionals think about this and other new search technologies?

A Xerox PR firm offered me phone time with the CategoriX product manager, Svetlana Godjevac. Always curious about new litigation document review tools, I accepted. I also read the CategoriX product sheet and a statistics-heavy Categorix white paper explaining how Xerox tested the product.

The CategoriX approach sounds interesting and useful. Xerox R&D in Grenoble developed the product and the company appears offers it beyond the litigation market (see the page Text Categorization and Clustering housed under Xerox Technology and Brand Licensing.) The product combines ‘probabilistic latent semantic analysis’ (document clustering) with iterative machine learning.

It sounds powerful but I can’t evaluate its effectiveness. This is by no means a criticism. Both search approaches have been around for years so it’s hard for me to assess how they work in CategoriX. Learning more about Caterorix confirms what I’ve suggested before: mere mortals can no longer evaluate EDD platforms, at least not by assessing the underlying algorithms.

I lament that I don’t know enough statistics to fully comprehend the white paper but Xerox appears to have tested the product (though the nature of the 2 document sets studied and human reviewer groups is not described). One finding I did focus on is that Xerox used this tool to quantify inter-reviewer variability. Not surprisingly, humans are not all that consistent, a fact that lawyers routinely overlook. In my conversation, Ms. Godjevac reports that Xerox does explain the statistics to lawyers and works with them to understand the problems of human review.

How a litigation team should choose among the available advanced tools is a real quandary. The investment to run a “bake off” among competing choices is enormous; moreover, the outcome may well depend on the nature of the documents. What does this say about defensibility in general? Would it be defensible to use product A if an objective study showed that product B was 20% better? And what exactly does 20% better mean anyway?

Courts seem a long way off from considering this question but the leap from the current standard to one that requires comparing tools seems more a matter of degree than of kind. Are litigation support professionals obliged constantly to evaluate new tools to make sure what they now use is adequate?

Of course, I may be way off base here. Which is why I am surprised and dismayed that I haven’t found much commentary on this tool. Many other bloggers comment on EDD but I did not find much blogging (or Tweeting) about CategoriX. I would like to see more discussion of products, comparisons of them, and the future standard of what courts will rule is defensible.

[I felt this did not stop at quite the right spot so am glad Tom stepped in….]

The Challenges of Evaluating EDD Search Tools by Tom O’Connor

Ron, your comments about the problems facing anyone attempting to evaluate ED applications are right on target. First of course is the fact that one needs an engineering degree to even read some of the white papers in this field. But it seems to me that the problem starts even before that with several fundamental problems.

The first, as you mention, is that there is never enough detail given about the document sets being studied. Understanding the documents is a crucial part of any automated litigation process and evaluating products which don’t sufficiently describe the universe of documents they are working with is simply impossible. This is not a failing of Xerox alone but really all the reviews I have seen. It is nearly impossible to cross compare applications if they are “tested’ on widely divergent data sets.

In addition, some search engines use a standardized thesaurus such as the publicly available WordNet Lexical database, an open source thesaurus from Princeton University. It has over 100,000 English words and associations. As an open source resource, the WordNet database is available for download and examination if needed for litigation validation purposes. If, however, the comparison is between one program using this database and another one that uses an internal or closed database, does that really help us?

Even the widely touted TREC (Text Retrieval Conference) study suffers from this failing in my opinion. The TREC study used a test set of 7 million documents available to the public pursuant to a Master Settlement Agreement between tobacco companies and several state attorneys general. Attorneys assisting in the study drafted five test complaints and 43 sample document requests (referred to as topics). The topic creator and a TREC coordinator then took on the roles of the requesting and responding counsel and negotiated over the form of a Boolean search to be run for each document request.

The problem is those documents were not in native format and did not include attachments. Given that typical collections today consist largely of massive volumes of e-mail, many with attachments (and attachments to attachments), this is , a huge issue when evaluating search capability for email.

A second problem I see concerns what type of search is best. We all agree that computer searching is more accurate than human review. The Sedona Conference Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, released in August 2007, states that “Human review of documents in discovery is expensive, time consuming, and error-prone. There is growing consensus that the application of linguistic and mathematic-based content analysis, embodied in new forms of search and retrieval technologies, tools, techniques and process in support of the review function can effectively reduce litigation cost, time, and error rates.” So the assumption that concept search is better than Boolean searching, although widespread, may be wrong.

In Disability Rights Council of Greater Wash. v. Wash. Metro. Area Transit Auth., 2007 WL 1585452 (D.D.C. June 1, 2007) Federal Judge Facciola stated that “concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results.” Judge Grimm made a similar statement in Victor Stanley, Inc. v. Creative Pipe, Inc (Civil Action No. MJG-06- 2662 (D. Md. May 29, 2008).

 But the TREC study results, however, don’t seem to support these judicial positions. In that study, computer scientists from academia and other institutions attempted to locate responsive documents for a number of topics using 31 different automated search methodologies, including concept searching. The result? Boolean searches located 57 percent of the known relevant documents. None of the alternative search methodologies had better results.

In fact, a Boolean search generally equaled or outperformed any of the individual alternative search methods, but those alternative searches also captured at least some responsive documents that the Boolean search had actually missed. The lesson? Manual review misses many documents but so does keyword searching, Boolean searching and concept searching – but they all miss different documents. The best approach is to use multiple applications to do iterative searches which winnow down to the best possible results.

This isn’t late breaking news. Ron, you started a discussion in May 2008 in Concept Searching in E-Discovery. Some of the info above I gleaned from reports on web sites and reports by people like Herb Roitblatt or Gene Eames who know a whole heck of lot more about this than do I. But the point is, one product isn’t going to do the job, no matter how good the product or convoluted their documentation. And irrespective of the tool, the “operator” better be well trained or who knows what the results will be.

I share Ron’s concern about emerging standards of defensibility. Given the technical complexities and the lack of statistical certainty, I don’t see how a clear, stable defensibility standard will emerge other than what we’ve seen, namely, have a plan, apply some smarts, and document what you do. As we’ve seen in other arenas, developing standards by judicial opinions is a long and messy process. Well, I suppose the upside is that consultants will stay busy!


Kellner Say Cooperation Now The Law in 7th Circuit

The Seventh Circuit Court of Appeals (which  covers Illinois, Indiana, and Wisconsin and all  the federal district and  appellate courts therein, including of course Chicago) has begun an Electronic Discovery Pilot Program. It was, according to the report itself “Developed as a result of (a) continuing comments by business leaders and practicing attorneys, regarding the need for reform of the civil justice pretrial discovery process in the United States,(b) the release of the March 11, 2009 Final Report on the Joint Project of the American College of Trial Lawyers Task Force on Discovery (“Task Force”) and the Institute for the Advancement of the American Legal System at the University of Denver (“IAALS”),1 and (c) The Sedona Conference Cooperation Proclamation.   The stated intent of the Pilot Program is to “take action to reduce the rising burden and cost of discovery in litigation in the United States brought on primarily by the use of electronically stored information (“ESI”) in today’s electronic world.”

The Seventh Circuit Electronic Discovery Committee , which developed the program, is a group of trial judges and lawyers, including in-house counsel, private practitioners, government attorneys, academics, and litigation expert consultants headquartered primarily in the Seventh Circuit. Committee members met for the first time in May 2009 and consulted with Ken Withers, the Managing Director of The Sedona Conference. Over the next four months various E-Discovery Committee members working in sub-committee groups continued to work on developing the Pilot Program and the result was the Seventh Circuit Electronic

Discovery Pilot Program’s Principles Relating to the Discovery of Electronically Stored Information (“Principles”). Those Principles will be implemented and evaluated during Phase One from October 1, 2009 through May 1, 2010.

What does this mean in practical terms? I asked that question of Chuck Kellner, the Vice President of Consulting for eDiscovery at Anacomp’s CaseLogistix and he replied: “The Sedona Principles on Cooperation and more, including references to Early Case Assessment, are about to become the law of the land in the Seventh Circuit”

Chuck went on to explain: “Basically what it says is that attorneys will learn about ESI, will come up to speed on their clients’ electronic discovery matters, will assess their cases, and will cooperate effectively and will negotiate scope of electronic discovery, and they will be monitored while doing it.   Later phases of the program will turn the screws tighter on requirements of cooperation, disclosure, and proportionality.  The meat of this starts on page 9 and the principles themselves on page 13.  The back half is a proposed standing order, which the judges can use in every case filed in every federal court in the Seventh Circuit. “

“I think what it will mean for us is increasing emphasis on reading the data early.  The EDRM as a way to describe various litigation support tasks might still be useful, but the EDRM as a linear workflow is under stress, as we predicted, in favor of piling up more concurrent tasks atop each other, and to “the left”.  Everything that we have been discussing and what we are now doing in ECA about mining metrics, about finding the significant documents quickly, and about estimating the size and cost and time of the review, these are all quickly becoming legal requirements, not just good strategy. “

Echoing statements we have heard all year from federal judges, Chuck also said. “ The tone of the judges there was that they were no longer bearing attorneys’ unpreparedness for meet-and-confer, and were prepared to start a rigorous program of enforcement.  To place this more broadly than the Seventh Circuit, the theme echoes what we’ve seen in case decisions in the last few months.”

So if you haven’t read The Cooperation Proclamation, this is a good time to do so. Especially if you practice in the Seventh Circuit.

Quick Report on ECA from The Masters Conference

Ron Friedmann sat in on a session at The Masters Conference called Early Case Assessment: Looking to the Future – From Early Assessment to Early Awareness. His notes included the following key points:

What is ECA? It’s getting an early look at the facts of your case and at the scope of discovery.

Where are the immediate savings; how is this different than past? It’s not a tool per se, it’s a method (a process, the right people, and technology). ECA does not generate savings very early – it’s not about upfront savings. You have to invest at the outset to learn about the case. Clients typically want to delay spending, so ECA is counter-intuitive to many lawyers.

Other than cost, are there other limits that hold back ECA? Clients are the main barrier. We are moving beyond linear review and search terms to a more subjective approach.

Tools that are better at ‘understanding’ data are in the ECA bucket… how do these emerging technologies affect the process, especially if clients bring the tools in-house? To start, who should operate the tools? It depends on organizational structure. It can be IT, Legal, Info Security, other corporate departments.

Ron’s comments on the session include the observation “What struck me most about this discussion is how much education is required “ and he goes on to comment that  it seems we’ve been talking about these issues for over 20 years.  Remember that in 1990, early tools such as scanning and OCR’ing led to full-text databases which could be searched by both conceptual and Boolean tools.

So why are we still trying to convince people to do iterative searches of data to pare down document populations? Ron notes that “The vendor challenge here is that clients are reluctant in this environment to spend upfront. So you need to educate your client – let them know that ECA will reduce the volume of documents that require human review. “

Fascinating comments as always from Ron and you can read more on this and his other reports from The Masters Conference  at his blog, Strategic Legal Technology.

A Few Recent Happenings in the ED World

Here’s several interesting topics that have developed in the last week: 

  1. The discussion about ED standards that I broached in my post of Sept. 21 continues to percolate. The most recent take on the debate is a posting on EDD Update by Seattle lawyer and technologist Eric Blank who raises some interesting points about data exchange protocols, capture rates and the use of consultants.
  2. You can expect those points, and many others, to be discussed at The Master Conference this week in Washington, D.C.  Chuck Kellner will be there speaking on Information Cooperation, a subject on which Chuck and I also presented a webinar several months ago: the recording of that event is available on the Anacomp web site here.
  3. The EDRM web site has a new look and feel and has added a great deal of content, including a link to the E-Discovery Zone, the series of interviews with luminaries of the e-discovery profession conducted by Browning Marean and myself and hosted by TechLaw.

Project Management Redux

The most recent Socha-Gelbman survey stated that “project management has grown in prominence as a means to minimize missteps and deliver more predictable, reliable, and cost-effective results.” And PM is certainly getting attention these days thanks to a good discussion at The Delaware E-Discovery Report and the results of a survey released by Applied discovery. But the discussions are not all positive. In the Delaware ED Report, Paul Easton said, “implementing project management software does not equal implementing project management” and the AD survey found that although 87% of respondents considered project management a critical component of their ED projects, only 8% were satisfied with the clarity of objectives and plans defined in their last e-discovery project.

So if we all know that PM is good why aren’t we doing it better? Part of the problem, it seems to me, is that we are lacking a specific definition of the phrase. As Brett Burney points out in his article The Emerging Field of Electronic Discovery Project Management, “ Formal project management, however, is a recognized professional discipline, complete with educational requirements (Project Management Professional or PMP) and an oversight body called the Project Management Institute (PMI).” Yet I have never heard of a law firm utilizing those standards in designing a project scope or specifications.

Why? Well as Conrad Jacoby says in his article on LLRX called Applying Project Management Techniques to Litigation Discovery, litigation team leadership “often only has time to focus on the highest priority projects and problems.” This dichotomy really highlights the fact that technology tools are just one component of success in e-discovery. Every case is different and there is no cookie cutter application or device that meets the challenges of each of those cases.

The fact is that advice from a good consultant ( like those mentioned here or well myself) is necessary for assistance with project planning, including data scoping, choice of document review methodologies and project training. Given the rise in number and complexity of ED specific applications, the need for good consulting services is at an all time high in order to select the best choices for the workflow of your particular matter. So take a look at the articles mentioned above, start assessing your projects and get the advice of a good consultant to help you decide how best to proceed.