Few newer technologies are viable for speeding the document review process and im- proving the ability to be responsive to court-mandated requests. Here we introduce pre- dictive coding and technology-assisted review (also known as computer-assisted review), the most signifi cant of new technology developments that can assist in e-discovery.
For most organizations, e-mail is the most common information source to begin deleting according to established retention policies.
Destructive retention of e-mail is a method whereby e-mail messages are re- tained for a limited period and then destroyed.
Predictive Coding
During the early case assessment (ECA) phase of e-discovery, t predictive coding is ag
“court-endorsed process” 18 utilized to perform document review. It uses human exper- tise and IT to facilitate analysis and sorting of documents. Predictive coding software leverages human analysis when experts review a subset of documents to “teach” the software what to look for, so it can apply this logic to the full set of documents, 19 mak- ing the sorting and culling process faster and more accurate than solely using human review or automated review.
Predictive coding uses a blend of several technologies that work in concert:20 soft- ware that performs machine learning (a type of g artifi cial intelligence software that
“learns” and improves its accuracy, fostered by guidance from human input and pro- gressive ingestion of data sets—in this case documents); 21workfl ow software, which w routes the documents through a series of work steps to be processed; and text analyt- ics software, used to perform functions such as searching for keywords (e.g., “asbestos”
in a case involving asbestos exposure). Then using keyword search capabilities, or con- cepts using s pattern search or meaning-based search, and sifting through and sorting documents into basic groups using fi ltering technologies, based on document content,g and sampling a portion of documents to fi nd patterns and to review the accuracy of g fi ltering and keyword search functions.
The goal of using predictive coding technology is to reduce the total group of documents a legal team needs to review manually (viewing and analyzing them one by one) by fi nding that gross set of documents that is most likely to be relevant or responsive (in legalese) to the case at hand. It does this by automating, speeding up, and improving the accuracy of the document review process to locate and “digitally categorize” documents that are responsive to a discovery request. 22 Predictive coding, when deployed properly, also reduces billable attorney and paralegal time and there- fore the costs of ECA. Faster and more accurate completion of ECA can provide valu- able time for legal teams to develop insights and strategies, improving their odds for success. Skeptics claim that the technology is not yet mature enough to render more accurate results than human review.
The fi rst state court ruling allowing the use of predictive coding technology in- stead of human review to cull through approximately 2 million documents to “execute a fi rst-pass review” was made in April 2012 by a Virginia state judge.23 This was the fi rst time a judge was asked to grant permission without the two opposing sides fi rst coming to an agreement. The case, Global Aerospace, Inc., et al. v. Landow Aviation, LP, et al., stemmed from an accident at Dulles Jet Center.
In an exhaustive 156-page memorandum, which included dozens of pages of legal analysis, the defendants made their case for the reliability, cost- effectiveness, and legal merits of predictive coding. At the core of the memo
Predictive coding software leverages human analysis when experts review a subset of documents to “teach” the software what to look for, so it can apply this logic to the full set of documents.
was the argument that predictive coding “is capable of locating upwards of seventy-fi ve percent of the potentially relevant documents and can be effec- tively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching.”24
This was the fi rst big legal win for predictive coding use in e-discovery.
Basic Components of Predictive Coding
Here is a summary of the main foundational components of predictive coding.
■ Human review. Human review is used to determine which types of document content will be legally responsive based on a case expert’s review of a sampling of documents. These sample documents are fed into the system to provide a seed set of examples. 25
■ Text analytics. This involves the ability to apply “keyword-agnostic” (through a thesaurus capability based on contextual meaning, not just keywords) to locate responsive documents and build create seed document sets.
■ Workfl ow. Software to route e-documents through the processing steps auto- matically to improve statistical reliability and streamlined processing.
■ Machine learning. The software “learns” what it is looking for and improves its capabilities along the way through multiple, iterative passes.
■ Sampling. Sampling is best applied if it is integrated so that testing for accuracy is an ongoing process. This improves statistical reliability and therefore defen- sibility of the process in court.
Predictive Coding Is the Engine; Humans Are the Fuel
Predictive coding sounds wonderful, but it does not replace the expertise of an attorney;
it merely helps leverage that knowledge and speed the review process. It “takes all the documents related to an issue, ranks and tags them so that a human reviewer can look over the documents to confi rm relevance.” So it cannot work without human input to let the software know what documents to keep and which ones to discard, but it is an emerging technology tool that will play an increasingly important role in e-discovery.26 Technology-Assisted Review
TAR, also known as computer-assisted review, is not predictive coding. TAR includest aspects of the nonlinear review process, such as culling, clustering and de-duplication, but it does not meet the requirements for comprehensive predictive coding.
Many technologies can help in making incremental reductions in e-discovery costs. Only fully integrated predictive coding, however, can completely transform the economics of e-discovery .
Mechanisms of Technology-Assisted Review
There are three main mechanisms, or methods, for using technology to make legal review faster, less costly, and generally smarter. 27
1. Rules driven. “I know what I am looking for and how to profi le it.” In this sce- nario, a case team creates a set of criteria, or rules, for document review and
builds what is essentially a coding manual. The rules are fed into the tool for execution on the document set. For example, one rule might be to “redact for privilege any time XYZ term appears and add the term ‘redacted’ where the data was removed.” This rule-driven approach requires iteration to truly be effective. The case team will likely have rules changes and improvements as the case goes on and more is learned about strategy and merit. This approach assumes that the case team knows the document set well and can apply very specifi c rules to the corpus in a reasonable fashion.
2. Facet driven. “I let the system show me the profi le groups fi rst.” In this sce- nario, a tool analyzes documents for potential items of interest or groups potentially similar items together so that reviewers can begin applying decisions. Reviewers typically utilize visual analytics that guide them through the process and take them to prioritized documents. This mechanism can also be called present and direct.
3. Propagation based. “I start making decisions and the system looks for similar- related items.” This type of TAR is about passing along, or propagating, what is known based on a sample set of documents to the rest of the documents in a corpus. In the market, this is often referred to as predictive coding because the system predicts whether documents will be responsive or privileged based on how other documents were coded by the review team. Propagation-based TAR comes in different fl avors, but all involve an element of machine learning.
In some scenarios, a review team will have access to a seed set of documents that the team codes and then feeds into the system. The system then mimics the action of the review team as it codes the remainder of the corpus. In other scenarios, there is not a seed set; rather, the systems give reviewers random documents for coding and then create a model for relevance and nonrelevance.
It is important to note that propagation-based TAR goes beyond simple mimicry; it is about creating a linguistic mathematical model for what relevance looks like.
These TAR mechanisms are not mutually exclusive. In fact, combining the mecha- nisms can help overcome the limitations of individual approaches. For example, if a doc- ument corpus is not rich (e.g., does not have a high enough percentage of relevant documents), it can be hard to create a seed set that will be a good training set for the propagation-based system.
However, it is possible to use facet-based TAR—for example, concept searching—to more quickly fi nd the documents that are relevant so as to create a model for relevance that the propagation-based system can leverage. 28
It is important to be aware that these approaches require more than just technology.
It is critical to have the right people in place to support the technology and the work- fl ow required to conduct TAR. Organizations looking to exercise these mechanisms of TAR will need:
■ Experts in the right tools and information retrieval. Software is an important part of TAR. The team executing TAR will need someone that can program the tool set with the rules necessary for the system to intelligently mark documents.
Furthermore, information retrieval is a science unto itself, blending linguistics, statistics, and computer science. Anyone practicing TAR will need the right team of experts to ensure a defensible and measurable process.
■ Legal review team . While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the deci- sions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set.
■ Auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on the statistics behind how certain the organization can be that the out- put of the TAR system matches the input specifi cation. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing mea- surements should be trained in statistics.
For an organization to use a propagated approach, in addition to people it may need a “seed” set of known documents. Some systems use random samples to create seed sets while others enable users to supply small sets from the early case investigations.
These documents are reviewed by the legal review team and marked as relevant, privi- leged, and the like. Then, the solution can learn from the seed set and apply what it learns to a larger collection of documents. Often this seed set is not available, or the seed set does not have enough positive data to be statistically useful.
Professionals using TAR state that the practice has value, but it requires a sophisticated team of users (with expertise in information retrieval, statistics, and law) who understand the potential limitations and danger of false confi dence that can arise from improper use. For example, using a propagation-based approach with a seed set of documents can have issues when less than 10 percent of the seed set documents are positive for relevance.
In contrast, rules driven and other systems can result in false negative decisions when based on narrow custodian example sets.
However TAR approaches and tools are used, they will only be effective if usage is anchored in a thought out, methodically sound process. This requires a defi nition of what to look for, searching for items that meet that defi nition, measuring results, and then refi ning those results on the basis of the measured results. Such an end-to-end plan will help to decide what methods and tools should be used in a given case. 29