By now, many are aware of the staggering increases in electronically stored information (ESI). Pundits and consultants often use colorful analogies to emphasize these remarkable numbers. Obscure terms like exabytes and yottabytes are commonly summoned. There’s a good reason for all this attention on the surging growth of digital information: Poorly managed ESI poses very serious business and legal risks to an organization.
The explosion of ESI and the rigors of e-discovery have spawned many tools that promise to help organizations conquer the chaos of “too much information.” One evolving technology, called predictive coding, has proved useful for e-discovery and is gaining traction as a tool for helping to manage information throughout its life cycle.
The process of predictive coding is not new, but the technologies around it have been evolving rapidly to better address e-discovery, where the hours and dollars required to manually review thousands of documents in potentially thousands of locations can overwhelm some organizations. Using algorithms, predictive coding helps an organization get a better idea of what its data contains, thereby, signaling its relevance to a particular e-discovery action.
Doug Smith, business manager for Wiley Rein LLP in Ashburn, Virginia, says predictive coding offers an alternative to the manual, subjective process of coding and quality review, which is laden with inefficiencies and inaccuracies. The predictive coding processes operate either through sampling or observing, both of which use human decisions as the calibrating mechanism.
Sampling is done by computer software that randomly selects a subset of electronic records and presents it to a human coder for review. It monitors the coder’s decisions, notes the characteristics of the records that are coded–such as date, recipients and keywords–and then uses these recorded decisions to predict the value of the remaining documents.
In the observing process, the coding system monitors the decisions of human coders as they review records. It then predicts how a record will be coded before presenting it for coding. Next, it compares the predicted coding to the actual coding. Eventually, the system’s predictive coding reaches the accuracy level that’s deemed acceptable.
Leigh Isaacs, director of records and information governance for Orrick, Herrington & Sutcliffe LLP in Washington, describes predictive coding as “an evolving technology that combines people, technology and workflows to find key documents and identify and review large data sets.” It’s a machine-learning technology that teaches the computer program to predict how to classify documents, based on human guidance. Isaacs further explains, “The computer program then applies what it has learned to the universe of information.”
In it, Isaacs sees many useful applications for information management business processes, even when they are not specifically focused on litigation. For example, pairing subject matter experts with the predictive coding technologies increases the accurate identification of information, thus, providing a solid foundation for defensible disposition and preventing the retention of content for too long. The technologies can also help a company analyze its data to identify valuable intellectual property, which could be an important part of research and development efforts, or help to substantiate patent and trademark claims. Predictive coding could also be used to locate vital records and contracts that may have been misfiled, identify sensitive information for the purposes of protection and compliance and much more. Although presently, the emphasis has been on the use of predictive coding for litigation and even regulatory proceedings, it is easy to see how it could be applied to other situations.
Regardless of whether an organization intends to use predictive coding for litigation or for other business purposes, it will be most effective when it is used as part of an organization’s information governance program and discipline. If an organization lacks proper information governance, then its ESI will not be in a legally defensible condition and the organization will be missing other business productivity benefits. Smith goes on to say that predictive coding can help remediate the problem by creating a classification schema that identifies and categorizes the information that’s housed in unstructured or less formal systems.