• Tidak ada hasil yang ditemukan

Defensible Disposal: The Only Real Way To Manage Terabytes T T and Petabytes

Dalam dokumen Information Governance - Wiley CIO (Halaman 150-157)

Legal review team . While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the deci- sions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set.

Auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on the statistics behind how certain the organization can be that the out- put of the TAR system matches the input specifi cation. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing mea- surements should be trained in statistics.

For an organization to use a propagated approach, in addition to people it may need a “seed” set of known documents. Some systems use random samples to create seed sets while others enable users to supply small sets from the early case investigations.

These documents are reviewed by the legal review team and marked as relevant, privi- leged, and the like. Then, the solution can learn from the seed set and apply what it learns to a larger collection of documents. Often this seed set is not available, or the seed set does not have enough positive data to be statistically useful.

Professionals using TAR state that the practice has value, but it requires a sophisticated team of users (with expertise in information retrieval, statistics, and law) who understand the potential limitations and danger of false confi dence that can arise from improper use. For example, using a propagation-based approach with a seed set of documents can have issues when less than 10 percent of the seed set documents are positive for relevance.

In contrast, rules driven and other systems can result in false negative decisions when based on narrow custodian example sets.

However TAR approaches and tools are used, they will only be effective if usage is anchored in a thought out, methodically sound process. This requires a defi nition of what to look for, searching for items that meet that defi nition, measuring results, and then refi ning those results on the basis of the measured results. Such an end-to-end plan will help to decide what methods and tools should be used in a given case. 29

Defensible Disposal: The Only Real Way To Manage Terabytes T T

Growth of Information

According to International Data Corporation (IDC), from now until 2020, the digital universe is expected by expand to more than 14 times its current size. 30 One exabyte is the data equivalent of about 50,000 years of DVD movies running continuously.

With about 1,800 exabytes of new data created in 2011, 2840 exabytes in 2012, and a predicted 6,120 exabytes in 2014, the volumes are truly staggering. While the data footprint grows signifi cantly each year, that says nothing of what has already been cre- ated and stored.

Contrary to what many say (especially hardware salespeople) storage is not cheap.t In fact, it is really becomes quite expensive when you add up not only the hard- ware costs but also maintenance, air conditioning and space overhead, and the highly skilled labor needed to keep it running. Many large companies spend tens if not hun- dreds of millions of dollars per year just to store data. This is money that could go straight to the bottom line if the unneeded data could be discarded. When you con- sider that most organizations’ information footprints are growing at between 20 and 50 percent per year and the cost of storage is declining by a few percentage points per year, in real terms they are spending way more this year than last to simply house information.

Volumes Now Impact Effectiveness

The law of diminishing returns applies to information growth. Assuming information is an asset, at some point when there is so much data, its value starts to decline. That is not because the intrinsic value goes down (although many would argue there is a lot of idle chatter in the various communications technologies). Rather the decline is related to the inability to expeditiously fi nd or have access to needed business information.

According the Council of Information Auto-Classifi cation “Information Explosion”

Survey, there is now so much information that nearly 50 percent of companies need to re-create business records to run their business and protect their legal interests because they cannot fi nd the original retained record.31 It is a poor business practice to spend resources to retain information and then, when it cannot be found, to spend more to reconstitute it.

There is increasing regulatory pressure, enforcement, and public scrutiny on all of an organization’s data storage activities. Record sanctions and fi nes, new regula- tions, and stunning court decisions have converged to mandate heightened controls and accountability from government regulators, industry and standards groups as well as the public. When combined with the volume of data, information privacy, security, protection of trade secrets, and records compliance become complex and critical, high- risk business issues that only executive management can truly fi x. However, executives typical view records and information management (RIM) as a low-importance cost center activity, which means that the real problem does not get solved.

In most companies, there is no clear path to classify electronic records, to for- mally manage offi cial records, or to ensure the ultimate destruction of these records.

Vast stores of legacy data are unclassifi ed, and most data is never touched again shortly after creation. Further, traditional records retention rules are too voluminous, too complex, and too granular and do not work well with the technology needed to manage records.

Finally, it is clear that employees can no longer be expected to pull the oars to cut through the information ocean, let alone boil it down into meaningful chunks of good information. Increasingly, technology has to play a more central role in manag- ing information. Better use of technology will create business value by reducing risk, driving improvements in productivity, and facilitating the exploitation and protection of ungoverned corporate knowledge.

How Did This Happen?

Over the past several years, organizations have come to realize that the exposure posed by uncontrolled data growth requires emergency, reactive action, as seemingly no oth- er viable approach exists. Faced with massive amounts of unknown unstructured data, many organizations have chosen to adopt a risk-averse save-everything policy. This approach has brought with it immediate repercussions:

Inability to quickly locate needed business content buried in ill-managed fi le systems.

Sharply increased storage costs, with some companies refusing to allocate any more storage to the business. The users’ reaction, out of necessity, is to store data wherever they can fi nd a place for it. (Do not buy the argument that stor-t age is cheap—everyone is spending more on storing unnecessary data, even if the per-gigabyte media cost has gone down).

Soaring litigation and discovery costs, as organizations have lost track of what is where, who owns it, and how to collect, sort, and process it.

Buried intellectual property, trade secrets, personally identifi able information, and regulated content, which are subject to leakage and unauthorized deletion, and are a clear target for opposing counsel—or anyone who can access them.

Lack of centralized policies and systems for the storage of records, which re- sults in hard-to-manage record sites spread throughout the organization.

The lack of a clear strategy for managing records that have long-term, rather than short-term, business, legal, and research value.

Information Glut in Organizations

71 percent of organizations surveyed have no idea of the content in their stored data.

58 percent of organizations are keeping information indefi nitely.

79 percent of organizations say too much time and effort is spent manually searching and disposing information.

58 percent of organizations still rely on employees to decide how to apply cor- porate policies. 32

What Is Defensible Disposition, and How Will It Help?

A solution to the unmitigated data sprawl is to defensibly dispose of the business con- tent that no longer has business or legal value to the organization. In the old days of records management, it was clear that courts and regulators alike understood that records came into being and eventually were destroyed in the ordinary course of business. It is good business practice to destroy unneeded content, provided that the

rules on which those decisions are made consider legal requirements and business needs. Today, however, the good business practice of cleaning house of old records has somehow become taboo for some businesses. Now it needs to start again.

An understanding of how technology can help defensibly dispose and how meth- odology and process help an organization achieve a thinner information footprint is critical for all companies overrun with outdated records that do not know where to start to address the issue. While no single approach is right for every organization, re- cords and legal teams need to take an informed approach, looking at corporate culture, risk tolerance, and litigation profi le.

A defensible disposition framework is an ecosystem of technology, policies, proce- dures, and management controls designed to ensure that records are created, managed, and disposed at the end of their life cycle.

New Technologies—New Information Custodians

Responsibility for records management and IG have changed dramatically over time.

In the past, the responsibility rested primarily with the records manager. However, the nature of electronic information is such that its governance today requires the partici- pation of IT, which frequently has custody, control, or access to such data, along with guidance from the legal department. As a result, IT personnel with no real connection or ownership of the data may be responsible for the accuracy and completeness of the business-critical information being managed. See the problem?

For many organizations, advances in technology mixed with an explosive growth of data forced a reevaluation of core records management processes. Many organi- zations have deployed archiving, litigation, and e-discovery point solutions with the intent of providing record retention compliance and responsiveness to litigation. Such systems may be tactically useful but fail to strategically address the heart of the matter:

too much information, poorly managed over years and years—if not decades.

A better approach is for organizations to move away from a reactive keep- everything strategy to a proactive strategy that allows the reasonable and reliable identifi cation and deletion of records when retention requirements are reached, absent a preservation obligation. Companies develop retention schedules and processes pre- cisely for this reason; it is not misguided to apply them.

Why Users Cannot, Will Not—and Should Not—Make the Hard Choices Employees usually are not suffi ciently trained on records management principles and methods and have little incentive (or downside) to properly manage or dispose of records.

Further, many companies today see that requiring users to properly declare or man- age records places an undue burden on them. The employees not only do not provide a

A defensible disposition framework is an ecosystem of technology, policies, procedures, and management controls designed to ensure that records are created, managed, and disposed at the end of their life cycle.

reasonable solution to the huge data pile (which for some companies may be petabytes of data) but contribute to its growth by using more unsanctioned technologies and parking company information in unsanctioned locations. So the digital landfi ll continues to grow.

Most organizations have programs that address paper records, but these same organizations commonly fail to develop similar programs for electronic records and other digital content.

Technology Is Essential to Manage Digital Records Properly Having it all—but not being able to fi nd it—is like not having it at all. t

While the content of a paper document is obvious, viewing the content of an electron- ic document depends on software and hardware. Further, the content of electronic storage media cannot be easily accessed without some clue as to its structure and format. Conse- quently, the proper indexing of digital content is fundamental to its utility. Without an index, retrieving electronic content is expensive and time consuming, if it can be retrieved at all.

Search tools have become more robust, but they do not provide a panacea for fi nding electronic records when needed because there is too much information spread out across way too many information parking lots. Without taxonomies and common business terminology, accessing the one needed business record may be akin to fi nding the needle in a stadium-size haystack.

Technological advances can help solve the challenges corporations face and ad- dress the issues and burdens for legal, compliance, and information governance. When faced with hundreds of terabytes to petabytes of information, no amount of user inter- vention will begin to make sense of the information tsunami.

Auto-Classifi cation and Analytics Technologies

Increasingly companies are turning to new analytics and classifi cation technologies that can analyze information faster, better, and cheaper. These technologies should be considered essential for helping with defensible disposition, but do not make the mistake of underestimating their expense or complexity.

As discussed in the previous section by Barry Murphy, machine learning tech- nologies mean that software can “learn” and improve at the tasks of clustering fi les and assigning information (e.g., records, documents) to different preselected topical categories based on a statistical analysis of the data characteristics. In essence, classifi cation technology evaluates a set of data with known classifi cation mappings and attempts to map newly encountered data within the existing classifi cations. This type of technology should be on the list of considerations when approaching defen- sible disposition in large, uncontrolled data environments.

Can Technology Classify Information?

What is clear is that IT is better and faster than people in classifying information. Period.

A better approach is for organizations to move away from a reactive keep- everything strategy to a proactive strategy of defensible deletion.

Increasingly studies and court decisions make clear that, when appropriate, com- panies should not fear using enabling technologies to help manage information.

For example, in the recent Da Silva Moore v. Publicis Groupe case, Judge Andrew Peck stated:

Computer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases. While this Court recognizes that computer-assisted review is not perfect, the Federal Rules of Civil Procedure do not require perfection. . . . Counsel no longer have to worry about being the “fi rst” or “guinea pig” for judicial acceptance of computer assisted review.

This work presents evidence supporting the contrary position: that a technology-assisted process, in which only a small fraction of the document collection is ever examined by humans, can yield higher recall and/or preci- sion than an exhaustive manual review process, in which the entire document collection is examined and coded by humans. 33

Moving Ahead by Cleaning Up the Past

Organizations can improve disposition and IG programs with a systemized, repeatable, and defensible approach that enables them to retain and dispose of all data types in compliance with the business and statutory rules governing the business’s operations.

Generally, an organization is under no legal obligation to retain every piece of in- formation it generates in the course of its business. Its records management process is there to clean up the information junk in a consistent, reasonable way. That said, what should companies do if they have not been following disposal rules, so information has piled up and continues unabated? They need to clean up old data. But how?

Manual intervention (by employees) will likely not work, due to the sheer volumes of data involved. Executives will not and should not have employees abdicate their regular jobs in favor of classifying and disposing of hundreds of millions of old stored fi les. (Many companies have billions of old fi les.) This buildup necessitates leveraging tech- nology, specifi cally, technologies that can discern the meaning of stored unstructured content, in a variety of formats, regardless of where it is stored.

Here is a starting point: Most likely, fi le shares, legacy e-mail systems, and other large repositories will prove the most target-rich environments, while better-managed document management, records management, or archival systems will be in less need of remediation. A good time to undertake a cleanup exercise is when litigation will not prevent action or when migrating to a new IT platform. (Trying to conduct a compre- hensive, document-level inventory and disposition is neither reasonable nor practical.

In most cases, it will create limited results and even further frustration.)

Technology choices should be able to withstand legal challenges in court.

Sophisticated technologies available today should also look beyond mere keyword searches (as their defensibility may be called into question) and should look to

Organizations can improve disposition and IG programs with a systemized, repeatable, and defensible approach.

advanced techniques such as automatic text classifi cation (auto-classifi cation), concept search, contextual analysis, and automated clustering. While technology is imperfect, it is better than what employees can do and will never be able to accomplish—to man- age terabytes of stored information and clean up big piles of dead data.

Defensibility Is the Desired End State; Perfection Is Not

Defensible disposition is a way to take on huge piles of information without personally cracking each one open and evaluating it. Perhaps it is, in essence, operationalizing a retention schedule that is no longer viable in the electronic age. Defensible disposition is a must because most big companies have hundreds of millions or billions of fi les, which makes their individualized management all but impossible.

As the list of eight steps to defensible disposition makes clear, different chunks of data will require different diligence and analysis levels. If you have 100,000 backup tapes from 20 years ago, minimal or cursory review may be required before the whole lot of tapes can be comfortably discarded. If, however, you have an active shared drive with records and information that is needed for ongoing litigation, there will need to be deeper analysis with analytics and/or classifi cation technologies that have become much more powerful and useful. In other words, the facts surrounding the information will help inform if the information can be properly disposed with minimal analysis or if it requires deep diligence.

Kahn’s Eight Essential Steps to Defensible Disposition

1. Defi ne a reasonable diligence process to assess the business needs and legal requirements for continued information retention and/or preservation, based on the information at issue.

2. Select a practical information assessment and/or classifi cation approach, given information volumes, available resources, and risk profi le.

3. Develop and document the essential aspects of the disposition program to ensure quality, effi cacy, repeatability, auditability, and integrity.

4. Develop a mechanism to modify, alter, or terminate components of the dispo- sition process when required for business or legal reasons.

5. Assess content for eligibility for disposition, based on business need, record retention requirements, and/or legal preservation obligations.

6. Test, validate, and refi ne as necessary the effi cacy of content assessment and disposition capability methods with actual data until desired results have been attained.

7. Apply disposition methodology to content as necessary, understanding that some content can be disposed with suffi cient diligence without classifi cation.

8. On an ongoing basis, verify and document the effi cacy and results of the dis- position program and modify and/or augment the process as necessary.

Source: “Chucking Daises: Ten Rules for Taking Control of Your Organization’s Digital Debris,” Randy Kahn, Esq., and Galena Datskovsky Ph.D., CRM (ARMA International, 2013), Overland Park, KS.

Business Case around Defensible Disposition

What is clear is that defensible disposition can have signifi cant ROI impact to a com- pany’s fi nancial picture. This author has clients for whom we have built the defensible

Dalam dokumen Information Governance - Wiley CIO (Halaman 150-157)