Home Knowledge. Innovation. Results. Services Webinars Encore Discoveries News Strategic Alliances About Us Contact Us
Newsletters
Technology Training
Click here to join our monthly newsletter mailing list
 
 
 
Current Newsletter


May 2010

Metadata
Metadata means simply “data about data”. In the e-discovery industry, it means facts or characteristics of a document or record that may not be included in the content of the document. For example, metadata would include who sent the document, who received it, when it was sent or received, the custodian of the document, or the subject of the document. Metadata might be contained on the face of a document. Letters typically include the sender and recipient’s name and address. But if the document was found in the custody of some other person, that would be interesting to know during a lawsuit and might not be included in the contents. For electronic documents, metadata might include the sender or recipient of an email. For electronic documents, there might be additional metadata tracked by the software application it was created in such as last date printed. When collecting electronic (and paper) documents, it is important to accurately preserve the metadata so that it can be collected into the database that holds the documents for review.

Duty to Preserve
The “duty to preserve” describes one of the responsibilities of data custodians originally described in the Sedona Principles and upheld by many state court decisions. The Sedona Principles for Electronic Document Production, Second Edition released in June 2007 includes the duty to preserve in the first principle. “Organizations must properly preserve electronically stored information that can reasonably be anticipated to be relevant to litigation.” This principle has been upheld in numerous cases including United States v. Philip Morris USA Inc., 2004 WL 1627252 (D. D.C.), Qualcomm Inc. v. Broadcom Corp, 2008 WL 66932 (S.D. Cal. Jan. 7, 2008), and the landmark case known as Zubulake 4: Zubulake v. UBS Warburg, 2004 WL 1620866 (S.D.N.Y. July 20, 2004). In essence, if you expect a lawsuit, you have to preserve all the data that might be relevant. You have to stop automatic destruction of data if your data management plan includes anything like that. And you have to inform anyone who controls data that might be relevant to prevent tampering or destruction of that data. If the data is not protected, the party at fault could be penalized through sanctions, fines or procedural rulings during the court case.

Priv Log
“Priv log” is short for privilege log, which is a list of documents that are being withheld from the opposing side in a law suit on the grounds that they are privileged. Documents can be designated as privileged by an attorney who reviews the document and determines that it meets the legal definition of privileged communication. Different jurisdictions have different requirements, but generally private communication between an attorney and their client about the details of their case is considered privileged and does not need to be produced to the opposing side or requesting government agency. Similarly, communication between a priest and a penitent, a doctor and a patient, and married couples are generally privileged. The theory is that society benefits when these types of communication are protected to foster honest discussions in these relationships. When documents are being gathered for a law suit, documents that contain these types of special communication are protected. However, most jurisdictions require that if documents determined to be privileged are withheld, a log of which documents they are and why they are privileged must be created and given to the opposing counsel and sometimes filed with the court. These “priv logs” generally contain basic information about the documents being withheld including the serial number of the document, the document type (e.g. email, letter, contract), a description, the context, author and recipients, and the type of privilege claimed.

Noise Words
Also called “stop” words, these are words that are ignored when a database is being indexed. This means that if a certain word is designated as a “noise” word, users cannot search for that word in the database. Most database software applications have a default list of noise words that include common words in the English language which have relatively low search value. For example, the stop words used by Relativity include “about,” “be”, “can,” “did” and “get,” just to name a few. If a user searched for these terms, the database would report that they were not in the database. Even if the word did appear in the database, the database ignores it in the indexing process. By ignoring noise words, the index on the database is smaller and runs faster making it easier and quicker to search for more valuable words. Most database software programs allow administrators to customize the list of noise words.

Index
In the context of electronic discovery, an index is essentially a tool used by a search engine to quickly locate data in a database. In most database software programs, the index must be built before searches can be run. Different databases use different indexing methods. But they all share a common goal of developing a list (or dictionary) of all the words that appear in a database. For each word, the index also notes the location of that word by using some combination of the record number, field number, and line or character number. When searches are performed by end users, instead of reading the entire database in real-time, the database quickly scans the prepared index. The index works much like an index in the back of a reference book which list the pages where the word appears. Creating the index can take a while at the beginning, but it saves time later during searches.

XML
XML stands for eXtensible Markup Language. It is a way of encoding electronic documents so that they can be understood by internet browsers or other electronic processes. It is a newer version of HTML which was used to create web pages and uses similar methods. For example, if you wanted text on a web page to appear <b>bold</b> you would put the <b> tag before it and the </b> tag at the end. XML is a much more robust way of marking documents which allows users to define their own tags. It is used to create web pages, but is also used in electronic discovery. The Electronic Discovery Reference Model (EDRM.net) developed a standard XML schema used to import and export electronic documents and associated metadata. It is a safer, more accurate way to migrate data between platforms. Many law firms and vendors are now using the XML schema as their data file format. In the old days, each type of litigation software required their own data load file format which made it difficult to transfer data between software packages. Using XML as a standard file format makes that processes simpler, and preserves data integrity.
Read more at http://en.wikipedia.org/wiki/XML and http://edrm.net/activities/projects/xml

30(b)(6) Witness:
A 30(b)(6) witness refers to someone designated by a corporation to testify on its behalf concerning the electronically stored information maintained by the corporation that is evidence in a lawsuit. Rule 30(b)(6) of the Federal Rules of Civil Procedure allows parties to a lawsuit to call witnesses from corporations who possess relevant electronic data. In response to a request, the corporation can designate a representative (or representatives) to testify about the record maintenance policies, practices, and technology of the corporation. Corporations need to select an employee with personal detailed knowledge about the information technology involved. Most ediscovery experts recommend that corporations who participate in frequent litigation select the 30(b)(6) witness in advance and make sure he/she is well informed and capable of giving accurate and effective testimony if needed.
Read more at http://www.law.cornell.edu/rules/frcp/Rule30.htm

Unicode:
In the context of litigation support and electronic discovery, the term Unicode refers to the ability of a software application to work with a variety of foreign languages. The non-profit Unicode Consortium has assigned a unique number for every letter of most every language. These unique numbers are incorporated into various software solutions to identify, display and process text data stored in most languages. The standard Unicode Character Set is updated periodically to add newly found language scripts. For a complete list of the languages covered, see www.Unicode.org. In electronic discovery, the frequency of foreign language data is increasing as commerce becomes more global. Many of the data sets Encore processes contain characters from languages other than English. In order to extract the non-English language data, software that features the Unicode system must be utilized. For the last few years, many of the most popular software products used to filter, cull, process, review or produce electronically stored information have been adding support for foreign languages by adopting the Unicode numbering system. Many software products now include the terms “Unicode”, “Unicode support”, or “Unicode compliant” in their marketing materials to indicate that they can handle a wide variety of non-English language text data.  In particular, Encore has experienced a great demand for processing data in Chinese, Japanese, Korean, Spanish, French, Farsi, Hebrew and Arabic.
Read more at http://www.unicode.org/standard/WhatIsUnicode.html

Early Case Assessment (ECA):
Early Case Assessment (ECA) used in the context of electronic discovery refers to searching and filtering data before it is processed. In the normal workflow, electronic documents have to have the metadata extracted prior to being loaded into a searchable database. This process can take days or weeks depending on the volume of the data. Early case assessment tools allow litigators to get a quick look at the data by indexing it without first extracting the metadata and text. This allows case teams to quickly assess the value or monetary risk of a case. Good early case assessment tools also allow litigators to narrow the scope of processing to relevant documents which translates into enormous savings in avoided processing and review costs. Encore uses Clearwell Systems, dtSearch, Guidance Encase, IPRO and several other tools to allow users to look at raw unprocessed data and filter it by various criteria.

DeNISTing:
Removing known system and software files from a dataset to reduce processing time and expense. NIST stands for National Institute of Standards and Technology. Their National Software Reference Library maintains their Reference Data Set containing MD5 hash values of files known to be used in operating systems and other software.
Read more at http://www.nsrl.nist.gov/

Past Editions
Archived Newsletters

Recognized Leader in Discovery Solutions
   
©2010 Encore Discovery Solutions. Safe Harbor Policy Privacy Policy Legal Terms