this product is unavailable for purchase using a firm account, please log in with a personal account to make this purchase.

Select from any of the filters or enter a search term
Calendar
Calendar

Dial 'M' for metadata: Unravelling the mystery

Dial 'M' for metadata: Unravelling the mystery

By Craig Macaulay and David Wishart

Technology 

0 Comments


Snapshot

  • Metadata, while useful, is still to be authoritatively defined and understood.
  • It is best thought of in terms of how it is created: entered by humans, stored by a device’s operating system or created by an application working with the document.
  • Metadata is most frequently at issue in discovery in litigation although there are many further issues surrounding it.

Metadata is still to be authoritatively defined and understood. Perhaps the best way to think of it is in terms of how it is created.

“We kill people based on metadata”, Michael Hayden, former United States National Security Agency and Central Intelligence Agency director

Metadata’s definition is “data about data”. However, this is hardly helpful. A more sophisticated definition is that it is structured information that describes, explains, locates or otherwise makes it efficient to retrieve, use, or manage an information resource. Metadata is used significantly in:

  • organising the filing and cross-referencing of electronic documents
  • electronic discovery for document title and date
  • in determining the veracity of electronic evidence.

A practical illustration is to take the number 3000. To a computer this could be a PIN number, last year’s profit or a postcode, and each will be managed and used in very different ways. The metadata is information that tells the computer in what way the number should be treated and used. 

Metadata can be used in all manner of situations. Michael Hayden refers to the use of mobile phone tower data – metadata – to assassinate mobile-phone-using terrorists with drone planes.1 Less dramatically, it is also important in both legal and privacy contexts. In terms of documents, it includes information about the author of a document, who has changed it and when, whether there are any attachments, and to whom and by whom it has been sent.

Its use in the legal context has dramatically increased in recent years. Often perceived not to contain private information, it can be used to establish accurate profiles, even the identity of individuals. Metadata has been used to identify misuse of intellectual property,2 to identify fake emails3 and even to identify a possible murderer through devices paired to a smart speaker.4 However, its most frequent use is in litigation for the purpose of discovery.

Despite its increasing use, the exact nature of metadata is subject to a great deal of confusion. Even a federal Attorney-General became entangled in confusions about IPs and URLs and what such data can reveal when trying to explain a Bill about metadata.

How not to think of metadata

It is not useful to think of metadata as existing in various forms (a computer’s file system information, included in the document and so forth) each providing particular information about a document (its date of origin, amendments, author etc) because the information in those forms can become inconsistent. For example, where a document is stored may say certain things about the document but an application using the document may record different and contradictory information about it as it is changed by the application. This means interpretation of the details of metadata needs to be treated with caution. It also means that classifying and understanding metadata in terms of the information that can be derived from it is not useful.

Nor is it useful to think of metadata as something perceivable in itself: the information it contains needs to be extracted by appropriate software from a variety of locations, which themselves are virtual rather than physical. Tamberlin J made this point in Jarra Creek Central Packing Shed Pty Ltd v Amcor Limited: “The information which is contained in the meta-data is not visible on a print-out of the relevant document, which shows only the face content and does not disclose the layers of electronic data beneath the visually readable information”.

Understanding metadata: How it is created

Metadata is created in three broad ways:

Custom metadata 

This is information about a document entered by humans either manually or indirectly by a computer program under the direction of a human. An example of custom metadata would be the traditional coding done by lawyers when classifying or coding the document as relevant for discovery. This could include comments and other data like document type or even a date from the document. Law firms code documents for date, title and type for the purpose of document review and this is still the modus operandi for hardcopy documents scanned into an electronic database. It is basic metadata.

File system/infrastructure metadata 

This is information stored by the operating system controlling the computer or other device where the document is stored. The information is placed in a special area of the device called the file system tables. What is stored depends on the operating system, but there are standards which mean much of this metadata is largely consistent between systems.

Examples of some of the more popular file system metadata are:

  • file name
  • created date (when the document was first written to the file system)
  • file name (as stated above this is not stored in the document itself)
  • modified date (when the last modification was saved to the file system)
  • last accessed date (this can depend on the operating system. (Later versions of Microsoft operating systems have this turned off by default)
  • registry settings and entries
  • event logs
  • link files (recent file listing)
  • print management software logs.

File system/infrastructure metadata has been the mainstay of electronic discovery for details like date (but only system modified date), title, document and page. So far it has been the most consistent and reliable source of metadata but it relies on the stability of the operating system and device storage. Moreover, Microsoft operating systems have started turning off the updating of file system metadata.

Application metadata 

This is data created by the application (computer program) working with the document. Application metadata can be embedded in the document but it can also be stored in other areas of a computer or device depending on how the application has been configured. It should be noted that where applications rely on other applications to operate, for example an accounting application using a database, the secondary application (the database) will also maintain its own application metadata.

Some of the more popular application metadata examples include:

  • created date
  • modified date
  • last accessed date
  • last printed data
  • author
  • date-time and author of changes to the content of the document
  • application logs (which log the activity of users and the application during its execution).

If the document is a photo, location, date and time, direction, device and all the technical details of the photograph including light are stored.

Application metadata is not uniform, even changing between versions of applications. Not all applications store the metadata and it can be difficult to extract metadata from large volumes of material. Accordingly it has mainly been used to test the veracity of other evidence although recently its use has become more widespread.

Each of these three ways of creating metadata is distinct and all need to be considered, especially in discovery when producing or requesting a document. 

Metadata in discovery

The word “document” is broadly defined in the Federal Court Rules to include documents encompassed by the definition of “document” in the Evidence Act 1995 (Cth), as well as any material data or information stored or recorded by mechanical or electronic means (see O1 r4). From this description, it is clear that embedded electronic information in relation to relevant documents, including the information embodied in electronic metadata, is discoverable.

Some cases illustrate the benefits for discovery of precise definition of metadata yet also demonstrate some confusion between the categories. Tamberlin J, despite accurately describing the hidden nature of metadata in Jarra Creek Central Packing Shed Pty Ltd v Amcor Limited, located the metadata in the document itself: “The term describes data contained within an electronic file relating to the identification, origin or history of the file itself”. Despite this, his order did not identify the location of the metadata.7 The plaintiff had wanted the defendants to produce further document metadata fields, not produced originally, to enable a more efficient review of those documents. The reason it was not produced originally was because of an agreement between the defendants and the Australian Competition and Consumer Commission. 

Wartsila Ship Design Singapore Pte Ltd v Liu Jiachun and others8 in the Singapore High Court demonstrates the importance of correct definition of the format of discoverable documents. It involved a claim of misuse of intellectual property by some ex-employees. An exchange protocol had been agreed to between the main parties under which the documents would be exchanged in portable document format (pdf). This had the effect of removing some of the application metadata associated with the document. During the proceedings one of the other parties disclosed documents in their native format, and this included the application metadata of the documents. The metadata which specifically related to the created date was of great interest to the plaintiff. It showed that the documents had been created by the plaintiff and went a long way to establishing their case.

By 2015 some practitioners had learned the power of precisely defining what metadata is sought. In Integrated Medical Technology Pty Ltd (IMT) and Anor v Gilbert and Ors9 IMT alleged that some former employees copied or misused computer software source code. Apart from just a copy of the source code used by the defendant, the plaintiff sought orders for the discovery of INI files, object files, temporary files, third party files integrated into the Delphi IDE, third party DLL files, express/scribe/dictate, text files with random names, WP tools installation files, the test bug report, the test log file, the dot RAR file, the help file, the built DDLS or EXE files, the decrypted DB user files, the spreadsheets, the registry files, the Report Definition Files, the data configuration files and the bitmap files. 

Accessing metadata in a variety of locations may be necessary to challenge the authenticity of documents. In NAK Australia Pty Ltd v Starkey Consulting Pty Ltd10 the Court ordered access to a computer from which an email was sent rather than defining the metadata as such. For this purpose the metadata was not associated with the email itself, rather the application which sent it. Production of a laptop was insufficient to establish authenticity in PM Sulcs & Associates Pty Ltd v Oliveri11 although server log files served to establish an email was fake in Rana v University of Adelaide (No 2).12 Similar concerns pervade the Ambridge Investments litigation: access to log files might well have solved some of the issues as to authenticity of emails and the lack of electronic copies.13

Many people confuse which category of metadata is generally used in discovery of documents. The most frequently accepted is the file system metadata. As mentioned above, this category of metadata is not embedded in the document (although it may appear embedded by how it is presented to us by the application). This is due to a number of reasons:

  • file system metadata is considered more consistent and reliable over time
  • file systems metadata is the only place you can source document title information
  • application metadata is not captured by all applications and is often inconsistently captured and stored
  • application metadata, given its inconsistency, is more costly to process (requires more work).

Accordingly, for discovery purposes it is important to understand the different types of metadata and their individual vagaries in terms of the manner of their creation, and to identify all possible sources of metadata, whether they be internal servers, external or utility servers, in the cloud, or on a variety of devices from laptops to mobile phones.

Metadata and privacy

Metadata is often thought of as having fewer privacy issues than content. Yet, even so, deep confusion abounds. The Telecommunications (Interception And Access) Amendment (Data Retention) Bill 2014 (Cth) – the discussions about this are those in which the Attorney-General became confused – sought to define and mandate the retention by telecommunications service providers of large amounts of data relating to account holders and their communications. Those communications include phone conversations, email and so forth, and the data includes the phone numbers of the people who called each other, how long they talked to each other, the email address from which a message was sent and the time the message was sent. This data was nominated “telecommunications data” in the Bill. It is a species of metadata even though the term “metadata” is not used. In the Explanatory Memorandum it is quite explicitly stated that “telecommunications data is less privacy intrusive than content”.14 

According to the Explanatory Memorandum, the rationale for mandating the retention and disclosure of such information to authorised persons was to investigate, prosecute and prevent serious criminal offences (including murder, sexual assault, kidnapping, drug trafficking, money laundering and fraud) and activities that threaten national security. Yet local councils have been requesting retained data in order to investigate infringements of by-laws such as illegal dumping, littering and parking.15 A recent United States study16 found that four pieces of data (which some people would term metadata) collected from a shopping centre could with 95 per cent accuracy determine the individual identity of any person in the shopping centre and that while names were not identified with that data, further investigation could easily establish them. That metadata does not have the same privacy concerns as content is clearly mistaken and the idea arises from a lack of clarity as to exactly what metadata is. 

Further issues

While there are signs the profession is becoming more literate in terms of the information that can be gleaned from available metadata and how and where it might be found, especially in terms of email authenticity, further issues loom. One of the most pressing is whether a litigant has possession or custody of documents stored on the cloud. Much email is cloud-based and many businesses store documents in that space. The question is whether the litigant’s storage of material on a third party’s servers takes it sufficiently outside the litigant’s control as to make it undiscoverable. A useful summary of the various approaches can be found in Dirak Asia Pte Ltd v Chew Hua Kok.17 For the purposes of Singapore, the Court decided that the practical capacity to access the documents stored was sufficient nexus to justify a discovery order. Were the matter to require access to the application metadata of the cloud server a different conclusion might have been reached.

Social media and “Big Data” also pose issues for comprehending and defining metadata. The reason is that each item produces its own metadata and yet itself is the metadata of other uses. What then is discoverable multiplies exponentially. Moreover, where that metadata is stored in terms of devices makes the question of control, as discussed above, problematic. Privacy concerns intrude with crudely defined criteria. Yet again, the proliferation of formats, including sound and video files, create issues of presentation of data. 

Conclusion

The key to successfully negotiating the confusions posed by metadata is threefold:

  • one should understand the different types and sources of metadata and their individual vagaries
  • all possible sources of metadata should be identified, whether they be internal, external or utility servers, in the cloud, or on a variety of devices from laptops to mobile phones
  • to appreciate that imagination and assiduous attention to the media is necessary to keep one’s understanding afloat because the digital revolution is constantly changing the methods of access and types of metadata.

Craig Macaulay is a data scientist with Phi Finney McDonald. He has more than 20 years’ experience in the forensic industry and has been involved in various litigation matters, fraud and improper conduct investigations, using computers and the data stored on them.

David Wishart is an associate professor in the Law School of La Trobe University. He has researched in many fields of law, notably corporation and competition laws. He has recently examined the impact of the Royal Commission into the financial industries. 

1. Stated in a debate on 1 April 2014 at John Hopkins University and quoted by David Cole, The New York Review of Books, 10 May 2014. A YouTube video of the debate is available at www.youtube.com/watch?v=kV2HDM86XgIwww.youtube.com/watch?v=kV2HDM86XgI. The comment is made at 17.59.

2. Wartsila Ship Design Singapore Pte Ltd v Liu Jiachun and others [2014] SGHCR 13.

3. NAK Australia Pty Ltd v Starkey Consulting Pty Ltd [2008] NSWSC 1142; PM Sulcs & Associates Pty Ltd v Oliveri [2009] NSWSC 456; Rana v University of Adelaide (No 2) [2008] FCA 494.

4. www.bostonherald.com/2018/11/11/alexa-served-privacy-concerns-echoed-in-new-hampshire-case/.

5. Sarah Dingle, “Attorney-General George Brandis struggles to explain Government’s metadata proposal”, ABC News, www.abc.net.au/news/2014-08-07/brandis-explanation-adds-confusion-to-metadata-proposal/5654186.

6. [2006] FCA 1802

7. Note 6 above.

8. [2014] SGHCR 13.

9. [2015] QSC 124.

10. [2008] NSWSC 1142.

11. [2009] NSWSC 456.

12. [2008] FCA 494.

13. Ambridge Investments Pty Ltd (in liq) (recvr app'td) v Baker & Ors [2010] VSC 59; Baker & Ors v Ambridge Investments Pty Ltd (in liq) & Ors [2011] VSCA 334.

14. Telecommunications (Interception And Access) Amendment (Data Retention) Bill 2014 (Cth), Revised Explanatory Memorandum, para 9.

15. www.brisbanetimes.com.au/national/queensland/big-brother-brisbane-city-council-taps-in-to-metadata-20181114-p50g24.html.

16. Yves-Alexandre de Montjoye, Laura Radaelli, Vivek Kumar Singh, Alex “Sandy” Pentland, “Unique in the shopping mall: On the re-identifiability of credit card metadata”, www.sciencemag.org, (last accessed 2 February 2015)

17. [2013] SGHCR 1


Views expressed on liv.asn.au (Website) are not necessarily endorsed by the Law Institute of Victoria Ltd (LIV).

The information, including statements, opinions, documents and materials contained on the Website (Website Content) is for general information purposes only. The Website Content does not take into account your specific needs, objectives or circumstances, and it is not legal advice or services. Any reliance you place on the Website Content is at your own risk.

To the maximum extent permitted by law, the LIV excludes all liability for any loss or damage of any kind (including special, indirect or consequential loss and including loss of business profits) arising out of or in connection with the Website Content and the use or performance of the Website except to the extent that the loss or damage is directly caused by the LIV’s fraud or wilful misconduct.

Be the first to comment