Limitations to Text and Data Mining and Consumer Empowerment: Making the Case for a Right to “Machine Legibility”

This paper focuses on the current legal barriers to text and data mining (TDM) in the context of smart disclosure systems (SDSs) whose aim is to provide consumers with improved access to the data needed to make informed decisions. The use of intellectual property rights and contracts, combined with technological protection measures, can hinder TDM and the deployment of SDSs. Further, those legal constraints can negatively impact on artificial intelligence innovation, because that requires improved access to data. There are thus various arguments for enhanced “machine legibility”. However, the TDM exceptions included in the recently approved Directive on Copyright in the Digital Single Market do not appear to clear the way for enhanced “machine legibility”. In relation to SDSs, we also argue that the principle of transparency, which is embedded in consumer and data protection laws, can serve as a last line of defence against prohibition of TDM.


Introduction
Smart disclosure refers "to the timely release of complex information and data in standardized, machine readable formats in ways that enable consumers to make informed decisions". 1 Smart disclosure systems (SDSs) allow users to get easy and timely access to the relevant pre-contractual information or even receive personalised advices based on their preferences. 2Over the last two years, several initiatives have emerged worldwide offering third party services to automatically analyse websites' contractual documents and to check their compliance with applicable consumer and data protection laws. 3One of the main goals of these projects 4 is to increase the awareness of users towards the rights, obligations and possible risks in their online transactions, trying to reduce or overcome the well-known signing-without-reading problem. 5Such tools are primarily directed to consumers, as end-users of the service; however, other possible users are consumer associations or regulatory authorities, which can use them to perform periodical assessments, start investigations or verify complaints more quickly.
The functioning of SDSs is based on text and data mining (TDM).TDM uses techniques from natural language processing, machine learning, information retrieval, and knowledge 1 Sunstein 2012.
2 On the advantages of smart disclosures and targeted information, see Ben-Shahar 2009;Helberger 2013;Porat and Strahilevitz 2013;Bar-Gill 2015;Busch 2016;Helleringer and Sibony 2017;Busch forthcoming. 3 One of the first projects in the area of automated analysis of legal documents is "Usable Privacy Policy" (www.usableprivacy.org),a consortium led by Carnegie Mellon University.Their tool aims to help users to navigate through the text of privacy policy and identify the privacy options and choices available (Sadeh et al.  2013).More recently, an international team formed by researchers from the Switzerland's Federal Institute of Technology, the University of Wisconsin and the University of Michigan, has launched two tools: Polisis (https://pribot.org/polisis), a tool to visualise in a very effective way the content of a privacy policy, and Pribot (https://pribot.org/bot), a chatbot available to answer questions about a specific privacy policy (Harkous et al.  2018).In the field of the automated analysis of T&C, we must mention CLAUDETTE, a research project carried out by an interdisciplinary team at the European University Institute (https://claudette.eui.eu).The tool, based on machine learning techniques, assesses the fairness of consumer standard terms (https://claudette.eui.eu/use-our-tools/).This functionality will be extended to the analysis of privacy policy.For more information, see Contissa et al. 2018; Lippi et al. 2018.Another interdisciplinary project, SaToS (Software Aided Analysis of ToS), is conducted by the chair of Software Engineering for Business Information Systems (Sebis) at TU Munich.The German research group is developing a solution to automatically identify Terms of Services of e-commerce websites and summarise the key points of the contract in a simplified language (Braun et al. 2018). 4This is precisely one of the objects of "The Internet of Platforms: an empirical research on private ordering and consumer protection in the sharing economy", carried out at UCLouvain.The project aims to address the issue of the lack of transparency in sharing economy transactions and improve the information users receive from and about the platform (http://www.rosels.eu/research/research-project-iop/).This paper presents some of the preliminary results of this project.management for the automated analysis of digital content (structured and unstructured data), in order to extract information, identify patterns, discover new trends, insights or correlations. 6spite the development of such promising tools for enhancing the readability and understandability of the conundrum of terms, the current EU legal framework is not particularly supportive of TDM.Many rigorous studies have already analysed the barriers to TDM, current tensions and negative externalities in the context of research, contributing to the debate on the TDM exception in the ongoing copyright reform. 7Several of these studies highlight that the beneficial uses of TDM are not limited to scientific research, but take place in other contexts, including consumer information and protection.
In the light of the artificial intelligence (AI) applications that are emerging, it is easy to predict that TDM will become a central technique enabling new kinds of information-based services and applications (e.g., for care and medical purposes, fact checking, disaster prevention, elaboration of sustainability policies and politics, etc.).Many studies have already highlighted the need for a broad access to datasets so as to train algorithms and improve AI applications. 8e article aims to contribute to the discussion on TDM by adding a further perspective.We take into consideration the application of TDM in SDSs, a sector that is growing in importance.The automated analysis of contracts and privacy policy for enhancing the awareness of consumers and, ultimately, ensuring consumer empowerment, is a perfect lab to test TDM's pierres d'achoppement.
The article is structured as follows: following the present Introduction, we outline in Section 2 a taxonomy of the possible obstacles for TDM and SDSs, identifying intellectual property rights (IPRs), contracts and technological protection measures (TPMs) as the principal ones.
In Section 3, we analyse the interplay between copyright, the database sui generis right and what is called private ordering 9 , noting that the existing copyright limitations may offer little help to counterbalance the power of contracts. 6In the proposal for a Directive on copyright in the Digital Single Market (hereafter "Copyright in the DSM Directive"), TDM is defined as "any automated analytical technique aiming to analyse text and data in digital form in order to generate information such as patterns, trends and correlations" (Art.2.2, Proposal for a Directive of the European Parliament and of the Council on copyright in the Digital Single Market, COM/2016/0593 final -2016/0280 (COD)).The definition is sufficiently broad to embrace the current TDM application panorama.For a technical definition of text and data mining, see Hearst 2003.Specifically on text mining, Feldman and Sanger 2007.For an extensive analysis of the definition of TDM, see Triaille et al. 2014. 7 Ibid.;Triaille et al. 2014;Bernhardt et al. 2015;Caspers and Guibault 2016a;Margoni and Dore 2016;Stamatoudi 2016;Hilty and Richter 2017;Geiger et al. 2018;Margoni and Kretschmer 2018;Rosati 2018. 8 On the crucial need to train algorithms on different datasets, see Hall and Pesenti 2017. 9 By "private ordering", we refer to both contractual, technological and informal measures as tools to enforce platforms' rights and interests towards their users.In the absence of a clear legislative framework or effective (and efficient) remedies, contracts and technology can be used to expand the prerogatives and powers of Therefore, in Section 4 we explore whether the current proposal for a specific TDM exception in the Copyright in the DSM Directive10 and the ongoing copyright reform in the EU will fill some of these gaps.Despite some merit of the text proposed by the European Commission (and of the amendments of the European Parliament), the exception as drafted will not permit to fully embrace the potentialities of Big Data analytics and AI, nor will it dispel legal uncertainty, as contemplated by the European Commission.
In Section 5 we go further and argue that the TDM and related data access issue should be framed in a perspective that takes copyright's rationale seriously.This leads to the argument that the reproduction right should in the first place not cover TDM processes.
The limitations to TDM coming from private ordering will be examined in Section 6, where we present the results of an empirical analysis conducted on a representative set of online platforms operating in the sharing economy.This analysis shows that there is a widespread trend to preclude TDM via contracts (the online Terms & Conditions) and through the embedded code.A prohibition of TDM making it impossible to run tools for automated contractual analysis would prevent consumers to access the relevant information to make informed choices.This would not be in line with the principle of transparency enshrined in both consumer and data protection.
In Section 7 we conclude by presenting the principle of "machine legibility" which transposes the transparency principle in the technological context.

Obstacles to TDM in the current European framework: setting the scenario for smart disclosure systems
There are at least three legal protections and tools that could limit the practice of TDM: 1) intellectual property rights (IPRs); 2) contracts; 3) data protection rules.First, at the light of the "black letter" of copyright law and its current interpretation by the Court of Justice of the EU (ECJ), many TDM activities could be considered as copyright infringements or violations of the database sui generis right.11Second (and this is an equally worrisome signal in the neverending battle for the control of information), contracts may expressly prohibit or limit TDM.In addition, TPMs can enforce (and reinforce) IPRs and contractual provisions, impeding TDM in practice.Finally, if the object of the mining consists of personal data, i.e. "any information platforms, restricting the legitimate uses and faculties of the weaker party.As noted with reference to the intellectual property domain by Dussolier 2007, p. 1393-1394.relating to an identified or identifiable natural person" 12 , the processing has to be compliant with data protection law.
These three legal instruments have different scopes and objectives but, in some cases, they produce the same consequences for TDM.Imagine a mining activity carried out by an insurance company on a national electronic health record system: TDM would be forbidden and several legal instruments could be invoked to justify it, in particular the protection of personal data.In other cases, data protection and copyright could admit the mining for research purposes (as both protections are subject to limitations for research), but TPMs, impeding the bulk download of the content, could affect the conduct of research in practice.Meanwhile, some mandatory copyright exceptions and lawful uses cannot be overridden by contractual provisions.It is therefore a complex and dynamic scenario that needs to be further explored if we want to unleash the potential of SDSs and, beyond, AI applications.
In the case of SDSs, the mining covers legal documents, such as Terms & Conditions (T&C) and privacy policy.Therefore, the issue of data protection will be left out from the analysis.Furthermore, for the purpose of this paper, the focus will be on IP and contractual obstacles. 12Art.4(1), General Data Protection Regulation.

Before the (IP) Law: limits and counter-limits to TDM
IP is one of the first barriers to TDM.The latter may, in principle, clashes with a bundle of exclusive rights if the work or the database qualify for protection under Directive 2001/29/EC ("InfoSoc Directive") or Directive 96/9/EC ("Database Directive").
As known, copyright protects "databases which, by reason of the selection or arrangement of their contents, constitute the author's own intellectual creation". 13A database is defined as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means". 14A website, like that through which online platforms offer their service (for example, the AirBnB website), can certainly fall into this notion. 15Copyright covers the database's peculiar "expression", i.e. the originality of its systematic organization, which is exteriorised through the "free and creative choices" 16 that show the "personal touch" 17 of the author.A contrario, the requirement of originality is not satisfied "when the setting up of the database is dictated by technical considerations, rules or constraints which leave no room for creative freedom". 18Therefore, copyright does not apply when the selection and arrangement follow a chronological or alphabetical ordering. 19st websites of the sharing economy platforms that can be considered as databases will probably not reach the threshold of originality required for copyright to protect the database as such (to be distinguished from the content of its pages).First of all, because the selection and arrangement of the data is essentially shaped by the kind of service that the platform provides.A carpooling platform will allow users to search for the closest car available, displaying, for example, the distance, the location of the car, the level of fuel, the license plate.In addition, to be userfriendly and easy to retrieve, some information has to be organised according to "trivial" criteria: e.g.T&C, privacy policy, copyright notices, FAQs, norms of the community are generally included in the "Legal conditions" section.Therefore, in the given scenario, copyright is unlikely to apply to the platform's database.
However, the database can be protected under the sui generis right regime.The latter is an exclusive protection granted to the maker of the database "which shows that there has been qualitatively and/or quantitatively a substantial investment in either the obtaining, verification 13 Art.3, Database Directive.For a comprehensive overview, see, Beunen 2007;Derclaye 2008Derclaye , 2014. 14 . 14 Art.1(2), Database Directive. 15For the classification of a website as a database, see Strowel andDerclaye 2001, p. 311-312. 16 ECJ, Case C-604/10, Football Dataco Ltd andOthers v Yahoo! UK Ltd andOthers [2012], ECLI:EU:C:2012 :115, para. 38. Derclaye 2012;Rosati 2013b. 17 ECJ, Case C-604/10, Football Dataco Ltd, para.38. 18Ibid., para.39. 19 In line with the US leading case Feist Publications Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991).See, Waelde et al. 2013, p. 65.  or presentation of the contents". 20We could debate whether there is substantial investment in obtaining data, if the latter are constantly produced by the web-users (e.g., personal information, localisation data, pictures, preferences).Equally, one could wonder whether there is substantial investment in the verification of the content if the platforms or sharing economy websites explicitly state and follow the policy that they do not check the accuracy of the data provided by users.At the same time, there are rooms to allege that a substantial investment could lie in the presentation of the content: after all, the platform arranges the website to facilitate the sharing of information between the users, adjusts the filters and search tools to customise and make appealing the experience for the end-user.
If we assume for the sake of argument that the threshold of substantial investment is met, the way the sui generis right is conceived and designed could affect the behaviour of the legitimate user.Like copyright, the sui generis right does not need any formality to exist. 21At the same time, the substantial investment in either one of the three activities listed in Art.7 (Directive 96/9/EC), as interpreted by the ECJ 22 , is a determination that is likely to be verified in court only and, considering the requested proof, the evidence is in the hands of the maker of the database.This means that, unlike copyright, it will be difficult for a user to know, by simply consulting the database, whether the latter is protected by the sui generis right. 23Hence, the sui generis right framework can contribute to legal uncertainty, reducing in practice users' faculties that would be totally legitimate: a user who wants to avoid the risk of private sanctions (the suspension of the account or the ban from the platform) or legal actions, will adopt a precautionary approach.She will act as if the sui generis right protects the database (by the way, the standard T&C will confirm that the provider of the online platform intends to protect the website by IPRs).
Another issue is whether copyright protection applies to the T&C and/or the privacy policy, which will be primarily investigated in the case of automated analysis of contractual documents.There is some discussion as to the copyrightability of such documents.In a way, they are "standard" by definition and their formulas are usually based on templates.Furthermore, in the case of privacy policy, its structure is mainly determined by law (see, for instance the list of mandated disclosures in Arts.13 and 14 of the General Data Protection Regulation, "GDPR").However, the EU criterion of originality does not require novelty or a high level of creativity.It  23 The maker of the database could state the existence of the sui generis right in the T&C, but such a circumstance is quite rare.Usually, T&C contain general formula such as "All right reserved", leaving ample room for interpretation to the end-user, which in the majority of the cases is nor a lawyer, much less an IP expert.
suffices to demonstrate the "author's own intellectual creation" 24 , which can be rather modest. 25ften there is still room for choices and adjustments in the presentation and drafting of those documents, which means they could be protected by copyright.
There is no case law concerning the standard of originality applied to contractual texts.Therefore, it is necessary to refer to the general framework and to reason in consimili casu.For instance, in Infopaq I, the Court of Justice stated that words, as such, are not protectable, but "through the choice, sequence and combination of those words […] the author may express his creativity in an original manner and achieve a result which is an intellectual creation" 26 .Thus, the ECJ concluded that even eleven consecutive words can potentially "express the author's own intellectual creation" 27 , leaving such a determination to national courts.The latter have ruled, episodically, over similar issues, i.e. the creativity of legal or technical works.In the Italian case law, for example, a technical article (a sort of machine's instruction manual) describing the functions of a monoscope has been considered original. 28In another case dealing with legal guidelines, the District Court of Venice found that "a regulation against counterfeiting" written by a lawyer presented the degree of originality required by copyright. 29In Spain, the Madrid Provincial Court concluded the same in relation to an exercise book of mathematical problems in statistics. 30Under Belgian and French copyright laws, instruction manuals and other informational documents have been recognized as protected by copyright. 31For instance, the text of a patent before its official publication 32 as well as the wording of various contracts 33 have been protected in France.In Belgium, the instructions for using IT equipment 34 have been granted copyright protection.In the UK, a wide range of subject matter has been protected as compilations in the past, including a leaflet conferring information about herbicides. 35Applying a low threshold of originality, courts have accepted as original railway tables and exam papers. 36n Germany, various decisions of the Federal Supreme Court (BGH) have considered as protected by copyright the following technical documents: user guidelines for the usage of a technical apparatus 37 , technical rules to be applied in the construction of roads 38 , technical drawings. 39erefore, the outcome of the intellectual effort of the author in drafting the various clauses of some online T&C, the choice of words or the structuring of the document, could qualify as protected work.Who has ever written a complex contract knows that it can be a creative task, and that many free choices have to be made for the organisation of the clauses and the drafting of each.In addition, thanks to some legislative interventions and a few recent scandals in the field of consumer and data protection 40 , there is a growing trend towards the encouragement of user-friendliness in legal documents: e-commerce sites, social networks and online platforms in general are starting to provide T&C and privacy policy in novel formats, using different fonts, layout and icons to increase the understandability, transforming the "legalese" in a plainer language, etc.The drafters of the text of the creative commons licenses claim they are protected by copyright, and made them available under the CC0 Public Domain Dedication. 41In exceptional cases, designers are even involved in the drafting, this circumstance making it hard to contest the originality of the legal documents. 42 sum up, to retrieve the relevant information, a SDS can automatically analyse contracts and privacy policies, considered as original works, and the various legal sections of a website could be considered as part of a database, potentially protected by the sui generis right.
34 Brussels Court of Appeal, 28 January 1997, cited in de Visscher andMichaux 2000, p. 31, footnote 125. 35 Supreme Court of Judicator -Court of Appeal, Elanco v. Mandops [1980] RPC 213. 36 Bently and Sherman 2009, p. 64. 37 Federal Supreme Court, 10 October 1991 -I ZR 147/89 ("Bedienungsanweisung"). 38 Federal Supreme Court, 11 April 2002 -I ZR 231/99 ("Technische Lieferbedingungen"). 39 Federal Supreme Court, 22 September 1999 -I ZR 48/97 ("Planungsmappe"). 40The speech of Senator Kennedy at the Congress hearing of Mark Zuckerberg for the Cambridge Analytica affair has become viral: "Here's what everybody's been trying to tell you today, and -and I say this gently.Your user agreement sucks […] The purpose of that user agreement is to cover Facebook's rear end.It's not to inform your users about their rights.Now, you know that and I know that.I'm going to suggest to you that you go back home and rewrite it.And tell your $1,200 an hour lawyers, no disrespect.They're good.But -but tell them you want it written in English and non-Swahili, so the average American can understand it.That would be a start". 41See, point 5 of the Creative Commons Terms, https://creativecommons.org/terms/ .42 See, for instance, the privacy policy elaborated by the designer Stefania Passera: https://juro.com/policy.html.
In case of IP protection, the TDM activity might come into conflict with the various exclusive rights included in each IP bundle.Depending on the technique used, TDM may involve: the reproduction and/or communication to the public of a content (the text of T&C/privacy policy);43 -the extraction and/or reuse of a substantial part of the database. 44communication to the public or a reuse do not always occur in the case of TDM: the latter usually elaborates the information and publishes the results of the analysis in the form of aggregate data, statistics, reports, etc.Therefore, unless the output of the TDM shows the whole or the excerpts of the protected work or the database, there will be no communication to the public or reuse. 45e most problematic issue with TDM is the broadness of the notions of reproduction (for copyright) and extraction (for the database right): when the tool has to run the analysis, it copies all or part of the work, it transfers all or a substantial part of the contents of a database to another medium or it technically adapts or translates the content (e.g.conversion from a PDF to another format). 46So, these operations that are necessary steps in the TDM process in principle fall under copyright or under the database right.
The right of reproduction belongs to the copyright owner, while the right of extraction (even a temporary one, like the visualisation on a computer's screen) of the whole or substantial part of a database is granted to its maker.This means that the user cannot perform TDM without the authorisation of the right holder or in the absence of a copyright exception.
The interplay between IPRs, contracts and TPMs is visible in the field of copyright exceptions and limitations.Under the current EU framework (mainly defined by the InfoSoc Directive), some exceptions could apply to TDM: for works protected under copyright, for example, Art. 5(1) of the InfoSoc Directive allows for the temporary reproduction of works, "which are transient or incidental [and] an integral and essential part of a technological process and whose sole purpose is to enable: (a) a transmission in a network between third parties by an intermediary, or (b) a lawful use of a work or other subject-matter to be made, and which have no independent economic significance"; Art.5(2)(b) admits the reproduction for personal use; and Art.5(3)(a) includes a specific exception for teaching and scientific research.A similar exception for research is included in the Database Directive. 47The maker of the database cannot prevent the legitimate user from extracting insubstantial parts (qualitatively or quantitatively considered), as long as she does not "perform acts which conflict with normal exploitation of the database or unreasonably prejudice the legitimate interests of the maker of the database" 48 and "cause prejudice to the holder of a copyright or related right in respect of the works or subject matter contained in the database". 49wever, such exceptions may not confer effective rights to the consumers: they are narrow, do not cover all the spectrum of TDM technologies, and are differently implemented in Member States.Besides, the framework of rights and exceptions under the InfoSoc Directive and the Database Directive is not completely homogeneous: what would be allowed under copyright is not necessarily allowed under the sui generis right (and vice versa).
The research exceptions, for example, are limited for the (sole) 50 purpose of illustration for teaching or scientific research. 51TDM would be permitted under these exceptions in very few cases, e.g. in the context of non-profit scientific projects dealing with SDSs or for educational purposes in order to demonstrate the functioning of the tool.Such exception arguably could also cover the "training" of machine learning systems, on the model of human teaching, but such use would nevertheless require to be justified by the non-commercial purpose to be achieved through TDM and the indication of the source. 52e exception for private copying will allow even more limited uses: as reported in Caspers and Guibault (2016), national laws have been quite restrictive in the implementation of 47 We do not take into consideration Art. 9 (a) Database Directive, since the exception for personal use applies only to non-electronic database, which do not permit TDM in any case. 48Art.8(2), Database Directive. 51For a complete analysis of the activities that could fall within the exception, Triaille et al. 2013, p. 359 ff.;Guibault et al. 2012, p. 49 ff.;Montagnani and Aime 2017, pp. 385 ff. 52 See Arts.5(3)(a) InfoSoc Directive and Arts.6(2)(b) and 9(1)(b) Database Directive.Another inconsistency between the two Directives must be noted with regard to the research exception.Under the Database Directive, the reference to the source appears to be a mandatory requirement, while the InfoSoc admits the possibility of not indicating the author's names if "this turns out to be impossible" (Art.5.3.a.InfoSoc Directive).According to some authors, this difference is more a declamation than a substantial matter, considering the general principle of "ad impossibilia nemo tenetur", which will be applicable to the exception for database in any case (see Montagnani and Aime 2017, p. 387, citing Walter and Von Lewinski 2010).Other authors interpret literally the provisions (cf.Triaille et al. 2014, p. 70, as reported in Montagnani and Aime 2017, p.  387).such a copyright limitation, focussing on "personal use, study or (small scale) research" 53 , and "sometimes the scope is limited to a few copies" 54 .Considering for example Italian law, the reproduction for private use is legitimate with reference to printed works and as long as made manually or with means that do not allow the distribution of the work (Art.68, Law 633/1941). 55ccording to the majority opinion in the literature, the exception refers to the reproduction used in the family circle. 56The exception could be extended to online works if the three steps test is respected (Art.71-nonies, Law 633/1941), but it has received a narrow interpretation by national courts so far. 57Therefore, the Italian private copying exception is rather limited in the digital environment and may not likely permit TDM performed by an individual or, a fortiori, by a consumer association.
Furthermore, to rely on those exceptions of the InfoSoc Directive is no guarantee as both the research exception and the private copying exception are optional and not imperative: Member States are free to adopt them and, where they exist, the exceptions can be overridden by contracts or TPMs. 58stronger defence for TDM could, in principle, come from Art. 5(1) of the InfoSoc Directive.In fact, the temporary reproduction is a mandatory exception that must be implemented by all Member States.However, there are some drawbacks.First, mandatory does not necessarily mean that the exception is imperative (non-overridable by contracts).When the European Legislator wanted to limit the freedom of contracts, it expressly did so (see, for instance, Art. 15, Database Directive, and Art. 8, Directive 2009/24/EEC, "Software Directive").59 There is nothing similar in the InfoSoc Directive that would prevent the 53 Caspers and Guibault 2016a, p. 34.See, also Helberger and Hugenholtz 2007.The private copying exception has traditionally received little attention in the literature and the case law, apart from the issues related to the copyright levies and fair compensation.See, Strowel 2015.54 Caspers and Guibault 2016a, p. 34.55 Valenti 2007b, p. 195. 56 Ibid.p. 202, para.III and the bibliography thereby cited.57 Despite the potential "open-ended" nature of the three steps test.As reported by Margoni 2012.See also Hilty et al. 2008.While, for phonograms and videograms, there is a specific provision: the reproduction is permitted if done by a physical person solely for personal use and for non-commercial purposes, in compliance with the applicable TPMs (Art.71-sexies (1), Law 633/1941). Th exception will not apply if the reproduction is done by a third party (Art.71-sexies (2), Law 633/1941) and if the works are available on-demand and protected by TPMs or contracts (Art.71-sexies (3), Law 633/1941). Onthe limits of the private copying exception for the digital context and the interplay with TPMs, cf.Caso 2004;Montagnani 2007;Mazziotti 2008. 58 See Derclaye and Favale 2010; Helberger et al. 2013; Triaille et al. 2013; Caspers and Guibault 2016a.Regarding TPMs, Art.6(4) does not require that the Member States take appropriate measures to ensure that the beneficiaries of the private copying exception do in practice benefit of the exception.59 Derclaye and Favale 2010, p. 90.prohibition of such use by contract.60 Nevertheless, some countries, like Belgium, Ireland and Portugal, have expressly excluded the possibility to overcome the temporary reproduction exception via contractual means.61 The situation is not crystal clear in other Member States. 62A second (and more preclusive) problem concerns the content of the exception: Art.5(1) is crafted for caching and browsing activity 63 , but the cumulative conditions set out in that provision, as restrictively interpreted by the ECJ 64 , hardly apply in TDM activities.In particular, the copies made through TDM are not necessarily temporary, transient or accessory. 65It is also far from clear that the "sole purpose" of this reproduction is "to enable a transmission in a network between third parties".Furthermore, a smart disclosure system does not necessarily meet the independent economic significance element. Th latter not only requires that the temporary reproduction does not have to generate "an additional profit, going beyond that derived from lawful use of the protected" 66 item, but the reproduction also does not have to "lead to a modification of that work" 67 .Literally interpreted, this last requirement will exclude most of TDM activities, since the data analysis process usually implies a transformation of the original work for making it processable by the machine (e.g., the conversion from a format to another one).68 Therefore, there are several limitations of the temporary copy exception that make it irrelevant for exempting TDM.
Lastly, one has to consider whether TDM could fall in the lawful uses recognised by the Database Directive at Art. 6(1) and 8.These rights are expressly protected against conflicting contractual provisions (Art.15, Database Directive).Art.6(1) allows the lawful user to make a copy of the database in order to access the contents or to allow a "normal use" of the same.It would be coherent with the rationale of the Directive to consider a SDS aiming to analyse the terms or the privacy policy as performing a normal use of the database. 69Meanwhile, with reference to Art. 8 Database Directive, the TDM tool would likely extract and mine information from an insubstantial part of the database (the "Legal conditions" section), without conflicting with the normal exploitation of the database or unreasonably prejudicing the legitimate interests of the maker or the author of the works or subject matter contained in the database.At the same time, in some cases it will be difficult to allege this, as an online platform like Uber has 48 different legal documents (between T&C and a variety of "contractual" policies) that will not qualify as insubstantial content.
Therefore, the Database Directive leaves some room to perform TDM, especially because it expressly protects statutory permitted uses from contrary contractual provisions.However, this balance of interests is ensured as long as there is a protection by the Database Directive.As the Ryanair decision established, if the database is not protected either under copyright or the sui generis right, the database owner can set down contractual limitations to its use.70

TDM exception and the draft Copyright in the DSM Directive: a new hope?
If the current system does not fit for the purpose of TDM, shall a specific exceptionlike the one contained in the currently debated proposal for a Directive on Copyright in the DSM -be able to restore the balance between IPRs, on the one hand, and the access to information and encouragement of AI innovation, on the other?
The current proposal arrives at the end of a process initiated by Commission with the Communication for a Digital Single Market Strategy for Europe (2015)71 , followed by the Communication "Towards a modern, more European copyright framework" (2015) 72 , and confirmed in the Communication "Promoting a fair, efficient and competitive European copyright-based economy in the Digital Single Market". 73In those preparatory documents, the comparing the prices of flights, by performing also the extraction of information from the Ryanair website, was a "normal use" of that database, thus any contrary contractual provision considered unenforceable.However, the ECJ noted that the existence of the sui generis right was not proven in the case: as a consequence, the prohibition of contractual overriding did not apply.focus of the Commission has always been on the need to promote innovation in research, threatened by an uncertain legal framework for TDM and national differences.The proposal essentially allows research organisations to text and data mine works to which they have lawful access for the purposes of scientific research.Research organisations are defined as "a university, a research institute or any other organisation the primary goal of which is to conduct scientific research or to conduct scientific research and provide educational services: (a) on a non-for-profit basis or by reinvesting all the profits in its scientific research; or (b)pursuant to a public interest mission recognised by a Member State; in such a way that the access to the results generated by the scientific research cannot be enjoyed on a preferential basis by an undertaking exercising a decisive influence upon such organisation".Notably, Member States should not provide for compensation for rightholders as regards uses under the TDM exception (see Recital 13) and any contractual provision limiting TDM shall be unenforceable.The norm is therefore both mandatory (for the Member States) and imperative (for private parties), and there is no way to circumvent it through private ordering relying on contracts.
During the legislative procedure, the Council has specified the scope of the TDM and its relationship with the proposal. 74First of all, by making clear that TDM is not always a copyright relevant activity: "in relation to mere facts or data which are not protected by copyright […] no authorisation is required under copyright law" (Recital 8a). 75Secondly, by reaffirming that acts already covered by temporary reproduction exception under the InfoSoc Directive will continue to benefit from that provision (Recital 8a).
Notably, the Council extends the TDM exception so as to include cultural heritage institutions ("cultural heritage institution means a publicly accessible library or museum, an archive 74 See the version of the text dated 25 May 2018 (Council of the EU, Interinstitutional File: 2016/0280(COD), doc.9134/18, available here: https://www.consilium.europa.eu/media/35373/st09134-en18.pdf). 75Here the Council tries to fix the wording of Recital 8 (Commission text).The latter has been criticised in the literature for being a potential source of confusion, since it "wrongly suggests that carrying out TDM is per se of relevance to copyright.The explanations given in Recital 8, according to which an authorisation to undertake such acts must be obtained from rightholders if no exception or limitation applies, are too sweeping".Hilty and Richter 2017, p. 3.However, the clarification offered by the Council could not be sufficient: the temporary reproduction exception is only one of the possible legitimate activities that can be lawfully performed by users without authorisation nor a specific TDM exception.
or a film or audio heritage institution"). 76It added a security requirement in a new Art.3(1a), specifying also that the copies of the works and other subject-matter generated through TDM shall not be retained for longer than necessary for achieving the purposes of scientific research (see also Recital 11c). 77e text of the Council adds a further exception for TDM (new Art.3a): Member States are free to allow the "temporary reproduction and extraction of lawfully accessible works and other subject-matter that form an integral part of the process of text and data mining", if such use "has not been expressly reserved" by the rightholder "including by technical means" (see also Recital 13a).This seems to make the exception dependent on the rightholder's willingness to accept it, and the application of technical means would be enough to express it.Contrary than the TDM exception provided in the initial text (Art.3), it is an optional exception for the Member States.This is not welcome as it will reinforce the risk of possible divergences between Member States on an issue that should be dealt with seamlessly across the EU internal borders.Furthermore, the additional TDM exception, if implemented, would be overridable (as Art.6(1) of the Commission's draft has not been extended to cover the newly proposed Art.3a).Anyway, the possibility for rightholders to make some reservation risks to subordinate the legislative exception to some private will.On the positive side, the scope of such additional exception is much broader than the one proposed by the Commission as its beneficiaries go well beyond research institutions and, eventually, cultural heritage institutions.Furthermore, it is true that Art.3a mirrors the wording and the content of the temporary reproduction exception at Art. 5(1) InfoSoc Directive ("temporary reproduction", "form an integral part of the process"), but with one important difference: the economic independent significance is out of the picture.
The discussion within the European Parliament (EP) showed the genuine concerns by some EP members about the narrow scope of the TDM exception. 78However, many of the most 76 Defined at Art. 2(3), Draft Directive. 77The wording of the Recital is far from being clear.On the one hand, by recognising the importance of peer review and verification, the proposal seems to allow the retention of the copies made under the exception "in certain cases" (not specified).Such copies must "be stored in a secure environment and not be retained for longer than is necessary for the scientific research activities".The text leaves to the Member State the task to determine the concrete modalities for retaining the copies.However, the hard issue to determine is: when will the copies no longer be necessary?It is interesting to note a parallelism: the GDPR has ensured through several provisions that data should not be stored longer than necessary with respect to the purpose of the processing.However, scientific research is one of the cases that justifies the possibility to make an exception to that rule (see, Art. 5(1)(e), GDPR).Interestingly, the scientific research can trump storage limitations when the data subjects' interests are at stake, but not in the IP domain when data aggregators prerogatives are involved. 78See, for instance, amendments proposed to the text of Art. 3 of the Commission proposal: amendment 538 by Julia Reda, Nessa Childers, Max Andersson, Michel Reimon, Brando Benifei (deleting any reference to research organisation, research purposes and lawful access), amendment 539 by Jytte Guteland (extending TDM to cultural heritage institutions), amendments 546 and 547 (respectively encouraging and obliging Members States to allow research organisations, without lawful access to works and other subject-matter, to perform TDM), amendment 548 (protecting the mandatory TDM exception against TPMs) and amendments 551-555 (limiting the scope of the measures that the rightholder can adopt to ensure the security and integrity of the networks and databases where the works or other subject-matter are hosted), amendment 564 favourable amendments to TDM were not tabled in the version the Committee on Legal Affairs (JURI) presented to the EP plenary for the vote of July, 5. 79 The JURI report reflected the dual approach followed by the Council in the text adopted in May 2018, proposing a mandatory TDM exception in favour of research organisations for the purpose of scientific research, provided that they have lawful access to the works or other subjectmatter (Art.3), and an optional exception available to anyone as long as the rightholder has not reserved the use of works and other subject-matter in a machine readable format (Art.3a).Query what those reservations in a machine readable format are and how they could be implemented.
The main reservation we have with this part of the EP resolution on the copyright reform is that the TDM exceptions could still be overridden by contracts (Art.3a) and/or technical means (Arts.3 and 3a).
Furthermore, the JURI version added two twin provisions at both Art. 3 and Art.3(a), allowing Member States to introduce respectively mandatory or optional TDM exceptions in accordance with Art.5(3)(a) InfoSoc Directive, which refers to the teaching and scientific research exception for original works and other subject-matter.These provisions have been probably designed for preserving the TDM exceptions already adopted by some Members States (for instance, UK, Estonia, France, Germany).However, such legislative choice is likely to fragment the European legal framework for TDM, allowing the blossoming of diverse national solutions and increasing the divide between TDM acts over works and databases (since the possibility refers uniquely to the research exception under the InfoSoc Directive).
Finally, with reference to the boundaries of the copyright relevant activity in TDM processes, the EP version introduced a new specific provision at Recital 8a.Dealing with the technical aspects of TDM, such amendment states that TDM as mere "reading and analysis of digitally stored, normalised information" is not a copyright relevant act.Copyright may come into play only in case of reproductions or extractions linked to the access and process of "information normalisation".The latter is not expressly defined, but as it emerges from the same recital, it refers to the preparatory activities which enable the automated computational analysis, such as the change of the format of information or the extraction from a database into another one that will be subjected to TDM.The exceptions provided in the Draft Directive, therefore, will cover only these activities.Apparently, this recital seems to restrict the scope of the exception.On the contrary, it reaffirms that not all TDM's stages need a specific exception to be valid: many (mandating the adoption of open formats for publicly-funded research and data in order to enable TDM).The text of the amendments presented by the Member of the EP is available here: https://euractiv.eu/wpcontent/uploads/sites/2/2017/05/JURI-copyright-amendments.pdf. 79 of them (such as the analysis, the creation of patterns and the subsequent publication) can be done freely and without infringing any IP entitlement.
The European Parliament voted against the JURI version during the plenary of July 2018, postponing any further decision to the September's meeting.However, in September, the EP approved the TDM amendments that essentially reproduce the July's version. 80The Copyright reform is now in the trilogue's phase of discussions between the Commission, the Council and the EP.
It thus appears that the various texts on TDM available in September 2018 provide for the following exceptions: (and research in general!) is not exclusive of the academic circles. 82TDM could be used by policy makers to test draft policies and regulation, by journalists or private individuals for fact checking, by consumers and lawyers to automatically compare the terms of service of different platforms, just to give a few examples.Such entities and individuals will not be able to acquire additional knowledge and/or provide additional services and tools relying on the TDM insights.
Second, in many instances (even in academic research which is partly supported by private funding), the boundary between commercial and non-commercial research is not always easy to trace.Such a limitation is likely to undermine also the spirit of the initial goal of the Commission, anticipated in the Communication for a Digital Single Market Strategy, whose intention was to promote research innovation for both non-commercial and commercial purposes. 83ird, even if contractual limitations are pushed out of the door, private ordering can come back through the window, via TPMs.The proposal, in fact, does not grant any effective protection against TPMs as it is not clear there is a possibility to legally circumvent those that would unlawfully limit TDM. 84nally, even if the copyright reform will expand TDM boundaries, the main obstacle remains the ECJ Ryanair holding and the power of contracts in the absence of copyright or database protection.Of course, it does not make much sense to address this issue in an instrument, as the Draft Directive, dealing with intellectual property -as the problem is to secure some uses when there is no IP protection -but such issue risks never being addressed in another context either. 82Many scholars have argued that the exception should be broadened.For instance, Margoni and Kretschmer 2018;Caspers and Guibault 2016b;Margoni and Dore 2016;Hilty and Richter 2017;Geiger et al. 2018. 83 See above Communication, "A Digital Single Market Strategy for Europe", COM/2015/0192 final, p. 7: "Innovation in research for both non-commercial and commercial purposes, based on the use of text and data mining (e.g.copying of text and datasets in search of significant correlations or occurrences) may be hampered because of an unclear legal framework and divergent approaches at national level.The need for greater legal certainty to enable researchers and educational institutions to make wider use of copyright-protected material, including across borders, so that they can benefit from the potential of these technologies and from crossborder collaboration will be assessed, as with all parts of the copyright proposals in the light of its impact on all interested parties". 84Margoni and Kretschmer 2018; Caspers and Guibault 2016a.As acutely pointed out by Margoni and Kretschmer: "The EU legislature is fully aware of this contradiction but failed to address it properly.In fact, Art.6 of the Proposal ("common provisions") clarifies that the provisions of the first, third and fifth subparagraph of Art.6(4) InfoSoc directive apply.In plain English, this means that if a user qualifies for an exception to copyright (e.g.TDM) but a Technological Protection Measure prevents them from doing it, Member States have an obligation to take appropriate measures to ensure that right holders make available to the beneficiary an exception or limitation.In the almost 20 years since when the InfoSoc directive was enacted, the UKIPO, which has correctly put in place a specific procedure for this type of situations, has received less than a handful of requests".

Reconstructing the reproduction right: back to the roots
The TDM exception, if it is well designed at the end of the legislative process, will somewhat improve the situation but it is not likely to be the panacea.Some issues, like the one of SDSs that prompted our paper, will remain if the provision is overridable and not broad.It is probably too little considering the downside of the exercise: by searching for some legal certainty, guarded by a statutory exception, the EU legislator will confirm in black letter that TDM, even with the clarifications made by the EP in Recital 8a, is a copyright-relevant activity.The risk we are running is to give up on a crucial point, instead of embracing the fatigue of a serious discussion about the boundaries of the right of reproduction and copyright foundations. 85at copyright might extend to the TDM process does not appear legitimate: no author of literary or artistic production in the past has conceived her copyright as a way to limit the use of her work as a source of useful information, for instance for discerning fluctuations in interest in a particular subject or for determining fashionable expressions.No author, while producing a work, has seriously relied on the possibility of earning revenues from the derivative use related to searching and indexing a corpus that includes her work. 86Such use is far removed from the core exploitation field of most works.In addition, when it appeared more than two centuries ago, copyright was not only intended to remunerate authors (and publishers) for creating (and disseminating) works, but was also promoted to expand public learning.This is not only true for the US copyright system, as the same rationale is at the origin and core of the continental droit d'auteur system. 87en without an express exception, we can rely on the implicit requirement that the reproduction involves a use as a work.Such use as a work does not exist in the case of TDM, nor in other cases involving copying for deriving information or checking conduct (e.g., to identify plagiarism).As put by the Court in Authors Guild v. Google, Inc., 'the purpose of Google's copying of the original copyrighted books is to make available significant information about those books, permitting a searcher to identify those that contain a word or term of interest' (emphasis added). 88This purposive analysis of the act of copying strongly weighs in favour of fair, because highly transformative, use.In the EU, the requirement of use as a work can help to reach the same outcome.Indeed, when acts of reproduction are carried out for the purpose of search and TDM, the work is not used as a work, it only serves as a tool or data for deriving other relevant information.The expressive features of the work are not used, and there is no public to enjoy 85 These thoughts were firstly elaborated in Strowel 2018. 86 See also Poort 2018. 87 See the review of the history and principles of copyright in both legal traditions : Strowel 1993. More recently, Strowel 2014, pp. 701-703. 88 Authors Guild v. Google, Inc. No. 13-4829-cv (2d Cir. Oct. 16, 2015).On April 18, 2016, the Supreme Court denied the petition for a writ of certiorari, leaving the Second Circuit ruling in Google's favour intact.To make available parts of the corpus of books, Google has scanned the digital copies and established a publicly available search function, the ngrams tool.the work, as the work is only an input in a process for searching a corpus and identifying occurrences and possible trends or patterns.
Even if the EU Parliament and Council decide to include an exception for TDM that is broad enough (this would require some commercial research to be exempted), the analysis based on 'the use of the work as a work' condition for copyright infringement remains necessary to address other types of copying for the purpose of providing information, such as copies for checking mistakes and plagiarism, copies to use or repair a protected work with a utilitarian function, non-transitory copies made on proxy servers, smart disclosure systems and many other uses that cannot be anticipated. 89 said, the main reason behind the introduction of the TDM exception was the reduction of legal uncertainties and the diverging national implementations regarding the research exceptions.If we look at the texts to be discussed during the trilogue, such uncertainties remain, and more divergences are likely because of the optional nature of the TDM exception provided in favour of beneficiaries other than research organisations.

Prohibition of TDM in T&C
As already mentioned, contracts also can affect TDM.In Sections 3 and 4, we have pointed out that, in some cases, copyright exceptions cannot be limited via contractual means.However, the issue is not harmonised under the InfoSoc Directive and even when the exceptions are protected against the power of contract, the free room they provide might be annihilated by TPMs.Furthermore, according to the Ryanair ruling, the copyright (or IP) balance cannot be exported so as to apply when there is no copyright (IP), but only contracts between parties.
To give an idea of the scope of the issue, we analyse in this Section the occurrences of contractual clauses prohibiting TDM or otherwise restricting copyright exceptions.As explained in Section 3, we looked at the copyright and database protection provisions, since TDM usually involves a "reproduction" and/or an "extraction" of the subject-matter to mine. 9089 The following US decisions quoted in Authors Guild v. Google, Inc. have, for instance, exempted several uses that, without the application of the work use requirement, cannot escape copyright's exclusivity in the EU: A.V. ex rel.Vanderhye v. iParadigms, LLC, 562 F.3d 630, 638-640 (4th Cir.2009) (justifying as transformative fair use purpose the complete digital copying of a manuscript to determine whether the original included matter plagiarized from other works); Perfect 10, Inc. v. Amazon.com,Inc., 508 F.3d 1146, 1165 (9th  Cir.2007) (justifying as transformative fair use purpose the use of a digital, thumbnail copy of the original to provide an Internet pathway to the original); Kelly v. Arriba Soft Corp., 336 F.3d 811, 818-819 (9th Cir.2003)  (same); Bond v. Blum, 317 F.3d 385 (4th Cir.2003) (justifying as fair use purpose the copying of author's original unpublished autobiographical manuscript for the purpose of showing that he murdered his father and was an unfit custodian of his children). 90Caspers and Guibault 2016a; Stamatoudi 2016; Triaille et al. 2014.The analysis focused on the terms and conditions (T&C) of twenty-one online platforms, equally distributed among three sectors: mobility (carpooling and car sharing), accommodation (including services for sharing office space), food (i.e., initiatives for the recuperation of unsold or unused food, sharing or delivery of home-cooked meals, etc.).In the selected sample, we tried to ensure a good balance between platforms operating globally and locally and between large capitalistic and cooperative initiatives. 91If different contractual versions were available, we have consulted the T&C for the Belgian market.
As shown in Table 1, 20 out of 21 platforms published the T&C on their website and 14 of them contained specific intellectual property clauses, directly or indirectly, related to TDM activities.More specifically: four platforms expressly prohibit TDM on the website content 92 ; -Three others do not allow the use of any kind of bot, crawler or scraper (i.e., the automated software agents that search through the content of webpages.These are necessary tools for TDM, e.g. when there is no available application programming interface 93 ); -in four cases, the reproduction or copy of website materials -which is usually a preliminary step of TDM process -is forbidden; and in three occasions, the formulation was vague or broad enough to exclude TDM. 94is shows a trend toward a general contractual ban of TDM.The prohibition is broad and refers to all the website's contents and services, thus including the informative pages containing the legal conditions.Such provisions were inserted with a view to protect the data (in some ways valuable) related to the service (e.g. the timetable of flights, the prices, the list of accommodation and contact details of the owners, etc.) from free exploitation: their effect on T&C and privacy policy would be an involuntary side effect.However, this is not the case: the contractual provision is usually confirmed and embedded into the technical instructions of the website.
To check this, we used the robots.txtfile.It is an exclusion protocol that content providers can insert into the root directory to prevent crawling or indexing activities on certain 91 On the theoretical foundation of platform cooperativism, see Scholz 2016. 92Blablacar does not prohibit TDM on the whole content of the website, but only on a substantial part of it. 93On crawling and scraping see Caspers and Guibault 2016a, p. 8. 94 According to T&C of Bar d'Office, users cannot obtain (or attempt) to obtain any material or information through any means not intentionally made available by the platform.We shall conclude that third party applications aiming at analysing Bar d'office legal documents do not fall within the permitted uses.Wibee does not allow to "exploit in any way the content" but it has to be combined with the contractual provision that allows the use of the platform for non-commercial purposes only.Finally, Menu Next Door contained the broad, but vague, formulation "All rights reserved".pages of their website. 95Adding the extension "robots.txt" at the end of the address of the website is possible to see the underlying instructions. 96The latter can consist of two main commands: -User-agent: it shows at what robots the instructions are directed.If there is an asterisk ("User-agent: *"), this means that the section applies to all robots.-Disallow: it indicates what pages cannot be visited by the robots.
We found that three platforms are actually using the exclusion protocol to keep robots away from the whole server, one from a directory which contains amongst others the legal documents, and two specifically from the page of T&C and privacy policy (see the annexed Table 1, "Robots.txt"column).In all cases where the robots were disallowed, the T&C provisions prohibited TDM as well.Only in one case there was no TDM-related provision in the contract, but nevertheless the code instructions did not allow the indexing of the website content by a series of robots.However, it should be noticed that the robots excluded are those usually mentioned in the "black lists" of malicious programs.
The nature of robots.txtas a TPM is controversial.Looking at the definition of "technological measures" provided at Art. 6 of the InfoSoc Directive, the robots.txtcould fall under it.Indeed, the protocol is a kind of technology that "is designed to prevent or restrict acts, in respect of works or other subject-matter, which are not authorised by the rightholder of any copyright or any right related to copyright" 97 .However, the effectiveness of such technological measures is debatable.According to the Directive, it implies that "the use of a protected work or other subject-matter is controlled by the rightholders through application of an access control or protection process, such as encryption, scrambling or other transformation of the work or other subject-matter or a copy control mechanism, which achieves the protection objective". 98Some authors have argued that the protocol contains instructions that do not qualify as a technical barrier: any software agent can simply ignore the "Disallow" command without actively forcing any digital fence. 99The content provider has just to rely on the voluntary compliance of the user, hoping that the visiting agent has been designed to follow the ASCII syntax.In any case, even if the robots.txt is not considered as an effective technological measure or not, its use by the content providers and online platforms confirms their willingness to limit TDM, as stated in their T&C. 10095 On the origin and functioning of this file, see http://www.robotstxt.org/robotstxt.html. 96Rotenberg and Compañó 2009. 97Art.6(3), InfoSoc Directive. 98Ibidem.
99 Sire 2015.Contra, Groom 2004, according to which a visiting agent programmed to systematically ignore the robots.txtcan be seen as a strategy to circumvent a technological protection measure. 100In Europe, the robots.txtfile has been questioned with reference to the issue of implied license only.See, for instance, the Copiepresse v. Google saga, commented in Strowel 2007, 2011.Doubts about the classification of robot.txtas a TPM in the US context are expressed by Jasiewicz 2012.In the US case Healthcare Advocates, Inc. v. Harding (497 F. Supp.2d 627, 643 (E.D. Pa.2007)), the Court for the Eastern District of Pennsylvania How legitimate is such a general and absolute ban imposed in many T&C?The prohibition of TDM is likely to undermine the legitimate activity of consumers that could use smart disclosures mechanisms and instruments for the automatic analysis of contractual documents to better understand the terms of the user agreement.If a sort of "bionic eye" is available in order to scan a document and extract the relevant pre-contractual information, is it justified to prohibit its use?
As shown in the previous Sections, neither the current framework of copyright exceptions nor the new TDM exceptions under consideration by the Council and EP could apply to SDSs and be able to protect the rightful interests of the users.However, the principle of transparency, embedded in consumer and data protection legislations, could offer a last line of defence against an absolute prohibition of TDM.

7.
Transparency 2.0: a right to machine legibility Transparency is a cardinal principle in EU Law. 101In consumer and data protection legislations, information transparency is designed as a necessary feature of mandated disclosures, i.e. the obligation to provide one party (traditionally the weak one) with the information concerning the transaction.If the information is accessible, clear and understandable, mandated disclosures can effectively inform the consumer or the data subject about the essential content of the agreement and allow her to make an optimal decision or express a meaningful consent, where requested.In this sense, the substantive requirement (the duty to provide certain information) is complemented by formal requirements (the provision of the information in advance and the use of plain and intelligible language).
In consumer law, the principle of transparency has been traditionally interpreted as comprising two components: 102 1) the consumer has to be able to have knowledge of the terms before entering into contract; 103 2) the information has to be provided in a way that the average incidentally discussed the nature of the robots.txtfile.The judge recognised the protocol as a TPM under the DMCA in that specific case.However, the Court expressly affirmed that robots.txt is not "analogous to digital password protection or encryption" and its nature must be assessed case-by-case ("This finding should not be interpreted as a finding that a robots.txtfile universally qualifies as a technological measure that controls access to copyrighted works under the DMCA"). 101Buijze 2013.Loos 2015;Kästle-Lamparter 2018, pp. 429-430, 474 and481. See also, Micklitz et al. 2009, pp. 135 ff. 103 The transparency principle, as a duty to provide information before the conclusion of the contract, is envisaged in the Annex of the Unfair Terms Directive (UTD), which includes, among the list of potential unfair terms, the contractual provision which: "irrevocably binds the consumer to terms with which he had no real opportunity of becoming acquainted before the conclusion of the contract" (Annex, 1.i, UTD).It can also be derived by Recital 20, UTD.It is further recalled at Art. 6(1) of the Consumer Rights Directive (CRD).consumer can understand without a legal advice.The latter means that information must be legible and given in a plain and intelligible language. 104e principle of legibility, in particular, requires to take into consideration the font size, the layout and the accessibility of pre-contractual information.As pointed out by Micklitz et al.,  the possibility to actually read the text of the contract, i.e. the design of conditions "plainly both from an editing and optical point of view" 105 (no "small-print" for instance), must be seen as a corollary of intelligibility enshrined in the 93/13/EEC Unfair Terms in consumer contracts Directive ("UTD").
The principle of legibility has been expressly codified in the 2011/83/EU Consumer Rights Directive ("CRD") at Art. 7(1) for off-premises contracts and Art. 8 (1) for distance contracts, which both state that information shall be legible if provided on a durable medium.Interestingly, Kästle-Lamparter points out that: "Taken at a face value, the requirement of legibility excludes mere audio tapes or files: information must be provided in a human-readable format.But perhaps this was not intended and the provision should be rather read 'legible if text-based', or possibly 'legible or audible' ". 106 Furthermore, it should be noted that if we assume the additional application of Directive 2000/31/EC ("e-commerce Directive") when the contract is concluded with an online platform, Art.10(3) establishes that: "Contract terms and general conditions provided to the recipient must be made available in a way that allows him to store and reproduce them" [emphasis added].The latter could therefore constitute an additional argument to support the mining of T&C's text.
The rationale of consumer protection, the ECJ's interpretation of the principle of transparency but also the formal requirements set out in the e-commerce Directive are likely to 104 The principle of transparency, sub specie of understandability of the information provided to the consumer, is specifically mentioned in several legislative instruments.The duty to provide the consumer with information in a clear and comprehensible manner is recalled at Art. 5, UTD: "In the case of contracts where all or certain terms offered to the consumer are in writing, these terms must always be drafted in plain, intelligible language".Moreover, it is expressed at Arts 5(1), 6(1) CRD. and further expanded at Art. 8 CRD.In addition, when the contract is concluded "through a means of distance communication which allows limited space or time to display the information" (Art.8.4, CRD), like the screen of a mobile phone, the trader will have to provide at least a set of pre-contractual information, such as the main characteristics of the goods or services, the identity of the trader, the total price, the right of withdrawal, the duration of the contract and, if the contract is of indeterminate duration, the conditions for terminating the contract.Among the appropriate means to display information to the consumer, the Commission suggested the adoption of a set of icons, making also available a model.However, such a measure does not seem to have taken hold 105 Micklitz et al. 2009, p. 136. 106 Kästle-Lamparter 2018, p. 474.accommodate a broad understanding of legibility. 107In particular, to ensure the balance of interests embedded in the 1993 UTD and 2011 CRD, it will be coherent to frame the legibility requirements, adopting a technologically neutral approach.This functional interpretation is further confirmed in the GDPR.
The analogy with the transparency in the data protection framework, despite the distinct area of application and the slightly different terminology used, is justified as there is a koinè, if not an open dialogue, between these two legal branches. 108e GDPR has recently codified transparency as a pillar of data protection along the principles of lawfulness and fairness of the processing (Art.5.1.a,GDPR).As specified by the European Data Protection Board (EDPB): "Transparency is an overarching obligation under the GDPR applying to three central areas: (1) the provision of information to data subjects related to fair processing; (2) how data controllers communicate with data subjects in relation to their rights under the GDPR; and (3) how data controllers facilitate the exercise by data subjects of their rights" 109 .The principle of transparency echoes in many aspects the articulation shown in the consumer legislation.
The GDPR explicitly refers to legibility when it allows the use of standardised icons to complement the privacy notice.Icons shall give "in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing" 110 .The GDPR further specifies that "where the icons are presented electronically they shall be machine-readable" (Art.12.7, GDPR). 107When it has interpreted the plainness and intelligibility requirements, the ECJ has always excluded formalistic readings.In Kásler, for instance, the Court hold that the requirement of transparency of terms, under the UTD, cannot "be reduced merely to being formally and grammatically intelligible" (ECJ, C-26/13, Árpád Kásler and Hajnalka Káslerné Rábai v OTP Jelzálogbank Zrt.[2014]  110 Art.12.7 GDPR.See also Recital 60.The link between transparency, information and visualisation is further stressed at Recital 58, GDPR: "The principle of transparency requires that any information addressed to the public or to the data subject be concise, easily accessible and easy to understand, and that clear and plain language and, additionally, where appropriate, visualisation be used.Such information could be provided in electronic form, for example, when addressed to the public, through a website.This is of particular relevance in situations where the proliferation of actors and the technological complexity of practice make it difficult for the data subject to know and understand whether, by whom and for what purpose personal data relating to him or her are being collected, such as in the case of online advertising".This is a highly behaviourally-informed legal innovation, considering that more often online users derive information from icons and pictures. 111wever, if the GDPR is systematically interpreted, the principle of transparency emerges in several other loci.First of all, data controllers, i.e. the person or entity which determines the purposes and means of the processing of personal data, shall take appropriate measures to provide the information required by law (at Arts.13 and 14 GDPR) and any communication regarding the right of access, the use of automated individual decision-making, and personal data breach 112 : a) in a concise, transparent, intelligible and easily accessible form; b) using clear and plain language; c) provided in writing or by other means, including, where appropriate, by electronic means; d) provided orally, if requested so by the data subject.
Furthermore, where the processing is based on the data subject's consent, the request for it has to be presented in "a manner which is clearly distinguishable from the other matters, in an intelligible and easily accessible form, using clear and plain language" 113 .
The Guidelines on Transparency (2018) drafted by the EDPB offer a first reading of such requirements.If according to "concision and transparency", information has to be presented "efficiently and succinctly, in order to avoid information fatigue" 114 , the "easy accessibility" imposes to make information clearly visible, e.g. on the website.This means that data controllers have to actively furnish the information (or the way to find it), while it is not an obligation of the data subject to start a quest for retrieving information.For example, data controllers should provide information by giving it directly to data subjects, "by linking them to it, by clearly signposting it or as an answer to a natural language question" 115 .In addition, the duty to provide information in "writing or by other means" specifically recalls the CRD obligation to make information available to the consumer "in a way appropriate to the means of distance communication used" (Art.8.1, CRD).The EDPB emphasises that such a provision has to be interpreted broadly, allowing to choose the most appropriate means and format to reach the informative goal: for a privacy policy on websites, "digital layered privacy, but also 'just in time' contextual pop-up notices, 3D touch or hover-over notices, and privacy dashboards"116 can be used.
The Board further underlines the importance to consider the specific circumstances where the provision/communication occurs or how the interactions between the controller and the data subject happen.For instance, the EDPB warns that the electronic privacy notice on a website is likely to be ineffective for screenless IoT or smart devices: it could be preferable, for example, to include the privacy policy in the instruction manual or make it easier to access through a QR code printed on the device. 117 sum up, through the transparency principle, sub specie of obligation to provide concise and transparent, easy accessible information, in writing or other means, the GDPR recalls the conceptual background already seen in consumer law.Furthermore, the EDPB encourages the use of technological advancements to inform users in a more meaningful way.
If we look at these provisions systematically and at their interpretation by EDPB and the ECJ, they can operate as functional equivalents: both of them pursue the same rationale, i.e., protecting the weak party from information asymmetries, by requiring that information must be visible, accessible and readable.Therefore, if this is the policy goal, the principle of transparency enshrined in consumer and data protection laws is technologically neutral and can accommodate a right to "machine legibility".By machine legibility we mean the possibility for a SDS to have access to the precontractual information (T&C) and the information related to the processing (privacy policy) in a format processable by the smart system.If thanks to AI there are now instruments enhancing the human ability to read and understand contractual terms and privacy settings, not only information should be legible to a human eye, but also to the tool that a user can take advantage of.Therefore, either depending on IPRs, contractual or TPM limitations, an absolute prohibition of TDM on pre-contractual and privacy information available online on the website of the platform will unreasonably restrict a legitimate prerogative of the consumer or the data subject.
Furthermore, the solution will also be costless to the data controller/trader: the platform could allow TDM, by enabling the indexing of the corresponding page on their website through the robots protocol and lowering TPMs barriers, if any.

Conclusion
SDSs offer clear benefits for the consumers as they reduce information asymmetries and improve the access to data needed to take informed decisions.The possibility to deploy SDSs might however be prevented as there are technical and legal obstacles, including the prohibitions inserted in many T&C and other online documents of interest for the consumers.In this context, it is not clear whether TDM tools needed for the use of SDSs can be used.The existing copyright and database exceptions do not adequately tackle the TDM issue, and, as recognized by the European Commission, there is a need for introducing a TDM exception.However, neither the initial proposal by the European Commission focusing on the research context, nor the amendments discussed within the Council and the European Parliament appear sufficient to facilitate the use of TDM for improved smart disclosure and, more broadly, for AI applications.
Regarding SDSs, the transparency principle embedded in consumer and data protection rules offers some legal ground to justify the use of TDM for consumer empowerment.But the requirement of machine legibility that appears necessary in a society where the automatic treatment of information becomes central could be further promoted by a well-designed TDM exception.We are not yet there.

Fig. 1 .
Fig. 1.Representation of limits to TDM in the context of smart disclosure systems.
In the September 2016 proposal for a Directive on Copyright for the Digital Single Market ("Draft Copyright in the DSM Directive" or "Draft Directive"), the Commission therefore introduced a specific exception for TDM.The exception expressly applies to the acts of reproduction and extraction contemplated in Art. 2, InfoSoc Directive (right of reproduction), Art.5(a), Database Directive (temporary or permanent reproduction of the database), Art.7(1), Database Directive (extraction of the whole or a substantial part of it), Art.11(1) Draft Directive (right of reproduction and making available to the public, recognised to publishers of press publications).
(EC Commission, DG Justice Guidance document concerning Directive 2011/83/EU of the European Parliament and of the Council of 25 October 2011 on consumer rights, amending Council Directive 93/13/EEC and Directive 1999/44/EC of the European Parliament and of the Council and repealing Council Directive 85/577/EEC and Directive 97/7/EC of the European Parliament and of the Council, June 2014, available here: https://ec.europa.eu/info/sites/info/files/crd_guidance_en_0.pdf).
Committee on Legal Affairs, Report on the proposal for a directive of the European Parliament and of the Council and the Parliament texts are unsatisfactory to promote the development of Big Data analytics in Europe.The conditions on the objectives of TDM (notfor-profit research or research with a public interest goal) and the beneficiaries (research organisations and cultural heritage institutions) lead to a narrow exception.81First,because TDM 81In this sense, see the second sentence added at para. 1 of Art. 3, EP text.