Loading wiki pages...

Wiki Version:
<h1>D3.2 A specification of the scientific method and scientific communication</h1> <p><strong>Summary</strong> This deliverable aims at shedding some light on scientific peer review in the era of digital science, which in our view goes beyond reviewing scholarly literature. In the Digital Era not only the final outcome of the research process, i.e. the scientific publication, but potentially also other research products generated at other stages of the research workflow can be subject to review by peers. The adoption of ICT technologies in support of science introduces unprecedented benefits, which can be mainly identified in: (i) the ability of sharing an online “digital laboratory”, i.e. tools, applications, services used to perform science, and (ii) the ability of sharing research products used as input or produced in the context of a digital laboratory. An example of (i) may be RStudio, a desktop tool to run R scripts, made available for download from some Web repository, while an example of (ii) may be the specific R script created by a scientist as result of his/her research activity, made available to other researchers through the digital laboratory. Accordingly, scientists can not only publish literature describing their findings but also share the entities they used and that are required to repeat and reproduce science.</p> <p>Such innovative shift also sets the condition for novel peer review methodologies, as well as scientific reward policies, where scientific results can be transparently and objectively assessed via machine-assisted processes. In this deliverable we describe our vision of “research flow peer review” as a urgent and demanded practice, identifying related challenges, current solutions, and proposing future directions.</p> <p>@[toc]</p> <h1>Glossary</h1> <p><strong>Digital laboratory</strong> the subset of assets of an e-infrastructure needed to perform the activities of a specific research flow; its content may change over time (addition or deletion of digital objects) but at any point in time its content should provide all that is needed in order to repeat the activities of the research flow</p> <p><strong>e-infrastructure</strong> a virtual space of assets providing all the functionality and the digital objects needed to perform research in a specific discipline; the assets in the e-infrastructure (e.g. textual descriptions, datasets, tools, services, standards) may come from different physical research infrastructure and are (usually) defined by the research community of that discipline</p> <p><strong>Research activity</strong> in any discipline, the activities performed to answer a “research question”, usually formulated as one or more hypotheses to be proved true through the research activity</p> <p><strong>Research experiment</strong> a sequence of research steps that represent a major action in the research flow; usually is a goal-driven sequence of steps set to verify intermediate hypothesis and whose results may inspire further experiments to address the target of the overarching research activity</p> <p><strong>Research flow</strong> the sequence of actions performed while carrying out a research activity</p> <p><strong>Research flow template</strong> a precise (formal) description of an “abstract” research flow, defined by the research community of the specific research field in which the template should be used; it embodies the best practices and standards of the research community and provides a description of the experiments and steps to be executed to perform a research activity in that field, together with the assets from the digital laboratory needed at each step; it provides also a description of the research products to be made available for peer review, together with the steps at which they should be made available to the research community</p> <p><strong>Research product</strong> the digital objects produced as a result of the research activity; they can represent the final outcome of the research activity, or can be the output of intermediate experiments or steps, possibly to be used in subsequent steps; we can identify several categories of research products, such as literature, datasets, computational products (programs and tools), formal descriptions (possibly machine executable) of steps, experiments or the whole research flow</p> <p><strong>Research step</strong> an action in the research flow that (usually) cannot be performed (or is not convenient to perform) with a sequence of smaller actions; this notion of “atomic” action is clearly dependent on the research field</p> <h2>Science’s digital shift and impact on Open Science</h2> <p>An increasing number of researchers conduct their research adopting ICT tools for the production and processing of research products. In the last decade, research infrastructures (organizational and technological facilities supporting research activities) are investing in “e-infrastructures” that leverage ICT tools, services, guidelines and policies to support the digital practices of their community of researchers. To find an analogy with traditional science, e-infrastructures are the place where researchers can grow and define the boundaries of their <em>digital laboratories</em>, i.e. the subset of assets they use to run an experiment. Researchers run their digital <em>experiments</em> (e.g. simulations, data analysis) taking advantage of the digital laboratory assets and generate new <em>research data</em> and <em>computational products</em> (e.g. software, R algorithms, computational workflows) that can be shared with other researchers of the same community, to be discovered, accessed and reused.</p> <p>The role of digital laboratories is therefore twofold: on the one hand they support researchers in their advancement of science, offering the facilities needed for their daily activities; on the other hand, they foster the dissemination of research within the research community, supporting discovery, access to, sharing, and reuse of digital research products. In fact, their digital nature offers unprecedented opportunities for scientists, who can share not only scientific literature describing their findings, but also the digital results that they managed to produce, together with the digital laboratory itself. Those features are fundamental for an effective implementation of the Open Science (OS) paradigm [<a href="#5k9wp98kz729" rel="nofollow"><em>1</em></a>,<a href="#bn5l8dvetga" rel="nofollow"><em>2</em></a>]. OS is a set of practices of science, advocated by all scientific/scholarly communication stakeholders (i.e., research funders, research and academic organisations, and researchers), according to which the research activities and all the products they generate should be freely available, under terms that enable their findability, accessibility, re-use, and re-distribution [<a href="#elc5nmqrlust" rel="nofollow"><em>3</em></a>]. The effects of Open Science are mainly the following:</p> <ul> <li>Reproduce research activities: let other users reproduce the experiments of a research activity;</li> <li>Transparently assess research activities: evaluate findings based on the ability to repeat science, but also on the quality of the individual products of science, i.e. literature, research data, computational products and experiments.</li> </ul> <p>If supported with adequate degrees of openness, scientists may find the conditions to <em>repeat</em> (“same research activity, same laboratory”), <em>replicate</em> (“same research activity, different laboratory”), <em>reproduce</em> (“same research activity, different input parameters”), or <em>re-use</em> (“using a product of a research activity into another research activity”) the research activities, thereby maximizing transparency and exploitation of scientific findings [<a href="#kio2s7rtbr66" rel="nofollow"><em>4</em></a>].</p> <h2>Research Flow in the Digital Era</h2> <p>A <em>research activity</em> is carried on as a sequence of actions constituting the <em>research flow</em>. The research flow is made of a number of <em>experiments</em>, realized as sequence of steps in the context of a <em>digital laboratory</em>, executed by scientists driven by the ultimate intent of proving an initial scientific thesis. In the following we shall refer to:</p> <ul> <li> <p>An <em>experiment</em> is defined in the following as a goal-driven sequence of steps set to verify a thesis, and whose result may inspire further experiments to address the target of the overarching research activity.</p> </li> <li> <p>A <em>digital laboratory</em> can be defined as a pool of digital assets (e.g. on-line tools, desktop tools, methodologies, standards) used by scientists to perform the steps of an experiment and generate research products.</p> </li> <li> <p>A <em>research product</em> is defined here as any digital object generated during the research flow that was relevant to complete the research activity and (possibly) relevant for its interpretation once the research activity has been completed. Products are digital objects, whose human consumption depends on computer programs; they are concrete items that can be discovered, accessed, and possibly re-used under given access rights. Examples are datasets in a data repository (e.g. sea observations in the PANGAEA repository), but also entries in domain databases (e.g. proteins in UNIPROT), software (e.g. models implemented as R algorithms in GitHub), and of course the scientific article, reporting about the findings of a research activity.</p> </li> </ul> <p>A research activity may therefore generate a number of research products, which represent the digital tangible results of the research activity and enable scientists to draw their conclusions. Indeed, several “intermediate” products are generated on the way to the end, e.g. input and outputs of unsuccessful experiments, versions of the final products to be refined. A research activity can therefore be described by a <em>research flow</em>, pictured in figure 1, as a sequence of steps <em>S1...Sn,</em> potentially grouped into <em>experiments</em>, carried out in the frame of a digital laboratory. More specifically, each step <em>Si</em> of a research flow is in turn a sequence of actions enacted by humans, possibly by means of digital laboratory assets, that may require or produce (intermediate) research products. Clearly, some (or all) of the research products generated during the research flow may become, at some point in time, new assets of a digital laboratory. According to this scenario, in the simplest case of theoretical sciences, the research flow might be constituted by one experiment consisting of one step of “formulation of hypothesis” and one step of “thinking”, whose final product is a scientific article. In a slightly more complex scenario, a research flow may be composed by one experiment, whose steps include data collection, data processing, and result analysis, with a final last step producing the article and the research data output to be published.</p> <p><img alt="Figure 1. The research flow" src="https://mfr.osf.io/export?url=https://osf.io/59tzu/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 1. The research flow</em></p> <h2>Peer-reviewing the research flow</h2> <p>As stated before, in the digital science era, the ability to share research products, in combination with digital laboratories, opens the way to Open Science principles. According to these principles, science should be open not only once that it is concluded, but also while it is being performed. In other words, scientists should, as much as possible, make their methodologies, thinking and findings available to enable/maximize collaboration and reuse by the community. The digital laboratory becomes therefore the core of this vision as it is the place providing the assets needed by the researchers to implement their research flow and at the same time the place providing the generated research products, for sharing and peer-reviewing. For example, scientists performing analysis of data using R scripts, may use a digital laboratory equipped with the software RStudio offered as-a-service by an online provider (e.g. <a href="http://www.bluebridge-vres.eu/" rel="nofollow"><em>BlueBridge e-infrastructure</em></a> powered by <a href="https://www.d4science.org/" rel="nofollow"><em>D4Science</em></a>) and a repository where they can store/share their R scripts and their input and output datasets (e.g. <a href="https://zenodo.org" rel="nofollow"><em>Zenodo.org</em></a>).</p> <p>Depending on the technological advances introduced and adopted by the research community in the digital laboratory, scientists may generate products whose goal is not just sharing “findings” but also sharing “methodologies”. Methodology products are digital objects encoding experiments or the research flow itself (see Figure 2). As such they are generated to model the actions performed by the scientists and enable their machine-assisted repetition. The availability of research products at various stages of the research flow makes it possible to introduce peer review stages during the on-going research project. Specifically, depending on the kind of products made available, different degrees of peer review may be reached, to support manual but also machine-supported reproducibility and consequently enforce more transparent and objective research flow peer review practices:</p> <ul> <li><strong>Manual reproducibility</strong>: the digital laboratory generates:<ul> <li><em>Literature</em>, defined as narrative descriptions of research activities (e.g. scientific article, book, documentation);</li> <li><em>Datasets</em>, defined as digital objects “used as evidence of phenomena for the purpose of research or scholarship” [<a href="#usjlz7ins46m" rel="nofollow"><em>6</em></a>];</li> <li><em>Computational products</em> (e.g. software, tools), intended as digital objects encoding business logic/algorithms to perform computational reasoning over data;</li> </ul> </li> </ul> <p>Reviewers are provided with the products generated by a research flow, whose steps are reported in an article together with references to the digital laboratory. Reproducibility and research flow assessment strongly depends on humans, both in the way the research flow is described and in the ability of the reviewers, and in general of other researchers, to repeat the same actions.</p> <ul> <li><strong>Machine reproducibility of experiments:</strong> the digital laboratory generates literature, datasets and computational products together with<ul> <li><em>Experiments</em>, intended as executable digital objects encoding a sequence of actions (e.g. a <em>methodology</em>) that make use of digital laboratory assets to deliver research products.</li> </ul> </li> </ul> <p>Reviewers are provided with an experiment, inclusive of products and digital assets. Reproducibility can be objectively supported by a machine and finally evaluated, but the assessment of methodology as a whole still depends on humans.</p> <ul> <li><strong>Machine reproducibility of research flows</strong>: the digital laboratory generates literature, datasets, computational products, experiments together with<ul> <li><em>Research flows</em>, intended as digital objects encoding a flow, inclusive of experiments, intermediate, and final products, and their relationships; the research flow may be encoded as a sharable and possibly reproducible digital product.</li> </ul> </li> </ul> <p>Reviewers are provided with technology to reproduce experiments and research flows. In this scenario, human judgment is supported by machines, which can provide a higher degree of transparency.</p> <p><img alt="Figure 2 - Entities of research flow" src="https://mfr.osf.io/export?url=https://osf.io/mb3vq/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <p><em>Figure 2 - Entities of research flow</em></p> <p><strong>Research flow and peer review: current practices</strong></p> <p>Researchers today tend to make a clear distinction between the phase of research activity and the phase of research publishing. During the former scientists perform science, during the latter scientists publish their final results. According to the OpenUP survey [<a href="#hyy13omu5w88" rel="nofollow"><em>7</em></a>] conducted between 20 January and 23 February 2017, more than a half of the 883 respondents (62% on average) confirmed the general trend of sharing and disseminating their results after the conclusion of the research. Trends were similar across countries, disciplines, organisation types and gender (see Figure 3). Research publishing is generally intended as the moment in which researchers are sharing their findings with the broader community of all researchers, hence also the moment from which the peer review of the research flow starts. With reference to the research flow scenario depicted in figure 1, it is as if every research flow would include a concluding step of “publishing” (see Figure 4), where researchers select all products generated at intermediate steps that are worth publishing and share them with “the world” to start the peer-reviewing process.</p> <p><img alt="Figure 3. Result from the OpenUP Survey: The start of dissemination activities" src="https://mfr.osf.io/export?url=https://osf.io/drsta/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 3. Result from the OpenUP Survey: The start of dissemination activities</em></p> <p><img alt="Figure 4. Publishing a research flow today: post research activities" src="https://mfr.osf.io/export?url=https://osf.io/xacz6/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 4. Publishing a research flow today: post research activities</em></p> <p>In the following sections of the report we shall present the two main trends in publishing a research flow today: enabling manual or machine supported peer-review. The first trend is concerned with literature, using the articles as the sole means for sharing research flow details. The second trend considers publishing other products together with the article, possibly with references to the digital library assets. The differences between the two is in the fact that the former is easier to uptake but provides lower degree of transparent and objective peer-review, while the latter enables (partially) automated peer-review, but requires the definition of an e-infrastructure and the digital laboratory related to the research flow, which may impose a change of behavior and imply a non-trivial learning curve.</p> <h2>Manual reproducibility and peer review: State of the art</h2> <h3>Current approaches based on scientific literature</h3> <p>Traditionally, the peer review of the research flow has been delegated to scientific literature (e.g., articles, books, technical reports, PhD theses) which is still (and likely always will, since narration is crucial for understanding) regarded as the common omni-comprehensive unit of scientific dissemination. Literature addresses reproducibility and assessment of research flows by explaining and describing the relative steps, the experimental (or digital) laboratory where it was conducted (i.e., methodology, tools, standards), describing any product of science used or yielded by the activity, and facilitating reproducibility by a detailed, theoretically unambiguous, description of the experiments (Fig. 5). To make this process less ambiguous, and to highlight the importance of repeatability and reproducibility, some journals mandate a dedicated section in the paper, often called “Methodology” section. The description of the research flow is clearly separated from the other sections of the paper, supporting reviewers and readers in understanding which steps have been performed, in which sequence, and, for each step, the adopted laboratory assets and the generated research products.</p> <p><img alt="Figure 5. Traditional peer-review: the research flow is peer reviewed by peer-reviewing its textual description, after the final results are obtained" src="https://mfr.osf.io/export?url=https://osf.io/eahn5/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 5. Traditional peer-review: the research flow is peer reviewed by peer-reviewing its textual description, after the final results are obtained</em></p> <p>However, a natural language description of a methodology can be interpreted in different ways and typically does not include all the details that are needed in order to replicate the experiment or reproduce the results. In addition, it has been found [<a href="#w9wvcwvapoff" rel="nofollow"><em>8</em></a>, <a href="#kix.2s3f26jrwtnx" rel="nofollow"><em>9</em></a>, <a href="#8uia5l17fhp4" rel="nofollow"><em>10</em></a>] that “Methodology” sections of the papers often include generic sentences, and lack of details that were necessary to attempt the reproduction of the study. To overcome this issue, the <a href="https://cos.io/" rel="nofollow"><em>Centre for Open Science</em></a>, in collaboration with more than 3,000 journals, is trying out an approach to improve accuracy and unambiguity in the descriptions of research flows in papers. The approach is based on the concept of “registered reports”, documents that describe the research flow and that are submitted to the journal before the research starts (“pre-registration”) [<a href="#gi55cn6w3gq" rel="nofollow"><em>11</em></a>] (see Figure 6). Registered reports can be considered as a “Methodology” section of a paper, but more structured, detailed and separated from the paper.</p> <p>The idea behind registered reports is that if a researcher prepares a detailed study design and shares it before he/she actually collects and analyses the data, the possible biases are minimized in the phases of data collection, analysis, and results reporting in the final paper. Reviewers are invited to review registered reports and assess the methodology and/or protocol described therein. Based on such assessment, editors can decide to “pre-accept” the final paper, regardless of the actual results that will be obtained. Registered reports have two main positive consequences: the first is that they can stimulate feedback from peer reviewers and the researchers can receive suggestions on how to improve their methodology before they start their investigations. The second main benefit is that if the research is “pre-accepted”, researchers know that they will be published also if they get negative results, hence they are keener to write an “honest” final paper, describing both what went right and what went wrong. Registered reports cannot be adopted in any field of science: the Centre for Open Science is conducting its pilot for research involving statistical analysis, which seem to be the type of studies that can benefit the most from this approach. As of May 2017, the Open Science Framework hosts about 153,000 registered reports (<a href="https://osf.io/registries/" rel="nofollow"><em>https://osf.io/registries/</em></a>.</p> <p><img alt="Figure 6. With registered reports motivations and design of the research flow are peer reviewed before the final results are obtained" src="https://mfr.osf.io/export?url=https://osf.io/j57mb/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 6. With registered reports motivations and design of the research flow are peer reviewed before the final results are obtained</em></p> <p>To conclude, if literature is certainly the most common way to make a research flow sharable, since other scientists can discover and read about somebody else’s methods, protocols, and findings, it generally fails at ensuring transparent evaluation of research flows. Indeed, as it cannot provide effective access to all products generated during the research flow, reproducibility is typically up to scientists and their ability to restore the original digital laboratory, find the necessary products, and perform the experiments as described in the article text. The inability to “objectively” reproduce science jeopardises effective peer-review, which for literature is generally biased by: authors’ decisions (e.g. what to describe in the text), reviewers’ decisions (e.g. trust in the author’s statements), and community practices (e.g. de facto standards in how to describe a research activity).</p> <p><strong>Table 1. Manual reproducibility and peer review via scientific literature</strong></p> <p><img alt="Table 1. Manual reproducibility and peer review via scientific literature" src="https://mfr.osf.io/export?url=https://osf.io/5e7bf/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <h3>Current approaches based on scientific literature with links to digital products</h3> <p>Scientists typically generate a number of research products while they are carrying out their research, but in several cases those are kept out of the scientific communication chain. As a consequence, part of the work of researchers is hidden from their peers, who end up with a partial view and can only see “the tip of the iceberg” provided by scientific literature.</p> <p>A common approach adopted today across several disciplines, demanded and inspired by communities and funders pushing for Open Science, is that of publishing articles together with links to other digital research products, deposited in dedicated repositories. In the majority of cases (e.g. Open Data Pilot), literature links to <em>datasets</em>, although some cutting-edge research communities are experimenting with links to <em>computational products</em> (e.g. software, scientific workflows), <em>experiments</em> and <em>methodologies</em>. This trend is confirmed by the outcome of the OpenUP survey (table 2) [<a href="#hyy13omu5w88" rel="nofollow"><em>7</em></a>], according to which datasets, software and IT tools are the most important research products after the traditional literature products. Protocols and methodologies are particularly important research products for the medical sciences.</p> <p>In the following sections we shall present these approaches, commenting on their advantages and disadvantages in terms of how they address the challenge of research flow peer review. A summary is presented in tables 3 and 4.</p> <p><strong>Table 2. The importance of the research outputs for OpenUp survey respondents: percentages show a share of respondents who chose ‘very important’ and ‘somewhat important’ answer categories</strong></p> <p><img alt="Table 2. The importance of the research outputs for OpenUp survey respondents: percentages show a share of respondents who chose ‘very important’ and ‘somewhat important’ answer categories" src="https://mfr.osf.io/export?url=https://osf.io/9vx5k/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <p><strong><em>Research data</em></strong> <strong><em>Research data</em></strong> The trend of publishing datasets for the purpose of making them citable from the literature (Fig. 7) is supported by a growing number of data repositories and archives that assign unique, persistent identifiers to the deposited datasets and apply the FAIR principles [<a href="#96t0b5p3u4li" rel="nofollow"><em>12</em></a>, <a href="#ee4hay12ohi9" rel="nofollow"><em>13</em></a>] (data should be Findable, Accessible, Interoperable and Re-usable). Relevant examples are <a href="https://zenodo.org/" rel="nofollow"><em>Zenodo</em></a> and <a href="https://figshare.com/" rel="nofollow"><em>figshare</em></a> (cross-discipline and allow deposition of products of any type), <a href="http://datadryad.org/" rel="nofollow"><em>DRYAD</em></a> (mostly life science), <a href="https://www.pangaea.de/" rel="nofollow"><em>PANGAEA</em></a> (earth & environmental science), <a href="https://archaeologydataservice.ac.uk/" rel="nofollow"><em>Archeology Data Service</em></a> (archeology), <a href="https://dans.knaw.nl/en" rel="nofollow"><em>DANS</em></a> (multi-discipline, mostly humanities and social sciences).</p> <p>Results of the OpenUP survey [<a href="#hyy13omu5w88" rel="nofollow"><em>7</em></a>] show that researchers are in favour of open peer review of data (71%), although it is not a current practice for most data repositories. In the majority of cases, curators of data repositories perform technical checks to ensure the readability of data and the compliance of the submission to a set of defined guidelines or policies. These mainly address technical aspects of the datasets, such as file formats, documentation (e.g. README files, availability of a description) and metadata [<a href="#4zfe71z5po1z" rel="nofollow"><em>14</em></a>, <a href="#rv7o6gfz4h0w" rel="nofollow"><em>15</em></a>].</p> <p>Among the aforementioned repositories, DRYAD and PANGAEA have the most advanced data review processes. According to <a href="http://datadryad.org/pages/policies" rel="nofollow"><em>DRYAD Terms of services</em></a>, before a dataset is published, DRYAD curators verify that the deposited data is readable and that the depositor has provided technically correct metadata about the datasets, its licensing conditions and links to the related scientific publications. The <a href="https://wiki.pangaea.de/wiki/Data_submission" rel="nofollow"><em>review process implemented by PANGAEA</em></a> consists of three main phases. First, PANGAEA editors and curators check the data for consistency, completeness and compliance to standards. The second phase is an optional phase that may be requested by the editors to better document the datasets by providing the definitions of the parameters included in the dataset. Finally, editors and curators ensure that the files of the datasets are in the proper format to be ingested into the PANGAEA system.</p> <p>The main focus of the data review processes implemented in data repositories is on checking technical details of the datasets, validating their metadata and, possibly, their descriptions, without addressing the scientific value of the dataset. In fact, data review is not performed by “peers”, but by editors and curators who are not necessarily researchers in the same field of the depositors. Likely, data reviewers are instead expert of data management, archiving, and data preservation.</p> <p><img alt="Figure 7. Final research data is made accessible via a data repository or archive." src="https://mfr.osf.io/export?url=https://osf.io/qrcw8/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 7. Final research data is made accessible via a data repository or archive.</em></p> <p>A different approach to data peer review is adopted by data journals (Fig. 8). Data journals publish data papers, i.e. papers describing datasets in terms of content, provenance, and foreseen usage.</p> <p>Data journals inherited the peer-review process from traditional journals of scientific literature and apply it, with slight changes, to the data papers. The survey conducted in 2015 by Candela et al. [<a href="#alx3alkptcz" rel="nofollow"><em>16</em></a>] observed that in the majority of cases the review policies of data papers were the same of traditional papers and that only some data journals had strategies to capture the specificity of data and data papers. The survey was extended in 2017 by Carpenter [<a href="#gk5q8y1szp9j" rel="nofollow"><em>17</em></a>], who confirmed the results of the previous survey and highlighted that the peer review of data papers is mostly focused on the peer review of metadata, rather than on the data itself, confirming the importance of documentation and metadata descriptions that will facilitate data re-use [<a href="#rv7o6gfz4h0w" rel="nofollow"><em>15</em></a>].</p> <p>With the existing approaches, the reproducibility of a dataset (when applicable, as some datasets cannot be reproduced, such as those generated by devices for atmospheric measurements) is not considered an important aspect of data (peer) review, although reproducibility is crucial to demonstrate the correctness of data and its analysis, upon which researchers’ conclusions are based.</p> <p><img alt="Figure 8. In data papers the phases of data collection and processing are described and peer reviewed. The asset of the digital laboratory used for processing is referred, possibly with its configuration details. The final research data is made accessible via a data repository or archive." src="https://mfr.osf.io/export?url=https://osf.io/5tpgh/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 8. In data papers the phases of data collection and processing are described and peer reviewed. The asset of the digital laboratory used for processing is referred, possibly with its configuration details. The final research data is made accessible via a data repository or archive.</em></p> <p><strong>Table 3. Manual reproducibility and peer review of research data</strong></p> <p><img alt="Table 3. Manual reproducibility and peer review of research data" src="https://mfr.osf.io/export?url=https://osf.io/wx7gf/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <p><strong><em>Computational research products</em></strong></p> <p>Today, the publishing of computational products is typically performed via tools and services that are not meant for scholarly communication but that implement general patterns for collaboration and sharing of computational products. Examples are software repositories (or Version Control Systems (VCSs)) with their hosting services like Github and language-specific repositories like CRAN (The Comprehensive R Archive Network), the Python Package Index, and CPAN.</p> <p>Unlike data repositories, software repositories usually do not assign persistent identifiers and literature products refer to software via URL. To overcome the issue of unstable URLs, data repositories like Zenodo and Datacite also allow to deposit computational products and get a persistent, citable identifier.</p> <p><em>Github, Zenodo, figshare</em></p> <p>Github is currently the most popular online software repository. Being a generic software repository, GitHub does not define any policy for (research) software deposition, although it encourages good software development practices and its user interface supports easy communication and collaboration among users.</p> <p>Recently, researchers started using it for sharing their research software. Zenodo, figshare and GitHub intercepted their need of having a citable product and started a partnership for the assignment of DOIs to software releases. Zenodo and figshare are “catch-all” repositories for research, where users can deposit research products of different types: literature, datasets, software, presentations, lessons, videos, images, and software. In May 2017, 15,358 research software products have been deposited in Zenodo.</p> <p>Language-specific repositories are meant to help users, researchers included, in finding common and high quality software to use. In some cases, researchers may deposit the computational products produced during their research flows, but still this is not a current practice. Having a repository for computational products integrated with the scholarly communication chain like Zenodo could change the habits of researchers and foster sharing and re-use of research software.</p> <p>In the following, a brief description of the review and submission policies of a selection of language-specific software repositories relevant for the research community is given.</p> <p><em>CRAN</em> <a href="https://cran.r-project.org/index.html" rel="nofollow"><em>CRAN</em></a> is the Comprehensive R Archive Network hosting the R package repository. Deposition of software in the repository is subject to policies[^1] that address credit, legal, technical and documentation concerns.</p> <p><em>PyPI</em> <a href="https://pypi.python.org/pypi" rel="nofollow"><em>The Python Package Index (PyPI)</em></a> is a repository for software packages written in Python language. <a href="https://wiki.python.org/moin/CheeseShopTutorial#Submitting_Packages_to_the_Package_Index" rel="nofollow"><em>PPI policies</em></a> for submission to the repository address the code structure, the description of the package (metadata and documentation) and licensing. In order to ensure that the package can be properly ingested and archived, users can first test the ingestion process on a <a href="https://testpypi.python.org/pypi" rel="nofollow"><em>test site</em></a>.</p> <p><em>CPAN</em> <a href="http://www.cpan.org/" rel="nofollow"><em>CPAN</em></a> is a software repository for Perl modules. Its <a href="http://www.cpan.org/scripts/submitting.html" rel="nofollow"><em>submission policies</em></a> address technical features of the modules and define the basic metadata fields to be provided. To support Perl users in respecting the CPAN policies and to ensure high quality of the submitted modules, a dedicated web site has been set up: the <a href="http://prepan.org/info" rel="nofollow"><em>PrePAN web site</em></a>. Pre-PAN is composed of a software repository with forum capabilities where users can submit their module for review by other Perl programmers.</p> <p>The presence of links from literature to computational products (see Figure 9) is an important step forward toward a complete description of the scholarly record and research transparency. However, as in the case of datasets, their availability is not a sufficient condition to ensure that the research flow can be effectively assessed and reproduced because a big portion of researchers’ work is still hidden: the “tip of the iceberg” is bigger, but it is still only the “tip”.</p> <p><img alt="Figure 9. Research software realised during the data collection, processing and analysis phases is made accessible via a software repository and citable via a persistent identifier." src="https://mfr.osf.io/export?url=https://osf.io/24x7e/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 9. Research software realised during the data collection, processing and analysis phases is made accessible via a software repository and citable via a persistent identifier.</em></p> <p><strong>Table 4. Manual reproducibility and peer review of computational products</strong></p> <p><img alt="Table 4. Manual reproducibility and peer review of computational products" src="https://mfr.osf.io/export?url=https://osf.io/jbpq9/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <h3>Remarks</h3> <p>In summary, reviewers assess research flows based on literature and the availability of datasets and (or) computational products. The final judgment is still far from being transparent as reviewers are not in the condition of repeating or reproducing the research flow. In the digital era, research data is generated, collected, manipulated and analysed by means of digital ICT tools. Also, very often researchers implement their own research computational tools (i.e. computational research product) to perform data processing. As mentioned above, common knowledge about these tools is fundamental for data processing to be properly assessed by reviewers and reproduced by other researchers. In all current approaches described above, researchers describe the methodology and the process in a paper (e.g. an algorithm in pseudo-code) so that reviewers can at least assess the logics of the process and readers could, at least in theory, re-implement it. Interestingly, depending on what is described in the paper, the review may not be related with the research flow as a whole, but rather concerned with the synthesis of the scientific conclusions.</p> <h2>Machine reproducibility and peer review: state of the art</h2> <p>In order to proceed on the road of Open Science, research in information science has started to explore and conceive solutions that focus on generating research products whose purpose is sharing “methodologies” rather than “final results”. Such products are digital encodings, executable by machines to reproduce the steps of an experiment or an entire research flow. They factor out from the scientific article the concept of experiment and research flow, making it a tangible, machine-processable and shareable result of science.</p> <h3>Experiments</h3> <p>The availability of research data and computational products increases the chances of reproducing an experiment described in scientific literature, but still requires considerable human intervention. In order to reuse and fairly review scientific findings, researchers and reviewers should be equipped with the products and tools necessary to encode, run, share and replicate experiments [<a href="#y4jrrxu2zyjf" rel="nofollow"><em>5</em></a>]. In this respect, research in the area of data models and information systems for scholarly communication has focused on the following aspects:</p> <ul> <li> <p><em>Data (information) models</em> for the representation of digital products encoding experiments : such models try to capture the essence of an experiment as a sequence of steps, at different levels of sophistication, e.g. human actionable steps, engine-executable processes made of service calls, packages of files to be unwrapped and executed manually by a researcher;</p> </li> <li> <p><em>Tools for generating experiment products</em>: experiment products include details of the digital laboratory (e.g. service or hardware configuration) and the input/output of each step at the moment of the experiment; in most scenarios such parameters are tedious to collect and in general we cannot expect the researcher to manually compile an experiment product; digital laboratories should be equipped with tools capable of creating a snapshot of the overall setting characterizing the experiment in order to make it shareable;</p> </li> <li> <p><em>Tools for executing experiment products</em>: the researchers running an experiment, as well as those willing to reuse it once it is published, should be equipped with tools for its execution and test.</p> </li> <li> <p><em>Scholarly communication practices</em> to share, cite, evaluate, and assign scientific reward for such products (i.e. author credit): scholarly communication must face the challenges required to enable sharing and scientific reward for any new scholarly object entering its ecosystem. Scientist spend their brainpower, energies, and funds to generate objects, and all stakeholders involved (including organizations and funders) demand return of investment.</p> </li> </ul> <p>Relevant examples of information systems for experiment publishing are <a href="https://www.protocols.io" rel="nofollow"><em>protocols.io</em></a>, <a href="https://www.ebi.ac.uk/arrayexpress/" rel="nofollow"><em>ArrayExpress</em></a> and <a href="https://www.myexperiment.org" rel="nofollow"><em>myExperiment</em></a> (see table 5 for a summary).</p> <p><em><a href="http://Protocols.io" rel="nofollow">Protocols.io</a></em></p> <p><a href="http://protocols.io" rel="nofollow">protocols.io</a> is an open access repository of scientific protocols where end-users can deposit, find, comment and modify protocols describing an experiment (wetlab, computational, or mixed). The idea behind <a href="http://protocols.io" rel="nofollow">protocols.io</a> is similar to the one of registered reports: the methodology and protocol of an experiment is separated from the description of the results. Protocols deposited in <a href="http://protocols.io" rel="nofollow">protocols.io</a> can be considered a form of structured and executable registered report that, optionally, can refer to datasets and the services needed to download and process them. The main difference between registered reports and <a href="http://protocols.io" rel="nofollow">protocols.io</a> is that the first are literature products, while in <a href="http://protocols.io" rel="nofollow">protocols.io</a> the experiments are in a machine-readable format. For human consumption, it is possible to download the experiment in json or PDF formats, which are automatically generated on request, or access the protocol from the web site and use the available functionality to read and “run” the experiment. The “run” option guides the user through each step of the protocol, making it easy to re-do each step and keep track of the status of the experiment re-production. A protocol can be private to a user or a group of users, or public. A public protocol can be accessed, commented and re-used by any user. DOIs can be assigned upon creator’s request to both public and private protocols (e.g. to privately share a protocol with reviewers without making it available to the public). Although designed for experiments in life science, the approach of <a href="http://protocols.io" rel="nofollow">protocols.io</a> is applicable to every scientific domain, whenever an experiment can be described as a sequence of steps to be performed by humans or by machines.</p> <p>In agreement with the journal GigaScience, which mandates the publication of the methods described in submitted papers on <a href="http://protocols.io" rel="nofollow">protocols.io</a>, protocols are not peer reviewed. <a href="http://Protocols.io" rel="nofollow">Protocols.io</a> curators perform basic checks to exclude pseudoscience and fake protocols. The lack of a pre-publication peer review process for the deposited protocols is motivated by the conviction that the quality of a protocol can only be verified when replicated by other researchers and that the replication of an experiment is not a task that a paper’s reviewer would perform. [<a href="#kix.2s3f26jrwtnx" rel="nofollow"><em>9</em></a>]</p> <p><em>ArrayExpress</em> <a href="https://www.ebi.ac.uk/arrayexpress/submit/overview.html" rel="nofollow"><em>ArrayExpress</em></a> is an archive for functional genomics data generated from microarray and next-generation sequencing (NGS) platforms. ArrayExpress intercepted the need of researchers to re-use existing data and evolved its model to archive data together with detailed information about the experiments because “<em>users should have everything they need for the data set to make sense and be reproducible without referring to an associated paper.</em>”[^4] Submissions are subject to guidelines that addresses both data and metadata. Specifically, microarray experiments must comply with the MIAMI guidelines (Minimum Information About a Microarray Experiment) [<a href="#bojeho61unbc" rel="nofollow"><em>18</em></a>], while sequencing experiments must comply with the MINSEQE guidelines (Minimum Information about a high-throughput SEQuencing Experiment) [<a href="#wmarm4cgmdir" rel="nofollow"><em>19</em></a>]. In brief, guidelines applied by ArrayExpress mandate the depositor to provide:</p> <ul> <li>The raw data (multiple file formats are accepted);</li> <li>The processed data (a tab delimited text file is mandatory, other formats are accepted in addition);</li> <li>A detailed description of the experiment, including its objectives and design;</li> <li>A description of the data processing protocols (e.g. normalisation or filtering methods used to generate the final data, algorithm used for alignment, summary of the instrumentation).</li> </ul> <p>Specific guidelines are provided based on the specific experiments to be deposited. Depositors are guided through these details thanks to pre-defined <a href="https://www.ebi.ac.uk/arrayexpress/help/pre-submission_checklist.html" rel="nofollow"><em>pre-submission checklists</em></a> and a dedicated submission tool called Annotare, which automatically checks the compliance of the submission to the guidelines in terms of inconsistencies and missing mandatory metadata or files.</p> <p>The ArrayExpress curation team does not perform scientific peer review of a deposited experiment, but further checks the quality of the metadata and the format of the submitted data files in order to ensure high-quality and promote reproducibility of functional genomics experiments.[<a href="#rxa2tvwbk6" rel="nofollow"><em>20</em></a>]</p> <p><em>myExperiment</em> <a href="https://www.myexperiment.org" rel="nofollow"><em>myExperiment</em></a> is a repository of so called “Research objects”, digital objects that aggregate resources related to a scientific experiment or a research investigation [<a href="#6umjfebkj9n2" rel="nofollow"><em>21</em></a>, <a href="#tmxrsvggkzan" rel="nofollow"><em>22</em></a>]. Resources include publications, bibliographic metadata, the data used and produced by an experiment (or a link to them), methods applied to produce or analyse the data. Named relationships can link resources belonging to the same Research Object to semantically describe the connections among resources. The Research Object model is a generic model that research communities can configure to match their needs and requirements via the concept of <em>profile</em>. A profile defines the shape and form of a domain- or application-specific research object in terms of types of resources, formats of metadata and a minimal information model (MIM) checklist that formally specify its requirements and perform automatic validation of a research object. Beyond this automatic validation, myExperiment does not perform any other technical assessment of the deposited objects.</p> <p>Different communities adopted myExperiment and the research object model to share experiments. Examples are the ISA (Investigation-Study-Assay) Research Object Bundle for the systems biology community (<a href="https://fairdomhub.org/" rel="nofollow"><em>FAIRDOMHub</em></a>) and the Workflow-centric research object, used for the representation of workflows and used by the biodiversity science community of <a href="https://www.biovel.eu/" rel="nofollow"><em>BioVel</em></a> and users of the <a href="https://taverna.incubator.apache.org/" rel="nofollow"><em>Taverna workbench</em></a>.</p> <p><strong>Table 5. Machine reproducibility and peer review of experiments</strong></p> <p><img alt="Table 5. Machine reproducibility and peer review of experiments" src="https://mfr.osf.io/export?url=https://osf.io/4rpqe/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"></p> <p><img alt="Figure 10. A digital object containing all the products used and generated during the research flow is published. The goal is to make an experiment of the research flow reproducible by others." src="https://mfr.osf.io/export?url=https://osf.io/b5ck4/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 10. A digital object containing all the products used and generated during the research flow is published. The goal is to make an experiment of the research flow reproducible by others.</em></p> <h3>Sharing and interpretation of Research Flow products</h3> <p>Executing experiments and computational workflows is an important part of the research flow and their publishing is a crucial step toward the Open Science paradigm (see Figure 10). However, sharing inputs, outputs and processes is often not enough for a human to understand the experiment and its value. In fact, information about the investigation in which the experiment is conducted is necessary in order to add context to the experiment and to make the interpretation easier.</p> <p>Guidelines like MIAME [<a href="#bojeho61unbc" rel="nofollow"><em>18</em></a>], MINSEQE [<a href="#wmarm4cgmdir" rel="nofollow"><em>19</em></a>] and the ISA framework [<a href="#ojhnd0e0q8cn" rel="nofollow"><em>24</em></a>] adopted by ArrayExpress and FAIRDOMHub, take this aspect into consideration and, when used with their full potential, may help at describing the whole research flow, from its inception and design to its final results. <a href="http://protocols.io" rel="nofollow">protocols.io</a> goes in the same direction, suggesting the creators of protocols to add as much information as possible in the proper sections of the protocol submission page [<a href="#unp3wmbkbw19" rel="nofollow"><em>23</em></a>]. However, these approaches fail at fully supporting research flow peer review for the following reasons: - They model an individual experiment rather than an arbitrary sequence of them; - They generally do not model the scientific method, but focus on the actual sequences of steps forming an experiment; - Their focus is on repeatability/reproducibility of an experiment rather than peer reviewing of all products and of the research flow (research methodology). - They do not make a distinction between successful and not successful experiment.</p> <h3>Remarks</h3> <p>Peer reviewing the whole research flow is certainly the most complete conception of evaluation and assessment of science. Its modeling includes the scientific method, which corresponds to how science is performed (the structuring of scientific thinking), and the experiments, which correspond to model how science was actually carried out in terms of steps, digital laboratory assets, and generated research products. As clarified by the analysis of the state of the art, existing approaches nicely solve some of these issues, but none of them tackles the general problem. Existing solutions have reproducibility of science as their main objective, rather than research flow peer review, hence they focus on the executable representation of digital objects encoding successful experiments. Such objects express the logic of an experiment but do not describe the overall research flow of which they are a final step; not only, such objects do not describe the scientific method underlying the research flow. For example if a scientist adopts a research flow devised as a cycle of experiments, refining their inputs and outputs until a certain success threshold is reached, using the approaches described above the scientist will publish only the last execution of the experiment, together with the research products required to reproduce it. Overall, these observations lead to the following considerations.</p> <p><em>Ongoing research flow peer review</em> In contrast with traditional peer review models, which assess scientific results only once the research activity has been successful, peer review could/should also be applied during the ongoing research flow, as a sort of monitoring and interim evaluation process. Ongoing research flow peer review would also increase the possibility for a researcher to demonstrate the validity and trustworthiness of the research being carried out and its (intermediate) results.</p> <p><em>Negative results.</em> By sharing intermediate research flow experiments and steps a researcher would also open up the “publication of negative results”. This practice could have a twofold positive effect: on the one hand, the researcher might receive comments and advice from colleagues, on the other hand, she would help the community by suggesting to avoid the same “mistakes” [<a href="#wjy0zgxs0pam" rel="nofollow"><em>25</em></a>].</p> <p><em>Machine assisted Vs human assisted review.</em> Today, there is no formal (and technologically supported) distinction between which steps of an experiment (and of the research flow) should be peer-reviewed by humans, e.g. novelty and impact of a research flow and its final products, and what could be reviewed by machines, e.g. conformance to given structural and semantic requirements of data and software (e.g. [<a href="#5fxg1rwf3ll7" rel="nofollow"><em>26</em></a>]). It would be desirable to have tools for “machine-assisted peer-review”, built on the very same digital laboratory assets that generated research products, e.g. verify research product conformance to given domain requirements and standards. Although humans would still play a central role in the peer-review process, such tools would support reviewers facing challenges going beyond their capabilities (e.g. checking the quality of each record in a database).</p> <p><em>Scientific method review.</em> In order to achieve an omni-comprehensive review of the research flow, reviewers would benefit from viewing a description of the underlying scientific process. This approach is the one underlying the Registered Reports proposed by the Centre of Open Science [<a href="#gi55cn6w3gq" rel="nofollow"><em>11</em></a>] which however is concerned with human peer review of literature products. None of the existing approaches aims at a peer review approach driven by a digital representation of the scientific process (a specific type of methodology product) where the research flow is intended as a peer reviewable instance of such process.</p> <h1>Towards research flow peer review</h1> <p>As summarized above the implementation of a fully-fledged research flow peer review methodology has requirements (tools and practices) that differ from those identified in Open Science for reproducibility. Reproducibility of science and its underlying principles are indeed crucial to support transparent peer review, but existing practices are not enough to fully address research flow peer review. In order to support this kind of peer review reviewers should evaluate science by means of a user-friendly environment which transparently relies on the underlying digital laboratory assets, hides their ICT complexity, and gives guarantees of repeatability and reproducibility recognized by the community.</p> <p>In this section we sketch some ideas in the direction of the definition of a framework for the representation of a research flow peer review for a given discipline of science. Such a framework may become the scaffolding on top of which developing tools for supporting ongoing peer review of research flows by “real-time hooking” to the underlying digital laboratory, where scientists are carrying out their research flow. Such tools would abstract over the complexity of the research activity and offer user-friendly dashboards to exmine the scientific process adopted, explore the ongoing research flow, and evaluate its intermediate experiments and relative products. In a less advanced implementation, such tools may provide scientific process and research flow to reviewers once the research activity has been terminated, inclusive of all intermediate experiments, steps and research products.</p> <p>To this aim, the framework should be built around the notion of <em>research flow review templates</em>. These are representations of the scientific processes in terms of patterns (sequences and cycles) of experiments and relative steps to be peer reviewed; note that such templates should include all and only experiments, steps, and relative “signatures” for which peer-review is required and supported. In other words a research flow template is not intended to describe the detailed experiments and steps of a research activity but to model which subset of these is relevant to assess the quality of the relative research flows.</p> <p>For example, consider the scientific process in Figure 11, which models one experiment repeatedly executed until the research activity is successful. At every round, the experiment designs (1) and collects (3) input data, instruments the digital laboratory with processing algorithms (2) and performs the analysis (4) to produce output data. Finally, it publishes (5) all such products. Then we may assume that the only review checkpoint is the one of “publication” (5), where input data and digital laboratory assets are made available. The corresponding research template would model the very same cycle and be made of one experiment including a single step of peer review, the one of publication mentioned above. Ongoing peer review tools would allow reviewers to select a given execution of the experiment in time, explore and assess input and output data, and re-execute the given step given the relative products. Of course, such tools should be equipped with functionalities to provide feedback and evaluation.</p> <p><img alt="Figure 11. Research lifecycle adopted from the research process model by Kraker & Lindstaedt (Source: OpenUP WP4 team, deliverable D4.1)" src="https://mfr.osf.io/export?url=https://osf.io/xfvpc/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 11. Research lifecycle adopted from the research process model by Kraker & Lindstaedt (Source: OpenUP WP4 team, deliverable D4.1)</em></p> <p>Several sciences are making use of shared scientific process patterns, for example in clinical trials,[^5] where sample-based experiments are structured and documented according to established protocols in order for other scientists to transparently understand the thinking underlying given findings. In order for a community to provide a specification of the peer-reviewable part of its research flow and therefore build adequate tools in support of reviewers, a simple formal framework capable of describing the structure of a given community research flow review template(s) should be available. Each template should reflect one particular way of performing science, capturing the steps which should be subject to peer review (each community may define more than one template). At the same time, templates enforce researchers at complying with certain expectations when producing science. Templates express common behaviour, determine good practices, enable reproducibility and transparent evaluation of science. To make an analogy, the structure of a template should reflect the structure of a recipe for cooking. It should specify a list of all the (types of) products needed from the digital laboratory at each step of the research flow (the ingredients) and should mandate a detailed description (machine actionable) of all the steps to be executed (the mixing and the cooking) in order to reproduce the research results (the cake).</p> <p>Such research flow review framework should encompass the following concepts (see figure 12):</p> <ul> <li><em>Research flow template</em>: the model of research flow to be followed by scientists in terms of experiments, including cycles, conditions, etc. to be peer reviewed;</li> <li><em>Step and experiment signatures</em>, intended as:<ul> <li><em>Result products</em>: the classes of products required and returned by such steps (datasets, literature, computational products);</li> <li><em>Asset of the digital laboratory</em> (which is not necessary a research product) required for the execution of the step or of the experiment.</li> <li><em>Methodology products</em>: the classes of products encoding experiments;</li> <li><em>Different classes of literature</em>: ranging from documentation to descriptions of scientific methods.</li> </ul> </li> </ul> <p>Sharing a framework of this kind allows the realization of <em>research publishing tools</em> and <em>review tools</em> that allow scientists to produce products as expected and other scientists to access such products, for reuse, reproducibility, and review. As mentioned above, to be effective and used in practice, such tools should be:</p> <ul> <li><em>Integrated with digital laboratory assets used to perform science</em>: scientists should focus on developing their science rather than publishing it; the process of creating research products and methodology products should be delegated as much as possible to machines, together with tracking the history of the actual research flow; digital laboratory assets require research publishing tools (e.g. wrappers, mediators) capable of flanking the experiment functionality they support with functionality for packaging and publishing the relative products, so that review tools can benefit from those;</li> <li><em>Easy to use</em>: user-friendly enough for scientists to access machine-assisted review tools without development skills; reviewers should be able to view the actual research flow, to view its current stage of development, and to apply machine-assisted validation from end-user interfaces;</li> <li><em>Trustworthy</em>: easy to use is a property that should come with guarantees of fairness, typically endorsed by the community adopting research publishing and review tools.</li> </ul> <p>Implementing this vision raises serious challenges as it requires not only endorsement of communities but also top-down cultural convergence (i.e. rigorous behaviour in performing science). Most importantly, the realization and maintenance of tools for publishing and review whose cost do not easily find a donor in communities that are typically formed by scientists rather than institutions.</p> <p><img alt="Figure 12. Research flow templates concepts" src="https://mfr.osf.io/export?url=https://osf.io/cepn2/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg"> <em>Figure 12. Research flow templates concepts</em></p> <h1>Use cases</h1> <p>The research flow review framework supports the definition of research flow templates to model common discipline-specific patterns (best practices for a given discipline) focusing on the steps of the research flows that should be subject to peer-review. In the following, experimental research flow templates for two specific use cases are presented. The first use case is in geothermal energy science, which is studying the energy generated and stored in the Earth. Geothermal research includes on-site activities for subsoil data measurements (in some cases already collected data is re-used) and digital activities for data curation and analysis. Typical research activities in this field are chemical analysis of rock samples and the geologic and electromagnetic modelling of the subsoil with a technique called “forward modelling”. The second use case is in archeology, i.e the study of ancient cultures and civilizations through examination of the artifacts (objects and or buildings) found over or (more often) under the ground. For both use cases we instantiate a research flow template reflecting the current practices in the field. By analysing the research flow instance we are able to identify gaps and possible enhancements to peer review practices, resulting in a further research flow that better suits Open Science principles and supports ongoing peer review.</p> <h2>Use case on geothermal energy science</h2> <p>Geothermal energy science is the scientific discipline studying energy generated and stored in the Earth. Geothermal research includes on-site activities for subsoil data measurements (in some cases already collected data is re-used) and digital activities for data curation and analysis. Typical research activities in this field are chemical analysis of rock samples and the geologic and electromagnetic modelling of the subsoil with a technique called “forward modelling”.</p> <h3>Chemical analysis of rock samples</h3> <p>The research activity can be synthesized in three main steps:</p> <ol> <li>Collection of the rock sample to analyse;</li> <li>Perform laboratory analysis on the rock sample (types of analysis vary based on the hypothesis to confirm);</li> <li>Publish an article where the hypothesis and the results of the analysis are presented and discussed. A subset of the analysis data are usually available as tables in the article (i.e. not published in a data repository).</li> </ol> <p>@<a href="gb3ez" rel="nofollow">osf</a> <em>Figure 13. Research flow for the use case on chemical analysis of rock samples</em></p> <p>Given this scenario, the corresponding research flow template for peer review would only include one experiment with a single step of peer review, the one of publication of the article embedding the analysis data (figure 13). The scenario could be enhanced to embrace Open Science practices by pushing researchers at openly publishing the full results of the analysis. Analysis data would be fully available to peer-reviewers and readers, instead of being partially available as hardly re-usable tables embedded in the full-text of the article. This enhanced scenario is depicted in figure 14.</p> <p>@<a href="j7etk" rel="nofollow">osf</a> <em>Figure 14. Research flow for the use case on chemical analysis of rock samples: analysis data completely available as separate but linked research products</em></p> <h3>Forward modelling of subsoil</h3> <p>The research activity can be synthesized as shown in figure 15:</p> <ol> <li>Find data: subsoil data about a location can be collected on-site with specific instruments or already existing data can be re-used, when available.</li> <li> <p>Modelling</p> <p>a) Data is imported in the modelling software tools (e.g. <a href="https://www.comsol.com/" rel="nofollow">Comsol multiphysics</a>, <a href="http://www.geomodeller.com/" rel="nofollow">3D GeoModeller</a>) b) The researcher uses the software to select, configure and apply canonical equations for the generation of the predictive model c) The generated model is manually verified by the researcher. If it is not correct based on the available data, then the researcher fines tune the parameters of the equation to generate a model that better fits. If the model fits the data, then it is used to answer the research question. d) The researcher publish an article that includes the model, part of the input data and the interpretation of the model</p> </li> </ol> <p>@<a href="pqzbm" rel="nofollow">osf</a> <em>Figure 15. Forward modelling of subsoil</em></p> <p>As in the previous case, current practice is to describe the research flow in the article, often based on paper templates provided by publishers, which is the only product subject to review. As a consequence, even if the research activities is more complex in terms of steps with respect to the lab analysis on rock sample, the template for research flow peer review resemble the one in figure 14, where the only step of peer review is the one of the publication of the article. A possible research flow template for the community of researchers in geothermal science could include the publishing of raw data and of all experiments ran to generate the final predictive model (fig. 16). By using this template, researchers do not only share the raw data and the final predictive model, but also the models that were discarded and the relative equation’s configuration, i.e. negative results.</p> <p>@<a href="gje3t" rel="nofollow">osf</a> <em>Figure 16. Research flow for the use case on forward modelling of subsoil: raw data, experiments and models are available, including those that produced negative results</em></p> <h2>Use case on archeology</h2> <p>Archeology is the study of ancient cultures through examination of the artifacts (objects and or buildings). A typical research flow in archeology is depicted in figure 17 and it is composed of the following steps:</p> <ol> <li>Preliminary studies to identify a place of interest. Studies include the analysis of indirect and direct sources. Examples of indirect sources are texts written by ancient geographers. Direct sources include street epigraphes and artifacts with information about a place, such as amphorae with stamps or coins. </li> <li>Preliminary studies typically help the archeologist to formulate an hypothesis on the area to search regarding its geographical location and its societal and cultural role in a specific historical period. </li> <li>The archaeologist sees aerial/satellite photos of the areas and looks for traces </li> <li>The archaeologist visits the area and looks for topographic hints to circumscribe the area </li> <li>Spot corings are performed and, if positive, the area is prepared for excavation </li> <li>During the excavation, the archeologist produces different types of documentation, often referred as “raw data”:<ul> <li>The Harris matrix for stratigraphic squaring, which tracks what have been found and where.</li> <li>Excavation diary: a daily log kept by the excavators on site during each season. They record the day to day activity of the team and their observations on their work</li> <li>Matrix map (GIS)</li> <li>Photogrammetry: photos taken during excavations that can be used to generate 3D models</li> </ul> </li> <li>At the end of the excavation activity, an excavation report is produced. It contains a summary of the excavation diary and the hypothesis of the archeologist based on the raw data. Raw data are not published, only the part that has been actually used for the interpretation</li> </ol> <p>@<a href="rtjyc" rel="nofollow">osf</a> <em>Figure 17. Use case on archeology</em></p> <p>The excavation report represents the main scholarly communication output of excavation activities and include the interpretation of the archeologist. However, the report does not contain or reference to the full raw data, thus peer reviewers can only assess that the authors’ conclusions are not completely wrong based on the subset of information that the authors decided to provide in the report. </p> <p>A peer review research flow template that reflects this current practice is the same as that in Figure 13: the only step that produces outputs for peer-review is the final publishing step.</p> <p>Open publishing of raw data is still far to be a common practice, although some countries are issuing national mandates for open excavation reports and international activities like ARIADNE are pushing for the open access to all raw data (excavation diaries and reports, Harris matrix, GIS maps, 3D Models. A research flow template supporting the Open Science paradigm for archeology is shown in figure 18. Preliminary studies and initial hypothesis can be shared by providing a collection of direct and direct sources, including, for example, the studied texts and on-site photos. Raw data produced during the excavation is also made available as a set of separate products, possibly linking to each others. Another option (not shown in the figure) is to publish all raw data as a single research object. The final step is for the publishing of the excavation report which should link to the other objects produced during the investigation.</p> <p>@<a href="gcvja" rel="nofollow">osf</a> <em>Figure 18. Research flow for the use case on archeology</em></p> <h1>Conclusion</h1> <p>The Open Science paradigm calls for the availability, findability and accessibility of all products generated by a research activity. That practice is a prerequisite for reaching two of the main goals of the Open Science movement: reproducibility and transparent assessment of research activities. In this paper we have described the current practices for the peer review of research flows, which range from traditional peer-review via scientific literature to peer review by reproducibility of digital experiments. We have argued that current practices have reproducibility of science as their main objective and they do not fully address transparent assessment and its features like publishing negative results, supporting peer-review while the research activities is ongoing and enabling machine-assisted peer review. </p> <p>Foundations of a framework for the peer-review of research flows have been presented. The goal of the framework is to be the bridge between the place where the research is conducted (i.e., the digital laboratory) and the place where the research is published (or in general, made available and accessible). The framework aims at providing the scaffolding on top of which reviewers can evaluate science by means of a user-friendly environment that transparently relies on the underlying digital laboratory assets, hides their ICT complexity, and gives guarantees of repeatability and reproducibility recognized by the community. One of the building block of the framework is the notion of research flow template, through which a community can model the research flow to be followed by scientists in terms of experiments, including cycles, conditions, etc. to be peer reviewed. The framework allows communities to define one or more research flow templates, each capturing the steps which should be subject to peer review for a specific type of research activity. Templates are not only useful to peers willing to evaluate a research activity, but also enforce researchers at complying with certain expectations of their community, like best practices and common behaviour. </p> <p>The framework is theoretically applicable to any field of research adopting digital objects and/or producing digital research outputs. Detailed analysis on the applicability of the framework is ongoing. Specifically, the fields of geothermal energy science and archeology have been considered as representatives of non-fully digital disciplines, which may pose challenges from the modelling point of view, as not all the research assets and products may be available in a digital laboratory. </p> <h1>References</h1> <ol> <li> <p>European Commission (2015). Validation of the results of the public consultation on Science 2.0: Science in Transition [report]. Brussels: European Commission, Directorate-General for Research and Innovation. Available at: <a href="http://ec.europa.eu/research/consultations/science-2.0/science_2_0_final_report.pdf" rel="nofollow"><em>http://ec.europa.eu/research/consultations/science-2.0/science_2_0_final_report.pdf</em></a>.</p> </li> <li> <p>European Commission's Directorate-General for Research & Innovation (RTD) (2016). Open Innovation, Open Science and Open to the World. Available at: <a href="https://ec.europa.eu/digital-single-market/en/news/open-innovation-open-science-open-world-vision-europe" rel="nofollow"><em>https://ec.europa.eu/digital-single-market/en/news/open-innovation-open-science-open-world-vision-europe</em></a>.</p> </li> <li> <p>FOSTER. Open Science Definition: <a href="https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition" rel="nofollow"><em>https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>De Roure, D. (2009) Replacing the Paper: The Twelve Rs of the e-Research Record. Open Wetware blog: <a href="http://blog.openwetware.org/deroure/?p=56" rel="nofollow"><em>http://blog.openwetware.org/deroure/?p=56</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>Bechhofer S. et al. (2013). Why linked data is not enough for scientists, Future Generation Computer Systems, Volume 29, Issue 2, February 2013, Pages 599-611, ISSN 0167-739X, <a href="https://doi.org/10.1016/j.future.2011.08.004" rel="nofollow"><em>https://doi.org/10.1016/j.future.2011.08.004</em></a>.(<a href="http://www.sciencedirect.com/science/article/pii/S0167739X11001439" rel="nofollow"><em>http://www.sciencedirect.com/science/article/pii/S0167739X11001439</em></a>).</p> </li> <li> <p>Borgman, C. L. (2015). <em>Big data, little data, no data: scholarship in the networked world</em>. MIT press.</p> </li> <li> <p>Stančiauskas, V., and Banelytė V.. (2017). OpenUP survey on researchers' current perceptions and practices in peer review, impact measurement and dissemination of research results [Data set]. Zenodo. <a href="http://doi.org/10.5281/zenodo.556157" rel="nofollow"><em>http://doi.org/10.5281/zenodo.556157</em></a>.</p> </li> <li> <p>Smagorinsky, P. (2008). The method section as conceptual epicenter in constructing social science research reports. Written Communication, 25, 389-411. <a href="http://journals.sagepub.com/doi/pdf/10.1177/0741088308317815" rel="nofollow"><em>http://journals.sagepub.com/doi/pdf/10.1177/0741088308317815</em></a>.</p> </li> <li> <p>Teytelman, L. (2016), We've been itching to share this! Integration of GigaScience and <a href="http://protocols.io" rel="nofollow">protocols.io</a> is an example of how science publishing should work. <a href="http://Protocols.io" rel="nofollow">Protocols.io</a> news: <a href="https://www.protocols.io/groups/protocolsio/news/weve-been-itching-to-share-this-integration-of-gigascience" rel="nofollow"><em>https://www.protocols.io/groups/protocolsio/news/weve-been-itching-to-share-this-integration-of-gigascience</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>Cotos E., Huffman S., and Link S. (2017). A move/step model for methods sections: Demonstrating Rigour and Credibility, English for Specific Purposes, Volume 46, April 2017, Pages 90-106, ISSN 0889-4906, <a href="https://doi.org/10.1016/j.esp.2017.01.001" rel="nofollow"><em>https://doi.org/10.1016/j.esp.2017.01.001</em></a>.</p> </li> <li> <p>Center for Open Science. Registered Reports: Peer review before results are known to align scientific values and practices. <a href="https://cos.io/rr/" rel="nofollow"><em>https://cos.io/rr/</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>FORCE11 (2014). Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing version b1.0. <a href="https://www.force11.org/fairprinciples" rel="nofollow"><em>https://www.force11.org/fairprinciples</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al. (2016) "The FAIR Guiding Principles for scientific data management and stewardship." <em>Scientific data</em> 3 (2016).</p> </li> <li> <p>Assante, M., Candela, L., Castelli, D. and Tani, A. (2016). Are Scientific Data Repositories Coping with Research Data Publishing?. <em>Data Science Journal</em>, <em>15</em>, 6. DOI: <a href="http://doi.org/10.5334/dsj-2016-006" rel="nofollow"><em>http://doi.org/10.5334/dsj-2016-006</em></a>.</p> </li> <li> <p>Mayernik, M.S., S. Callaghan, R. Leigh, J. Tedds, and S. Worley (2015). <a href="http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-13-00083.1" rel="nofollow"><em>Peer Review of Datasets: When, Why, and How.</em></a> <em>Bull. Amer. Meteor. Soc.,</em> 96, 191–201, doi: 10.1175/BAMS-D-13-00083.1.</p> </li> <li> <p>Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762. doi:<a href="http://doi.org/10.1002/asi.23358" rel="nofollow"><em>10.1002/asi.23358</em></a>.</p> </li> <li> <p>Carpenter, T. A. (2017). What Constitutes Peer Review of Data: A survey of published peer review guidelines. <em>arXiv preprint arXiv:1704.02236</em>. <a href="https://arxiv.org/pdf/1704.02236.pdf" rel="nofollow"><em>https://arxiv.org/pdf/1704.02236.pdf</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>MIAMI guidelines (Minimum Information About a Microarray Experiment): <a href="http://fged.org/projects/miame/" rel="nofollow"><em>http://fged.org/projects/miame/</em></a>.</p> </li> <li> <p>MINSEQE guidelines (Minimum Information about a high-throughput SEQuencing Experiment): <a href="http://fged.org/projects/minseqe/" rel="nofollow"><em>http://fged.org/projects/minseqe/</em></a></p> </li> <li> <p>Tang, A. (2017). ArrayExpress at EMBL-EBI quality first! Repositive blog: <a href="https://blog.repositive.io/arrayexpress-at-embl-ebi-quality-first/" rel="nofollow"><em>https://blog.repositive.io/arrayexpress-at-embl-ebi-quality-first/</em></a> (last accessed 19 May 2017).</p> </li> <li> <p>De Roure, D, Goble, C., and Stevens R. (2009). The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows. <em>Future Gener. Comput. Syst.</em> 25, 5 (May 2009), 561-567. <a href="http://dx.doi.org/10.1016/j.future.2008.06.010" rel="nofollow"><em>http://dx.doi.org/10.1016/j.future.2008.06.010</em></a>.</p> </li> <li> <p>Bechhofer, S., De Roure, D., Gamble, M., Goble, C., & Buchan, I. (2010). Research objects: Towards exchange and reuse of digital knowledge. <em>The Future of the Web for Collaborative Science</em>, <em>10</em>.</p> </li> <li> <p><a href="http://protocols.io" rel="nofollow">protocols.io</a> team (2017). How to make your protocol more reproducible, discoverable, and user-friendly. <a href="http://Protocols.io" rel="nofollow">Protocols.io</a>. <a href="http://dx.doi.org/10.17504/protocols.io.g7vbzn6" rel="nofollow"><em>http://dx.doi.org/10.17504/protocols.io.g7vbzn6</em></a>.</p> </li> <li> <p>Sansone, S. A., et al. (2012). Toward interoperable bioscience data. Nature genetics, 44(2), 121-126. doi:<a href="http://dx.doi.org/10.1038/ng.1054" rel="nofollow"><em>10.1038/ng.1054</em></a>.</p> </li> <li> <p>Di Leo, A., E. Risi, and L. Biganzoli. No pain, no gain… What we can learn from a trial reporting negative results. Annals of Oncology 28.4 (2017): 678-680.</p> </li> <li> <p>Shanahan D. A peerless review? Automating methodological and statistical review, <a href="https://blogs.biomedcentral.com/bmcblog/2016/05/23/peerless-review-automating-methodological-statistical-review/" rel="nofollow">https://blogs.biomedcentral.com/bmcblog/2016/05/23/peerless-review-automating-methodological-statistical-review/</a></p> </li> </ol>
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.