D3.2 A specification of the scientific method and scientific communication
====================
**Summary** This deliverable aims at shedding some light on scientific
peer review in the era of digital science, which in our view goes beyond
reviewing scholarly literature. In the Digital Era not only the final
outcome of the research process, i.e. the scientific publication, but
potentially also other research products generated at other stages of
the research workflow can be subject to review by peers. The adoption of
ICT technologies in support of science introduces unprecedented
benefits, which can be mainly identified in: (i) the ability of sharing
an online “digital laboratory”, i.e. tools, applications, services used
to perform science, and (ii) the ability of sharing research products
used as input or produced in the context of a digital laboratory. An
example of (i) may be RStudio, a desktop tool to run R scripts, made
available for download from some Web repository, while an example of
(ii) may be the specific R script created by a scientist as result of
his/her research activity, made available to other researchers through
the digital laboratory. Accordingly, scientists can not only publish
literature describing their findings but also share the entities they
used and that are required to repeat and reproduce science.
Such innovative shift also sets the condition for novel peer review
methodologies, as well as scientific reward policies, where scientific
results can be transparently and objectively assessed via
machine-assisted processes. In this deliverable we describe our vision
of “research flow peer review” as a urgent and demanded practice,
identifying related challenges, current solutions, and proposing future
directions.
@[toc]
Glossary
========
**Digital laboratory** the subset of assets of an e-infrastructure
needed to perform the activities of a specific research flow; its
content may change over time (addition or deletion of digital objects)
but at any point in time its content should provide all that is needed
in order to repeat the activities of the research flow
**e-infrastructure** a virtual space of assets providing all the
functionality and the digital objects needed to perform research in a
specific discipline; the assets in the e-infrastructure (e.g. textual
descriptions, datasets, tools, services, standards) may come from
different physical research infrastructure and are (usually) defined by
the research community of that discipline
**Research activity** in any discipline, the activities performed to
answer a “research question”, usually formulated as one or more
hypotheses to be proved true through the research activity
**Research experiment** a sequence of research steps that represent a
major action in the research flow; usually is a goal-driven sequence of
steps set to verify intermediate hypothesis and whose results may
inspire further experiments to address the target of the overarching
research activity
**Research flow** the sequence of actions performed while carrying out a
research activity
**Research flow template** a precise (formal) description of an
“abstract” research flow, defined by the research community of the
specific research field in which the template should be used; it
embodies the best practices and standards of the research community and
provides a description of the experiments and steps to be executed to
perform a research activity in that field, together with the assets from
the digital laboratory needed at each step; it provides also a
description of the research products to be made available for peer
review, together with the steps at which they should be made available
to the research community
**Research product** the digital objects produced as a result of the
research activity; they can represent the final outcome of the research
activity, or can be the output of intermediate experiments or steps,
possibly to be used in subsequent steps; we can identify several
categories of research products, such as literature, datasets,
computational products (programs and tools), formal descriptions
(possibly machine executable) of steps, experiments or the whole
research flow
**Research step** an action in the research flow that (usually) cannot
be performed (or is not convenient to perform) with a sequence of
smaller actions; this notion of “atomic” action is clearly dependent on
the research field
Science’s digital shift and impact on Open Science
--------------------------------------------------
An increasing number of researchers conduct their research adopting ICT
tools for the production and processing of research products. In the
last decade, research infrastructures (organizational and technological
facilities supporting research activities) are investing in
“e-infrastructures” that leverage ICT tools, services, guidelines and
policies to support the digital practices of their community of
researchers. To find an analogy with traditional science,
e-infrastructures are the place where researchers can grow and define
the boundaries of their *digital laboratories*, i.e. the subset of
assets they use to run an experiment. Researchers run their digital
*experiments* (e.g. simulations, data analysis) taking advantage of the
digital laboratory assets and generate new *research data* and
*computational products* (e.g. software, R algorithms, computational
workflows) that can be shared with other researchers of the same
community, to be discovered, accessed and reused.
The role of digital laboratories is therefore twofold: on the one hand
they support researchers in their advancement of science, offering the
facilities needed for their daily activities; on the other hand, they
foster the dissemination of research within the research community,
supporting discovery, access to, sharing, and reuse of digital research
products. In fact, their digital nature offers unprecedented
opportunities for scientists, who can share not only scientific
literature describing their findings, but also the digital results that
they managed to produce, together with the digital laboratory itself.
Those features are fundamental for an effective implementation of the
Open Science (OS) paradigm \[[*1*](#5k9wp98kz729),[*2*](#bn5l8dvetga)\].
OS is a set of practices of science, advocated by all
scientific/scholarly communication stakeholders (i.e., research funders,
research and academic organisations, and researchers), according to
which the research activities and all the products they generate should
be freely available, under terms that enable their findability,
accessibility, re-use, and re-distribution \[[*3*](#elc5nmqrlust)\]. The
effects of Open Science are mainly the following:
- Reproduce research activities: let other users reproduce the experiments of a research activity;
- Transparently assess research activities: evaluate findings based on the ability to repeat science, but also on the quality of the individual products of science, i.e. literature, research data, computational products and experiments.
If supported with adequate degrees of openness, scientists may find the
conditions to *repeat* (“same research activity, same laboratory”),
*replicate* (“same research activity, different laboratory”),
*reproduce* (“same research activity, different input parameters”), or
*re-use* (“using a product of a research activity into another research
activity”) the research activities, thereby maximizing transparency and
exploitation of scientific findings \[[*4*](#kio2s7rtbr66)\].
Research Flow in the Digital Era
--------------------------------
A *research activity* is carried on as a sequence of actions
constituting the *research flow*. The research flow is made of a number
of *experiments*, realized as sequence of steps in the context of a
*digital laboratory*, executed by scientists driven by the ultimate
intent of proving an initial scientific thesis. In the following we
shall refer to:
- An *experiment* is defined in the following as a goal-driven sequence of steps set to verify a thesis, and whose result may inspire further experiments to address the target of the overarching research activity.
- A *digital laboratory* can be defined as a pool of digital assets (e.g. on-line tools, desktop tools, methodologies, standards) used by scientists to perform the steps of an experiment and generate research products.
- A *research product* is defined here as any digital object generated during the research flow that was relevant to complete the research activity and (possibly) relevant for its interpretation once the research activity has been completed. Products are digital objects, whose human consumption depends on computer programs; they are concrete items that can be discovered, accessed, and possibly re-used under given access rights. Examples are datasets in a data repository (e.g. sea observations in the PANGAEA repository), but also entries in domain databases (e.g. proteins in UNIPROT), software (e.g. models implemented as R algorithms in GitHub), and of course the scientific article, reporting about the findings of a research activity.
A research activity may therefore generate a number of research
products, which represent the digital tangible results of the research
activity and enable scientists to draw their conclusions. Indeed,
several “intermediate” products are generated on the way to the end,
e.g. input and outputs of unsuccessful experiments, versions of the
final products to be refined. A research activity can therefore be
described by a *research flow*, pictured in figure 1, as a sequence of
steps *S1...Sn,* potentially grouped into *experiments*, carried out in
the frame of a digital laboratory. More specifically, each step *Si* of
a research flow is in turn a sequence of actions enacted by humans,
possibly by means of digital laboratory assets, that may require or
produce (intermediate) research products. Clearly, some (or all) of the
research products generated during the research flow may become, at some
point in time, new assets of a digital laboratory. According to this
scenario, in the simplest case of theoretical sciences, the research
flow might be constituted by one experiment consisting of one step of
“formulation of hypothesis” and one step of “thinking”, whose final
product is a scientific article. In a slightly more complex scenario, a
research flow may be composed by one experiment, whose steps include
data collection, data processing, and result analysis, with a final last
step producing the article and the research data output to be published.
![Figure 1. The research flow][1]
*Figure 1. The research flow*
Peer-reviewing the research flow
--------------------------------
As stated before, in the digital science era, the ability to share
research products, in combination with digital laboratories, opens the
way to Open Science principles. According to these principles, science
should be open not only once that it is concluded, but also while it is
being performed. In other words, scientists should, as much as possible,
make their methodologies, thinking and findings available to
enable/maximize collaboration and reuse by the community. The digital
laboratory becomes therefore the core of this vision as it is the place
providing the assets needed by the researchers to implement their
research flow and at the same time the place providing the generated
research products, for sharing and peer-reviewing. For example,
scientists performing analysis of data using R scripts, may use a
digital laboratory equipped with the software RStudio offered
as-a-service by an online provider (e.g. [*BlueBridge
e-infrastructure*](http://www.bluebridge-vres.eu/) powered by
[*D4Science*](https://www.d4science.org/)) and a repository where they
can store/share their R scripts and their input and output datasets
(e.g. [*Zenodo.org*](https://zenodo.org)).
Depending on the technological advances introduced and adopted by the
research community in the digital laboratory, scientists may generate
products whose goal is not just sharing “findings” but also sharing
“methodologies”. Methodology products are digital objects encoding
experiments or the research flow itself (see Figure 2). As such they are
generated to model the actions performed by the scientists and enable
their machine-assisted repetition. The availability of research products
at various stages of the research flow makes it possible to introduce
peer review stages during the on-going research project. Specifically,
depending on the kind of products made available, different degrees of
peer review may be reached, to support manual but also machine-supported
reproducibility and consequently enforce more transparent and objective
research flow peer review practices:
- **Manual reproducibility**: the digital laboratory generates:
- *Literature*, defined as narrative descriptions of research activities (e.g. scientific article, book, documentation);
- *Datasets*, defined as digital objects “used as evidence of phenomena for the purpose of research or scholarship” \[[*6*](#usjlz7ins46m)\];
- *Computational products* (e.g. software, tools), intended as digital objects encoding business logic/algorithms to perform computational reasoning over data;
Reviewers are provided with the products generated by a research flow, whose steps are reported in an article together with references to the digital laboratory. Reproducibility and research flow assessment strongly depends on humans, both in the way the research flow is described and in the ability of the reviewers, and in general of other researchers, to repeat the same actions.
- **Machine reproducibility of experiments:** the digital laboratory generates literature, datasets and computational products together with
- *Experiments*, intended as executable digital objects encoding a sequence of actions (e.g. a *methodology*) that make use of digital laboratory assets to deliver research products.
Reviewers are provided with an experiment, inclusive of products and digital assets. Reproducibility can be objectively supported by a machine and finally evaluated, but the assessment of methodology as a whole still depends on humans.
- **Machine reproducibility of research flows**: the digital laboratory generates literature, datasets, computational products, experiments together with
- *Research flows*, intended as digital objects encoding a flow, inclusive of experiments, intermediate, and final products, and their relationships; the research flow may be encoded as a sharable and possibly reproducible digital product.
Reviewers are provided with technology to reproduce experiments and research flows. In this scenario, human judgment is supported by machines, which can provide a higher degree of transparency.
![Figure 2 - Entities of research flow][2]
*Figure 2 - Entities of research flow*
**Research flow and peer review: current practices**
Researchers today tend to make a clear distinction between the phase of
research activity and the phase of research publishing. During the
former scientists perform science, during the latter scientists publish
their final results. According to the OpenUP survey
\[[*7*](#hyy13omu5w88)\] conducted between 20 January and 23 February
2017, more than a half of the 883 respondents (62% on average) confirmed
the general trend of sharing and disseminating their results after the
conclusion of the research. Trends were similar across countries,
disciplines, organisation types and gender (see Figure 3). Research
publishing is generally intended as the moment in which researchers are
sharing their findings with the broader community of all researchers,
hence also the moment from which the peer review of the research flow
starts. With reference to the research flow scenario depicted in figure
1, it is as if every research flow would include a concluding step of
“publishing” (see Figure 4), where researchers select all products
generated at intermediate steps that are worth publishing and share them
with “the world” to start the peer-reviewing process.
![Figure 3. Result from the OpenUP Survey: The start of dissemination activities][3]
*Figure 3. Result from the OpenUP Survey: The start of dissemination activities*
![Figure 4. Publishing a research flow today: post research activities][4]
*Figure 4. Publishing a research flow today: post research activities*
In the following sections of the report we shall present the two main
trends in publishing a research flow today: enabling manual or machine
supported peer-review. The first trend is concerned with literature,
using the articles as the sole means for sharing research flow details.
The second trend considers publishing other products together with the
article, possibly with references to the digital library assets. The
differences between the two is in the fact that the former is easier to
uptake but provides lower degree of transparent and objective
peer-review, while the latter enables (partially) automated peer-review,
but requires the definition of an e-infrastructure and the digital
laboratory related to the research flow, which may impose a change of
behavior and imply a non-trivial learning curve.
Manual reproducibility and peer review: State of the art
---------------------------------------------------------
### Current approaches based on scientific literature
Traditionally, the peer review of the research flow has been delegated
to scientific literature (e.g., articles, books, technical reports, PhD
theses) which is still (and likely always will, since narration is
crucial for understanding) regarded as the common omni-comprehensive
unit of scientific dissemination. Literature addresses reproducibility
and assessment of research flows by explaining and describing the
relative steps, the experimental (or digital) laboratory where it was
conducted (i.e., methodology, tools, standards), describing any product
of science used or yielded by the activity, and facilitating
reproducibility by a detailed, theoretically unambiguous, description of
the experiments (Fig. 5). To make this process less ambiguous, and to
highlight the importance of repeatability and reproducibility, some
journals mandate a dedicated section in the paper, often called
“Methodology” section. The description of the research flow is clearly
separated from the other sections of the paper, supporting reviewers and
readers in understanding which steps have been performed, in which
sequence, and, for each step, the adopted laboratory assets and the
generated research products.
![Figure 5. Traditional peer-review: the research flow is peer reviewed by peer-reviewing its textual description, after the final results are obtained][5]
*Figure 5. Traditional peer-review: the research flow is peer reviewed by peer-reviewing its textual description, after the final results are obtained*
However, a natural language description of a methodology can be
interpreted in different ways and typically does not include all the
details that are needed in order to replicate the experiment or
reproduce the results. In addition, it has been found
\[[*8*](#w9wvcwvapoff), [*9*](#kix.2s3f26jrwtnx),
[*10*](#8uia5l17fhp4)\] that “Methodology” sections of the papers often
include generic sentences, and lack of details that were necessary to
attempt the reproduction of the study. To overcome this issue, the
[*Centre for Open Science*](https://cos.io/), in collaboration with more
than 3,000 journals, is trying out an approach to improve accuracy and
unambiguity in the descriptions of research flows in papers. The
approach is based on the concept of “registered reports”, documents that
describe the research flow and that are submitted to the journal before
the research starts (“pre-registration”) \[[*11*](#gi55cn6w3gq)\] (see
Figure 6). Registered reports can be considered as a “Methodology”
section of a paper, but more structured, detailed and separated from the
paper.
The idea behind registered reports is that if a researcher prepares a
detailed study design and shares it before he/she actually collects and
analyses the data, the possible biases are minimized in the phases of
data collection, analysis, and results reporting in the final paper.
Reviewers are invited to review registered reports and assess the
methodology and/or protocol described therein. Based on such assessment,
editors can decide to “pre-accept” the final paper, regardless of the
actual results that will be obtained. Registered reports have two main
positive consequences: the first is that they can stimulate feedback
from peer reviewers and the researchers can receive suggestions on how
to improve their methodology before they start their investigations. The
second main benefit is that if the research is “pre-accepted”, researchers know that they will be published also if they get negative
results, hence they are keener to write an “honest” final paper,
describing both what went right and what went wrong. Registered reports
cannot be adopted in any field of science: the Centre for Open Science
is conducting its pilot for research involving statistical analysis,
which seem to be the type of studies that can benefit the most from this
approach. As of May 2017, the Open Science Framework hosts about 153,000
registered reports
([*https://osf.io/registries/*](https://osf.io/registries/).
![Figure 6. With registered reports motivations and design of the research flow are peer reviewed before the final results are obtained][6]
*Figure 6. With registered reports motivations and design of the research flow are peer reviewed before the final results are obtained*
To conclude, if literature is certainly the most common way to make a
research flow sharable, since other scientists can discover and read
about somebody else’s methods, protocols, and findings, it generally
fails at ensuring transparent evaluation of research flows. Indeed, as
it cannot provide effective access to all products generated during the
research flow, reproducibility is typically up to scientists and their
ability to restore the original digital laboratory, find the necessary
products, and perform the experiments as described in the article text.
The inability to “objectively” reproduce science jeopardises effective
peer-review, which for literature is generally biased by: authors’
decisions (e.g. what to describe in the text), reviewers’ decisions
(e.g. trust in the author’s statements), and community practices (e.g.
de facto standards in how to describe a research activity).
**Table 1. Manual reproducibility and peer review via scientific literature**
![Table 1. Manual reproducibility and peer review via scientific literature][7]
### Current approaches based on scientific literature with links to digital products
Scientists typically generate a number of research products while they
are carrying out their research, but in several cases those are kept out
of the scientific communication chain. As a consequence, part of the
work of researchers is hidden from their peers, who end up with a
partial view and can only see “the tip of the iceberg” provided by
scientific literature.
A common approach adopted today across several disciplines, demanded and
inspired by communities and funders pushing for Open Science, is that of
publishing articles together with links to other digital research
products, deposited in dedicated repositories. In the majority of cases
(e.g. Open Data Pilot), literature links to *datasets*, although some
cutting-edge research communities are experimenting with links to
*computational products* (e.g. software, scientific workflows),
*experiments* and *methodologies*. This trend is confirmed by the
outcome of the OpenUP survey (table 2) \[[*7*](#hyy13omu5w88)\],
according to which datasets, software and IT tools are the most
important research products after the traditional literature products.
Protocols and methodologies are particularly important research products
for the medical sciences.
In the following sections we shall present these approaches, commenting
on their advantages and disadvantages in terms of how they address the
challenge of research flow peer review. A summary is presented in tables
3 and 4.
**Table 2. The importance of the research outputs for OpenUp survey respondents: percentages show a share of respondents who chose ‘very important’ and ‘somewhat important’ answer categories**
![Table 2. The importance of the research outputs for OpenUp survey respondents: percentages show a share of respondents who chose ‘very important’ and ‘somewhat important’ answer categories][8]
***Research data***
***Research data*** The trend of publishing datasets for the purpose of
making them citable from the literature (Fig. 7) is supported by a
growing number of data repositories and archives that assign unique,
persistent identifiers to the deposited datasets and apply the FAIR
principles \[[*12*](#96t0b5p3u4li), [*13*](#ee4hay12ohi9)\] (data should
be Findable, Accessible, Interoperable and Re-usable). Relevant examples
are [*Zenodo*](https://zenodo.org/) and
[*figshare*](https://figshare.com/) (cross-discipline and allow
deposition of products of any type), [*DRYAD*](http://datadryad.org/)
(mostly life science), [*PANGAEA*](https://www.pangaea.de/) (earth &
environmental science), [*Archeology Data
Service*](https://archaeologydataservice.ac.uk/) (archeology),
[*DANS*](https://dans.knaw.nl/en) (multi-discipline, mostly humanities
and social sciences).
Results of the OpenUP survey \[[*7*](#hyy13omu5w88)\] show that
researchers are in favour of open peer review of data (71%), although it
is not a current practice for most data repositories. In the majority of
cases, curators of data repositories perform technical checks to ensure
the readability of data and the compliance of the submission to a set of
defined guidelines or policies. These mainly address technical aspects
of the datasets, such as file formats, documentation (e.g. README files,
availability of a description) and metadata \[[*14*](#4zfe71z5po1z),
[*15*](#rv7o6gfz4h0w)\].
Among the aforementioned repositories, DRYAD and PANGAEA have the most
advanced data review processes. According to [*DRYAD Terms of
services*](http://datadryad.org/pages/policies), before a dataset is
published, DRYAD curators verify that the deposited data is readable and
that the depositor has provided technically correct metadata about the
datasets, its licensing conditions and links to the related scientific
publications. The [*review process implemented by
PANGAEA*](https://wiki.pangaea.de/wiki/Data_submission) consists of
three main phases. First, PANGAEA editors and curators check the data
for consistency, completeness and compliance to standards. The second
phase is an optional phase that may be requested by the editors to
better document the datasets by providing the definitions of the
parameters included in the dataset. Finally, editors and curators ensure
that the files of the datasets are in the proper format to be ingested
into the PANGAEA system.
The main focus of the data review processes implemented in data
repositories is on checking technical details of the datasets,
validating their metadata and, possibly, their descriptions, without
addressing the scientific value of the dataset. In fact, data review is
not performed by “peers”, but by editors and curators who are not
necessarily researchers in the same field of the depositors. Likely,
data reviewers are instead expert of data management, archiving, and
data preservation.
![Figure 7. Final research data is made accessible via a data repository or archive.][9]
*Figure 7. Final research data is made accessible via a data repository or archive.*
A different approach to data peer review is adopted by data journals
(Fig. 8). Data journals publish data papers, i.e. papers describing
datasets in terms of content, provenance, and foreseen usage.
Data journals inherited the peer-review process from traditional
journals of scientific literature and apply it, with slight changes, to
the data papers. The survey conducted in 2015 by Candela et al.
\[[*16*](#alx3alkptcz)\] observed that in the majority of cases the
review policies of data papers were the same of traditional papers and
that only some data journals had strategies to capture the specificity
of data and data papers. The survey was extended in 2017 by Carpenter
\[[*17*](#gk5q8y1szp9j)\], who confirmed the results of the previous
survey and highlighted that the peer review of data papers is mostly
focused on the peer review of metadata, rather than on the data itself,
confirming the importance of documentation and metadata descriptions
that will facilitate data re-use \[[*15*](#rv7o6gfz4h0w)\].
With the existing approaches, the reproducibility of a dataset (when
applicable, as some datasets cannot be reproduced, such as those
generated by devices for atmospheric measurements) is not considered an
important aspect of data (peer) review, although reproducibility is
crucial to demonstrate the correctness of data and its analysis, upon
which researchers’ conclusions are based.
![Figure 8. In data papers the phases of data collection and processing are described and peer reviewed. The asset of the digital laboratory used for processing is referred, possibly with its configuration details. The final research data is made accessible via a data repository or archive.][10]
*Figure 8. In data papers the phases of data collection and processing are described and peer reviewed. The asset of the digital laboratory used for processing is referred, possibly with its configuration details. The final research data is made accessible via a data repository or archive.*
**Table 3. Manual reproducibility and peer review of research data**
![Table 3. Manual reproducibility and peer review of research data][11]
***Computational research products***
Today, the publishing of computational products is typically performed
via tools and services that are not meant for scholarly communication
but that implement general patterns for collaboration and sharing of
computational products. Examples are software repositories (or Version
Control Systems (VCSs)) with their hosting services like Github and
language-specific repositories like CRAN (The Comprehensive R Archive
Network), the Python Package Index, and CPAN.
Unlike data repositories, software repositories usually do not assign
persistent identifiers and literature products refer to software via
URL. To overcome the issue of unstable URLs, data repositories like
Zenodo and Datacite also allow to deposit computational products and get
a persistent, citable identifier.
*Github, Zenodo, figshare*
Github is currently the most popular online software repository. Being a
generic software repository, GitHub does not define any policy for
(research) software deposition, although it encourages good software
development practices and its user interface supports easy communication
and collaboration among users.
Recently, researchers started using it for sharing their research
software. Zenodo, figshare and GitHub intercepted their need of having a
citable product and started a partnership for the assignment of DOIs to
software releases. Zenodo and figshare are “catch-all” repositories for
research, where users can deposit research products of different types:
literature, datasets, software, presentations, lessons, videos, images,
and software. In May 2017, 15,358 research software products have been
deposited in Zenodo.
Language-specific repositories are meant to help users, researchers
included, in finding common and high quality software to use. In some
cases, researchers may deposit the computational products produced
during their research flows, but still this is not a current practice.
Having a repository for computational products integrated with the
scholarly communication chain like Zenodo could change the habits of
researchers and foster sharing and re-use of research software.
In the following, a brief description of the review and submission
policies of a selection of language-specific software repositories
relevant for the research community is given.
*CRAN* [*CRAN*](https://cran.r-project.org/index.html) is the
Comprehensive R Archive Network hosting the R package repository.
Deposition of software in the repository is subject to policies[^1] that
address credit, legal, technical and documentation concerns.
*PyPI* [*The Python Package Index (PyPI)*](https://pypi.python.org/pypi)
is a repository for software packages written in Python language. [*PPI
policies*](https://wiki.python.org/moin/CheeseShopTutorial#Submitting_Packages_to_the_Package_Index)
for submission to the repository address the code structure, the
description of the package (metadata and documentation) and licensing.
In order to ensure that the package can be properly ingested and
archived, users can first test the ingestion process on a [*test
site*](https://testpypi.python.org/pypi).
*CPAN* [*CPAN*](http://www.cpan.org/) is a software repository for Perl
modules. Its [*submission
policies*](http://www.cpan.org/scripts/submitting.html) address
technical features of the modules and define the basic metadata fields
to be provided. To support Perl users in respecting the CPAN policies
and to ensure high quality of the submitted modules, a dedicated web
site has been set up: the [*PrePAN web site*](http://prepan.org/info).
Pre-PAN is composed of a software repository with forum capabilities
where users can submit their module for review by other Perl
programmers.
The presence of links from literature to computational products (see
Figure 9) is an important step forward toward a complete description of
the scholarly record and research transparency. However, as in the case
of datasets, their availability is not a sufficient condition to ensure
that the research flow can be effectively assessed and reproduced
because a big portion of researchers’ work is still hidden: the “tip of
the iceberg” is bigger, but it is still only the “tip”.
![Figure 9. Research software realised during the data collection, processing and analysis phases is made accessible via a software repository and citable via a persistent identifier.][12]
*Figure 9. Research software realised during the data collection, processing and analysis phases is made accessible via a software repository and citable via a persistent identifier.*
**Table 4. Manual reproducibility and peer review of computational products**
![Table 4. Manual reproducibility and peer review of computational products][13]
### Remarks
In summary, reviewers assess research flows based on literature and the
availability of datasets and (or) computational products. The final
judgment is still far from being transparent as reviewers are not in the
condition of repeating or reproducing the research flow. In the digital
era, research data is generated, collected, manipulated and analysed by
means of digital ICT tools. Also, very often researchers implement their
own research computational tools (i.e. computational research product)
to perform data processing. As mentioned above, common knowledge about
these tools is fundamental for data processing to be properly assessed
by reviewers and reproduced by other researchers. In all current
approaches described above, researchers describe the methodology and the
process in a paper (e.g. an algorithm in pseudo-code) so that reviewers
can at least assess the logics of the process and readers could, at
least in theory, re-implement it. Interestingly, depending on what is
described in the paper, the review may not be related with the research
flow as a whole, but rather concerned with the synthesis of the
scientific conclusions.
Machine reproducibility and peer review: state of the art
----------------------------------------------------------
In order to proceed on the road of Open Science, research in information
science has started to explore and conceive solutions that focus on
generating research products whose purpose is sharing “methodologies”
rather than “final results”. Such products are digital encodings,
executable by machines to reproduce the steps of an experiment or an
entire research flow. They factor out from the scientific article the
concept of experiment and research flow, making it a tangible,
machine-processable and shareable result of science.
### Experiments
The availability of research data and computational products increases
the chances of reproducing an experiment described in scientific
literature, but still requires considerable human intervention. In order
to reuse and fairly review scientific findings, researchers and
reviewers should be equipped with the products and tools necessary to
encode, run, share and replicate experiments \[[*5*](#y4jrrxu2zyjf)\].
In this respect, research in the area of data models and information
systems for scholarly communication has focused on the following
aspects:
- *Data (information) models* for the representation of digital products encoding experiments : such models try to capture the essence of an experiment as a sequence of steps, at different levels of sophistication, e.g. human actionable steps, engine-executable processes made of service calls, packages of files to be unwrapped and executed manually by a researcher;
- *Tools for generating experiment products*: experiment products include details of the digital laboratory (e.g. service or hardware configuration) and the input/output of each step at the moment of the experiment; in most scenarios such parameters are tedious to collect and in general we cannot expect the researcher to manually compile an experiment product; digital laboratories should be equipped with tools capable of creating a snapshot of the overall setting characterizing the experiment in order to make it shareable;
- *Tools for executing experiment products*: the researchers running an experiment, as well as those willing to reuse it once it is published, should be equipped with tools for its execution and test.
- *Scholarly communication practices* to share, cite, evaluate, and assign scientific reward for such products (i.e. author credit): scholarly communication must face the challenges required to enable sharing and scientific reward for any new scholarly object entering its ecosystem. Scientist spend their brainpower, energies, and funds to generate objects, and all stakeholders involved (including organizations and funders) demand return of investment.
Relevant examples of information systems for experiment publishing are
[*protocols.io*](https://www.protocols.io),
[*ArrayExpress*](https://www.ebi.ac.uk/arrayexpress/) and
[*myExperiment*](https://www.myexperiment.org) (see table 5 for a summary).
*Protocols.io*
protocols.io is an open access repository of scientific protocols where
end-users can deposit, find, comment and modify protocols describing an
experiment (wetlab, computational, or mixed). The idea behind
protocols.io is similar to the one of registered reports: the
methodology and protocol of an experiment is separated from the
description of the results. Protocols deposited in protocols.io can be
considered a form of structured and executable registered report that,
optionally, can refer to datasets and the services needed to download
and process them. The main difference between registered reports and
protocols.io is that the first are literature products, while in
protocols.io the experiments are in a machine-readable format. For human
consumption, it is possible to download the experiment in json or PDF
formats, which are automatically generated on request, or access the
protocol from the web site and use the available functionality to read
and “run” the experiment. The “run” option guides the user through each
step of the protocol, making it easy to re-do each step and keep track
of the status of the experiment re-production. A protocol can be private
to a user or a group of users, or public. A public protocol can be
accessed, commented and re-used by any user. DOIs can be assigned upon
creator’s request to both public and private protocols (e.g. to
privately share a protocol with reviewers without making it available to
the public). Although designed for experiments in life science, the
approach of protocols.io is applicable to every scientific domain,
whenever an experiment can be described as a sequence of steps to be
performed by humans or by machines.
In agreement with the journal GigaScience, which mandates the
publication of the methods described in submitted papers on
protocols.io, protocols are not peer reviewed. Protocols.io curators
perform basic checks to exclude pseudoscience and fake protocols. The
lack of a pre-publication peer review process for the deposited
protocols is motivated by the conviction that the quality of a protocol
can only be verified when replicated by other researchers and that the
replication of an experiment is not a task that a paper’s reviewer would
perform. \[[*9*](#kix.2s3f26jrwtnx)\]
*ArrayExpress*
[*ArrayExpress*](https://www.ebi.ac.uk/arrayexpress/submit/overview.html)
is an archive for functional genomics data generated from microarray and
next-generation sequencing (NGS) platforms. ArrayExpress intercepted the
need of researchers to re-use existing data and evolved its model to
archive data together with detailed information about the experiments
because “*users should have everything they need for the data set to
make sense and be reproducible without referring to an associated
paper.*”[^4] Submissions are subject to guidelines that addresses both
data and metadata. Specifically, microarray experiments must comply with
the MIAMI guidelines (Minimum Information About a Microarray Experiment)
\[[*18*](#bojeho61unbc)\], while sequencing experiments must comply with
the MINSEQE guidelines (Minimum Information about a high-throughput
SEQuencing Experiment) \[[*19*](#wmarm4cgmdir)\]. In brief, guidelines
applied by ArrayExpress mandate the depositor to provide:
- The raw data (multiple file formats are accepted);
- The processed data (a tab delimited text file is mandatory, other formats are accepted in addition);
- A detailed description of the experiment, including its objectives and design;
- A description of the data processing protocols (e.g. normalisation or filtering methods used to generate the final data, algorithm used for alignment, summary of the instrumentation).
Specific guidelines are provided based on the specific experiments to be
deposited. Depositors are guided through these details thanks to
pre-defined [*pre-submission
checklists*](https://www.ebi.ac.uk/arrayexpress/help/pre-submission_checklist.html)
and a dedicated submission tool called Annotare, which automatically
checks the compliance of the submission to the guidelines in terms of
inconsistencies and missing mandatory metadata or files.
The ArrayExpress curation team does not perform scientific peer review
of a deposited experiment, but further checks the quality of the
metadata and the format of the submitted data files in order to ensure
high-quality and promote reproducibility of functional genomics
experiments.\[[*20*](#rxa2tvwbk6)\]
*myExperiment* [*myExperiment*](https://www.myexperiment.org) is a
repository of so called “Research objects”, digital objects that
aggregate resources related to a scientific experiment or a research
investigation \[[*21*](#6umjfebkj9n2), [*22*](#tmxrsvggkzan)\].
Resources include publications, bibliographic metadata, the data used
and produced by an experiment (or a link to them), methods applied to
produce or analyse the data. Named relationships can link resources
belonging to the same Research Object to semantically describe the
connections among resources. The Research Object model is a generic
model that research communities can configure to match their needs and
requirements via the concept of *profile*. A profile defines the shape
and form of a domain- or application-specific research object in terms
of types of resources, formats of metadata and a minimal information
model (MIM) checklist that formally specify its requirements and perform
automatic validation of a research object. Beyond this automatic
validation, myExperiment does not perform any other technical assessment
of the deposited objects.
Different communities adopted myExperiment and the research object model
to share experiments. Examples are the ISA (Investigation-Study-Assay)
Research Object Bundle for the systems biology community
([*FAIRDOMHub*](https://fairdomhub.org/)) and the Workflow-centric
research object, used for the representation of workflows and used by
the biodiversity science community of [*BioVel*](https://www.biovel.eu/)
and users of the [*Taverna
workbench*](https://taverna.incubator.apache.org/).
**Table 5. Machine reproducibility and peer review of experiments**
![Table 5. Machine reproducibility and peer review of experiments][14]
![Figure 10. A digital object containing all the products used and generated during the research flow is published. The goal is to make an experiment of the research flow reproducible by others.][15]
*Figure 10. A digital object containing all the products used and generated during the research flow is published. The goal is to make an experiment of the research flow reproducible by others.*
### Sharing and interpretation of Research Flow products
Executing experiments and computational workflows is an important part
of the research flow and their publishing is a crucial step toward the
Open Science paradigm (see Figure 10). However, sharing inputs, outputs
and processes is often not enough for a human to understand the
experiment and its value. In fact, information about the investigation
in which the experiment is conducted is necessary in order to add
context to the experiment and to make the interpretation easier.
Guidelines like MIAME \[[*18*](#bojeho61unbc)\], MINSEQE
\[[*19*](#wmarm4cgmdir)\] and the ISA framework
\[[*24*](#ojhnd0e0q8cn)\] adopted by ArrayExpress and FAIRDOMHub, take
this aspect into consideration and, when used with their full potential,
may help at describing the whole research flow, from its inception and
design to its final results. protocols.io goes in the same direction,
suggesting the creators of protocols to add as much information as
possible in the proper sections of the protocol submission page
\[[*23*](#unp3wmbkbw19)\]. However, these approaches fail at fully
supporting research flow peer review for the following reasons:
- They model an individual experiment rather than an arbitrary sequence of them;
- They generally do not model the scientific method, but focus on the actual sequences of steps forming an experiment;
- Their focus is on repeatability/reproducibility of an experiment rather than peer reviewing of all products and of the research flow (research methodology).
- They do not make a distinction between successful and not successful experiment.
### Remarks
Peer reviewing the whole research flow is certainly the most complete
conception of evaluation and assessment of science. Its modeling
includes the scientific method, which corresponds to how science is
performed (the structuring of scientific thinking), and the experiments,
which correspond to model how science was actually carried out in terms
of steps, digital laboratory assets, and generated research products. As
clarified by the analysis of the state of the art, existing approaches
nicely solve some of these issues, but none of them tackles the general
problem. Existing solutions have reproducibility of science as their
main objective, rather than research flow peer review, hence they focus
on the executable representation of digital objects encoding successful
experiments. Such objects express the logic of an experiment but do not
describe the overall research flow of which they are a final step; not
only, such objects do not describe the scientific method underlying the
research flow. For example if a scientist adopts a research flow devised
as a cycle of experiments, refining their inputs and outputs until a
certain success threshold is reached, using the approaches described
above the scientist will publish only the last execution of the
experiment, together with the research products required to reproduce
it. Overall, these observations lead to the following considerations.
*Ongoing research flow peer review* In contrast with traditional peer
review models, which assess scientific results only once the research
activity has been successful, peer review could/should also be applied
during the ongoing research flow, as a sort of monitoring and interim
evaluation process. Ongoing research flow peer review would also
increase the possibility for a researcher to demonstrate the validity
and trustworthiness of the research being carried out and its
(intermediate) results.
*Negative results.* By sharing intermediate research flow experiments
and steps a researcher would also open up the “publication of negative
results”. This practice could have a twofold positive effect: on the one
hand, the researcher might receive comments and advice from colleagues,
on the other hand, she would help the community by suggesting to avoid
the same “mistakes” \[[*25*](#wjy0zgxs0pam)\].
*Machine assisted Vs human assisted review.* Today, there is no formal
(and technologically supported) distinction between which steps of an
experiment (and of the research flow) should be peer-reviewed by humans,
e.g. novelty and impact of a research flow and its final products, and
what could be reviewed by machines, e.g. conformance to given structural
and semantic requirements of data and software (e.g.
\[[*26*](#5fxg1rwf3ll7)\]). It would be desirable to have tools for
“machine-assisted peer-review”, built on the very same digital
laboratory assets that generated research products, e.g. verify research
product conformance to given domain requirements and standards. Although
humans would still play a central role in the peer-review process, such
tools would support reviewers facing challenges going beyond their
capabilities (e.g. checking the quality of each record in a database).
*Scientific method review.* In order to achieve an omni-comprehensive
review of the research flow, reviewers would benefit from viewing a
description of the underlying scientific process. This approach is the
one underlying the Registered Reports proposed by the Centre of Open
Science \[[*11*](#gi55cn6w3gq)\] which however is concerned with human
peer review of literature products. None of the existing approaches aims
at a peer review approach driven by a digital representation of the
scientific process (a specific type of methodology product) where the
research flow is intended as a peer reviewable instance of such process.
Towards research flow peer review
=================================
As summarized above the implementation of a fully-fledged research flow
peer review methodology has requirements (tools and practices) that
differ from those identified in Open Science for reproducibility.
Reproducibility of science and its underlying principles are indeed
crucial to support transparent peer review, but existing practices are
not enough to fully address research flow peer review. In order to
support this kind of peer review reviewers should evaluate science by
means of a user-friendly environment which transparently relies on the
underlying digital laboratory assets, hides their ICT complexity, and
gives guarantees of repeatability and reproducibility recognized by the
community.
In this section we sketch some ideas in the direction of the definition
of a framework for the representation of a research flow peer review for
a given discipline of science. Such a framework may become the
scaffolding on top of which developing tools for supporting ongoing peer
review of research flows by “real-time hooking” to the underlying
digital laboratory, where scientists are carrying out their research
flow. Such tools would abstract over the complexity of the research
activity and offer user-friendly dashboards to exmine the scientific
process adopted, explore the ongoing research flow, and evaluate its
intermediate experiments and relative products. In a less advanced
implementation, such tools may provide scientific process and research
flow to reviewers once the research activity has been terminated,
inclusive of all intermediate experiments, steps and research products.
To this aim, the framework should be built around the notion of
*research flow review templates*. These are representations of the
scientific processes in terms of patterns (sequences and cycles) of
experiments and relative steps to be peer reviewed; note that such
templates should include all and only experiments, steps, and relative
“signatures” for which peer-review is required and supported. In other
words a research flow template is not intended to describe the detailed
experiments and steps of a research activity but to model which subset
of these is relevant to assess the quality of the relative research
flows.
For example, consider the scientific process in Figure 11, which models
one experiment repeatedly executed until the research activity is
successful. At every round, the experiment designs (1) and collects (3)
input data, instruments the digital laboratory with processing
algorithms (2) and performs the analysis (4) to produce output data.
Finally, it publishes (5) all such products. Then we may assume that the
only review checkpoint is the one of “publication” (5), where input data
and digital laboratory assets are made available. The corresponding
research template would model the very same cycle and be made of one
experiment including a single step of peer review, the one of
publication mentioned above. Ongoing peer review tools would allow
reviewers to select a given execution of the experiment in time, explore
and assess input and output data, and re-execute the given step given
the relative products. Of course, such tools should be equipped with
functionalities to provide feedback and evaluation.
![Figure 11. Research lifecycle adopted from the research process model by Kraker & Lindstaedt (Source: OpenUP WP4 team, deliverable D4.1)][16]
*Figure 11. Research lifecycle adopted from the research process model by Kraker & Lindstaedt (Source: OpenUP WP4 team, deliverable D4.1)*
Several sciences are making use of shared scientific process patterns,
for example in clinical trials,[^5] where sample-based experiments are
structured and documented according to established protocols in order
for other scientists to transparently understand the thinking underlying
given findings. In order for a community to provide a specification of
the peer-reviewable part of its research flow and therefore build
adequate tools in support of reviewers, a simple formal framework
capable of describing the structure of a given community research flow
review template(s) should be available. Each template should reflect one
particular way of performing science, capturing the steps which should
be subject to peer review (each community may define more than one
template). At the same time, templates enforce researchers at complying
with certain expectations when producing science. Templates express
common behaviour, determine good practices, enable reproducibility and
transparent evaluation of science. To make an analogy, the structure of
a template should reflect the structure of a recipe for cooking. It
should specify a list of all the (types of) products needed from the
digital laboratory at each step of the research flow (the ingredients)
and should mandate a detailed description (machine actionable) of all
the steps to be executed (the mixing and the cooking) in order to
reproduce the research results (the cake).
Such research flow review framework should encompass the following
concepts (see figure 12):
- *Research flow template*: the model of research flow to be followed by scientists in terms of experiments, including cycles, conditions, etc. to be peer reviewed;
- *Step and experiment signatures*, intended as:
- *Result products*: the classes of products required and returned by such steps (datasets, literature, computational products);
- *Asset of the digital laboratory* (which is not necessary a research product) required for the execution of the step or of the experiment.
- *Methodology products*: the classes of products encoding experiments;
- *Different classes of literature*: ranging from documentation to descriptions of scientific methods.
Sharing a framework of this kind allows the realization of *research
publishing tools* and *review tools* that allow scientists to produce
products as expected and other scientists to access such products, for
reuse, reproducibility, and review. As mentioned above, to be effective
and used in practice, such tools should be:
- *Integrated with digital laboratory assets used to perform science*: scientists should focus on developing their science rather than publishing it; the process of creating research products and methodology products should be delegated as much as possible to machines, together with tracking the history of the actual research flow; digital laboratory assets require research publishing tools (e.g. wrappers, mediators) capable of flanking the experiment functionality they support with functionality for packaging and publishing the relative products, so that review tools can benefit from those;
- *Easy to use*: user-friendly enough for scientists to access machine-assisted review tools without development skills; reviewers should be able to view the actual research flow, to view its current stage of development, and to apply machine-assisted validation from end-user interfaces;
- *Trustworthy*: easy to use is a property that should come with guarantees of fairness, typically endorsed by the community adopting research publishing and review tools.
Implementing this vision raises serious challenges as it requires not
only endorsement of communities but also top-down cultural convergence
(i.e. rigorous behaviour in performing science). Most importantly, the
realization and maintenance of tools for publishing and review whose
cost do not easily find a donor in communities that are typically formed
by scientists rather than institutions.
![Figure 12. Research flow templates concepts][17]
*Figure 12. Research flow templates concepts*
Use cases
=======
The research flow review framework supports the definition of research flow templates to model common discipline-specific patterns (best practices for a given discipline) focusing on the steps of the research flows that should be subject to peer-review. In the following, experimental research flow templates for two specific use cases are presented. The first use case is in geothermal energy science, which is studying the energy generated and stored in the Earth. Geothermal research includes on-site activities for subsoil data measurements (in some cases already collected data is re-used) and digital activities for data curation and analysis. Typical research activities in this field are chemical analysis of rock samples and the geologic and electromagnetic modelling of the subsoil with a technique called “forward modelling”. The second use case is in archeology, i.e the study of ancient cultures and civilizations through examination of the artifacts (objects and or buildings) found over or (more often) under the ground. For both use cases we instantiate a research flow template reflecting the current practices in the field. By analysing the research flow instance we are able to identify gaps and possible enhancements to peer review practices, resulting in a further research flow that better suits Open Science principles and supports ongoing peer review.
## Use case on geothermal energy science ##
Geothermal energy science is the scientific discipline studying energy generated and stored in the Earth. Geothermal research includes on-site activities for subsoil data measurements (in some cases already collected data is re-used) and digital activities for data curation and analysis. Typical research activities in this field are chemical analysis of rock samples and the geologic and electromagnetic modelling of the subsoil with a technique called “forward modelling”.
### Chemical analysis of rock samples ###
The research activity can be synthesized in three main steps:
1. Collection of the rock sample to analyse;
2. Perform laboratory analysis on the rock sample (types of analysis vary based on the hypothesis to confirm);
3. Publish an article where the hypothesis and the results of the analysis are presented and discussed. A subset of the analysis data are usually available as tables in the article (i.e. not published in a data repository).
@[osf](gb3ez)
*Figure 13. Research flow for the use case on chemical analysis of rock samples*
Given this scenario, the corresponding research flow template for peer review would only include one experiment with a single step of peer review, the one of publication of the article embedding the analysis data (figure 13). The scenario could be enhanced to embrace Open Science practices by pushing researchers at openly publishing the full results of the analysis. Analysis data would be fully available to peer-reviewers and readers, instead of being partially available as hardly re-usable tables embedded in the full-text of the article. This enhanced scenario is depicted in figure 14.
@[osf](j7etk)
*Figure 14. Research flow for the use case on chemical analysis of rock samples: analysis data completely available as separate but linked research products*
### Forward modelling of subsoil ###
The research activity can be synthesized as shown in figure 15:
1. Find data: subsoil data about a location can be collected on-site with specific instruments or already existing data can be re-used, when available.
2. Modelling
a) Data is imported in the modelling software tools (e.g. [Comsol multiphysics][18], [3D GeoModeller][19])
b) The researcher uses the software to select, configure and apply canonical equations for the generation of the predictive model
c) The generated model is manually verified by the researcher. If it is not correct based on the available data, then the researcher fines tune the parameters of the equation to generate a model that better fits. If the model fits the data, then it is used to answer the research question.
d) The researcher publish an article that includes the model, part of the input data and the interpretation of the model
@[osf](pqzbm)
*Figure 15. Forward modelling of subsoil*
As in the previous case, current practice is to describe the research flow in the article, often based on paper templates provided by publishers, which is the only product subject to review. As a consequence, even if the research activities is more complex in terms of steps with respect to the lab analysis on rock sample, the template for research flow peer review resemble the one in figure 14, where the only step of peer review is the one of the publication of the article.
A possible research flow template for the community of researchers in geothermal science could include the publishing of raw data and of all experiments ran to generate the final predictive model (fig. 16). By using this template, researchers do not only share the raw data and the final predictive model, but also the models that were discarded and the relative equation’s configuration, i.e. negative results.
@[osf](gje3t)
*Figure 16. Research flow for the use case on forward modelling of subsoil: raw data, experiments and models are available, including those that produced negative results*
## Use case on archeology ##
Archeology is the study of ancient cultures through examination of the artifacts (objects and or buildings). A typical research flow in archeology is depicted in figure 17 and it is composed of the following steps:
1. Preliminary studies to identify a place of interest. Studies include the analysis of indirect and direct sources. Examples of indirect sources are texts written by ancient geographers. Direct sources include street epigraphes and artifacts with information about a place, such as amphorae with stamps or coins.
2. Preliminary studies typically help the archeologist to formulate an hypothesis on the area to search regarding its geographical location and its societal and cultural role in a specific historical period.
3. The archaeologist sees aerial/satellite photos of the areas and looks for traces
4. The archaeologist visits the area and looks for topographic hints to circumscribe the area
5. Spot corings are performed and, if positive, the area is prepared for excavation
6. During the excavation, the archeologist produces different types of documentation, often referred as “raw data”:
- The Harris matrix for stratigraphic squaring, which tracks what have been found and where.
- Excavation diary: a daily log kept by the excavators on site during each season. They record the day to day activity of the team and their observations on their work
- Matrix map (GIS)
- Photogrammetry: photos taken during excavations that can be used to generate 3D models
7. At the end of the excavation activity, an excavation report is produced. It contains a summary of the excavation diary and the hypothesis of the archeologist based on the raw data. Raw data are not published, only the part that has been actually used for the interpretation
@[osf](rtjyc)
*Figure 17. Use case on archeology*
The excavation report represents the main scholarly communication output of excavation activities and include the interpretation of the archeologist. However, the report does not contain or reference to the full raw data, thus peer reviewers can only assess that the authors’ conclusions are not completely wrong based on the subset of information that the authors decided to provide in the report.
A peer review research flow template that reflects this current practice is the same as that in Figure 13: the only step that produces outputs for peer-review is the final publishing step.
Open publishing of raw data is still far to be a common practice, although some countries are issuing national mandates for open excavation reports and international activities like ARIADNE are pushing for the open access to all raw data (excavation diaries and reports, Harris matrix, GIS maps, 3D Models.
A research flow template supporting the Open Science paradigm for archeology is shown in figure 18. Preliminary studies and initial hypothesis can be shared by providing a collection of direct and direct sources, including, for example, the studied texts and on-site photos. Raw data produced during the excavation is also made available as a set of separate products, possibly linking to each others. Another option (not shown in the figure) is to publish all raw data as a single research object. The final step is for the publishing of the excavation report which should link to the other objects produced during the investigation.
@[osf](gcvja)
*Figure 18. Research flow for the use case on archeology*
# Conclusion #
The Open Science paradigm calls for the availability, findability and accessibility of all products generated by a research activity. That practice is a prerequisite for reaching two of the main goals of the Open Science movement: reproducibility and transparent assessment of research activities. In this paper we have described the current practices for the peer review of research flows, which range from traditional peer-review via scientific literature to peer review by reproducibility of digital experiments. We have argued that current practices have reproducibility of science as their main objective and they do not fully address transparent assessment and its features like publishing negative results, supporting peer-review while the research activities is ongoing and enabling machine-assisted peer review.
Foundations of a framework for the peer-review of research flows have been presented. The goal of the framework is to be the bridge between the place where the research is conducted (i.e., the digital laboratory) and the place where the research is published (or in general, made available and accessible). The framework aims at providing the scaffolding on top of which reviewers can evaluate science by means of a user-friendly environment that transparently relies on the underlying digital laboratory assets, hides their ICT complexity, and gives guarantees of repeatability and reproducibility recognized by the community. One of the building block of the framework is the notion of research flow template, through which a community can model the research flow to be followed by scientists in terms of experiments, including cycles, conditions, etc. to be peer reviewed. The framework allows communities to define one or more research flow templates, each capturing the steps which should be subject to peer review for a specific type of research activity. Templates are not only useful to peers willing to evaluate a research activity, but also enforce researchers at complying with certain expectations of their community, like best practices and common behaviour.
The framework is theoretically applicable to any field of research adopting digital objects and/or producing digital research outputs. Detailed analysis on the applicability of the framework is ongoing. Specifically, the fields of geothermal energy science and archeology have been considered as representatives of non-fully digital disciplines, which may pose challenges from the modelling point of view, as not all the research assets and products may be available in a digital laboratory.
References
==========
1. European Commission (2015). Validation of the results of the public consultation on Science 2.0: Science in Transition \[report\]. Brussels: European Commission, Directorate-General for Research and Innovation. Available at: [*http://ec.europa.eu/research/consultations/science-2.0/science\_2\_0\_final\_report.pdf*](http://ec.europa.eu/research/consultations/science-2.0/science_2_0_final_report.pdf).
2. European Commission's Directorate-General for Research & Innovation (RTD) (2016). Open Innovation, Open Science and Open to the World. Available at: [*https://ec.europa.eu/digital-single-market/en/news/open-innovation-open-science-open-world-vision-europe*](https://ec.europa.eu/digital-single-market/en/news/open-innovation-open-science-open-world-vision-europe).
3. FOSTER. Open Science Definition: [*https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition*](https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition) (last accessed 19 May 2017).
4. De Roure, D. (2009) Replacing the Paper: The Twelve Rs of the e-Research Record. Open Wetware blog: [*http://blog.openwetware.org/deroure/?p=56*](http://blog.openwetware.org/deroure/?p=56) (last accessed 19 May 2017).
5. Bechhofer S. et al. (2013). Why linked data is not enough for scientists, Future Generation Computer Systems, Volume 29, Issue 2, February 2013, Pages 599-611, ISSN 0167-739X, [*https://doi.org/10.1016/j.future.2011.08.004*](https://doi.org/10.1016/j.future.2011.08.004).([*http://www.sciencedirect.com/science/article/pii/S0167739X11001439*](http://www.sciencedirect.com/science/article/pii/S0167739X11001439)).
6. Borgman, C. L. (2015). *Big data, little data, no data: scholarship in the networked world*. MIT press.
7. Stančiauskas, V., and Banelytė V.. (2017). OpenUP survey on researchers' current perceptions and practices in peer review, impact measurement and dissemination of research results \[Data set\]. Zenodo. [*http://doi.org/10.5281/zenodo.556157*](http://doi.org/10.5281/zenodo.556157).
8. Smagorinsky, P. (2008). The method section as conceptual epicenter in constructing social science research reports. Written Communication, 25, 389-411. [*http://journals.sagepub.com/doi/pdf/10.1177/0741088308317815*](http://journals.sagepub.com/doi/pdf/10.1177/0741088308317815).
9. Teytelman, L. (2016), We've been itching to share this! Integration of GigaScience and protocols.io is an example of how science publishing should work. Protocols.io news: [*https://www.protocols.io/groups/protocolsio/news/weve-been-itching-to-share-this-integration-of-gigascience*](https://www.protocols.io/groups/protocolsio/news/weve-been-itching-to-share-this-integration-of-gigascience) (last accessed 19 May 2017).
10. Cotos E., Huffman S., and Link S. (2017). A move/step model for methods sections: Demonstrating Rigour and Credibility, English for Specific Purposes, Volume 46, April 2017, Pages 90-106, ISSN 0889-4906, [*https://doi.org/10.1016/j.esp.2017.01.001*](https://doi.org/10.1016/j.esp.2017.01.001).
11. Center for Open Science. Registered Reports: Peer review before results are known to align scientific values and practices. [*https://cos.io/rr/*](https://cos.io/rr/) (last accessed 19 May 2017).
12. FORCE11 (2014). Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing version b1.0. [*https://www.force11.org/fairprinciples*](https://www.force11.org/fairprinciples) (last accessed 19 May 2017).
13. Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al. (2016) "The FAIR Guiding Principles for scientific data management and stewardship." *Scientific data* 3 (2016).
14. Assante, M., Candela, L., Castelli, D. and Tani, A. (2016). Are Scientific Data Repositories Coping with Research Data Publishing?. *Data Science Journal*, *15*, 6. DOI: [*http://doi.org/10.5334/dsj-2016-006*](http://doi.org/10.5334/dsj-2016-006).
15. Mayernik, M.S., S. Callaghan, R. Leigh, J. Tedds, and S. Worley (2015). [*Peer Review of Datasets: When, Why, and How.*](http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-13-00083.1) *Bull. Amer. Meteor. Soc.,* 96, 191–201, doi: 10.1175/BAMS-D-13-00083.1.
16. Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762. doi:[*10.1002/asi.23358*](http://doi.org/10.1002/asi.23358).
17. Carpenter, T. A. (2017). What Constitutes Peer Review of Data: A survey of published peer review guidelines. *arXiv preprint arXiv:1704.02236*. [*https://arxiv.org/pdf/1704.02236.pdf*](https://arxiv.org/pdf/1704.02236.pdf) (last accessed 19 May 2017).
18. MIAMI guidelines (Minimum Information About a Microarray Experiment): [*http://fged.org/projects/miame/*](http://fged.org/projects/miame/).
19. MINSEQE guidelines (Minimum Information about a high-throughput SEQuencing Experiment): [*http://fged.org/projects/minseqe/*](http://fged.org/projects/minseqe/)
20. Tang, A. (2017). ArrayExpress at EMBL-EBI quality first! Repositive blog: [*https://blog.repositive.io/arrayexpress-at-embl-ebi-quality-first/*](https://blog.repositive.io/arrayexpress-at-embl-ebi-quality-first/) (last accessed 19 May 2017).
21. De Roure, D, Goble, C., and Stevens R. (2009). The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows. *Future Gener. Comput. Syst.* 25, 5 (May 2009), 561-567. [*http://dx.doi.org/10.1016/j.future.2008.06.010*](http://dx.doi.org/10.1016/j.future.2008.06.010).
22. Bechhofer, S., De Roure, D., Gamble, M., Goble, C., & Buchan, I. (2010). Research objects: Towards exchange and reuse of digital knowledge. *The Future of the Web for Collaborative Science*, *10*.
23. protocols.io team (2017). How to make your protocol more reproducible, discoverable, and user-friendly. Protocols.io. [*http://dx.doi.org/10.17504/protocols.io.g7vbzn6*](http://dx.doi.org/10.17504/protocols.io.g7vbzn6).
24. Sansone, S. A., et al. (2012). Toward interoperable bioscience data. Nature genetics, 44(2), 121-126. doi:[*10.1038/ng.1054*](http://dx.doi.org/10.1038/ng.1054).
25. Di Leo, A., E. Risi, and L. Biganzoli. No pain, no gain… What we can learn from a trial reporting negative results. Annals of Oncology 28.4 (2017): 678-680.
26. Shanahan D. A peerless review? Automating methodological and statistical review, https://blogs.biomedcentral.com/bmcblog/2016/05/23/peerless-review-automating-methodological-statistical-review/
[1]: https://mfr.osf.io/export?url=https://osf.io/59tzu/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[2]: https://mfr.osf.io/export?url=https://osf.io/mb3vq/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[3]: https://mfr.osf.io/export?url=https://osf.io/drsta/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[4]: https://mfr.osf.io/export?url=https://osf.io/xacz6/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[5]: https://mfr.osf.io/export?url=https://osf.io/eahn5/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[6]: https://mfr.osf.io/export?url=https://osf.io/j57mb/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[7]: https://mfr.osf.io/export?url=https://osf.io/5e7bf/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[8]: https://mfr.osf.io/export?url=https://osf.io/9vx5k/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[9]: https://mfr.osf.io/export?url=https://osf.io/qrcw8/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[10]: https://mfr.osf.io/export?url=https://osf.io/5tpgh/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[11]: https://mfr.osf.io/export?url=https://osf.io/wx7gf/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[12]: https://mfr.osf.io/export?url=https://osf.io/24x7e/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[13]: https://mfr.osf.io/export?url=https://osf.io/jbpq9/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[14]: https://mfr.osf.io/export?url=https://osf.io/4rpqe/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[15]: https://mfr.osf.io/export?url=https://osf.io/b5ck4/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[16]: https://mfr.osf.io/export?url=https://osf.io/xfvpc/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[17]: https://mfr.osf.io/export?url=https://osf.io/cepn2/?action=download&direct&mode=render&initialWidth=848&childId=mfrIframe&format=1200x1200.jpeg
[18]: https://www.comsol.com/
[19]: http://www.geomodeller.com/