Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation

The use of computers and complex software is pervasive in archaeology, yet their role in the analytical pipeline is rarely exposed for other researchers to inspect or reuse. This limits the progress of archaeology because researchers cannot easily reproduce each other’s work to verify or extend it. Four general principles of reproducible research that have emerged in other fields are presented. An archaeological case study is described that shows how each principle can be implemented using freely available software. The costs and benefits of implementing reproducible research are assessed. The primary benefit, of sharing data in particular, is increased impact via an increased number of citations. The primary cost is the additional time required to enhance reproducibility, although the exact amount is difficult to quantify.


Introduction
Archaeology, like all scientific fields, advances through rigorous tests of previously published studies.When numerous investigations are performed by different researchers and demonstrate similar results, we hold these results to be a reasonable approximation of a true account of past human behavior.This ability to reproduce the results of other researchers is a core tenet of scientific method, and when reproductions are successful, our field advances.In archaeology, we have a long tradition of empirical tests of reproducibility, for example, by returning to field sites excavated or surveyed by General Principles of a Reproducible Methodology Data and Code Provenance, Sharing, and Archiving Perhaps the most trivial principle of reproducible research is making openly available the data and methods that generated the published results.This is a computational analogue to the archaeological principle of artifact provenience.For example, without provenience information, artifacts are nearly meaningless; without providing data and code, the final published results are similarly diminished.Making data and code available enables others to inspect these materials to evaluate the reliability of the publication, and to include the materials into other projects, and may lead to higher quality and more impactful published research (Gleditsch and Strand 2003;Piwowar et al. 2007;Wicherts et al. 2011).While it might seem a basic principle for reproducible research, current community norms in archaeology, like many disciplines, do not encourage or reward the sharing of data and other materials used in the research leading to journal articles (Borgman 2012;B. McCullough 2007;Stodden et al. 2013;Tenopir et al. 2011).While funding agencies, such as the US National Science Foundation (NSF), require a data management plan (DMP) in proposals, and some journals, such as PLOS ONE and Nature, require data availability statements, none of these require all archaeologists to make their data available by default (Begley and Ioannidis 2015;Miguel et al. 2014).For archaeology submissions to the NSF, the DMP recommendations were developed by the Society of American Archaeologists, rather than from within the NSF (Rieth 2013).
It is difficult to prescribe a single approach to making data and other materials openly available because of the wide variety of archaeological data, and the diversity of contexts it is collected (Kintigh 2006).As a general principle that should be applicable in all cases, the provenance of the data must always be stated, even if the data are not publicly accessible (for example, due to copyright limitations, cultural sensitivities, for protection from vandalism, or because of technical limitations).Where a journal article includes data summaries and visualizations, the principle is that authors make publicly available (i.e., not Bby request^) the computer files containing the rawest form possible of the data from which the summaries and plots were generated (e.g., spreadsheets of individual measurement records).This minimalist approach means that only the data needed to support the publication should be released, the rest can be kept private while   (Stodden 2009).Discipline-agnostic repositories include figshare.comand zenodo.org,and repositories and data sharing services specifically for archaeologists include the Archaeological Data Service, the Digital Archaeological Record, and Open Context (Arbuckle et al. 2014;Kansa et al. 2011).

Scripted Analyses
The dominant mode of interaction with data analysis tools for many researchers is a mouse-operated point-and-click interface with commercial software such as Microsoft's Excel, IBM's SPSS and SAS's JMP (Keeling and Pavur 2007;Thompson and Burnett 2012).This method of interaction is a formidable obstacle to reproducibility because mouse gestures leave few traces that are enduring and accessible to others (Wilson et al. 2014).Ad hoc edits of the raw data and analysis can easily occur that leave no trace and interrupt the sequence of analytical steps (Sandve et al. 2013).While it is possible for a researcher to write down or even video their mousedriven steps for others to reproduce, and this would be an excellent first step for sharing methods in many cases, these are rather cumbersome and inefficient methods for communicating many types of analyses.A second problem with much mouse-driven software is that the details of the data analysis are not available for inspection and modification because of the proprietary code of the software (Ince et al. 2012;Vihinen 2015).This constrains the transparency of research conducted with much commercial and mouse-driven software (Hatton and Roberts 1994).
While there are many conceivable methods to solve these problems (such as writing out all the operations in plain English or making a video screen-capture of the analysis), currently the most convenient and efficient solution is to interact with the data analysis tools using a script (Joppa et al. 2013).A script is a plain text file containing instructions composed in a programming language that direct a computer to accomplish a task.In a research context, researchers in fields such as physics, ecology, and biology write scripts to do data ingest, cleaning, analysis, visualizing, and reporting.By writing scripts, a very high-resolution record of the research workflow is created, and is preserved in a plain text file that can be reused and inspected by others (Gentleman and Temple Lang 2007).Data analysis using scripts has additional advantages of providing great flexibility to choose from a wide range of traditional and cuttingedge statistical algorithms, and tools for automation of repetitive tasks.Sharing these scripts may also increase the impact of the published research (Vandewalle 2012).The general approach of a scripted workflow to explicitly and unambiguously carry out instructions embodies the principles of reproducibility and transparency.Examples of programming languages used for scripting scientific analyses include R, Python, and MATLAB (Bassi 2007;Eglen 2009;Perkel 2015;Tippmann 2014).Among archaeologists who share code with their publications, R is currently the most widely used programming language (Bocinsky 2014;Bocinsky and Kohler 2014;Borck et al. 2015;Contreras and Meadows 2014;Crema et al. 2014;Drake et al. 2014;Dye 2011;Guedes et al. 2015;Lowe et al. 2014;Mackay et al. 2014;Marwick 2013;Peeples and Schachner 2012;Shennan et al. 2015).

Version Control
All researchers face the challenge of managing different versions of their computer files.A typical example, in the simple case of a solo researcher, is where multiple revisions of papers and datasets are saved as duplicate copies with slightly different file names (for example, appending the date to the end of the file name).In a more complex situation with multiple researchers preparing a report of publication, managing contributions from different authors and merging their work into a master document can result in a complex proliferation of files that can be very challenging to manage efficiently.While this complexity can be an inconvenience, it can lead to more profound problems of losing track of the provenance of certain results, and in the worst cases, losing track of the specific versions of files that produced the published results (Jones 2013).
One solution to these problems is to use a formal version control system (VCS) (Sandve et al. 2013), initially developed for managing contributions to large software projects, and now used for many other purposes where multiple people are contributing to one file or collections of files.Instead of keeping multiple copies of a file, a VCS separately saves each change to a version control database (known as a Bcommit,^for example, the addition of a paragraph of text or a chunk of code) along with a comment describing the change.The commit history preserves a high-resolution record of the development of a file or set of files.Commits function as checkpoints where individual files or an entire project can be safely reverted to when necessary.Many VCSs allow for branching, where alternate ideas can be explored in a structured and documented way without disrupting the central flow of a project.Successful explorations can be merged into the main project, while dead ends can be preserved in an orderly way (Noble 2009).This is useful in two contexts, firstly to enable remote collaborators to work together without overwriting each other's work and, secondly, to streamline responding questions from reviewers about why one option was chosen over another because all the analytical pathways explored by the authors are preserved in different branches in the VCS (Ram 2013).Version control is a key principle for reproducible research because of the transparency it provides.All decision points in the research workflow are explicitly documented so others can see why the project proceeded in the way it did.Researchers in many areas of science currently use Git or Subversion as a VCS (Jones 2013), often through a public or private online hosting service such as GitHub, BitBucket, or GitLab.

Computational Environments
Most researchers use one of three operating systems as their primary computational environment, Microsoft Windows, Apple OS X, or Linux.Once we look beyond the level of this basic detail, our computational environments diversify quickly, with many different versions of the same operating system in concurrent use, and many different versions of common data analysis software in concurrent use.For basic data analysis, the primary problem here is poor interoperability of file types from different versions of the same software.But for more complex projects that are dependent on several pieces of complex software from diverse sources, it is not uncommon for one of those pieces to change slightly (for example, when an update is released, a minor configuration is changed, or because different operating systems causes programs to behave differently), introducing unexpected output and possibly causing the entire workflow to fail (Glatard et al. 2015).For example, computationally intensive analyses often use mathematical functions based on single-precision floating-point arithmetic whose implementations vary between software (Keeling and Pavur 2007) and across operating systems.For archaeologists, this issue is particularly relevant to simulation studies.This situation can make it very challenging to create a research pipeline that will remain reproducible on any computer other than that of the researcher who constructed it (and into the future on the same computer, as its component software changes in ways that are beyond control of the researcher, due to automatic updates).
At the most general level, the principle that attempts to solve this problem is to provide a description of how other researchers can recreate the computational environment of the research pipeline.The simplest form of this is a list of the key pieces software and their version numbers; this is often seen in the archaeological literature where exotic algorithms are used.In other fields, where computationally intensive methods are more widespread, and software dependencies are more extensive, more complex approaches have emerged, such as machine-readable instructions for recreating computational environments, or providing the entire actual computational environment that the analysis was conducted in (Dudley and Butte 2010;Howe 2012).Either of these provides another researcher with an identical copy of the operating systems and exact versions of all software dependencies.The ideal solution is to provide both, because providing the actual environment alone can result in a Bblack box^problem where the specific details of the environment are not available for inspection by another researcher, and the environment cannot easily be extended or joined to other environments for new projects.This results in a loss of transparency and portability, but this can be mitigated by providing a plain-text file that contains the instructions on how to recreate the environment in a machine-readable format.With this information, researchers can easily see the critical details of the environment, as well as efficiently recombine these details into other environments to create new research workflows.Examples of systems currently used by researchers to capture the entire environments include virtual machines (e.g., Oracle's VirtualBox) and GNU/Linux containers (e.g., Docker).These environments are designed to be run in an existing operating system, so a researcher might have a GNU/ Linux virtual machine running within their Windows or OS X computer.Vagrantfiles and Dockerfiles are common examples of machine-readable plain-text instructions for making virtual machines to an exact specification.One advantage of using self-contained computational environment like a virtual machine or container is that it is portable, and will perform identically whether it is used on the researcher's laptop or high-performance facilities such as a commercial cloud computing service (Hoffa et al. 2008).While these more complex approaches may seem a bridge too far for most archaeologists, they offer some advantages for collaborating in a common computing environment (i.e., in a project involving two or more computers using a virtual machine or container environment can simplify collaboration), and for working on small-scale iterations of an analysis prior to scaling up to time-consuming and expensive computations.
To summarize, in this section, I have described four general principles of reproducible research.These principles have been derived from current efforts to improve computational reproducibility in other fields, such as genomics, ecology, astronomy, climatology, neuroscience, and oceanography.The four principles are as follows: make data and code openly available and archive it in a suitable location, use a programming language to write scripts for data analysis and visualizations, use version control to manage multiple versions of files and contributions from collaborators, and, finally, document and share the computational environment of the analysis.Researchers following these principles will benefit from an increase in the transparency and efficiency of their research pipeline (Markowetz 2015).Results generated using these principles will be easier for other researchers to understand, reuse, and extend.
Case Study: The 1989 Excavation at Madjebebe, Northern Territory, Australia In this section, I describe my efforts to produce a publication of archaeological research that demonstrates the above principles of reproducible research.I describe the specific tools that I used, explain my reasons for choosing these tools, and note any limitations and obstacles I encountered.Our paper on Madjebebe (Clarkson et al. 2015) describes familiar types of evidence from a hunter-gatherer rockshelter excavation-stone artifacts, dates, sediments, mollusks.We-the co-authors of the Madjebebe paper and Imostly used conventional and well-established methods of analyzing, summarizing, and visualizing the data.In this example, I expect the typical reader will recognize the types of raw data we used (measurements and observations from stone artifacts, dates, sediments, mollusks), and the output of our analysis (plots, tables, simple statistical test results).The novel component here is how we worked from the raw data to the published output.For this Madjebebe publication, we experimented with the principles of reproducible research outlined above, and used data archiving, a scripted analytical pipeline, version control, and an isolated computational environment.Additional details of our specific implementations are available at https://github.com/benmarwick/1989excavation-report-Madjebebeand Marwick (2015).
That standard and familiar nature of the archaeological materials and methods used in the paper about Madjebebe should make it easy for the reader to understand how the methods for enhancing reproducibility described here can be adapted for the majority of research publications in archaeology.I recognize that not every research project can incorporate the use of these tools (for example, projects with very large amounts of data or very long compute times).However, my view is that the principles and tools described here are suitable for the majority of published research in archaeology (where datasets are small, i.e., <10 GB, and analysis compute times are short, i.e., <30 min).

Figshare for Data Archiving
We chose Figshare to archive all the files relating to the publication, including raw data, which we uploaded as a set of CSV files (Fig. 2).CSV stands for comma separated variables and is an open file format for spreadsheet files that can be opened and edited in any text editor or spreadsheet program.Although there are data repositories designed specifically for archaeologists (Beale 2012;Kansa 2012;Richards 1997), some of these are fee-based services and, at the time we deposited our data, they all lacked a programmatic interface and connections to other online services (such as GitHub, our version control backup service).Figshare is a commercial online digital repository service that provides instant free unlimited archiving of any type of data files (up to 250 MB per file) for individual researchers in any field, and automatically issues persistent URLs (DOIs).Figshare also supplies file archiving services for many universities and publishers, including PLOS and Nature.Figshare allows the user to apply permissive Creative Commons licenses to archived files that specify how the files may be reused.We chose the CC0 license for our data files (equivalent to a release in the public domain); this is widely used and recommended for datasets (Stodden 2009).The CC0 license is simpler than the related CC-BY (requiring attribution) and CC-NC (prohibiting commercial use) license, so CC0 eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of the data.We also archived our programming code on Figshare and applied the MIT license which is a widely used software license that permits any person to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the code (Henley and Kemp 2008;Morin et al. 2012).Our motivation for choosing these licenses is to clearly communicate to others that we are comfortable with our data and code to be reused in any way-with appropriate attributions resulting from normal scholarly practices (Stodden 2009).The MIT license has the added detail of specifically not providing a warranty of any kind and absolving us as authors from liability for any damages or problems that others might suffer or encounter when using our code.

R for Scripting the Analysis
I used the R programming language to script our data analysis and visualization workflow.I chose R because it is a highly expressive, functional, interpretive, objectoriented language that was originally developed by two academic statisticians in the 1990s (Chambers 2009;Wickham 2014).Like Python, R is a free and open source complete programming language.Where the two differ is that R is heavily customized for data analysis and visualization (Gandrud 2013b;Tippmann 2014).Python, which has a reputation for readability and ease of use, is a general-purpose programming tool with fewer customizations for data analysis and visualization (Perkel 2015).In the last decade, R has acquired a large user community of researchers, including archaeologists, many of whom contribute packages to a central open repository that extend the functionality of the language (Mair et al. 2015).These packages are typically accompanied by peer-reviewed scholarly publications that explain the algorithms presented in the package.Such a large and active community means that many common data analysis and visualization tasks have been greatly simplified by R packages, which is a key factor in my choice of this language.For example, rOpenSci is a collective of scientists mostly in ecology, evolution, and statistics that supports the development of R packages to access and analyze data, and provide training to researchers (Boettiger et al. 2015).Our publication depended on 19 of these user-contributed packages, which saved me a substantial amount of programming effort.I also organized our code as a custom R package because it provides a logical and widely shared structure to organizing the analysis and data files.The R package structure gives us access to the many quality control tools involved in package building, and is a convenient template for projects of any scale (Wickham 2015).Because packages are ubiquitous among R users, we hope that by providing our code as an R package, the use of familiar conventions for organizing the code will make it easier for other users to inspect, use, and extend our code.
The knitr and rmarkdown packages are especially relevant to our efforts to make our analysis reproducible (Xie 2013).Knitr provides algorithms for dynamically converting plain text and R code into formatted documents (i.e., PDF, HTML, or MS Word) that contain the text and the output of the code, such as tables and plots.Rmarkdown provides an authoring format that enables the creation of dynamic documents using a simple syntax (related to HTML and LaTeX, but simpler) for formatting text and managing citations, captions, and other typical components of a scientific document (Baumer and Udwin 2015;Baumer et al. 2014).The rmarkdown package uses a document formatting language called markdown, which has a simple syntax for styling text, and extends it into a format called R markdown that enables embedded computation of R code contained in the markdown document.Using syntax for styling in markdown (and HTML, LaTeX, etc.) is different to composing and editing in Microsoft Word because markdown separates presentation from content.An example of this can be seen in the heading in Fig. 3, where the two hash symbols are the syntax for a heading, and the formatting is applied only when the document is executed.Together, the knitr and rmarkdown packages enabled us to compose a single plain-text source document that contained interwoven paragraphs of narrative text and chunks of R code.This approach has the code located in context with the text so any reader can easily see the role of the code in the narrative.This results in an executable paper (cf.Leisch et al. 2011;Nowakowski et al. 2011), which, when rendered by the computer using the knitr package, interprets the R code to generate the statistical and visual output and applies the formatting syntax to produce readable output in the form of a HTML, Microsoft Word, or PDF file that contains text, statistical results and tables, and data visualizations.This practice of having documentation and code in a single interwoven source document is known as literate programming (Knuth 1984).This is a focus of many efforts to improve the reproducibility of research, for example, by computer scientists Fig. 3 A small literate programming example showing a sample of R markdown script similar to that used in our publication (on the left), and the rendered output (on the right).The example shows how to formulae can be included, and how a chunk of R code can be woven among narrative text.The code chunk draws a plot of artifact mass by distance from source, computes a linear regression, and adds the regression line to the plot.It also shows how one of the output values from the linear regression can be used in the narrative text without copying and pasting Computational Reproducibility in Archaeological Research and neuroscientists (Abari 2012;Delescluse et al. 2012;Schulte et al. 2012;Stanisic et al. 2015), but is not currently a mainstream practice in any field.

Git and GitHub for Version Control and Code Sharing
I chose Git as our version control system because it is currently by far the most widely used version control system at the moment, both in research contexts and for software engineering (Jones 2013;Loeliger and McCullough 2012).Git is a free and open source cross-platform program for tracking changes in plain text documents.The current popularity of Git is important because it means there is a lot of documentation and examples available to learn how to use the system.The key benefit of using Git was saving episodes of code-writing in meaningful units, for example the preparation of each figure was a single commit (Fig. 4).This was helpful because if some new code had an unexpected effect on an earlier figure, I could revert back to the previous commit where the code worked as expected.This high-resolution control over the progress of the code-writing provided by the version control system was helpful for identifying and solving problems in the analysis.During the peer-review and proofing stages, I used Git commits to indicate the exact version of the code that was used for the draft, revised, and final versions of the paper, which was helpful for keeping track of the changes we made in response to the reviewers' comments.
I used GitHub as a remote backup for our project, hosting the code and data files together with their Git database.GitHub is one of several commercial online services that hosts Git repositories and provides online collaboration tools (GitHub repositories that are open to the public are free, but fees are charged for private repositories; feewaivers are available for academic users).While writing the paper, I worked on a private GitHub repository that was not publicly accessible because we needed approval from other stakeholders (such as the Aboriginal group on whose land the archaeological site is located) for the final paper before revealing it to the public.When the paper was published, I made the repository open and publicly available on GitHub (Barnes 2010), as well as archiving a copy of the code on Figshare with the data.The code on Figshare is frozen to match the output found in the published article, but the code on GitHub continues to be developed, mostly minor edits and improvements that do not change the contented of the executed document.GitHub has Git-based tools for Fig. 4 Git commit history graph.This excerpt shows a typical sequence of commits and commit messages for a research project.The seven character code are keys that uniquely identify each commit.The example here shows the creation and merging of a branch to experiment with a variation of a plot axis.The graph shows more recent events at the top and earlier events at the bottom organizing large-scale collaboration on research projects that are widely used in other fields, but I did not use these because of the small scale of our project (Gandrud 2013a).

Docker for Capturing the Computational Environment
Currently, there are two widely used methods for creating portable, isolated computational environments.The most established method is to create a virtual machine, usually taking the form of a common distribution of GNU/Linux such as Ubuntu or Debian.Although this is a widely used and understood method, it is also timeconsuming to prepare the virtual machine, and the virtual machine occupies a relatively large amount of disk space (8 Gb in our case).I preferred the GNU/Linux container method because the virtual environment can be created much faster (which is more convenient for iteration) and the container image occupies much less disk space.The key difference between the two is that a virtual machine replicates an entire operating system, while the container image only shares some of the system resources to create an isolated computational environment, rather than requiring a complete system for each environment (Fig. 5).The low resource use of the container system makes it possible to Fig. 5 Schematic of computer memory use of Docker (on the left) compared to a typical virtual machine (on the right).This figure shows how much more efficiently Docker uses hardware resources such as hard drive space, compared to a virtual machine run several virtual environments simultaneously on a Windows or Mac desktop or laptop computer.
The specific GNU/Linux container system we used is called Docker, and is currently the dominant open source container system (Boettiger 2015).Like Git and R, Docker is a free and open source program.Docker is developed by a consortium of software companies, and they host an open, version-controlled online repository of ready-made Docker images, known as the Docker Hub, including several that contain R and RStudio in the GNU/Linux operating system.We used images provided by rOpenSci as our base image, and wrote a Dockerfile to specify further customizations on this base image.These include the installation of the JAGS library (Plummer and others 2003) to enable efficient Bayesian computation in R. Our Docker image is freely available on the Docker Hub and may be accessed by anyone wanting access to the original computational environment that we used for our analysis.Similarly, our Dockerfile is included in our code repository so that the exact contents of our Docker image are described (for example, in case the Docker Hub is unavailable, a researcher can rebuild our Docker image from the Dockerfile).Using the Dockerfile, our image can be reconstituted and extended for other purposes.We treated our Docker image as a disposable and isolated component, deleting and recreating it regularly to be sure that the computational environment documented in the Dockerfile could run our analyses.

Discussion
Developing competence in using these tools for enhancing computational reproducibility is time-consuming, and raises the question of how much of this is practical for most archaeologists, and what the benefits and costs might be.Our view is that once the initial costs of learning the tools is paid off, implementing the principles outlined above makes research and analysis easier, and has material professional benefits.
Perhaps the best established benefit is that papers with publicly available datasets receive a higher number of citations than similar studies without available data.Piwowar et al. (2007) investigated 85 publications on microarray data from clinical trials and found that papers that archived their data were cited 69 % more often than papers that did not archive.However, a larger follow-up study by Piwowar and Vision (2013) of 10,557 articles that created gene expression microarray data discovered only a 9 % citation advantage for papers with archived data.Henneken and Accomazzi (2011) analyzed 3814 articles in four astronomy journals and found that articles with links to open datasets on average acquired 20 % more citations than articles without links to data.Restricting the sample to papers published since 2009 in The Astrophysical Journal, Dorch (2012) found that papers with links to data are receiving 50 % more citations per paper per year, than papers without links to data.In 1331 articles published in Paleoceanography between 1993 and 2010, Sears (2011) found that publicly available data in articles was associated with a 35 % increase in citations.While I am not aware of any studies specifically of archaeological literature, similar positive effects of data sharing have been described in the social sciences.In 430 articles in the Journal of Peace Research, articles that offered data in any form, either through appendices, URLs, or contact addresses, were on average cited twice as frequently as an article with no data but otherwise equivalent author credentials and article variables (Gleditsch and Strand 2003).It is clear that researchers in a number of different fields following the first principle of reproducible research benefit from a citation advantage for their articles that include publicly available datasets.In addition to increased citations for data sharing, Pienta et al. (2010) found that data sharing is associated with higher publication productivity.They examined 7040 NSF and NIH awards and concluded that a typical research grant award produces a median of five publications, but when data are archived a research grant award leads to a median of ten publications.
It is also worth noting that the benefits of using a programming language such as R for archaeological analyses extend beyond enhanced reproducibility.From a practical standpoint, users of R benefit from it being freely available for Windows, Unix systems (such as GNU/Linux), and the Mac.As a programming language designed for statistics and data visualization, R has the advantage of providing access to many more methods than commercial software packages such as Excel and SPSS.This is due to its status as the lingua franca for academic statisticians (Morandat et al. 2012;Narasimhan et al. 2005;Widemann et al. 2013), which means that R is the development environment for many recently developed algorithms found in journals (Bonhomme et al. 2014;Reshef et al. 2011), and these algorithms are readily available for archaeologists and others to use.R is widely known for its ability to complex data visualizations and maps with just a few lines of code (Bivand et al. 2008;Kahle and Wickham 2013;Sarkar 2008;Wickham 2009).Furthermore, my view is that once the learning curve is overcome, for most analyses using R would not take any longer than alternative technologies, and will often save time when previously written code is reused in new projects.
The primary cost of enhancing reproducibility is the time required to learn to use the software tools.I did not quantify this directly, but my personal experience is that about 3 years of self-teaching and daily use of R was necessary to develop the skills to code the entire workflow of our case study.Much less time was needed to learn Git and Docker, because the general concepts of interacting with these types of programs are similar to working with R (for example, using a command line interface and writing short functions using parameters).I expect that most archaeologists could develop competence substantially quicker than I did, by participating in short training courses such as those offered by Software Carpentry (Wilson 2014), Data Carpentry (Teal et al. 2015), rOpenSci (Boettiger et al. 2015), and similar organizations, or through the use of R in quantitative methods courses.I did not measure the amount of time required to improve the reproducibility of our case study article because I planned the paper to be reproducible before we started the analysis.This makes it difficult to separate time spent on analytical tasks from time spent on tasks specifically related to reproducibility.This situation, where the case study has Bbuilt-in reproducibility^and the additional time and effort is marginal, may be contrasted with Bbolt-on reproducibility,^where reproducibility is enhanced only after the main analysis is complete.In the Bbolt-onŝ ituation, I might estimate a 50 % increase in the amount of time required for a project similar to this one.For multi-year projects with multiple teams, the time needed for the bolt-on approach would probably make it infeasable.
The main challenge I encountered using the tools described above in project was the uneven distribution of familiarity with them across our team.This meant that much of the final data analysis and visualization work presented in the publication was concentrated on the team members familiar with these tools.The cause of this challenge is mostly likely the focus on point-and-click methods in most undergraduate courses on data analysis (Sharpe 2013).The absence of discussion of software in the key texts on statistics and archaeology (VanPool and Leonard 2010) is also a contributing factor.This contrasts with other fields where statistical methods and the computational tools to implement them are often described together (Buffalo 2015;Haddock and Dunn 2011;Scopatz and Huff 2015).This makes it difficult for archaeologists to acquire the computational skills necessary to enable reproducible research during a typical archaeology degree, leaving only self-teaching and short workshops as options for the motivated student.

Conclusion
We have outlined one potential standard way for enhancing the reproducibility of archaeological research, summarized in Fig. 1 and Table 2. Our compendium is a collection of files that follows the formal structure of an R package, and includes the raw data, R scripts organized into functions and an executable document, a Git database that includes the history of changes made to all the files in the compendium, and a Dockerfile that recreates the computational environment of our analysis.While the exact components of this kind of compendium will undoubtedly change over time as newer technologies appear, I expect that the general principles I have outlined will remain relevant long after these specific technologies have faded from use.
Two future directions follow from the principles, tools, and challenges that I have discussed above.First, the rarity of archaeologists with the computational skills necessary for reproducible research (as I observed on our group, and in the literature broadly, Table 2) highlights the need for future archaeologists to be trained as Pi-shaped researchers, rather than T-shaped researchers (Fig. 6).Current approaches to postgraduate training for archaeologists results in T-shaped researchers with wide-but-shallow general knowledge, but deep expertise and skill in one particular area.In contrast, a Pishaped researcher has the same wide breadth, but has to have deep knowledge of both their own domain-specific specialization, as well as a second area of deep knowledge in the computational principles and tools that enable reproducible research (Faris et al. 2011).
A second future direction is the need to incentivize training in, and practicing of, reproducible research by changing the editorial standards of archaeology journals.Although all the technologies and infrastructure to enhance research reproducibility are already available, they are not going to be widely used by researchers until there are strong incentives and a detailed mandate (McCullough and Vinod 2003;McCullough et al. 2006McCullough et al. , 2008)).One way to incentivize improvements to reproducibility is for journal editors to require submission of research compendia in place of the conventional stand-alone manuscript submission (Miguel et al. 2014).A research compendium is a manuscript accompanied by code and data files (or persistent links to reputable online repositories) that allows reviewers and readers to reproduce and extend the results without needing any further materials from the original authors (Gentleman and Temple Lang 2007;King 1995).This paper is an example of a research compendium, with the source files available at http://dx.doi.org/10.6084/m9.figshare.1563661,and the case study paper on Madgebebe is more realistic and complex example of a compendium, online at http://dx.doi.org/10.6084/m9.figshare.1297059.Requiring submission of compendia instead of simply manuscripts is currently being experimented with by journals in other fields (e.g., Quarterly Journal of Political Science, Biostatistics) (Nosek et al. 2015;Peng 2009).The results of these experiments suggest that changing research communication methods and tools is a slow process, but they are valuable to find mistakes in submissions that are otherwise not obvious to reviewers, and they show that such changes to editorial expectations are possible without the journal being abandoned by researchers.
In archaeology, much progress has already been made in this direction by researchers using agent-based modeling.Archaeological publications that employ agent-based models often make available the complete code for their model in a repository such as OpenABM, which has successfully established community norms for documenting and disseminating computer code for agent-based models (Janssen et al. 2008).In archaeological publications, especially where a new method is presented, there is an urgent need to converge on similar community norms of sharing data and code in standardized formats.This will speed the adoption of new methods by reducing the effort needed to reverse-engineer the publication in order to adapt the new method to a new research problem.Most archaeologists will benefit from publications (their own and others') being reproducible, but attaining a high degree of reproducibility may not be possible for some publications.For example, only a low degree of reproducibility is possible for research that depends on sensitive data that cannot be made public, or research that depends on algorithms in specialized, expensive proprietary software (such as those provided by research instrument manufacturers).However, I believe that the majority of archaeological research publications have ample scope for substantial improvements in reproducibility.The technical problems are largely solved; the challenge now is to change the norms of the discipline to make high reproducibility a canonical attribute of high-quality scholarly work.
Software pervades every domain of research, and despite its importance in generating results, the choice of tools is very personal (Healy 2011), and archaeologists are given little guidance in the literature or during training.With this paper, I hope to begin a discussion on general principles and specific tools to improve the computational reproducibility of published archaeological research.This discussion is important because the choice of tools has ethical implications about the reliability of claims made in publication.Tools that do not facilitate well-documented, transparent, portable, and reproducible data analysis workflows may, at best, result in irreproducible, unextendable research that does little to advance the discipline.At worst, they may conceal accidents or fraudulent behaviors that impede scientific advancement (Baggerly and Coombes 2009;Herndon et al. 2014;Laine et al. 2007;Lang 1993;Miller 2006).

Fig. 1
Fig. 1 Workflow diagram showing key steps and software components.The boxes with a bold outline indicate key steps and tools that enable computational reproducibility in our project

Fig. 2
Fig. 2 File organization of the Figshare archive.The items with a dashed border are typical components of an R package, the solid outline indicates custom items added to form this specific compendium, and the shaded items indicate folders and the unshaded items indicate files

Table 1
Glossary of key terms used in the text http://www.linfo.org/plain_text.htmlBinary A file that must be interpreted by a specific program before it is human-readable and editable.For example, PDF, Microsoft Word doc, and Excel xls files are binary files, and can only be read and edited by those programs.Many commercial programs use proprietary binary file formats.This limits their interoperability and archival Computational Reproducibility in Archaeological Research CC-NC Allows for reuse only for non-commercial purposes (for example, a Cultural Heritage Management business would not be allowed to use CC-NC data or code).Not recommended for most research output.MIT A license especially for software that places very few restrictions on the use of the software, and disclaims the author of any responsibility for problems arising from others using the software.It is one of the most popular licenses for open source software.http://opensource.org/licenses/MITData archiving DOI DOI stands for Bdigital object identifier,^a persistent (but not permanent) label that stores information about the online location of a electronic file.A DOI also includes metadata, for example in the case of journal article it might include the author, title, date of publication, etc. https://zenodo.org/tDAR The Digital Archaeological Record (tDAR) is a digital repository for the digital records of archaeological investigations.Fees are charged for archiving files, but access to open files is free.https://www.tdar.org/Open Context http://opencontext.org/

Table 1
(continued) https://www.docker.com/Communities Software Carpentry An international non-profit volunteer organization focusing on teaching researchers basic software skills.Prioritizes the use of free and open source software tools, encourages researchers to use permissive licenses for their research products.Target audience is novices with little or no prior computational experience.http://software-carpentry.org/ Data Carpentry Similar to Software Carpentry, but focuses more on domain-specific training covering the full lifecycle of data-driven research.http://www.datacarpentry.org/rOpenSci A collaboration of volunteers from academia and industry developing R-based tools for making scientific research, data, and publication freely accessible to the public.They also conduct workshops to train researchers to use R and related tools.https://ropensci.org/Computational Reproducibility in Archaeological Research

Table 2
Summary of degrees of reproducibility the article.However, because the code is not complete, substantial effort and skill is required by other researchers to reproduce the results of the article, and to re-use the code in new studies.This presents obstacles to re-use of the code.the paper, and details of the computational environment of the original analysis.Note that this does not guarantee complete and permanent reproducibility, but it gives the best odds we can currently provide.The use of an open access repository means that researchers can access the files even if they do not have a subscription to the journal, and ensures the availability of the files if the journal website changes.