WordSeg package

doi:None

Title	Authors

Home

Put briefly: you can find the additional analyses promised as supplementary materials in supmat.pdf. We also provide a host of other files that can be used to check for reproducibility (see below for instructions). # Full contents of this project: - supmat.pdf: Supplementary analyses. - results\_do\_prov.zip and results\_do\_concat\_prov.zip: Contain the evaluation and stats files outputted by WordSeg as called by the do\_prov.sh and do\_concat\_prov.sh. - sel-providence.zip: Contains the tags files used in the reported experiments, as well as the orthographic version of those files for ease of human reading. (ignore unless you want to check for reproducibility) - do_prov.sh: Bash script used to run the experiments building on independent transcripts, reported in the paper (sections 4.3) (ignore unless you want to check for reproducibility and/or get inspiration to run your own experiments) - do\_concat_prov.sh: Bash script used to run the experiments building on concatenated transcripts, reported in the paper (section 4.4) (ignore unless you want to check for reproducibility and/or get inspiration to run your own experiments) - analyses.Rmd: RMarkdown file generating all figures and analyses reported on in the paper. (ignore unless you want to check for reproducibility; see below for knitting instructions) - analyses.html: Latest knitted version of the analysis.Rmd file (this is redundant with the information reported on in the manuscript -- ignore unless you want to check for reproducibility, in which case you would compare your results against these ones) - analyses-DATE.html: We expect we may make modifications to the WordSeg package. If so, we will store older versions of the knitted Rmd with this format. (The data is in an unambiguous format: YYYYMMDD) - supmat.Rmd: RMarkdown file generating the supplementary materials. (ignore unless you want to check for reproducibility; see below for knitting instructions) # Instructions to reproduce analyses reported on in the paper Some readers may want to check our materials for reproducibility. To regenerate the reports above, you will need [RStudio](https://www.rstudio.com/). For further information on using Rmd for transparent (knittable) analyses, see [Mike Frank & Chris Hartgerink's tutorial](https://libscie.github.io/rmarkdown-workshop/handout.html). 1. Download and unzip results\_do\_prov.zip and results\_do\_concat\_prov.zip. 2. Download analyses.Rmd (or supmat.Rmd, if you want to reproduce our supplementary materials) and put it at the same level as the two ensuing results folders. 3. Create a folder called "derived" at the same level as the results folder and the analyses.Rmd (or supmat.Rmd) file 3. Launch RStudio by double-clicking on analyses.Rmd (or supmat.Rmd) -- (or otherwise ensure that your working directory points to the Rmd location). 4. Click on the button "knit". # Instructions to check your word segmentation analyses against ours Some readers may want to check our whole pipeline, from the unsegmented materials to the analyses. This cannot be done blindly and will require some knowledge of the WordSeg package and your own system. Thus, the following instructions are intended for more advanced users. Please note that to generate the html or pdf reports, you will need [RStudio](https://www.rstudio.com/). For further information on using Rmd for transparent (knittable) analyses, see [Mike Frank & Chris Hartgerink's tutorial](https://libscie.github.io/rmarkdown-workshop/handout.html). 1. Install wordseg-0.7.1 (https://github.com/bootphon/wordseg/releases/tag/v0.7.1) 1. Download and unzip sel-providence.zip. 2. Download do\_prov.sh, do\_concat_prov.sh 3. **Change the paths at the top of these files** to make them appropriate to your environments. For instance, you need to point to the sel-providence files unzipped in step 1. 4. **Verify that the calls are appropriate to your system**. Most importantly, please make sure you adapt the call to slurm if you are not running these scripts in a system containing slurm. If you run Sun Grid Engine, please use wordseg-sge.sh instead. If you do not work on a cluster, you can use wordseg-bash.sh instead. 5. Make the for .sh scripts executable with `$ chmod +x *.sh` 6. Launch do\_prov.sh and do\_concat\_prov.sh one at a time with e.g., `$ ./do_prov.sh` 2. Download analyses.Rmd (or supmat.Rmd) and put it at the same level as the two ensuing results folders. 3. Create a folder called "derived" at the same level as the results folder and the analyses.Rmd (or supmat.Rmd) file 3. Launch RStudio by double-clicking on analyses.Rmd (or supmat.Rmd) -- (or otherwise ensure that your working directory points to the Rmd location). 4. Click on the button "knit". 5. Compare your resulting analyses.html file against the one we provide in this project 6. If you notice divergencies, consider creating an issue on [our github](https://github.com/bootphon/wordseg/issues)

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.