Main content

Social work  /

Contributors:

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: Abstract Background and rationale: RERO DOC is an electronic database that includes bibliographic information on scientific research undertaken by students and staff from institutions in the western part of Switzerland. RERO DOC stores and makes available the full-text PDFs of most bachelor theses it includes and its contents are indexed in both WorldCat and Google Scholar. Crawlers also regularly browse RERO DOC and make it possible for Google Scholar users to search for text found within bachelor theses, a feature not available on other databases or search engines. Whilst undertaking searches meant to identify all bachelor theses published by HETS Valais and currently indexed in RERO DOC (and hence available as PDFs) I found out a handful theses’s PDFs had not been found by Google Scholar and I sought to investigate potential reasons why (file size, citations, publication year). Objective: To investigate potential reasons why Google Scholar crawlers did not include (or find) PDFs of some bachelor theses published by HETS Valais students Study design: Cross-sectional analysis Methods: Google Scholar search run on August 5, 2020, using the search strategy: "travail social" AND "bachelor" site:rero.ch. The following data was manually collected for the first 200 records found with the search: ranking in search result (eg. 1st result, 2nd, 3rd, etc.), title of work in Google Scholar, actual title of work, publication year, PDF available on Google Scholar (yes/no), file size (MB), file type (eg. PDF, DOCX, other), PDF available on RERO DOC (yes/no), number of forward citations, reference. Results: Among the 200 records collected, three (n=3, 1.5% of the overall sample) did not refer to social work bachelor theses and nine (n=9, 4.5% of the overall sample) were duplicate records of a unique bachelor thesis. Of the 188 remaining unique records, eight (n=8, 4.25% of the sample) were published by students from HETS Geneva and 180 (95.74%) by students from HETS Valais. The average overall file size was 3.45 MB (from 0.33 MB to 61.30 MB). The file sizes ranged from 0.33MB to 25.70 MB among theses available on Google Scholar and from 35.40 MB to 61.30 MB among theses unavailable on Google Scholar. All (n=188/188, 100%) theses were available as full-text PDFs on RERO DOC. Five (n=5, 2.66%) theses were not available as full-text PDFs on Google Scholar. Five (n=5, 2.66%) theses [3–7], all published by students from HETS Valais had been cited once whilst other works had not been cited according to Google Scholar. Two (n=2) bachelor theses had been cited by students from other Swiss HETS, one (n=1) was cited in a French Master thesis, one (n=1) was cited in a book and one (n=1) was cited by myself in a prior research project. Theses from HETS Geneva were published from 2006 to 2019 whilst theses from HETS Valais were published from 2007 to 2020. Based on very limited data (n=5), theses not available as PDFs on Google Scholar were, on average, larger (46,48MB vs 2,28MB), more often written by HETS Geneva students (2/5=40% vs 6/183=3.28%), older (2014,4 vs 2015,26) and less cited (0.00 citations vs 0.03) when compared with theses available as PDFs. Google Scholar appeared to display the titles it found within PDFs instead of titles indexed in RERO DOC, which resulted in some errors. In particular titles with colons “:” were almost all cut before the colon. The nine duplicates found on Google Scholar had titles extracted from the thesis’s Table of Contents section. Discussion Although guidance provided by Google recommends limiting the size of scientific works <5 MB to be adequately crawled and indexed in Google Scholar, all works with a size ranging from 0.33MB to 25.70 MB were found to be available as PDFs on Google Scholar. The actual threshold appeared to be between 25.70 MB and 35.40 MB in this sample. The bachelor thesis with 9 duplicates found among the first 200 Google Scholar search results suggests crawlers sought multiple times to identify this thesis’s title and may have used font size to guide their selection. The issue of duplicates appears to be rare but a few theses with a similar error could potentially lead to a substantial number of irrelevant records within search results and push important records beyond the 1000 maximum shown. Limitations: Only 5 theses did not have a PDF available on Google Scholar and differences with theses which have PDFs available could likely arise by chance alone with such a small sample. Funding: No funding was received for this work. Registration: See https://osf.io/6m9sn/files/ (see previous versions) Data and materials: See https://osf.io/584b2/. All other data is otherwise included within this manuscript. Keywords: Cross-sectional analysis, bachelor thesis, HES-SO, bibliographic indexing, Google Scholar, RERO DOC

License: CC-By Attribution 4.0 International

Files

Loading files...

Citation

Components

data


Recent Activity

Loading logs...

Tags

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.