Modes of access : the influence of dissemination channels on the use of open access monographs

Introduction. This paper studies the effects of several dissemination channels in an open access environment by analysing the download data of the OAPEN Library. Method. Download data were obtained containing the number of downloads and the name of the Internet provider. Based on public information, each Internet provider was categorised. The subject and language of each book were determined using metadata from the OAPEN Library. Analysis. Quantitative analysis was done using Excel, while the qualitative analysis was carried out using the statistical package SPSS. Results. Almost three quarters of all downloads come from users who do not use the Website www.oapen.org, but find the books by other means. Qualitative analysis found no evidence that channel use was influenced by user groups or the state of users' Internet infrastructure; nor was any effect on channel use found for either the language or the subjects of the monographs. Conclusions. The results show that most readers are using the "direct download" channel, which occur if the readers use systems other than the OAPEN Library Website. This implies that making the metadata available in the user's systems, the infrastructure used on a daily basis, ensures the best results.


Introduction
Open access is much debated and in recent years has gained much attention in the literature.The scientific and scholarly impact of papers has been discussed extensively, for change font instance by Antelman (2004), who finds that freely published papers receive more citations across a number of disciplines.Podlubny (2005) takes the citation analysis a step further and proposes a normalisation procedure, aimed at comparing the impact of scientists from different fields.Bollen et al. go beyond citations and investigate thirty-nine impact measures, and conclude that use-based measures may be a better indication of scientific impact (Bollen, Van de Sompel, Hagberg and Chute, 2009).
Not only is the impact hotly debated but the economic aspects have also received much attention.A major discussion point is the merits of publishing a free version of a paper next to the official version in a journal which is not freely accessible (green open access), versus the merits of directly publishing in an open access journal (gold open access) (Harnad et al., 2004(Harnad et al., , 2008)).Recently, the report Accessibility, Sustainability, Excellence: how to expand access to research publications by Finch et al. was heavily discussed (Finch et al., 2013).
The discussion on the effects of open access on monographs does not attract the same amount of attention so far, and the amount of available research is small.Apart from running the OAPEN Library, the OAPEN foundation is currently involved in two pilot projects in the Netherlands and the UK experimenting with open access monograph publishing.The first results of the OAPEN-UK pilot are discussed by Collins and Milloy (2012).In September 2013, the results of the Dutch pilot project were published (Ferwerda, Snijder and Adema, 2013).

Dissemination channels
This paper will focus on a different aspect: dissemination channels.In the literature on open access, dissemination channels seem to be a given.If it is discussed at all, dissemination is described as making papers available in an institutional repository.This paper is the first to analyse the effects of several dissemination channels in an open access environment.
Here we examine the monograph downloads of the OAPEN Library, which was officially launched in September 2010 (OAPEN Consortium, 2011).It is a Web based collection of monographs, mainly in the field of humanities and social sciences.All books are available in open access and users can search the Website in several ways.Each book also has a unique Web address and can be downloaded directly without searching the Website.These addresses, combined with metadata describing the books, are made available on the OAPEN Website and through several aggregators.This is described in more detail in (Snijder, 2013a).
This paper examines the download data of the OAPEN Library, which was gathered during a period of six months.The data consist of the number of downloads a month by provider.
Here we define a provider as the organization that grants the user access to the Internet.Furthermore, the data contain information on whether a book was downloaded through the OAPEN Website or directly.Because the data were aggregated monthly, we can distinguish three situations: firstly, a book was downloaded a certain number of times through a provider via the Website only; secondly, a book was downloaded a certain number of times through a provider using the direct download address of the book; thirdly, a book was downloaded a certain number of times through a provider via the Website and also a certain number of times directly.In the last case, the readers related to that provider use a combination of ways to access the book.
It is not unreasonable to assume that each provider caters for several people.In the case where all readers only use the Website or only use direct downloads, their preference seems to be aligned.If, in the same month, a portion of the readers use the Website and another portion of the readers prefer direct downloads, this may hint at another group configuration.In this case, other aspects of use could also differ, which is why this is analysed separately.Thus, the download data stem from three channels: Website only, Website and direct access; and direct access only.
As the data are available through several channels, it may be useful to investigate the literature on multichannel management.This field looks at the challenges that retailers face in the deployment of multiple channels to reach their customers.While typical research in this field looks at the differences between offline channels such as stores and online channels such as Websites, parts of the theoretical framework could be applied to this paper.
The multichannel management framework is based on theories on the adoption of innovations, explaining if and why people will use new channels.On this layer the specific aspects of working with multiple (retail) channels are discussed.According to Rogers (1995), several factors influence the use of innovations: the relative advantage of the innovation, its fit with existing use patterns, the perceived complexity, the ability to try out the innovation, the perceived risk related to adoption, and the degree to which adoption and use can be observed by others (Rogers, 1995).
The work of Rogers is paired to the technology adoption model and its extension technology adoption model2.This model states that perceived usefulness and perceived ease of use are drivers of innovation adoption; technology adoption model2 extends this framework to social influence processes (subjective norm, voluntariness, and image) and cognitive instrumental processes (job relevance, output quality, result demonstrability, and perceived ease of use) (Davis, Bagozzi and Warshaw, 1989;Davis, 1989 ;Venkatesh & Davis, 2000).Neslin et al. identified five key challenges in multichannel management: data integration, understanding customer behaviour, channel evaluation, allocating resources across channels and coordinating channel strategies.In a later paper, the list of relevant aspects has grown to thirteen (S A Neslin and Shankar, 2009;S. A. Neslin, Grewal, Leghorn, Shankar, Teerling, Thomas and Verhoef, 2006).Basically, the questions revolve around whether or not to deploy a multichannel strategy, how to set up different channels, and how to evaluate the results.
What aspects of multichannel management can be used here?Instead of offline versus online channels, we are discussing different online channels.We envision different users with different needs.They are not paying customers, and researching and purchasing in an open access environment are more or less the same action.Searching for information in the field of humanities and social sciences is covered by many authors.Shen discusses the many channels used by social scientists, grouping them in internal and external electronic and paper resources, combined with 'external human resources' (Shen, 2007, p. 8).Bulger et al. discuss humanities scholar's search behaviour through six use cases where scholars employed a range of resources and technologies (Bulger et al., 2011).Wang et al. use an international angle by discussing the scholars in the USA, Greece and China (Wang, Dervos, Zhang and Wu, 2007).Griffiths and Brophy focus on students' online search behaviour, and describe the strong preference for search engines, especially Google, compared to the library catalogue or other sources (Griffiths and Brophy, 2005).Lamothe discusses the growing use of e-books in an academic library (Lamothe, 2010).
Channel evaluation also has implications for resource management: the results help to decide where to invest the most time and money.This goes beyond managing information technology systems, it also affects marketing decisions.In short, multichannel management aims to create an optimal strategy in a given environment.
If we combine search behaviour with the decision to use a specific channel, we arrive at the following research question: Does the use based on the channel 'Website only' differ from use based on 'direct access only' or from use from a combination of those channels?The answer has implications for open access publishing as it may help to optimise the dissemination of open access monographs.
First, the download data is analysed quantitatively: counting the number of downloads per channel.Then, the qualitative analysis tries to find an answer to the question of whether properties of the users, their infrastructure or the properties of the book themselves have a significant impact on the use per channel.

Quantitative analysis
In this section, the data set is described, followed by the number of downloads per channel.The number of monograph downloads is an indication of readership.Whilst we can assume that the more a monograph has been downloaded, the more it has been read we cannot, however, state that 100 downloads equal equates to 100 people reading the monograph cover to cover.

The data set
The data set consists of the download data of 979 books, published by thirty-five different publishers.The books are published in ten different languages.By far the largest number of the downloaded books are in English.The 979 monographs in the data set were downloaded 152,662 times in the first six months of 2012.The ratios of the downloads by language are more or less in line with the percentages of published languages.This is discussed in more detail in the qualitative analysis.Appendix 2 contains the complete list of languages.The following table lists the ten most downloaded subjects.This is a fraction of all available subjects: the complete data set contains eighty-three different subjects.The classification used is the Book Industry Communication standard subject categories (Book Industry Communication, 2010).The question of whether language or subject has a measurable influence on channel use will be discussed in the qualitative analysis.As before, the ratios of the downloads by subject are more or less in line with the percentages of published subjects.This is discussed in more detail in the qualitative analysis.Appendix 3 contains the complete list of subjects.We saw that the 979 books were downloaded 152,662 times in the first six months of 2012.The books were accessed through 6176 different providers which are based in 166 countries.We stated before that a provider is defined as the organization that grants the user access to the Internet.In some cases, the provider is an organization such as a university or a government agency.In other cases, this is an Internet Service Provider, such as Comcast in the USA or Ziggo in the Netherlands.The providers will be discussed in more detail in the qualitative analysis.

Downloads by dissemination channel
The downloads were measured per provider by channel a month.So, if a provider downloaded the same monograph more than once in the same month, using the same channel, the number of downloads were added.In some instances, a provider downloaded a monograph several times a month through the Website and also by direct access.In those cases, the downloads were added to the combined channel Website and direct access.In other instances, a monograph was only downloaded through the Website, or the monograph was only downloaded by direct access only.Then the downloads were added to the channels Website only or direct access only respectively.
Using this procedure, the following data becomes available: The data shows that use is dominated by direct access only.This implies that almost three quarters of all downloads come from users who do not use the Website, but find the books by other means.This kind of use is made possible by making the metadata of the books, including a direct download URL, directly available to all interested parties, including libraries and content aggregators.The metadata is licensed under a Creative Commons Zero licence, which makes it free to use under any circumstance.The channel Website and direct access contains a combination of downloads through the Website and direct access.
Here again, the portion of downloads by direct access is larger than the downloads through the Website.It is clear that most readers find the books through routes other than the OAPEN Library Website.
The use data revealed that 24% of the visits to the OAPEN Library Website lead to downloading one or more titles.However, this percentage cannot be compared to the use data of other systems.If 100 OAPEN monographs were downloaded through a library catalogue, how many searches were conducted which did not result in a download taking place?Therefore, we do not know whether the OAPEN Library Website is a more efficient way to search compared to other systems.
We discussed before that multichannel management aims to create an optimal strategy in a given environment.The goal of open access publishing is to remove barriers to access, and it makes sense to investigate how to maximize the dissemination of open access monographs.We saw that the direct access channel is far more used than the other channels and this has serious consequences for managing and optimising the service: from a dissemination point of view it makes more sense to invest in metadata and the dissemination of metadata then to spend resources on the Website.It is important that any system used for open access dissemination is capable of exporting metadata in formats that can be used by content aggregators or the systems used by prospective readers.Apart from library catalogues, search engines may be a much used research tool, and investing resources in optimal coverage by the likes of Google and Bing may be beneficial.

Qualitative analysis
The goal of the qualitative analysis is to establish whether user's characteristics (i.e., their infrastructure) or the collection are influential factors on channel use.Firstly user characteristics are discussed.The download percentages of the quantitative analysis are used as a benchmark, and are compared to the actual values found using an independent t-test.A factor is considered influential if the difference between the use numbers is statistically significant and the effect size is not small.

Characteristics of users and dissemination channels
Readers are placed in several groups: academic; government; business; non-profit organizations and the general public.While academic users could be seen as the main audience for monographs, readers of other backgrounds have equal access to the monographs in the OAPEN Library.The users are categorised based on the data from the OAPEN logs, combined with public data.
The OAPEN Library is a Web based service, and its logs contain the Web address of the providers.So, if researchers at Leiden University download a book using their office equipment, the Web address of that university will be logged.Basic information such as address and telephone number are publicly available and can be found using the so called 'WHOIS protocol' (Internet Engineering Task Force, 2004).By combining the use data and information about the provider, we can make assumptions about who is downloading a specific monograph.
A large portion of the providers are not universities or government agencies, but Internet service providers.If the provider is an Internet service providers, the user cannot be linked to an organization.We cannot assume that all use through an service provider comes from people browsing the Internet at home.If the Internet infrastructure in a certain country is highly developed, chances are that each organization is capable of giving direct Internet access to their members.If the Internet infrastructure is less well developed, a large portion of the organizations in that country do not directly provide Internet access but rely on the services of an Internet service provider.
Of course, it is always possible that 'service provider users' from a country with a highly developed Internet infrastructure are in fact academics working from home after office hours.The available data do not contain the (local) time of the download, which makes determining whether a reader is downloading during office hours impossible.Furthermore, if the reader is not acting in a professional capacity, the chances are also higher that the download started after office hours.The difference in access to scholarly and scientific literature for academics compared to others is quite large; using the credentials of the academic institution allows direct access to all kinds of literature behind pay walls.It might therefore be more efficient to use these credentials not only at the office, but also after office hours.
If we want to divide Internet service provider use in those two categories, we need a way to determine the state of a country's infrastructure.This is done by using a World Bank publication: The little data book on information and communication technology (World Bank, 2011).It lists several statistics for each country, one of which is the number of Internet users in 100 people.If there is a connection between the state of the infrastructure and the percentage of downloads through service providers, the percentage of service provider use is lower for highly developed Internet infrastructures.
This assumption was tested by charting the measured downloads from thirty countries and the percentage of service provider use.The found values were set against the amount of Internet users per 100 people.Because the country of each provider is known, it was possible to select the countries with the highest number of downloads.The selected thirty countries are responsible for almost 92% of all downloads.
The first chart depicts the percentage of downloads through an Internet service provider, sorted by the number of Internet users in 100 people.In this chart we see that there is a trend toward a higher percentage of downloads through a service provider, when the number of Internet users in 100 people decreases.The second chart depicts the number of Internet users in 100 people.Here we see a decrease from 91.8 users in Norway to 5.3 users in India.Somewhere between these two extremes we need to set a cut-off point to determine which countries have a highly developed Internet infrastructure.Within these countries, the chances are higher that Internet service provider use from these countries is from non-professional users.This distinction is used in the qualitative analysis, to determine whether the Internet infrastructure influences downloads through the different channels.From the data above, the first abrupt change in Internet users is found between Switzerland with 70.9 Internet users and Hungary with 61.6 Internet users for 100 people.
Based on this, the threshold is set to seventy Internet users in 100 people.Countries with seventy or more Internet users per 100 people are considered to have a highly developed infrastructure.The same threshold is also used in Snijder (2013b).

Type of users and dissemination channels
Now we can look at the download percentages of the different user groups.The number of downloads by channel differ wildly and, therefore, there is a large difference in the absolute number of downloads by each group.For instance, the number of downloads by academic readers through the direct access only channel is almost seven times the number of academic downloads through the Website only.
Is there a connection between user type and dissemination channel?Regardless of the channel, most of the use comes from three groups: academic, Internet service provider and Internet service provider high Internet use.As academics are the intended audience for monographs, it is not very surprising to see a large proportion of use that originates from academic institutions.Furthermore, the academic community is large.As discussed before, it was not possible to determine whether the role of users in the group 'Internet service provider' was academic or otherwise.The members of the group 'Internet service provider -high Internet' are more likely to be non-professional users.Based on that we might conclude that disseminating open access books helps to make scholarly content available to the public.In all channels, the use by non-profit, government and business organizations is small, compared to that of academic and Internet service provider-related use.
From the quantitative analysis it becomes clear that 8% of the use comes from the channel Website only, 19% from the channel Website and direct access, and 73% through direct access only.We can use these percentages as a baseline for the expected downloads for each user group, and compare it to actual number of downloads by channel.Using the difference between those amounts, expressed as the percentage of the expected value -we find no significant effect for user type: t(17) = -0.541,p =0.595.Based on the lack of significant differences on channel use, we can conclude that the type of user plays a minimal role in channel use.

Characteristics of Internet infrastructure
Dividing the Internet structure in highly developed and less well developed countries is not only useful to differentiate between user groups but is, in itself, also a possible influence on channel use.We might expect that readers from countries with a highly developed infrastructure have different download patterns compared to those with more limited bandwidth.Appendix 1 lists the countries with highly developed infrastructure.
When we look at overall use, not taking into account the different channels, the difference between the two groups is clear: the number of downloads from countries with a highly developed infrastructure is more than twice the number of those from the rest of the world.
The same percentages as before are used as a baseline for the expected downloads, and again those numbers are compared to the actual number of downloads per channel.Using the difference between those amounts -expressed as the percentage of the expected valuewe find no significant effect for Internet infrastructure: t(5) = -0.418,p = 0.639.Based on the lack of significant differences on channel use, we can conclude that Internet infrastructure plays a minimal role in channel use.

Characteristics of content and dissemination channels
Is there a connection between characteristics of the content, the monographs, and dissemination channels?In this section we examine two aspects: subject and language.Not all languages or subjects will be analysed: the three most downloaded languages and ten most downloaded subjects are examined.

Language and dissemination channels
It seems obvious that language influences the use of the monographs, as readers are unlikely to download a book in a language they cannot read.The high use of monographs in the English language is directly visible, but we have to take into account the large number of books available in that language.The question is whether language use differs significantly from expected values.
In the description of the data set, we saw that 52.6% of the books were written in English, 16.7% in German and 12.9% in Dutch.If we apply these percentages to the number of downloads per dissemination channel, we can compute the expected values.Using the difference between those amounts, expressed as the percentage of the expected value, we find no significant effect for language: t(11) = -1.229,p = 0.245.Based on the lack of significant differences on channel use, we can conclude that language of the monographs does not play a role in channel use.Still, the percentage of downloads of English language books through the Website is relative high, and this raises the question of whether users primarily search using English terms.To test this, a small sample was analysed.Of all queries in one month, a list was created of searches that occurred at least twice.This created a set of 2,219 different queries.The percentage of 'non-English' queries was more that 51%.Nevertheless, this group also contained search terms that exist not only in the English language, but also in Dutch and German.If we analyse this group, five ambiguous terms account for more than 62% of queries: film; water; IMISCOE; Iran; Islam.So, a large percentage of all the examined queries are at least compatible with English.It is therefore safe to assume that most searches are indeed in English, which would partly explain the results.The large number of available English language books might be another factor.

Subject and dissemination channels
The last aspect to analyse is the subject of the monographs.Are the users of the OAPEN Library interested in certain subjects or do the download patterns closely follow the spread of subjects amongst the books?We have found the percentages of titles with a certain subject in the quantitative analysis.The expected number of downloads per channel are computed by applying these percentages to the number of books downloaded per channel, and the actual number of downloads is compared against the benchmark values.Using the difference between those amounts, expressed as the percentage of the expected value, we find no significant effect for subject: t(32) = 1.507, p = 0.142.Based on the lack of significant differences on channel use, we can conclude that subject does not play a role in channel use.11,546 11,546 29,453 29,453 111,663 111,663 152,662 However, when a dissemination channel is used more (the channel direct access only is used for of 73.1% of all downloads, while use through the Website only is 7.6%) the number of subjects also grows.This is illustrated by the fact that the ten subjects listed here cover almost 74% of all downloads occurring through the Website only.In contrast, the percentages drop for the other channels to 63.8%.

Conclusions
This paper is the first to analyse the effects of several dissemination channels in an open access environment.Its goal is to help determine an optimal strategy to achieve maximum distribution of open access monographs.The books are made available via the OAPEN Library Website, by direct downloads or a combination of those two.It is interesting to note that a large proportion of readers who directly download the monographs do not use the Website; they have found the description of the books by other means.
From the quantitative analysis, the dominance of one channel is clear.The data shows that 73% of all downloads occurred by the channel direct download.This implies that almost three quarters of downloads come from users who do not use the Website, but find the books through other systems or Websites.
The qualitative analysis revealed that regardless of the channel, most use comes from three groups: academic, Internet service provider and Internet service provider high Internet use.Other user groups, business, government and non-profit, are not highly represented.When looking at the use by group, no effect on channel use could be established.The Internet infrastructure is another factor that was taken into account.
While the digital divide between users from countries with a highly developed Internet infrastructure and users from less well-off countries is very clear, no effect on channel use could be found.The same holds true for the aspects of the books themselves: the analysis could not find any effect on channel use for either the language or the subjects of the monographs.
The goal of multichannel analysis is to determine the optimal use of resources: What configuration leads to the best results?The definition of best results in an open access environment differs from a commercial environment.The objective is not financial gain, but maximum dissemination.In the OAPEN Library, readers can access books through three channels.First, the Website, which is optimised for search: it does not only contain metadata, but also enables full text search.Furthermore, it contains browsing functions as a means to enable serendipity.In contrast to this, the direct search channel functions in a different way.It is based on metadata only, which is incorporated into systems outside the OAPEN Library.Full-text search on the contents of the books is not possible.The third channel is a combination of both.
The results show that most readers are using the direct download channel, despite the fact that the OAPEN Library Website offers functions that are not available through other channels.A possible answer may be found in the theoretical models on the use of innovations discussed in the introduction.There we saw several factors influencing the use of new systems, such as its fit with existing use patterns, perceived ease of use and social norms.It is possible that users of the direct download channel prefer their own systems, which are familiar and are part of their routine and environment.In that case, learning to use a new interface may not be seen as a worthwhile investment.But who are the principal users of the OAPEN Library?The analysis revealed that current users are based in academic institutions or use an Internet service provider.Users based in businesses, governmental or non-profit organizations are far less common.Also, the digital divide between upcoming countries and the developed countries is a large factor: two-thirds of the downloads occurred from countries with a highly developed Internet infrastructure.And although the OAPEN Library contains books in German, Dutch, Italian and other languages, the majority of the books, and the majority of the readers, use English.
How does this compare to the goal of maximum dissemination?A recurring theme in the discussion on open access is making scientific and scholarly results available to members of academia who cannot access the information behind a pay wall.Seen from that perspective, the current situation is quite a success: academic institutions are responsible for a large portion of the downloads.However, when we look at other possible patrons, the picture is less rosy.In the collection of the OAPEN Library, the subjects politics and government, society and culture, and sociology and anthropology are well covered.Those books may contain useful information for governmental organizations -for instance in the field of immigration studies, which is a much debated topic in Europe and North America.Nevertheless, there is not much use from governmental organizations, nor from non-profit organizations.Does the form, i.e., monographs, not fit within the informational habits of those potential users, or is the OAPEN collection not embedded in the information systems used?
When we compare the use from countries with a highly developed Internet infrastructure to the use from the rest of the world, the difference is striking.The first group of countries contains twenty-seven countries, yet it has downloaded twice as much as books.Here we see that making books freely available does not automatically take away other barriers to access.
The language of the publications may be another issue to research.More than half of the analysed books are written in English, and the download percentage of English language books is also roughly 50%.It is possible that the overall use is at least partly shaped by the amount of books available in a certain language.In other words, if the collection contained a larger percentage of monographs in another language, for instance French, Spanish and Portuguese, how might that affect the use?
The results imply that making the metadata available in the user's systems, the infrastructure used on a daily basis, ensures the best results.So, to achieve the optimum amount of use, first we must identify users who are not using the data, secondly we have to understand how they search for information and thirdly we have to establish what is the best way to make our data available.Researching those questions would bring the goal of maximum dissemination a little closer.These challenges are not only faced by the OAPEN Foundation, but are shared by all organizations that disseminate open access publications or data.

Figure 1 :
Figure 1: Percentage of downloads through an Internet service provider [Click for large figure]

Figure 2 :
Figure 2: Number of Internet users per 100 people [Click for large figure]