The "BEAGLE vectors" folder contains FORTRAN files for each of the BEAGLE vectors. These include:
- word_list.txt: Complete list of words
- ranked_stop_list.txt: All of the words that were not included when compiling the semantic vectors.
- itemNstop396N1024.unformatted: All of the item vectors in FORTRAN format.
- orderNstop396N1024.unformatted: All of the order vectors in FORTRAN format.
- memNstop396N1024.unformatted: All of the memory vectors in FORTRAN format. These are sums of the item and order vectors and were not used in the paper.
- visual.unformatted: Randomly generated environmental vectors for each word.
The "Datasets" folder contains each of the datasets from the experiment in .csv format. Each of the individual trials are included and each trial's global similarity values from BEAGLE are also included as columns. A few notes:
- Mean or max similarity is noted in the column title ("meanContextCos" = mean item similarity, "maxContextCos" = maximum order similarity).
- Order similarity includes both global similarity calculations from the convolution method and from the permutation method. The "OrderCos" columns are derived from the convolution method while the "OrderP5Cos" columns are derived from the permutation method.
The "Model code" folder contains a .zip file with all of the files needed to run DE-MCMC with the BEAGLE-DDM model from the paper.
The attached Github repo contains code used to construct the vectors. It is based on the BEAGLE model by Mewhort and Jones (2007), in addition to alternative binding approach, using Random Permutaitons (Sahlgren, Holst, & Kanerva, 2008).
...
Note that parts of the code may be redundant due to change of approach. For instance, instead of shifting vectors on-the-fly when binding via random permutations, I later changed the code so that we pre-compute the permutation vectors and use Numpy's vector indexing approach to speed up the process. I have tested the code against typical examples to ensure correctness, but perhaps you'll notice something that I missed.
Below are links to different vector files:
Open CSR formatted vectors using Scipy's sparse library (requires 3 files and the dimensionality )
NOVELS (Random Permutation)
>> The corpus is too large to upload to OSF (about 700 mbs), but is available through...
https://cloudstor.aarnet.edu.au/plus/s/N7Q8koKZuBLtxyP
>> CSR format
Dimensionality: 10000x39076
Vocab
https://cloudstor.aarnet.edu.au/plus/s/vdiS6UVHAuP3PkH
Context
indices
https://cloudstor.aarnet.edu.au/plus/s/IvqXLuGOaIVtwYy
indptr
https://cloudstor.aarnet.edu.au/plus/s/4AReRC4thXs0DwO
data
https://cloudstor.aarnet.edu.au/plus/s/EMSS3cav3ye5IDT
Environment
indices
https://cloudstor.aarnet.edu.au/plus/s/SaXSPzerPJ9piOA
indptr
https://cloudstor.aarnet.edu.au/plus/s/hOO8cQ84sMketKq
data
https://cloudstor.aarnet.edu.au/plus/s/V5bG5o4RgCRvBFK
Order
indices
https://cloudstor.aarnet.edu.au/plus/s/Tz8B27SXbNHJn3T
indptr
https://cloudstor.aarnet.edu.au/plus/s/Q3rdyLp7cuZVSRU
data
https://cloudstor.aarnet.edu.au/plus/s/dBapH13kMjhH7zt
TASA (Holographic)
Environment
https://cloudstor.aarnet.edu.au/plus/s/l6n0N8aaOyXs6PU
Order (Window size of 5)
https://cloudstor.aarnet.edu.au/plus/s/khCQrKpDl8l7Ggw
Context (Window size of 50)
https://cloudstor.aarnet.edu.au/plus/s/1uxFX0dkE9xAwVz
vocab: https://cloudstor.aarnet.edu.au/plus/s/pGdFksizXBR1e4w