Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This dataset is comprised of data gathered for and created in the process of the paper [Legal Document Retrieval using Document Vector Embeddings and Deep Learning][1]. Other than the files provided here, it uses the large legal corpus [created by an earlier study][2] out of which it takes a set of raw cases. In addition, this dataset contains a mention map, edge list, and the output of legal text ranking. 1. **Text Corpus and Map:** This corpus contains 2,500 cases extracted from a [large corpus of legal cases from the United States supreme court][3]. With the set of these case files is provided a *mention map* which indicates which case have cited which other case within the corpus. 2. **Edge List:** This is the edge list of the citation graph generated by the above mention map. 3. **Outputs:** Finally the results obtained by the [above paper][1] are included in the text rank form and as a serialized file. ---------- For the purpose of convenience, given below is the abstract of the paper [Legal Document Retrieval using Document Vector Embeddings and Deep Learning][5]. *Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy coupling of domain experts, that makes the entire process to be time consuming and cumbersome. In this study, we have developed three novel models which are compared against a golden standard generated via the on line repositories provided, specifically for the legal domain. The three different models incorporated vector space representations of the legal domain, where document vector generation was done in two different mechanisms and as an ensemble of the above two. This study contains the research being carried out in the process of representing legal case documents into different vector spaces, whilst incorporating semantic word measures and natural language processing techniques. The ensemble model built in this study, shows a significantly higher accuracy level, which indeed proves the need for incorporation of domain specific semantic similarity measures into the information retrieval process. This study also shows, the impact of varying distribution of the word similarity measures, against varying document vector dimensions, which can lead to improvements in the process of legal information retrieval.* [1]: https://goo.gl/ahZFF8 [2]: https://osf.io/qvg8s/ [3]: https://osf.io/qvg8s/ [4]: https://goo.gl/ahZFF8 [5]: https://goo.gl/ahZFF8
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.