Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This dataset is comprised of data gathered for and created in the process of the paper [Word Vector Embeddings and Domain Specific Semantic based Semi-Supervised Ontology Instance Population][1]. Other than the files provided here, it uses the large legal corpus [created by an earlier study][2] out of which it takes a set of raw cases and [a small legal ontology created by another study][3]. However, we do not include that ontology in this dataset. Please download it from [the dataset of the original paper][3]. The domain specific semantic is based on the result models built by the [large legal corpus study][2]. This dataset contains the class instances by the proposed models, a gazetteer list of legal words, and the result vectors. 1. **Class instances by 5 models:** These are the instances to be used to populate the classes in the ontology according to the 5 proposed models. 2. **Legal words:** This is a set of gazetteer lists of words in the legal domain prepared with the help of a legal professional. 3. **Results:** Finally the result vectors obtained by the [above paper][1] are included. ---------- For the purpose of convenience, given below is the abstract of the paper [Word Vector Embeddings and Domain Specific Semantic based Semi-Supervised Ontology Instance Population][1]. *An ontology defines a set of representational primitives which model a domain of knowledge or discourse. With the arising fields such as information extraction and knowledge management, the role of ontology has become a driving factor of many modern day systems. Ontology population, on the other hand, is a inherently problematic process, as it needs manual intervention to prevent the conceptual drift. The semantic sensitive word embedding has become a popular topic in natural language processing with its capability to cope with the semantic challenges. Incorporating domain specific semantic similarity with the word embeddings could potentially improve the performance in terms of semantic similarity in specific domains. Thus, in this study we propose a novel way of semi-supervised ontology population through word embeddings and domain specific semantic similarity as the basis. We built several models including traditional benchmark models and new types of models which are based on word embeddings. Finally, we ensemble them together to come up with a synergistic model which outperformed the candidate models by 33% in comparison to the best performed candidate model.* [1]: https://goo.gl/g65v4C [2]: https://osf.io/qvg8s/ [3]: https://osf.io/zsp8e/ [4]: https://osf.io/qvg8s/ [5]: https://goo.gl/g65v4C [6]: https://goo.gl/g65v4C
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.