Main content

dict_variables

Menu

Loading wiki pages...

View
Wiki Version:
# [dict_info.csv][1] variables ### **short** Short name, corresponding to the file name [`{short}.(csv|dic)`] and wiki urls (`https://osf.io/y6g5b/wiki/{short}`). ### **name** Display name. ### **description** Brief description potentially including information about the dictionary's construction. ### **note** Notes about standardization steps that change the terms or categories of the original. ### **constructor** How the dictionary was constructed: - **algorithm**: Terms were selected by some automated process, potentially learned from data or other resources. - **crowd**: Several individuals rated the terms, and in aggregate, those ratings translate to categories and weights. - **mixed**: Some combination of the other methods, usually in some iterative process. - **team**: One of more individuals make decisions about term inclusions, categories, and weights. ### **subject** Broad, rough subject or purpose of the dictionary: - **emotion**: Terms relate to emotions, potentially exemplifying or expressing them. - **general**: A large range of categories, aiming to capture the content of the text. - **impression**: Terms are categorized and weighted based on the impression they might give. - **language**: Terms are categorized or weighted based on their linguistic features, such as part of speech, specificity, or area of use. - **social**: Terms relate to social phenomena, such as characteristics or concerns of social entities. ### **terms** Number of unique terms across categories. ### **term_type** Format of the terms: - **glob**: Include asterisks which denote acceptance of any characters until a word boundary. - **glob+**: Glob-style asterisks with regular expressions within terms. - **ngram**: Includes any number of words as a term, separated by spaces. - **pattern**: A string of characters, potentially within or between words, or spanning words. - **regex**: Regular expressions. - **stem**: Unigrams with common endings removed. - **unigram**: Complete single words. ### **weighted** Logical indicating whether values are associated with terms. This dictates file format: .csv when weighted, and .dic when unweighted. ### **regex_characters** Logical indicating whether special regular expression characters are present in any term, which might need to be escaped if the terms are used in regular expressions. Glob type terms allow complete parens (at least one open and one closed, indicating preceding or following words), and word initial and terminal asterisks. For all other terms, `[](){}*.^$+?\|` are counted as regex characters. These could be escaped in R with `gsub('([][)(}{*.^$+?\\\\|])', '\\\\\\1', terms)` if `terms` is a character vector, and in Python with (importing re) `[re.sub(r'([][(){}*.^$+?\|])', r'\\\1', term) for term in terms]` if `terms` is a list. ### **categories** Category names in the order in which they appear in the dictionary file, separated by commas. ### **ncategories** Number of categories. ### **original_max** The maximum value across categories before standardization: `original values / max(original values) * 100`. ### **osf** Open Science Framework ID, corresponding to the dictionary file's URL (`https://osf.io/download/{osf}`). [1]: https://osf.io/kjqb8
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.