Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This repo includes supplementary data for the paper ["A Grounded Typology of Word Classes"][1] Here you'll find word-level groundedness measures based on PaliGemma for the COCO-35L, Multi30k, and Crossmodal-3600 datasets. POS tagging is based on [Stanza][2]. If you wish to compute groundedness scores on your own data, you'll need the [`paligemma-3b-ft-coco35-224`][3] checkpoint from Huggingface for the captioning model. For the language model, please use our trained, comparable model: [`chaley22/pali-captioning-lm-nolora`][4] The model covers the 35 languages in COCO-35L: - Arabic (ar) - Bengali (bn) - Czech (cs) - Danish (da) - German (de) - English (en) - Spanish (es) - Persian (fa) - Finnish (fi) - French (fr) - Hebrew (he) - Hindi (hi) - Croatian (hr) - Hungarian (hu) - Indonesian (id) - Italian (it) - Japanese (ja) - Korean (ko) - Norwegian (no) - Dutch (nl) - Polish (pl) - Portuguese (pt) - Romanian (ro) - Russian (ru) - Swedish (sv) - Swahili (sw) - Maori (mi) - Telugu (te) - Thai (th) - Turkish (tr) - Ukranian (uk) - Vietnamese (vi) - Chinese (zh) [1]: https://arxiv.org/abs/2412.10369 [2]: https://stanfordnlp.github.io/stanza/ [3]: https://huggingface.co/google/paligemma-3b-ft-coco35l-224 [4]: https://huggingface.co/chaley22/pali-captioning-lm-nolora
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.