The project contains the golden dataset of annotated oceanographic documents, 115 web-documents and 12 scientific papers in the IOB2 format.
The tagged corpus is based on the following oceanic classes.
| Class | Tagging Examples |
| ------ | ------ |
| Time | Date/time, UTC time, Year, Day, Unix timestamp |
| Lat-Lon | Longitude, Latitude, Location, Measured at |
| Depth | Depth, z, Lower\_z, Max\_depth, Min\_depth |
| Investigator | Professor Moriarty, Principal investigator, Co-principal investigator |
| Geographic Region | Bay of Bengal, Mediterranean Sea, Gulf of Mexico, Pacific Ocean |
| Organization | Woods Hole Oceanographic Institution, European Geophysical Society |
| Platform-Instrument Type | Vessel, Satellite, Buoy, Glider, Drifting Buoy |
| Platform Name | Oasis of the seas, Wizard, Argo-id, Charles Darwin |
| Measured Variable | Temperature, Salinity, Phytoplankton biomass, Abundance of Coccolithus |
| Unit | Degrees Celsius, dbar, $\mu mol/kg$, $m^2/s$, Knots, Count/s, \# |
| Method | Counting by flow cytometer, Binocular microscope |
| Processing Type | Logistic Probability, Log biomass, Removal of outliers, Log-transformation |
| Funding Agency | National Science Foundation, Stanford University, The EU |
| Device-Instrument | CTD Rosette, Sonar, Plankton counters, Radar, Wave gauge |
| Program | MAREDAT, Argo, GLOSS, Voluntary observing ship program |
| Dataset's ID | DOI, UUID strings, Dataset file name |
| Campaign | POLAR 5, ACLOUD, TARA\_20090919Z |