YCB-impact sounds dataset

doi:None

Title	Authors

Home

@[toc](Contents) ## Introduction ## This dataset is a collection of impact and interaction sounds captured during human-object and robot-object interaction. The collection was done in two parts: The human-object interaction sounds (this dataset) were collected at [Institut de Robòtica i Informàtica Industrial][1] (IRI), CSIC-UPC in Barcelona. The robot-object interaction sounds were collected by the [Humanoids and Cognitive Robotics Lab][2] at the [Czech Technical University in Prague][3] (CTU) in Prague. They are available separately at [Robot_impact_Data][4]. All the objects are listed in the [YCB objects dataset][5], which is well known as a standard set of household objects for benchmarking robotics algorithms, most commonly in grasping and manipulation and object recognition using computer vision. The goal of building this dataset is to introduce and improve audio as an available data stream for multimodal sensing in robots. ## Data Collection ## 1. Manually Collected Dataset The manually collected data consists of impact sounds from three different exploratory actions - hitting, scratching and dropping. Data for 75 out of 77 objects from the YCB dataset was collected with this process. The dataset was captured in a room with closed windows. We placed a shotgun microphone [Rode Videomic Pro][6] connected to a digital audio recorder [Zoom H1N][7] close to the sound source, and we used a metallic gripper to interact with the objects. We also placed a static [GoPro camera Hero 5][8] to capture a close top-down perspective of the interaction (48fps) similar to those that would have a camera placed on a robotic arm. The same person captured all dataset in the same place by performing the three different actions to generate impact sounds. The hitting action was performed on average forty times by applying a different force at different locations for each object to capture richer data. The scratching and dropping actions were performed on average five times each per object at different locations and from different heights, respectively. ![Manual Collection](https://osf.io/rmjnt/download =50%x) 2. Robot Dataset A Kinova Gen3 robotic arm fitted with a Robotiq 2F-85 gripper was used for collecting impact sounds from robot-object interaction. Two actions were considered - vertical poking and horizontal poking. The actions are considered different because in the vertical poking scenario, the motion of the object is restricted as it is placed on a rigid surface. However in the horizontal scenario it is free to slide across the surface. 49 objects from the YCB object set were explored by the robot, for approximately 50 vertical pokes per object and 5-10 horizontal pokes per object. This dataset is available separately at [Robot_impact_Data][4]. ![Robot Collection](https://osf.io/xrn9g/download =50%x) 3. Seen and Unseen Objects For training a a neural network architecture for material recognition, it was decided to split the objects into two sets - 'seen' objects and 'unseen' objects. Only sounds from the 'seen' objects are used for training the network. This also helps us to evaluate the generalization capability of the trained network. A list of all the objects explored for human-object or robot-object interaction can be found [here][9]. ## Data Processing ## The collected audio files are segmented into individual interaction instances and filtered using a classical algorithm based on wavelet threshold multi-taper spectra. An example of pre- and post-processed audio is shown below. ![Noisy-Clean](https://osf.io/xf9es/download =75%x) Then, the filtered audio is converted into a 128x128 mel spectrogram which represents the frequency spread of the obtained sound. Samples of this spectrogram are shown below. ![Spectrograms](https://osf.io/3ucbh/download =75%x) ## Material and Object Recognition ## 1. Labelling Material annotation was manually performed by using visual inspection. We distinguished eleven different surface materials including plastic, wood, ceramic, fiber, felt, foam, glass, paper, metal, rubber, leather. We also considered a more fine grained classification that distinguish two types of metals (steel, aluminum) and three types of plastic (soft plastic, hard plastic, other plastic). We also found 10 objects over 75, to be composed of different materials (chips can, Master Chef can, Skillet lid, fork, spoon, knife, scissors, two screwdrivers and the hammer). For these in our annotations, we assigned to the object the material label corresponding to the surface material that generated the impact sound. ![grouped_dataset](https://osf.io/5he9k/download) 2. Training and Experiments We trained from scratch a modified version of the ResNet34 network to predict among the 14 classes of object materials. The network was initially trained with the manual interaction data. The aim was to investigate whether the network trained on human-object sounds can be succesfully transferred to the robot for recognizing materials via robot-object interaction. Further, more training data from the robot collected dataset was added to the training of the network to improve the results. The following experiments were conducted: 1. Testing a randomly initialized network on audio samples generated by the robot. 2. Testing the network pre-trained on manual data on audio samples generated by the robot: a. On sounds collected from teleoperated vertical impact. b. On sounds collected from automated horizontal impact. c. On a mixture of test samples from vertical and horizontal impact. 3. Fine-Tuning the network by further training it with a sparse subset of audio samples from the robot impact sound dataset. a. On sounds collected from teleoperated vertical impact. b. On a mixture of test samples from vertical and horizontal impact. 4. Retraining the network from randomized initial weights on only audio samples form the robot impact sound dataset. a. On sounds collected from teleoperated vertical impact. b. On a mixture of test samples from vertical and horizontal impact. c. Train only on sounds collected from vertical poking, and test only on sounds collected from horizontal poking. Object recognition amongst the 75 objects was also attempted on the modified ResNet34. 3. Results We observed that adding data collected from the robot into the training set improved the performance of the network. This can be seen amongst the confusion matrices below, where the confusion matrix slowly aligns itself along the major diagonal. ![Conf_matrix](https://osf.io/aywh9/download =50%x) Object recognition amongst 75 YCB objects based on audio resulted in ~79% accuracy. ![CF_objects](https://osf.io/sfqa5/download =70%x) ## Publication ## The dataset accompanies the following article: Dimiccoli, M., Patni, S., Hoffmann, M. and Moreno-Noguer F. Predicting material from impact sounds for robot manipulation. IEEE International Conference on Intelligent Robots and Systems (IROS 2022). ## Acknowledgements ## This work was supported by the project Interactive Perception-Action-Learning for Modelling Objects (IPALM) (H2020 -- FET -- ERA-NET Cofund -- CHIST-ERA III / Technology Agency of the Czech Republic, EPSILON, no. TH05020001) and partially supported by the project MDM-2016-0656 funded by MCIN/ AEI /10.13039/501100011033. M.D. was supported by grant RYC-2017-22563 funded by MCIN/ AEI /10.13039/501100011033 and by ESF Investing in your future. The collaborators at CTU, Prague were additionally supported by OP VVV MEYS funded project CZ.02.1.01/0.0/0.0/16_019/0000765 Research Center for Informatics. We thank Bedrich Himmel for assistance with sound setup, Antonio Miranda and Andrej Kruzliak for data collection, Laura Roldán Villardell for technical support, and Lukas Rustler for video preparation. ![IPALM logo](https://osf.io/tvg29/download =15%x)![TACR logo](https://osf.io/3aud4/download =15%x) [1]: https://www.iri.upc.edu/ [2]: https://cyber.felk.cvut.cz/research/groups-teams/humanoids/ [3]: https://www.cvut.cz/en [4]: https://osf.io/bj5w8 [5]: https://www.ycbbenchmarks.com/ [6]: https://www.rode.com/microphones/videomicpro [7]: https://zoomcorp.com/en/us/handheld-recorders/handheld-recorders/h1n-handy-recorder/ [8]: https://gopro.com/es/es/update/hero5 [9]: https://osf.io/74pbg/

Compare

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.