@[toc](Contents)
## Introduction ##
This dataset is a collection of impact and interaction sounds captured during human-object and robot-object interaction. The collection was done in two parts: The human-object interaction sounds (this dataset) were collected at [Institut de Robòtica i Informàtica Industrial][1] (IRI), CSIC-UPC in Barcelona. The robot-object interaction sounds were collected by the [Humanoids and Cognitive Robotics Lab][2] at the [Czech Technical University in Prague][3] (CTU) in Prague. They are available separately at [Robot_impact_Data][4].
All the objects are listed in the [YCB objects dataset][5], which is well known as a standard set of household objects for benchmarking robotics algorithms, most commonly in grasping and manipulation and object recognition using computer vision. The goal of building this dataset is to introduce and improve audio as an available data stream for multimodal sensing in robots.
## Data Collection ##
1. Manually Collected Dataset
The manually collected data consists of impact sounds from three different exploratory actions - hitting, scratching and dropping. Data for 75 out of 77 objects from the YCB dataset was collected with this process.
The dataset was captured in a room with closed windows. We placed a shotgun microphone [Rode Videomic Pro][6] connected to a digital audio recorder [Zoom H1N][7] close to the sound source, and we used a metallic gripper to interact with the objects. We also placed a static [GoPro camera Hero 5][8] to capture a close top-down perspective of the interaction (48fps) similar to those that would have a camera placed on a robotic arm.
The same person captured all dataset in the same place by performing the three different actions to generate impact sounds. The hitting action was performed on average forty times by applying a different force at different locations for each object to capture richer data. The scratching and dropping actions were performed on average five times each per object at different locations and from different heights, respectively.
![Manual Collection](https://osf.io/rmjnt/download =50%x)
2. Robot Dataset
A Kinova Gen3 robotic arm fitted with a Robotiq 2F-85 gripper was used for collecting impact sounds from robot-object interaction. Two actions were considered - vertical poking and horizontal poking. The actions are considered different because in the vertical poking scenario, the motion of the object is restricted as it is placed on a rigid surface. However in the horizontal scenario it is free to slide across the surface. 49 objects from the YCB object set were explored by the robot, for approximately 50 vertical pokes per object and 5-10 horizontal pokes per object. This dataset is available separately at [Robot_impact_Data][4].
![Robot Collection](https://osf.io/xrn9g/download =50%x)
3. Seen and Unseen Objects
For training a a neural network architecture for material recognition, it was decided to split the objects into two sets - 'seen' objects and 'unseen' objects. Only sounds from the 'seen' objects are used for training the network. This also helps us to evaluate the generalization capability of the trained network.
A list of all the objects explored for human-object or robot-object interaction can be found [here][9].
## Data Processing ##
The collected audio files are segmented into individual interaction instances and filtered using a classical algorithm based on wavelet threshold multi-taper spectra. An example of pre- and post-processed audio is shown below.
![Noisy-Clean](https://osf.io/xf9es/download =75%x)
Then, the filtered audio is converted into a 128x128 mel spectrogram which represents the frequency spread of the obtained sound. Samples of this spectrogram are shown below.
![Spectrograms](https://osf.io/3ucbh/download =75%x)
## Material and Object Recognition ##
1. Labelling
Material annotation was manually performed by using
visual inspection. We distinguished eleven different surface
materials including plastic, wood, ceramic, fiber, felt, foam,
glass, paper, metal, rubber, leather. We also considered a
more fine grained classification that distinguish two types
of metals (steel, aluminum) and three types of plastic (soft
plastic, hard plastic, other plastic). We also found 10 objects over 75, to be composed of different materials (chips can, Master Chef can,
Skillet lid, fork, spoon, knife, scissors, two screwdrivers and
the hammer). For these in our annotations, we assigned to the object the material label corresponding to the surface material that generated the impact sound.
![grouped_dataset](https://osf.io/5he9k/download)
2. Training and Experiments
We trained from scratch a modified version of the ResNet34 network to predict among the 14 classes of object materials. The network was initially trained with the manual interaction data. The aim was to investigate whether the network trained on human-object sounds can be succesfully transferred to the robot for recognizing materials via robot-object interaction. Further, more training data from the robot collected dataset was added to the training of the network to improve the results.
The following experiments were conducted:
1. Testing a randomly initialized network on audio samples generated by the robot.
2. Testing the network pre-trained on manual data on audio samples generated by the robot:
a. On sounds collected from teleoperated vertical impact.
b. On sounds collected from automated horizontal impact.
c. On a mixture of test samples from vertical and horizontal impact.
3. Fine-Tuning the network by further training it with a sparse subset of audio samples from the robot impact sound dataset.
a. On sounds collected from teleoperated vertical impact.
b. On a mixture of test samples from vertical and horizontal impact.
4. Retraining the network from randomized initial weights on only audio samples form the robot impact sound dataset.
a. On sounds collected from teleoperated vertical impact.
b. On a mixture of test samples from vertical and horizontal impact.
c. Train only on sounds collected from vertical poking, and test only on sounds collected from horizontal poking.
Object recognition amongst the 75 objects was also attempted on the modified ResNet34.
3. Results
We observed that adding data collected from the robot into the training set improved the performance of the network. This can be seen amongst the confusion matrices below, where the confusion matrix slowly aligns itself along the major diagonal.
![Conf_matrix](https://osf.io/aywh9/download =50%x)
Object recognition amongst 75 YCB objects based on audio resulted in ~79% accuracy.
![CF_objects](https://osf.io/sfqa5/download =70%x)
## Publication ##
The dataset accompanies the following article: Dimiccoli, M., Patni, S., Hoffmann, M. and Moreno-Noguer F. Predicting material from impact sounds for robot manipulation. IEEE International Conference on Intelligent Robots and Systems (IROS 2022).
## Acknowledgements ##
This work was supported by the project Interactive Perception-Action-Learning for Modelling Objects (IPALM) (H2020 -- FET -- ERA-NET Cofund -- CHIST-ERA III / Technology Agency of the Czech Republic, EPSILON, no. TH05020001) and partially supported by the project MDM-2016-0656 funded by MCIN/ AEI /10.13039/501100011033. M.D. was supported by grant RYC-2017-22563 funded by MCIN/ AEI /10.13039/501100011033 and by ESF Investing in your future.
The collaborators at CTU, Prague were additionally supported by OP VVV MEYS funded project CZ.02.1.01/0.0/0.0/16_019/0000765 Research Center for Informatics.
We thank Bedrich Himmel for assistance with sound setup, Antonio Miranda and Andrej Kruzliak for data collection, Laura Roldán Villardell for technical support, and Lukas Rustler for video preparation.
![IPALM logo](https://osf.io/tvg29/download =15%x)![TACR logo](https://osf.io/3aud4/download =15%x)
[1]: https://www.iri.upc.edu/
[2]: https://cyber.felk.cvut.cz/research/groups-teams/humanoids/
[3]: https://www.cvut.cz/en
[4]: https://osf.io/bj5w8
[5]: https://www.ycbbenchmarks.com/
[6]: https://www.rode.com/microphones/videomicpro
[7]: https://zoomcorp.com/en/us/handheld-recorders/handheld-recorders/h1n-handy-recorder/
[8]: https://gopro.com/es/es/update/hero5
[9]: https://osf.io/74pbg/