Abstract

•Acoustic signal: Segmentation into speech and speech pauses • The ESMERALDA speech recognizer is used to detect voice activity more robustly than an approach that is solely based on signal energy. •Visual signal: Segmentation into motion peaks •A peak ranges between two local minima in the amount of changed pixels in the visual signal. • The amount of changed pixels is calculated by summing up a motion history image at each time step. • Temporal association: Overlapping speech and visual segments are associated to one acoustic package.

Highlights

  • Acoustic packaging makes use of the synchrony between the visual and audio modality in order to detect temporal structure in actions that are demonstrated to children and robots [1]

  • The ESMERALDA speech recognizer is used to detect voice activity more robustly than an approach that is solely based on signal energy

  • The amount of changed pixels is calculated by summing up a motion history image at each time step

Read more

Summary

Acoustic Packaging and the Learning of Words

Lars Schillingmann1 – Petra Wagner2 – Christian Munier Britta Wrede4 – Katharina Rohlfing. Figure: A test subject showing how to stack cups to an infant

System Overview
Acoustic Prominence
Signal Envelope
Visualization and Inspection
Detecting Moving Colored Objects
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.