[Idea] Software for pronunciation mastery in any language

In https://www.youtube.com/watch?v=eeaghqkLRi8&t=10m30s we see a vowel formant chart for the first time in Gabriel's Pronunciation Tutorial series. It got me thinking whether there's already a software letting place recorded vowels on the chart and providing on-the-spot feedback on how good we perform. As far I've bumped into that research http://www.cc.kochi-u.ac.jp/~tamasaki/Paper3.pdf which I vaguely skimmed through and it seems that there's a technical possibility of doing that. [b]Does any other person here think that it would be just awesome to have this?[/b] Assuming these formants are the only thing that constitute a sound (I don't know that, so please correct me if they're not) we could master literally any accent and dialect to a point of being perceived as native-sounding entities. Simply having enough data of natives pronuncing vowels and we're good to go play copycats. Now my question: does any of you know of any software of this sort already existent? Feedback is an awesome thing, it is a part of official flow state definition (check Csikszentmihalyi) and I love it. I love having feedback, biofeedback and every other possible back, including sexy back. Cheers Szymon

* Originally posted by krzemian.

Didn't find what you were looking for?

New post
Was this post helpful?
0 out of 0 found this helpful

Comments

6 comments

  • I like the idea but I am unsure of the practicality. The paper you referenced cites formant frequencies as the root of the study. These frequencies are recovered from a recording using a broad class of functions called Digital Signal Processing. So basically you process a recording and you get a table of frequencies (tones) in that recording with relative strengths. If you wanted to check your pronunciation against one that you wish to emulate the software would have to reconcile the difference in the frequencies and the strengths between your voice and that voice. This would be very challenging if for instance your voice is very deep and the compare voice is high. While your ear is capable of recognizing that the two pronunciations are very a similar all the computer would be aware of would be the frequencies and relative strengths. So the challenge becomes writing an algorithm that matches two disparate data sets to allow for a meaningful compare. DSP at this level is not an area of expertise for me so maybe someone with more relevant experience can comment.

    In the mean time, to have the vowel chart and matching recording would be valid. I think that if I listen to the vowel sound, immediately record repeating it, and then play both back one after the other my ear would tell me how close I am. I believe I have enough of an understanding of how these sounds are formed, thanks to Gabriel's video, that I can self correct. That software would fairly straight forward because all it would do would be store information. Your brain would do all of the processing.

    * Originally posted by Guy.
    0
  • During the first caffeine high of the day I had an insight that might make your original idea feasible. If the model voice is examined, a range in frequency and strength can be derived between the lowest vowel sound and the highest. From that the relative difference between 2 vowel sounds from the model can be determined, say between a and j. Then when you speak, regardless of other comparisons of your voice to theirs, it follows that you would need to achieve that same relative difference. The software could display whether you need to go higher or lower, louder or softer, and playback both recordings. This would still be an undertaking but I think it is doable should someone step up.

    * Originally posted by Guy.
    0
  • Yeah, relatively measuring ranges would help compare e.g. male and female voice effectively. Another thing is that techniques like machine learning could be utilised in order to do the fine tuning, i.e. verify the just-made-up hypothesis that there are some relative differences in distances between a given pair of consonants in low and high voices.

    A whole new area of proficiency would be required to incorporate more factors like speed, rhythm, dynamics and possibly others that I'm currently not aware of in order to let you repeat phrases as close as possible to the native version.

    Ideally the user experience would be to simply mark a point on a map and let him train the accent to resemble people from that area.

    Coming further, having a supreme phonem-comparing algorithm it would be just a baby step behind providing the world with a universal speech-to-text tool that if fed enough data (e.g. you training the algorithm with your own phonem recordings) could understand anybody. That only shows me the true potential of such algorithm as well as probably its difficulty as I can safely assume that I'm not the first one to think of an efficient speech recognition tool.

    * Originally posted by krzemian.
    0
  • The discipline would be joint time-frequency analysis. Basically at what time and for how long did a given frequency happen and how did the amplitude of that frequency change with respect to time. It is possible Google and others have the base algorithms in place based on the performance and over all accuracy of the their text to speech engines. Assuming those algorithms were available the overall effort would still be daunting. Which is why I think it is easier to play the two recordings back and let the brain handle that part, at least for the first revision. It is and will, for the near future at least, be the best complex processor at our disposal.

    * Originally posted by Guy.
    0
  • Haven't had time to check it out myself yet, so I don't know its full capabilities but I know this exists:
    http://www.fon.hum.uva.nl/praat/

    * Originally posted by AndyN.
    0
  • Praat is a beast to learn; as far as I can tell, it's amazingly capable software, but I gave up after attempting to use it. :P

    I think something like this may be feasible with enough of a budget to design it (I think I've heard of similar software used to teach singing, actually), but one of the issues here is that it doesn't address the ear training issue. You might be able to fine-tune your way to a perfect vowel, but it's not necessarily going to help if your ears can't give you adequate feedback once the software is off. The question would be whether computerized feedback ear training can help develop an accent better than ear training alone, and I'm not sure about the answer there.

    * Originally posted by Gabriel Wyner.
    0

Please sign in to leave a comment.