Filled pauses (hesitation markers)
Non-lexical vocalizations inserted while the brain assembles the next clause. Drains perceived authority.
VMU is not a beautiful picture. It is an acoustic and linguistic engine. Below is the full clinical taxonomy of speech defects we detect — using scientific terminology, not vague "verbal parasites".
Every item below is a measurable acoustic or linguistic signal. The AI counts, classifies and timestamps each instance.
Non-lexical vocalizations inserted while the brain assembles the next clause. Drains perceived authority.
Involuntary repetition of phonemes, syllables or words. Often stress-triggered.
Whole-word loops as the speaker buys cognitive time.
Mid-sentence reset that signals an unprepared mental model.
Pauses above 1.2 s mid-clause read as uncertainty.
Empty connective tissue that dilutes propositional content.
High-frequency lexical fillers typical for Russian and Ukrainian speech.
Hedging tokens that signal low conviction.
Endings, gender, number or case do not agree across the clause.
Sentences do not link — listener loses the thread.
Common under cognitive load; reduces clarity score.
Detected and flagged; configurable strictness per audience (kids, B2B, public).
Wrong register for the social setting.
Acoustic markers of autonomic arousal detected from spectral features.
Listener comprehension drops; sounds defensive.
Loss of melodic contour kills persuasion.
Detection is step zero. Real change happens in two complementary loops — passive and active.
The phone listens in the background. Every time you say a filler, a parasite, or a flagged word, your device vibrates — a single tactile pulse. No screen, no shame. Just a private somatic signal that rewires the habit through classical conditioning over 14–30 days.
Browser mic / device mic streams 16 kHz mono PCM into the analyzer.
VAD, F0, jitter, shimmer, spectral tilt — extracted per 25 ms frame.
Gemini 2.5 native audio returns verbatim transcript with timestamps.
Morphosyntactic analyzer detects agreement errors, register, cohesion.
Every disfluency tagged by category from the taxonomy above.
Haptic guard fires in real time; tutor session generates personalized drills.