Speak Clearly
A research-grounded guide to everyday speech clarity — what actually works, what doesn't, and exactly how to practice. In English, Swedish, and Korean.
Start Here
If you just want to start practicing, jump to the daily practice routines (10–15 minutes, three languages). Everything below explains the science and reasoning behind those routines — read when you're curious, not as a prerequisite.
Evidence tags throughout: Strong = multiple peer-reviewed studies. Clinical standard = established SLP practice. Emerging = limited or preliminary evidence.
The Big Picture: What Actually Makes Speech Clearer
Most people assume "articulation" means pronouncing each sound precisely — drilling th, r, and s until they're textbook-perfect. But the research tells a different story. When scientists measure what happens acoustically when someone shifts from casual speech to clear speech, the biggest changes aren't in individual consonants. They're in loudness, mouth opening, timing, and phrasing.
The foundational research on clear speech, starting with Picheny, Durlach, and Braida in 1985 and extended by Smiljanić and Bradlow over the following decades, consistently finds that when speakers are asked to "speak clearly," intelligibility improves substantially. Picheny and colleagues (1985) measured an average improvement of 17 percentage points for hearing-impaired listeners. Payton, Uchanski, and Braida (1994) found a 20-point improvement for normal-hearing listeners and 26 points for hearing-impaired listeners across noisy and reverberant conditions. Across the broader literature, the clear speech benefit has been reported in the range of 12 to 34 percentage points depending on speakers, listeners, and conditions (see the review by Smiljanić & Bradlow, 2009). This benefit is robust: it works for listeners with hearing loss, non-native listeners, children with learning disabilities, and listeners in noise or reverberation.
What drives this improvement? The acoustic analysis points to several simultaneous changes: increased vocal intensity (speakers get louder), expanded vowel space (the mouth opens wider, making vowels more distinct), lengthened durations with strategic pauses, and enhanced consonant-vowel contrasts. Crucially, instruction matters. Lam and Tjaden (2013) found that when speakers were told to "overenunciate," intelligibility gains were greater than when told simply to "speak clearly" — suggesting that a more explicit, effortful shift produces better outcomes.
The key insight: Clear speech is not about perfecting individual sounds. It's a global shift in how you use your voice and body — more air, more mouth movement, more deliberate phrasing. Most people who are hard to understand aren't mispronouncing sounds. They're trailing off, mumbling with a nearly closed mouth, rushing without pauses, or simply too quiet.
This guide is organized by impact — the factors that produce the largest intelligibility gains come first. Think of them as four pillars, each building on the previous:
Pillar 1: Loudness & respiratory support — providing the aerodynamic engine that powers everything else. Pillar 2: Jaw opening & vowel space — the single most actionable physical change for most speakers. Pillar 3: Prosody — pausing, phrasing, and intonation that help listeners parse your speech. Pillar 4: Consonant precision — sharpening individual sounds within the context of connected speech. The order matters. Drilling consonants without adequate breath support, mouth opening, and phrasing is like tuning the instruments while the orchestra has no conductor.
Pillar 1: Loudness & Respiratory Support
Strong Evidence
This may be the most counterintuitive finding in the field: training loudness as a primary target can improve articulation, facial expression, and even swallowing. This "cross-system effect" is the central discovery of the Lee Silverman Voice Treatment (LSVT LOUD), developed by Ramig and colleagues over 30 years of NIH-funded research, primarily for Parkinson's disease. The mechanism is biomechanical coupling: when you commit to projecting your voice, you necessarily increase respiratory drive (more air pressure), phonatory effort (stronger vocal fold engagement), and articulatory amplitude (your mouth opens wider, your tongue moves further). You can't speak loudly with a closed mouth and lazy tongue.
As the LSVT research team has documented: training the single motor control parameter of amplitude (vocal loudness) results in distributed effects of improved articulation, facial expression, and swallowing. While the original protocol was designed for Parkinson's patients with reduced amplitude across all motor systems, the underlying principle — that loudness acts as a "rising tide" for the entire speech production system — is grounded in physiology, not pathology.
An important distinction: simply being told to "speak loud" is not the same as using loudness as a structured training vehicle. Krause and Braida (2002) found that when speakers were given a one-time instruction to speak loudly, the resulting speech did not consistently improve intelligibility — only "clear speech" as a speaking mode did. The LSVT cross-system effect depends on sustained, intensive training that uses loudness as the primary motor target, which then reorganizes the entire speech production system. For healthy speakers, the practical application is in between: using projected volume as a carrier during practice to recruit fuller articulation, not just turning up the volume.
Why loudness as a carrier beats "slow down" — the evidence
"Just slow down" is probably the most common piece of advice for someone who's hard to understand. The evidence, however, paints a more nuanced picture. Tjaden and Wilding (2004) compared habitual, loud, and slow conditions in speakers with dysarthria. Loud speech improved intelligibility; slow speech did not, and in some cases made it worse. A follow-up study (Tjaden, Sussman, & Wilding, 2014) with 78 speakers with Parkinson's disease and multiple sclerosis confirmed: loud and clear speech conditions improved intelligibility relative to the habitual condition, while the slow condition did not.
These studies used clinical populations with motor speech disorders, but the principle extends to everyday speakers for a plausible reason: voluntary rate reduction without other changes can distort natural prosodic patterns (making speech harder to parse), and speakers who simply slow down without increasing articulatory effort may produce the same underarticulated speech, just more slowly. The rate reduction itself isn't harmful — clear speech is, on average, slower than conversational speech. But the benefit comes from what speakers do with the extra time (open wider, pause at phrase boundaries, finish consonants), not from the slowness itself.
It's worth noting that speaking rate and speaking mode interact. Krause and Braida (2002) showed that clear speech at a conversational rate was still more intelligible than conversational speech. A later analysis of their data (Krause & Braida, 2009) calculated that clear speech produced at normal speaking rates provided roughly 78% of the intelligibility benefit of clear speech at slower rates (14 percentage points vs. 18 percentage points) — confirming that clear speech has inherent acoustic properties beyond just being slower. Rate reduction does contribute to intelligibility, but it's a secondary factor, not the primary one. The practical takeaway: instead of consciously trying to slow down, focus on speaking with more energy and fuller articulation. The slight natural deceleration that accompanies clearer articulation will take care of itself.
What healthy speakers can extract from this
You don't need LSVT LOUD (which requires a certified clinician and an intensive 16-session protocol). But you can apply the core principle: use loudness as a carrier during practice. When you practice speaking clearly, start by turning up the volume. Not shouting — projecting, as if speaking to someone across a medium-sized room. This automatically recruits more breath support and opens your articulatory system. Over time, you'll learn to maintain the articulation benefits at a more normal volume.
Exercise: Breath-Powered Projection Strong
- Sit or stand upright. Place one hand on your abdomen just below your ribs.
- Inhale through your nose for 3–4 seconds, feeling your abdomen expand outward (not your shoulders rising).
- On a sustained /ɑː/ ("ahh"), exhale steadily for as long as comfortable. Target 15–20 seconds. Focus on consistent volume — don't let it trail off.
- Now count from 1 to 10 at a comfortable loud volume (imagine someone across the room needs to hear each number). Maintain even loudness through 10 — most people get quieter around 7 or 8.
- Repeat with a sentence: "I need to pick up the groceries and get home before dinner." The word "dinner" should be as loud and clear as "I."
Progression: When you can sustain even loudness through a full sentence, lengthen to two sentences on a single breath. Then practice short paragraphs, pausing to breathe at natural phrase boundaries rather than gasping mid-phrase.
Safety note: "Projected" means comfortably loud, not strained. If you feel throat tightness, pain, or hoarseness during or after practice, you're pushing too hard. Ease back to a level that feels energized but not effortful in the throat. The effort should come from your breath support (abdomen), not your larynx. If hoarseness persists beyond a day, stop voice exercises and see a professional.
The trailing-off problem: One of the most common patterns in everyday unclear speech is sentence-final fading — the last few words of each sentence dissolve into inaudibility. This happens because speakers run out of air, reduce phonatory effort, or both. If people often ask you to repeat just the end of your sentences, this is your primary target. Practice maintaining volume through the last word of every sentence.
Pillar 2: Open Your Mouth — Jaw Opening & Vowel Space
Strong Evidence
Kinematic studies of clear speech consistently find that one of the most reliable physical differences between clear and conversational speech is increased jaw displacement — speakers open their mouths more. This isn't a trivial cosmetic difference. Greater jaw opening directly expands the acoustic vowel space — the range of formant frequencies (F1 and F2) that distinguish one vowel from another. A larger vowel space means more distinct vowels, which means listeners can tell them apart more easily, especially in noise.
For many everyday speakers, insufficient mouth opening may be the single most actionable change they can make. It requires no special skill, no equipment, and no warm-up — just the awareness and habit of opening your mouth when you speak. The reason it feels "exaggerated" at first is the sensory recalibration problem (discussed in the self-monitoring section): your internal sense of normal is calibrated to your habitual, under-articulated speech. A normal, clear amount of mouth opening genuinely feels like overacting until you recalibrate.
Exercises for vowel space and jaw opening
Exercise: Vowel Space Expansion Strong
- Say "ee — ah — oo" (/iː — ɑː — uː/) slowly and deliberately. These three vowels define the corners of your vowel space. Make them as acoustically different from each other as you can.
- For "ah" /ɑː/: drop your jaw as far as comfortable — aim for at least two finger-widths of opening between your upper and lower teeth.
- For "ee" /iː/: spread your lips and keep your tongue high and forward. You should feel a clear difference from "ah."
- For "oo" /uː/: round your lips fully and bring your tongue high and back.
- Now string them into words: "see — saw — sue." Then into phrases: "He bought two." Each vowel should be acoustically distinct.
Why this works: The vowel triangle /i–ɑ–u/ represents the maximum range of tongue and jaw positions. By practicing the extremes, you stretch your habitual range. This transfers to all the vowels in between. Acoustic studies of clear speech consistently find expanded F1 and F2 ranges — this exercise directly targets that expansion.
Exercise: The Jaw Drop Check Clinical Standard
- Read any sentence aloud at your normal conversational level. Place your hand lightly under your chin.
- Notice how much your jaw actually moves. For most habitual speakers, the answer is: surprisingly little.
- Now re-read the sentence, consciously opening your mouth wider on every stressed vowel. You should feel your chin pressing into your hand more.
- Record both versions and listen back. The "wider" version will almost certainly sound clearer and more natural than you expected — not exaggerated.
Pillar 3: Prosody — Pausing, Phrasing, Emphasis
Strong Evidence
Suprasegmental features — the rhythm, melody, and timing of speech — contribute substantially to intelligibility, possibly more than fine articulatory drill for everyday speakers. Clear speech research consistently shows that speakers adopt a more structured prosodic pattern when speaking clearly: they pause at syntactic boundaries (between phrases and clauses), they group words into meaningful chunks, and they use a wider pitch range.
For listeners, this is enormously helpful. Human speech processing doesn't work by decoding one sound at a time — listeners parse speech into chunks based on prosodic cues. Pausing at the right places gives listeners time to process each chunk. Emphasis on key words helps listeners identify the informational focus. Intonation contour signals whether a sentence is a statement, a question, or continues to the next phrase. When all these cues are flattened — as they are in rushed, monotone, or mushy speech — listeners lose the structural scaffolding they depend on.
Phrasing exercise
Exercise: Chunked Reading Clinical Standard
- Take any paragraph of text. Before reading it aloud, mark it into chunks of 3–7 words using slashes: "The meeting is at three / in the conference room / on the second floor."
- Read aloud, pausing briefly (about half a second) at each slash. Resist the urge to rush through.
- Within each chunk, emphasize the one or two most important words by making them slightly louder, longer, or higher-pitched: "The meeting is at three / in the conference room / on the second floor."
- Record and listen. Does it sound "too slow"? It almost certainly doesn't to a listener — this is the calibration gap (see Self-Monitoring).
Progression: Once chunked reading feels natural, apply the same approach to spontaneous speech. Start with planned speech (telling a story you know well) before moving to unplanned conversation.
Pillar 4: Consonant Precision in Connected Speech
Clinical Standard
Yes, consonant clarity matters — but it matters within the context of connected speech, not as an isolated drill. The motor learning literature is clear on this: task specificity means that practicing individual sounds in isolation has limited transfer to the way those sounds behave in real words, phrases, and sentences. A sound produced in isolation has different coarticulatory demands, timing constraints, and motor plans than the same sound embedded in fluent speech.
The most important consonant clarity issue for everyday speakers is not that they can't produce individual sounds — it's that they drop sounds in connected speech. Final consonants get swallowed ("going" becomes "goin"), consonant clusters get simplified ("texts" becomes "tex"), and unstressed syllables disappear ("probably" becomes "probly"). These reductions are normal in casual speech, but they accumulate to reduce intelligibility, especially in noise or for non-native listeners.
Tongue twisters: useful but limited
Emerging Evidence
Tongue twisters are traditional in speech training and widely used by actors, broadcasters, and SLPs. They do develop articulatory coordination and awareness, and they can reveal which sound sequences are difficult for you. However, there are no large-scale randomized controlled trials demonstrating that tongue twister practice transfers to improved intelligibility in everyday connected speech. Their evidence base rests primarily on clinical tradition and face validity. This doesn't mean they're useless — coordination drills have a role — but they should not be your primary practice tool. Think of them as warm-ups, not the workout.
What Doesn't Work (And Why)
Non-speech oral motor exercises (NSOME)
Blowing whistles, puffing cheeks, pushing the tongue against a depressor, stretching lips — these "oral motor exercises" are surprisingly common in clinical practice (surveys show 85% of SLPs have used them), but the evidence for their effectiveness is overwhelmingly negative. Multiple systematic reviews, including a Cochrane review (Lee & Gibbon, 2015) and systematic reviews by McCauley et al. (2009) and Lof & Watson (2008), found no substantive evidence that non-speech oral motor exercises improve speech production.
The fundamental problem is task specificity: the neural control systems for speech and non-speech oral movements are substantially different. Blowing is not speech. Tongue push-ups are not speech. Functional imaging studies show different activation patterns for speech versus non-speech oral tasks (Bunton, 2008). As motor learning research predicts, training on one task does not transfer to a fundamentally different task. Lof and Watson's frequently cited work (2008, 2010) identified five theoretical and empirical reasons NSOMEs don't work, chief among them that speech is not about oral muscle strength but about coordination, timing, and acoustic targets.
What to do instead: If you want to improve how you speak, practice speaking. The exercises in this guide are all speech-based — they involve producing actual speech sounds, words, phrases, and sentences. This is consistent with the task-specificity principle and with ASHA's evidence-based practice guidelines.
"Just slow down" (by itself)
As discussed under Pillar 1: rate reduction alone, without accompanying changes in articulatory effort, mouth opening, or phrasing, may not improve intelligibility and can sometimes make it worse. Tjaden and Wilding (2004) and Tjaden, Sussman, and Wilding (2014) found — in speakers with dysarthria due to Parkinson's disease and multiple sclerosis — that slow speech did not improve scaled intelligibility while loud speech did. Van Nuffelen et al. (2010) studied seven rate-control methods in 27 speakers with dysarthria and found that rate control did not improve overall intelligibility across the group — clinically meaningful improvement occurred in only about half of participants, and the maximal decrease in speaking rate was not associated with the maximal increase in intelligibility.
These findings come from clinical populations, but they illustrate a general principle: the problem isn't slowness itself — it's that people interpret "slow down" as "stretch everything out evenly," which distorts natural prosody without improving segmental clarity. If someone slows down and also opens their mouth more and pauses at phrase boundaries, they will be clearer — but the benefit comes from the other changes, not the rate reduction per se. Rate reduction does contribute to the clear speech advantage (Krause & Braida, 2002), but it is a secondary factor — clear speech produced at a normal rate still provides the majority of the intelligibility benefit (Krause & Braida, 2009).
Better advice: "Speak with more energy" or "overenunciate" rather than "slow down." Research by Lam and Tjaden (2013) found that the instruction to overenunciate produced the greatest intelligibility gains.
Isolated sound drills without connected speech
Practicing /s/ by itself for ten minutes will make you very good at saying /s/ by itself. It will not reliably improve how you produce /s/ in "I sent six messages last Saturday." The motor plan for an isolated sound is fundamentally different from the motor plan for that sound embedded in coarticulated speech. This is another facet of task specificity.
All the exercises and practice routines in this guide use words, phrases, and sentences from the beginning, not sequences of isolated sounds. If you need to work on a specific sound, practice it in progressively longer and more complex speech contexts: syllables → words → short phrases → sentences → connected speech.
How to Practice: Motor Learning Science
Strong Evidence
How you structure your practice matters as much as what you practice. The motor learning literature, synthesized for speech by Maas et al. (2008) in a landmark tutorial in the American Journal of Speech-Language Pathology, identifies several principles that predict whether practice leads to lasting improvement:
Distributed beats massed. Short, frequent practice sessions (10–15 minutes daily) produce better long-term retention than long, infrequent sessions (45 minutes once a week). This is true across motor skills and is one of the most robust findings in the motor learning literature. For speech, this means: practice every day, but keep it brief.
Random beats blocked (for retention). Mixing different targets within a practice session (e.g., switching between a loudness exercise, a phrasing exercise, and a tongue twister) produces worse performance during practice but better retention and transfer afterward. This is called the "contextual interference effect." Blocked practice (drilling one thing at a time) feels more productive in the moment but doesn't stick as well. The practice routines below mix targets deliberately.
Practice should feel effortful. Research on the cognitive cost of clear speech production confirms that speaking clearly requires measurable cognitive effort even for healthy speakers. Keerstock and Smiljanić (2021) showed that reading aloud in a clear speaking style reduced speakers' own recognition memory and recall of what they had just read — the effort of producing clear speech consumed cognitive resources that would otherwise support memory encoding. This is important to know: if speaking clearly feels effortful and exaggerated at first, that's a sign you're actually changing your behavior. With practice, effort decreases through automatization — but the early difficulty is expected and correct, not a sign you're doing it wrong.
Delayed feedback beats immediate feedback. When practicing alone, resist the urge to correct every single attempt in real time. Instead, produce several attempts, then review (by recording and listening back). This encourages you to develop internal monitoring rather than relying on external correction — which is essential for transfer to real conversation.
Self-Monitoring Without a Clinician
The biggest obstacle to solo practice is the sensory recalibration problem. This is well-documented in the LSVT literature: speakers consistently perceive their appropriately loud, clear speech as "too loud" or "too exaggerated." When Parkinson's patients are trained to speak at a normal conversational volume, they reliably report that it feels like shouting. This isn't unique to clinical populations — healthy speakers show the same bias. Your internal sense of "normal" is calibrated to whatever you habitually do. When you change, the new behavior feels wrong precisely because it's different, not because it actually sounds excessive.
Rule of thumb: If your clearly spoken speech feels 30–40% too loud, too exaggerated, or too slow, it probably sounds just right to your listeners. If it feels comfortable and natural to you, you probably haven't changed anything. Comfort is the enemy of improvement in the early stages.
Practical self-assessment methods
Record and listen back. Use your phone to record yourself reading a paragraph, first at your habitual level, then at your "too loud / too clear" level. Play both back. The second will almost always sound better — more natural, more authoritative, more intelligible — than you expected. The gap between how it feels and how it sounds is the recalibration gap.
Use speech recognition as a rough proxy. Dictate a passage to your phone's speech-to-text function, first casually, then clearly. Compare accuracy. This isn't a clinical measure, but if the dictation software can understand you better, listeners probably can too.
Ask someone. Tell a friend or family member you're working on speaking more clearly and ask for honest feedback. Specifically ask: "Can you understand me easily?" and "Do I sound weird or exaggerated?" The answers will almost always be "yes" and "no."
Daily Practice Routines
Each routine below is designed to take 10–15 minutes. Practice daily — brief and consistent beats long and sporadic. Each routine integrates all four pillars (loudness, mouth opening, prosody, consonant precision) in a mixed-target format, consistent with the motor learning principle of randomized practice for better retention.
English Routine
Warm-Up: Breath & Volume Calibration
2 min Diaphragmatic breath: inhale 4 seconds, sustain "ahh" for 15+ seconds at a projected volume. Count 1–20, maintaining even loudness — number 20 should be as loud as number 1. If you trail off, restart from where you faded.
Block A: Vowel Space + Jaw Opening
3 min Say each pair slowly, exaggerating the vowel contrast. Open your jaw wide on the open vowels. Alternate between pairs randomly (don't drill one pair 10 times — mix them):
Minimal pairs: beat / bat — Luke / lock — seed / sad — pool / Paul — sheep / shop
Then in sentences: "He beat the bat away." / "Did Luke turn the lock?" / "The sheep left the shop."
Focus: make the vowels maximally different from each other. Your jaw should visibly drop on every open vowel.
Block B: Phrasing & Emphasis
3 min Read aloud with deliberate chunking. Pause for half a beat at each slash. Emphasize the bolded words slightly (louder, longer, or higher pitch):
"I called the office / three times today / but nobody answered."
"She's moving to Seattle / at the end / of March."
"The problem isn't the budget / it's the timeline."
Then make up your own sentences, marking chunks and emphasis before reading.
Block C: Consonant Precision in Phrases
3 min These phrases target common reduction points. Say each at a projected volume, finishing every final consonant and every consonant cluster:
She asked for the sixth text. (final clusters: /skt/, /ksθ/, /kst/) — The guests left gifts on the desks. — He worked hard, but the strength wasn't enough. — Lists of specific facts and statistics.
Then, tongue twisters for coordination (warm-up, not the main event):
Red lorry, yellow lorry — Unique New York — She sells seashells by the seashore.
Say each 3 times, getting slightly faster but never sacrificing clarity for speed.
Cool-Down: Connected Speech
2 min Pick any topic and speak about it for two minutes at your "projected" level, applying everything: steady loudness, open mouth, chunked phrasing, clear final consonants. Record it. Listen back. Notice the gap between how it felt and how it sounds.
Advancement criteria: When your recorded cool-down speech sounds clear and natural to you on playback (not exaggerated), AND a speech-to-text app transcribes it with 95%+ accuracy, AND at least one listener confirms it sounds good — you've internalized this level. Begin practicing in real conversations: pick one conversation per day to use your "clear voice." Gradually increase.
Svenska (Swedish)
Swedish phonology: what makes it distinctive
Swedish presents several distinctive challenges for articulation clarity:
Vowel system: Swedish has one of the largest vowel inventories among European languages — nine vowel qualities, each occurring in phonemically contrastive long and short forms (totaling roughly 17–18 vowel phonemes depending on analysis; short /e/ and /ɛ/ coincide in many dialects, giving 17 rather than 18). The long/short distinction involves both duration and quality differences, so the contrast is robust but requires precise articulation. Key challenges include the front rounded vowels /yː, ʏ, øː, œ/, which don't exist in English.
Pitch accent: Swedish has a lexical pitch accent system — accent 1 (acute) and accent 2 (grave) — that distinguishes word meanings (e.g., anden "the duck" vs. anden "the spirit"). The exact tonal realization varies substantially by dialect. In Central Swedish (Stockholm area), the contrast is typically described as a single-peaked contour for accent 1 and a double-peaked contour for accent 2. In other dialects the contours differ, and some dialects (e.g., parts of Finland Swedish) lack the tonal contrast entirely. Clear production of pitch accent contributes to intelligibility and naturalness.
The sj-sound /ɧ/: This phoneme, spelled ⟨sj, sk, stj, skj⟩ and others, is one of the most discussed sounds in phonetics because of its remarkable articulatory variability — it has been described variously as a velar-labial fricative, a uvular fricative, or a labialized postalveolar fricative, and its realization varies significantly across speakers and dialects. There is no single "correct" articulation; clarity requires a consistent, audible production regardless of which articulatory strategy the speaker uses.
Retroflex assimilation: When an /r/ precedes a dental consonant /t, d, n, s, l/, the sequence merges into a retroflex consonant /ʈ, ɖ, ɳ, ʂ, ɭ/ in Central and Northern Swedish dialects. This occurs both within words and across word boundaries (bort → [bɔʈ], för sent → [fœːˈʂɛnt]). Southern and Finland Swedish dialects generally do not produce retroflexes, preserving the consonant sequences instead. These assimilations are natural in fluent Swedish but can reduce clarity if they cause adjacent sounds to merge ambiguously.
Uppvärmning: Andning och volym
2 min Diafragmaandning: andas in 4 sekunder, håll "aaa" i 15+ sekunder med tydlig volym. Räkna 1–20 på svenska med jämn styrka genom hela räkningen.
Block A: Vokalkontraster och käköppning
3 min Svenska har rika vokalkontraster. Öva dessa minimalpar med tydlig skillnad — öppna käken ordentligt på de öppna vokalerna:
Lång/kort: mat /mɑːt/ – matt /matː/ — glas /ɡlɑːs/ – glass /ɡlasː/ — ful /fʉːl/ – full /fɵlː/
Rundade vokaler: sy /syː/ – se /seː/ – så /soː/ — söt /søːt/ – sot /suːt/
I meningar: "Huset var fullt av folk." / "Hon hade ett sött leende."
Block B: Prosodi och tonfallsaccent
3 min Öva frasering med pauser. Betona de markerade orden:
"Jag ringde kontoret / tre gånger idag / men ingen svarade."
"Hon flyttar till Göteborg / i slutet / av mars."
Accent 1 vs. accent 2: anden (the duck, accent 1) vs. anden (the spirit, accent 2) — say both with distinct tonal contours.
Block C: Tungvrickare och konsonantprecision
3 min Dessa klassiska tungvrickare tränar specifika svenska ljud:
Sju sjösjuka sjömän sköttes av sju sköna sjuksköterskor.
(sj-sound /ɧ/ + sk-sound, rounded vowels)
Sex laxar i en laxask.
(s + ks clusters, vowel contrasts)
Packa pappas kappsäck.
(p/k contrasts, consonant clusters)
Flyg, fula fluga, flyg! Och den fula flugan flög.
(fl-clusters, y/u vowel contrast)
Säg varje tungvrickare 3 gånger. Börja långsamt, öka tempot utan att tappa tydlighet.
Avslutning: Fri tal
2 min Tala fritt i 2 minuter om valfritt ämne — med tydlig volym, öppen mun, och medveten frasering. Spela in och lyssna tillbaka.
Advancement criteria: When your recorded free speech sounds clear and natural on playback (not exaggerated), AND a Swedish speech-to-text app transcribes it with high accuracy (including correct å/ä/ö), AND at least one listener confirms it sounds natural — begin using your clear voice in one real conversation per day. Gradually increase.
한국어 (Korean)
Korean phonology: what makes it distinctive
Korean presents articulation challenges that are fundamentally different from Germanic languages:
Three-way laryngeal contrast: Korean stops and affricates come in three series — lenis (평음, e.g., ㄱ ㄷ ㅂ ㅈ), fortis (경음, e.g., ㄲ ㄸ ㅃ ㅉ), and aspirated (격음, e.g., ㅋ ㅌ ㅍ ㅊ). This contrast is cued by multiple acoustic dimensions: Voice Onset Time (VOT), fundamental frequency (f0) of the following vowel, and voice quality (spectral tilt, breathiness). Importantly, Seoul Korean is undergoing a well-documented sound change: the VOT difference between lenis and aspirated stops has been converging in the speech of younger speakers (especially women), and the fundamental frequency of the following vowel has become an increasingly important distinguishing cue. This is described in the phonetics literature as a case of incipient tonogenesis (Kang, 2014; Bang et al., 2018).
Positional sound changes: Korean consonants change significantly depending on their position in the syllable. Stops are released in onset position but unreleased in coda. Multiple coda consonants neutralize — for example, ㄱ, ㄲ, and ㅋ all become unreleased [k̚] in syllable-final position. Intervocalic lenis stops become voiced. Obstruent clusters at syllable boundaries undergo complex assimilation rules. Clear speech requires producing these natural alternations consistently while maintaining enough contrast that listeners can recover the intended words.
Syllable structure: Korean has a relatively simple syllable structure: (C)V(C). There are no onset clusters and limited coda consonants (only seven are permitted in syllable-final position). But the linking and resyllabification across word boundaries, combined with the positional sound changes, creates patterns that require precise timing and coordination.
준비 운동: 호흡과 발성 (Warm-Up: Breath & Voice)
2분 복식 호흡: 4초 들이쉬고, "아아아" 소리를 15초 이상 일정한 크기로 유지합니다. 1부터 20까지 세면서 끝까지 소리 크기를 유지하세요.
Block A: 자음 대조 연습 (Consonant Contrast Practice)
3분 평음, 경음, 격음의 세 가지 대조를 연습합니다. 각 단어를 크게, 입을 열고 발음하세요:
ㄷ-ㄸ-ㅌ: 달 — 딸 — 탈
ㅂ-ㅃ-ㅍ: 불 — 뿔 — 풀
ㅈ-ㅉ-ㅊ: 자다 — 짜다 — 차다
In a sentence: "달이 밝으니까 딸과 함께 탈을 쓰고 놀자." Mix the pairs randomly, not in blocks.
Block B: 문장 읽기 — 끊어 읽기 연습 (Phrased Reading)
3분 슬래시에서 잠깐 쉬고, 굵은 글씨 단어를 강조하세요:
"오늘 회의는 / 세 시에 / 회의실에서 합니다."
"내일 아침까지 / 보고서를 / 제출해 주세요."
"문제는 예산이 아니라 / 시간입니다."
Block C: 잰말놀이 (Tongue Twisters)
3분 각 잰말놀이를 3번 말하세요. 천천히 시작해서 점점 빠르게:
간장 공장 공장장은 장 공장장이고 된장 공장 공장장은 강 공장장이다.
(Targets: ㄱ/ㅈ/ㅇ consonant sequences, similar-syllable coordination)
경찰청 철창살은 외철창살이고 검찰청 철창살은 쌍철창살이다.
(Targets: ㅊ/ㅈ affricates, ㄹ lateral, consonant clusters)
네가 그린 기린 그림은 못 그린 기린 그림이고 내가 그린 기린 그림은 잘 그린 기린 그림이다.
(Targets: ㄱ/ㄹ sequences, ㅡ/ㅣ vowel precision)
마무리: 자유 발화 (Free Speech Cool-Down)
2분 2분 동안 자유롭게 이야기하세요 — 크게, 입을 크게 열고, 끊어서 말하기를 의식하면서. 녹음하고 다시 들어보세요.
Advancement criteria: When your recorded free speech sounds clear and natural on playback, AND Korean speech-to-text accurately transcribes your three-way consonant contrasts (평음/경음/격음), AND at least one listener confirms it sounds natural — begin applying your clear voice to one real conversation per day. Gradually increase.
Environmental Strategies
Strong Evidence
Some of the highest-impact, lowest-effort interventions for being understood better have nothing to do with how you speak — they involve managing the environment in which you speak. These strategies are standard recommendations in audiology and supported by decades of research on speech perception in noise:
Face the listener. Visual speech cues (lip movements, facial expressions, jaw movement) contribute substantially to intelligibility. Research on audiovisual speech perception consistently shows that seeing the speaker's face can add roughly 5–10 dB of effective signal-to-noise ratio — equivalent to a substantial reduction in the effective background noise level. Face-to-face conversation where the listener can see your mouth is always clearer than talking while turned away, looking at a screen, or covering your mouth.
Reduce distance. Sound intensity drops by 6 dB for every doubling of distance. Moving from across the room to across the table makes an enormous difference. For phone calls, keep the microphone close and positioned properly.
Minimize competing noise. Turn off background music or TV. Close the window. Move away from the kitchen noise. The single biggest predictor of whether someone can understand you is the signal-to-noise ratio — how loud you are relative to the background. Every bit of noise reduction helps.
Ensure adequate lighting. Listeners use visual speech information automatically, even when they don't think they're lip-reading. Good lighting on the speaker's face makes this unconscious processing more effective.
When to Seek Professional Help
This guide is for anyone who wants to speak more clearly in everyday life. It is not a substitute for professional evaluation. Consider seeing a speech-language pathologist (SLP) or physician if you experience any of the following:
Sudden changes in speech clarity, voice quality, or the ability to find or produce words — especially following a neurological event (stroke, head injury) or alongside new neurological symptoms.
Persistent hoarseness lasting more than 2–3 weeks, especially if accompanied by throat pain, difficulty swallowing, or unintended weight loss.
Difficulty swallowing (dysphagia) — choking on food or liquids, food "sticking," or coughing during meals.
Hearing loss — if you struggle to hear others, this may be the primary reason they struggle to hear you (speakers unconsciously reduce effort when they can't monitor their own output). An audiological evaluation is the first step.
Speech changes that worsen over time rather than staying stable, which may indicate a progressive neurological condition.
An SLP can provide individualized assessment, instrumental measures (acoustic analysis, nasometry), and targeted treatment that a self-guided resource cannot replace.
References & Sources
The following references informed this guide. Only sources the author is confident are real and correctly attributed are included. Where specific studies could not be verified with certainty, findings are described in the text without fabricated citations.
Bang, H.-Y., Sonderegger, M., Kang, Y., Clayards, M., & Yoon, T.-J. (2018). The emergence, progress, and impact of sound change in progress in Seoul Korean: Implications for mechanisms of tonogenesis. Journal of Phonetics, 66, 120–144.
Bunton, K. (2008). Speech versus nonspeech: Different tasks, different neural organization. Seminars in Speech and Language, 29(4), 267–275.
Fox, C. M., Ramig, L. O., Ciucci, M. R., Sapir, S., McFarland, D. H., & Farley, B. G. (2006). The science and practice of LSVT/LOUD: Neural plasticity-principled approach to treating individuals with Parkinson disease and other neurological disorders. Seminars in Speech and Language, 27(4), 283–299.
Kang, Y. (2014). Voice onset time merger and development of tonal contrast in Seoul Korean stops: A corpus study. Journal of Phonetics, 45, 76–90.
Keerstock, S., & Smiljanić, R. (2021). Reading aloud in clear speech reduces sentence recognition memory and recall for native and non-native talkers. Journal of the Acoustical Society of America, 150(5), 3387–3398.
Krause, J. C., & Braida, L. D. (2002). Investigating alternative forms of clear speech: The effects of speaking rate and speaking mode on intelligibility. Journal of the Acoustical Society of America, 112(5), 2165–2172.
Krause, J. C., & Braida, L. D. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of the Acoustical Society of America, 115(1), 362–378.
Krause, J. C., & Braida, L. D. (2009). Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech. Journal of the Acoustical Society of America, 125(5), 3346–3357.
Lam, J., & Tjaden, K. (2013). Intelligibility of clear speech: Effect of instruction. Journal of Speech, Language, and Hearing Research, 56(5), 1429–1440.
Lee, A. S. Y., & Gibbon, F. E. (2015). Non-speech oral motor treatment for children with developmental speech sound disorders. Cochrane Database of Systematic Reviews, 2015(3).
Lof, G. L., & Watson, M. (2008). A nationwide survey of nonspeech oral motor exercise use: Implications for evidence-based practice. Language, Speech, and Hearing Services in Schools, 39, 392–407.
Lof, G. L., & Watson, M. M. (2010). Five reasons why nonspeech oral motor exercises (NSOME) do not work. Perspectives on School-Based Issues, 11, 109–117.
Maas, E., Robin, D. A., Austermann Hula, S. N., Freedman, S. E., Wulf, G., Ballard, K. J., & Schmidt, R. A. (2008). Principles of motor learning in treatment of motor speech disorders. American Journal of Speech-Language Pathology, 17(3), 277–298.
McCauley, R. J., Strand, E., Lof, G. L., Schooling, T., & Frymark, T. (2009). Evidence-based systematic review: Effects of nonspeech oral motor exercises on speech. American Journal of Speech-Language Pathology, 18(4), 343–360.
Payton, K. L., Uchanski, R. M., & Braida, L. D. (1994). Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. Journal of the Acoustical Society of America, 95(3), 1581–1592.
Picheny, M. A., Durlach, N. I., & Braida, L. D. (1985). Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28(1), 96–103.
Smiljanić, R., & Bradlow, A. R. (2009). Speaking and hearing clearly: Talker and listener factors in speaking style changes. Language and Linguistics Compass, 3(1), 236–264.
Tjaden, K., & Wilding, G. E. (2004). Rate and loudness manipulations in dysarthria: Acoustic and perceptual findings. Journal of Speech, Language, and Hearing Research, 47(4), 766–783.
Tjaden, K., Sussman, J. E., & Wilding, G. E. (2014). Impact of clear, loud, and slow speech on scaled intelligibility and speech severity in Parkinson's disease and multiple sclerosis. Journal of Speech, Language, and Hearing Research, 57(3), 779–792.
Van Nuffelen, G., De Bodt, M., Vanderwegen, J., Van de Heyning, P., & Wuyts, F. (2010). Effect of rate control on speech production and intelligibility in dysarthria. Folia Phoniatrica et Logopaedica, 62(3), 110–119.