Concurrency and Threading
1. Android Threading Model
The Main Thread & The Looper
At the heart of every Android application is the Main Thread (often called the UI Thread). Unlike a standard Java thread that executes a task and dies, the Android Main Thread sits in an infinite loop, waiting for events to process.
- Looper: This is the mechanism that keeps the thread alive. It continuously loops through a MessageQueue, pulling out tasks (Messages or Runnables) and executing them one by one.
- MessageQueue: A queue of tasks waiting to be processed. These tasks include drawing frames, handling touch events, processing broadcast receivers, and executing
Handler.post()blocks.
The Golden Rule: Never Block the Main Thread Because the Main Thread processes events sequentially, if you execute a block of code that takes 2 seconds to run, no frames can be drawn and no touches can be handled for those 2 seconds.
- Dropped Frames: If processing takes >16ms, the app misses the 60fps window, causing "jank".
- ANR (Application Not Responding): If the main thread is blocked for >5 seconds (specifically for input events) or >10 seconds for BroadcastReceivers, the OS prompts the user to kill the app.
Background Work Options
Since we cannot block the Main Thread, we must offload work.
| Tool | Ideal Use Case | Lifecycle Scope |
|---|---|---|
| Coroutines | The standard for most async work (network, DB, easy concurrency). | Tied to specific scopes (viewModelScope, lifecycleScope). |
| WorkManager | Guaranteed, deferrable execution (e.g., uploading 500MB of logs). | Survives app restart and process death. |
| Foreground Service | Long-running work the user must be aware of (e.g., Music Player, Navigation). | Lives as long as the notification is visible. |
| Handler/Thread | Legacy (Inter-Thread Communication). Rare in modern apps. | Manual management (high risk of leaks). |
2. Coroutines Deep Dive
Kotlin Coroutines are often described as "lightweight threads," but a better mental model is sequential code that can pause.
Suspending vs. Blocking
- Blocking (Thread): The thread stops working and waits for the IO/Operation to release it. It consumes system resources (RAM for stack) while doing nothing useful.
- Suspending (Coroutine): The function pauses its execution and returns the thread to the pool. The thread is free to go do other work. When the result is ready, the coroutine resumes (potentially on a different thread).
The Suspend State Machine (Under the Hood)
A suspend function is not magic. The Kotlin compiler transforms it into a state machine.
suspend fun fetch() {
val token = getToken() // Suspension point 1
val user = getUser(token) // Suspension point 2
}
Compilation:
- The function signature changes to accept a
Continuationparameter:func fetch(completion: Continuation<Any?>) - The body acts like a giant
switchstatement based on alabelfield. - State 0: Call
getToken. ReturnCOROUTINE_SUSPENDED. - Resume: Callback triggers,
labelmoves to 1. - State 1: Use result of
getToken, callgetUser. ReturnCOROUTINE_SUSPENDED. - Resume: Callback triggers,
labelmoves to 2. - State 2: Return final result.
This "Continuation Passing Style" allows the thread to be released while the function state (local variables) is stored in the heap object (Continuation).
Dispatchers: Where the work happens
Think of a Dispatcher as a scheduler. It receives a coroutine and decides where and when it runs. It determines which thread (or thread pool) executes the coroutine.
| Dispatcher | Underlying Thread Pool | Best For |
|---|---|---|
| Dispatchers.Main | The single Android Main Thread (via Handler). | UI updates, lightweight animations, calling livedata.value. |
| Dispatchers.IO | Elastic thread pool (64+ threads). Threads are created on demand. | Blocking I/O (Database, Network, File Reading). Designed for waiting. |
| Dispatchers.Default | Fixed pool size (matches CPU Cores). | CPU-intensive tasks (Sorting list, parsing huge JSON, image processing). Designed for active computation. |
| Dispatchers.Unconfined | Executes immediately in the caller's thread until the first suspension. | Avoid in production code. Mostly used for unit testing mechanics. |
Switching Contexts
You often need to switch between these worlds.
withContext(Dispatcher): Switches to the new dispatcher, suspends the caller until the block finishes, and returns the result. This is for synchronous style code.// Runs on Main val data = withContext(Dispatchers.IO) { database.loadData() // Runs on IO } // Suspends Main until IO is done textView.text = data // Back on Mainlaunch/async: Starts a new concurrent task. Does not pause the caller.
It's not guaranteed after switching back to the previous dispatcher, the thread is the same. For example, if you switch from IO to Main, the thread might be different from the one that executed the IO task. For strict thread handling process like SQLite Transaction, this can cause deadlock.
3. Structured Concurrency
Q: What is structured concurrency?
Coroutines follow a parent-child hierarchy within a scope:
- Parent waits for all children to complete
- Cancelling parent cancels all children
- Child failure propagates to parent (by default)
This prevents orphan coroutines and ensures cleanup.
Q: What scopes should you use in Android?
| Scope | Lifecycle | Use Case |
|---|---|---|
| viewModelScope | ViewModel | Most business logic |
| lifecycleScope | Activity/Fragment | UI-related work |
| rememberCoroutineScope | Composition | Compose event handlers |
| CoroutineScope | Custom | Services, application-level |
Q: Why avoid GlobalScope?
GlobalScope creates coroutines that:
- Never auto-cancel (leak potential)
- Live for app lifetime
- Can't be tested properly
- Violate structured concurrency
Use custom scopes instead, cancelled when appropriate.
4. Exception Handling
Handling errors in async code is notoriously hard. Coroutines try to normalize this with try-catch, but there are caveats based on how you start the coroutine.
launch vs async
launch: Propagates exceptions immediately. It treats an exception as a fatal crash of that job hierarchy.- Solution: Encapsulate the code inside the launch with
try-catch.
- Solution: Encapsulate the code inside the launch with
async: Defers the exception. It is stored inside theDeferredobject and only thrown when you call.await().- Solution: Wrap the
.await()call intry-catch.
- Solution: Wrap the
The SupervisorJob
By default, if one child coroutine fails, the parent cancels all other children. Example: You are loading "User Profile" and "Friend List" in parallel. If "Friend List" fails, you typically don't want to crash the "User Profile" load.
To fix this, use a SupervisorJob (or supervisorScope).
- SupervisorJob: "My children can fail independently. If one dies, don't kill me or its siblings."
- Note:
viewModelScopeandlifecycleScopeact as supervisors by default.
// Example of independent execution
val scope = CoroutineScope(SupervisorJob() + Dispatchers.Main)
scope.launch {
// If this crashes...
throw Exception("Boom")
}
scope.launch {
// ...this one continues running happy and free.
delay(1000)
}
5. Cancellation
Cancellation is cooperative. Just because you call job.cancel() doesn't mean the thread stops instantly. The code running inside the coroutine must check if it should stop.
Standard coroutine functions (delay(), withContext(), yield()) check for cancellation automatically. If they see the job is cancelled, they throw a special CancellationException, which stops execution.
The "CPU Loop" Problem If you have a tight loop doing heavy math, it might never check for cancellation.
// BAD: This will not cancel until the loop finishes!
suspend fun costlyMath() = withContext(Dispatchers.Default) {
var i = 0
while (i < 1000000000) {
i++ // Heavy crunching
}
}
// GOOD: Add a check point
suspend fun costlyMath() = withContext(Dispatchers.Default) {
var i = 0
while (i < 1000000000) {
ensureActive() // Throws cancellation exception if job needs to stop
i++
}
}
Cleanup
If a coroutine is cancelled, you might need to close a file or a socket.
try-finally: Thefinallyblock creates a safe space to run cleanup code, even during cancellation.withContext(NonCancellable): If you need to run a suspend function during cleanup (e.g., sending a "Goodbye" network packet), you must wrap it inNonCancellable, otherwise the cleanup code itself will be cancelled immediately!
6. Coroutine Thread Safety
Just because coroutines are "easy" doesn't mean thread safety disappears. If you modify a shared mutable list from multiple coroutines on Dispatchers.Default, you will have race conditions.
Solutions
- Mutex: The coroutine equivalent of a
lock. It suspends instead of blocking.val mutex = Mutex() mutex.withLock { sharedCounter++ } - Thread Confinement: The simplest solution. Restrict the state to a single thread. For example, creating a custom single-threaded dispatcher, or doing all updates on
Dispatchers.Main. - Atomic Variables:
AtomicInteger,AtomicReference. Good for simple counters, harder for complex logic. - Actors: A pattern where a specific coroutine manages a state, and others communicate with it via a
Channel. The state is never exposed, only "messages" to change it.
7. WorkManager - Persistent Work
Coroutines are great, but they die if the app process dies. WorkManager is for work that must happen, even if the user reboots their phone or force-closes the app.
Key Distinction: "Deferrable" WorkManager is not for "right now". It's for "as soon as possible, but respect system health". It might run immediately, or it might run in 10 minutes when the radio wakes up.
Work Constraints You can tell the OS: "Only run this log upload when..."
- The device is Charging
- We have unmetered WiFi
- The device is idle (screen off)
Chaining WorkManager excels at complex dependency chains that need reliability.
downloadWorker -> processWorker -> uploadWorkerIf the app dies duringprocessWorker, WorkManager will resurrect the app later and restart fromprocessWorker(reusing the intermediate results if configured correctly).
Pitfall: Distributed Denial of Service (The "Thundering Herd")
Be careful when using WorkManager for regular content sync (e.g., "Sync data every 6 hours").
- The Problem: Android's Doze Mode restricts background tasks to specific "maintenance windows". If 10,000 devices are asleep, they will all wake up at the exact same moment (start of the maintenance window) and hit your server simultaneously.
- Why Jitter Fails: Adding a random delay (
setInitialDelay) is often insufficient. Doze "quantizes" wake-ups. If devices have jittered times of 12:05, 12:15, and 12:20, but the maintenance window only opens at 12:30, all of them will fire at 12:30 anyway, defeating the jitter.
8. Testing Coroutines & Flows
Asynchronous code is traditionally hard to test because of "flakiness" and wait times. Kotlin provides kafka-coroutines-test to solve this.
runTest
The golden rule of coroutine testing: Use runTest instead of runBlocking.
runTest creates a TestScope that skips delay() calls. A 10-second delay executes instantly in virtual time.
@Test
fun testDelay() = runTest {
val result = repo.fetchData() // Takes 5 seconds "virtually"
assertEquals("Success", result)
} // Test completes in milliseconds, not 5 seconds
TestDispatcher
Sometimes you need to control the execution order precisely.
- StandardTestDispatcher: Queues coroutines. You must manually call
runCurrent()oradvanceUntilIdle()to let them run. Good for verifying intermediate states. - UnconfinedTestDispatcher: Runs coroutines eagerly (immediately). Simpler for basic tests.
9. Asynchronous Flows (Reactive Streams)
Coroutines return one value. Flows emit multiple values over time (like a pipe of data).
Cold Flow vs Hot Flow
- Cold Flow (
flow { }): Lazy. It doesn't start doing anything until someone collects it.- Internals: It behaves like a function. If 3 subscribers call
collect(), the builder block runs 3 times independently.
- Internals: It behaves like a function. If 3 subscribers call
- Hot Flow (
SharedFlow,StateFlow): Active. It emits data even if no one is listening.- Internals: Maintains an active buffer/cache. Subscribers maximize sharing.
Backpressure: Producer vs Consumer Speed
The Problem: What happens if the Producer emits data faster than the Consumer can process in collect()?
- Producer: Emits every 10ms.
- Consumer: Takes 100ms to process each item.
Strategy 1: Suspension (The Default) Since Flow is sequential strings of coroutines, the Producer suspends and waits for the Consumer to finish.
- Effect: The whole pipeline slows down to the Consumer's speed.
- Total Time: 10 items * 100ms = 1000ms. No data is lost, but it's slow.
Strategy 2: Buffering (buffer())
"Don't wait for me, just put it in a pile."
- Creates a separate coroutine for the Producer. It runs as fast as it can and puts items into a Channel (queue).
- The Consumer pulls from the queue at its own pace.
- Effect: Producer finishes quickly. Consumer is still slow.
- Risk: If the queue grows infinitely, you get OutOfMemoryError.
Strategy 3: Conflation (conflate())
"I only care about the latest news." (e.g., Stock Prices, UI State)
- If the Consumer is busy processing
Item 1, and the Producer emits2,3,4...conflatedrops2and3. - When Consumer finishes
Item 1, it picks upItem 4(the latest). - Effect: Very fast, but data is lost (dropped).
Strategy 4: Cancellation (collectLatest)
"If new data arrives, stop what you are doing and restart with the new data." (e.g., Search Query)
- User types "A". Search starts.
- User types "Ab". Previous search is cancelled immediately. New search for "Ab" starts.
- Effect: Extremely responsive UI, but wastes CPU on cancelled work.
flow.collectLatest { value ->
// If a new value creates while this block is running,
// this block is CANCELLED (Throw CancellationException)
// and restarted with the new value.
updateUI(value)
}
StateFlow vs SharedFlow (The LiveData Replacements)
| Type | Behavior | Best Used For |
|---|---|---|
| StateFlow | Holds one current value. Replays latest to new collectors. | UI State (LCE pattern). Replacement for LiveData. |
| SharedFlow | Holds zero or more buffered values. Emits events. | One-off events (SnackBar, Navigation, Toasts). |
Thread Safety Note: StateFlow.update { ... } is thread-safe. StateFlow.value = ... is merely atomic for the write, but not for read-modify-write cycles (race condition possible).
10. JVM Memory Model & Primitives
While Kotlin hides this, understanding the JVM under the hood is key for senior engineers.
1. Visibility (The volatile keyword)
If Thread A changes a boolean isRunning = false, Thread B might not see that change immediately because it's cached in the CPU core.
volatile: "Do not cache this variable. Always read/write directly to main RAM."- Guarantees: Visibility.
- Does NOT Guarantee: Atomicity.
volatile count++is not thread-safe.
2. Synchronization (synchronized)
The classic way to ensure only one thread enters a block at a time.
- in Kotlin:
@Synchronizedannotation orsynchronized(lock) { ... }. - Impact: It blocks the thread. Coroutine
Mutexis preferred because it suspends.
3. Deadlock
Happens when T1 holds Resource A and waits for B, while T2 holds Resource B and waits for A. Neither can proceed.
- Prevention: Always acquire locks in the same order.
Quick Reference
| Topic | Key Points |
|---|---|
| Threading Rule | Never block main thread; use background threads for I/O and computation |
| Looper | Mechanism that keeps the Main Thread alive to process events |
| Dispatchers | Scheduler that decides thread execution. Main (UI), IO (Wait), Default (CPU). |
| Structured Concurrency | Parent-child hierarchy; automatic cancellation; use viewModelScope/lifecycleScope |
| Exceptions | launch propagates immediately; async stores until await; use SupervisorJob for isolation |
| Cancellation | Cooperative; check isActive or use suspending functions; cleanup in finally |
| Thread Safety | Mutex (suspendable lock); StateFlow.update; Atomic vars |
| WorkManager | Guaranteed execution; survives process death; respects constraints |
| Testing | Use runTest to skip delays; StandardTestDispatcher to control time |
| Flows | Cold (flow) vs Hot (StateFlow/SharedFlow). StateFlow replaces LiveData. |