Goroutines: How Go Manages Concurrency
Process: The execution of a program. What happens when a program runs.
Thread: A control flow within user process space.
Every process has one main thread. Multi-threads need the main thread to create them.
The runtime automatically creates and destroys system-level threads for us. Goroutines are user-level threads.
G -> P -> M
Each goroutine has its own stack - a continuous memory space. When we schedule a goroutine to run, we just set the PC, SP, BP registers (code entry address, stack top pointer, stack base pointer) to that goroutine’s addresses. This happens entirely in user space. No switch to kernel space needed. So it runs fast.
User-Level Thread Model
The operating system only knows about user processes. It does not see the threads inside. All kernel scheduling is based on user processes.
Many coroutine libraries wrap their blocking operations into non-blocking forms. At points where they would block, they yield themselves and wake other user threads to run on that KSE. This avoids kernel context switches when KSE blocks. The whole process does not get blocked.
Kernel vs User Threads
User threads are lighter. You can maintain your own thread table in the process. Kernel thread switching (sleep and wake) needs traps. The thread table lives in the kernel. If a kernel thread blocks, the whole process blocks.
Two-Level Thread Model
You can dynamically bind to the same KSE. When a KSE gets scheduled out of CPU because its bound thread blocks, other user threads in the associated process can bind to different KSEs.
Why call it two-level? The user scheduler maps user threads to KSEs. The kernel scheduler maps KSEs to CPUs.
G-P-M Model Overview
In Go, each goroutine is an independent execution unit. Unlike OS threads with fixed 2MB memory, goroutines use dynamic stack growth. They start at 8KB and grow as needed up to 1GB (1GB on 64-bit machines, 256MB on 32-bit). The Go Scheduler handles all of this.
M represents a kernel thread. When created, it joins the global M list runtime.allm. The struct records its pre-associated P, whether it seeks G in spin state, start function, and M’s free list. New or restored M always start in spin state.
P also has global and free lists. But P has its own states:
- Pgcstop: Creation time or brief GC stop
- Pidle: Not bound to M
- Prunning: Bound to some M
- Psyscall: Running G doing system call
- Pdead: Current P will not be used again, released based on MAXPROCS value
Scheduling
The scheduler maintains two task queues for G: a Global queue and Local queues maintained by each P.
When you create a new goroutine with the keyword, it goes to P’s local queue first. For a goroutine to run, M needs to hold (bind) a P. Then M starts an OS thread, loops through P’s local queue, takes a goroutine and executes it. When P’s queue is empty, M tries the global queue first. After running, M unbinds from P and goes to sleep.
Switch rules:
- User-space block/wake
- System calls: M becomes preemptible. After G completes system call, it marks itself runnable and returns to P queue
Work sharing: When a processor creates new threads, it tries to migrate some to other processors to better utilize idle ones.
Work stealing: Underutilized processors actively look for threads on other processors and “steal” them.
Sometimes M and G get locked together. This feature exists for CGO. C function libraries use thread-local storage techniques, caching data in kernel thread private caches. To prevent data loss, the scheduler checks if G is locked when matching M to G. If current M, continue. Otherwise wake the locked M to continue executing G, then assign other G to current M.
Before running G, M also checks for runtime serial tasks waiting to execute. If any exist (like GC), it stops and blocks current M until serial tasks complete. Then it wakes M to continue.
Main Goroutine
Runtime.m0 handles the main goroutine. Before running, it creates a special defer cleanup function for exit cleanup. Then it starts the GC mark-and-sweep goroutine. Only then does it execute init functions in main package, run main function, check for panics after execution for necessary handling, and finally end the main goroutine and current process execution.
Runtime Functions
runtime.GOMAXPROCS()
Goexit()
Gosched()
NumGoroutine() can check simple memory leaks caused by G.
LockOSThread and UnlockOSThread lock current G and M. Multiple calls to former only apply on last call.
SetMaxStack() constrains stack space size that single G can request. Before init calls, main G sets default values: 250M for 32-bit and 1G for 64-bit.
Coroutine Pool Principle
The Go scheduler has reuse mechanisms. Each time you use the go keyword, it checks if there are available G structs in current M’s P. If yes, take one directly. Otherwise allocate a new G struct. If new G allocated, attach it to runtime queues.
Simply put: active waiting gives the scheduler breathing room to reuse goroutines.
Implementation: Use channel of handler functions as values, start specific number of goroutines, then for-range receive functions and run them. Pool the channel to improve performance and achieve coroutine reuse.