golang-performance

Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

242 stars

Best use case

golang-performance is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "golang-performance" skill to help with this workflow task. Context: Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

  • Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

  • Do not use this when you only need a one-off answer and do not need a reusable workflow.
  • Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/golang-performance/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/89jobrien/golang-performance/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/golang-performance/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How golang-performance Compares

Feature / Agentgolang-performanceStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Golang Performance

This skill provides guidance on optimizing Go application performance including profiling, memory management, concurrency optimization, and avoiding common performance pitfalls.

## When to Use This Skill

- When profiling Go applications for CPU or memory issues
- When optimizing memory allocations and reducing GC pressure
- When implementing efficient concurrency patterns
- When analyzing escape analysis results
- When optimizing hot paths in production code

## Profiling with pprof

### Enable Profiling in HTTP Server

```go
import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // pprof endpoints available at /debug/pprof/
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Main application
}
```

### CPU Profiling

```bash
# Collect 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Interactive commands
(pprof) top10          # Top 10 functions by CPU
(pprof) list FuncName  # Show source with timing
(pprof) web            # Open flame graph in browser
```

### Memory Profiling

```bash
# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Allocs profile (all allocations)
go tool pprof http://localhost:6060/debug/pprof/allocs

# Interactive commands
(pprof) top10 -cum     # Top by cumulative allocations
(pprof) list FuncName  # Show allocation sites
```

### Programmatic Profiling

```go
import (
    "os"
    "runtime/pprof"
)

func profileCPU() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()

    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()

    // Code to profile
}

func profileMemory() {
    f, _ := os.Create("mem.prof")
    defer f.Close()

    runtime.GC() // Get accurate stats
    pprof.WriteHeapProfile(f)
}
```

## Memory Optimization

### Reduce Allocations

```go
// BAD: Allocates on every call
func Process(items []string) []string {
    result := []string{}
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}

// GOOD: Pre-allocate with known capacity
func Process(items []string) []string {
    result := make([]string, 0, len(items))
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}
```

### Use sync.Pool for Frequent Allocations

```go
var bufferPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func ProcessRequest(data []byte) []byte {
    buf := bufferPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufferPool.Put(buf)
    }()

    // Use buffer
    buf.Write(data)
    return buf.Bytes()
}
```

### Avoid String Concatenation in Loops

```go
// BAD: O(n^2) allocations
func BuildString(parts []string) string {
    result := ""
    for _, part := range parts {
        result += part
    }
    return result
}

// GOOD: Single allocation
func BuildString(parts []string) string {
    var builder strings.Builder
    for _, part := range parts {
        builder.WriteString(part)
    }
    return builder.String()
}
```

### Slice Memory Leaks

```go
// BAD: Keeps entire backing array alive
func GetFirst(data []byte) []byte {
    return data[:10]
}

// GOOD: Copy to release backing array
func GetFirst(data []byte) []byte {
    result := make([]byte, 10)
    copy(result, data[:10])
    return result
}
```

## Escape Analysis

```bash
# Show escape analysis decisions
go build -gcflags="-m" ./...

# More verbose
go build -gcflags="-m -m" ./...
```

### Avoiding Heap Escapes

```go
// ESCAPES: Returned pointer
func NewUser() *User {
    return &User{}  // Allocated on heap
}

// STAYS ON STACK: Value return
func NewUser() User {
    return User{}  // May stay on stack
}

// ESCAPES: Interface conversion
func Process(v interface{}) { ... }

func main() {
    x := 42
    Process(x)  // x escapes to heap
}
```

## Concurrency Optimization

### Worker Pool Pattern

```go
func ProcessItems(items []Item, workers int) []Result {
    jobs := make(chan Item, len(items))
    results := make(chan Result, len(items))

    // Start workers
    var wg sync.WaitGroup
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for item := range jobs {
                results <- process(item)
            }
        }()
    }

    // Send jobs
    for _, item := range items {
        jobs <- item
    }
    close(jobs)

    // Wait and collect
    go func() {
        wg.Wait()
        close(results)
    }()

    var output []Result
    for r := range results {
        output = append(output, r)
    }
    return output
}
```

### Buffered Channels for Throughput

```go
// SLOW: Unbuffered causes blocking
ch := make(chan int)

// FAST: Buffer reduces contention
ch := make(chan int, 100)
```

### Avoid Lock Contention

```go
// BAD: Global lock
var mu sync.Mutex
var cache = make(map[string]string)

func Get(key string) string {
    mu.Lock()
    defer mu.Unlock()
    return cache[key]
}

// GOOD: Sharded locks
type ShardedCache struct {
    shards [256]struct {
        mu    sync.RWMutex
        items map[string]string
    }
}

func (c *ShardedCache) getShard(key string) *struct {
    mu    sync.RWMutex
    items map[string]string
} {
    h := fnv.New32a()
    h.Write([]byte(key))
    return &c.shards[h.Sum32()%256]
}

func (c *ShardedCache) Get(key string) string {
    shard := c.getShard(key)
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    return shard.items[key]
}
```

### Use sync.Map for Specific Cases

```go
// Good for: keys written once, read many; disjoint key sets
var cache sync.Map

func Get(key string) (string, bool) {
    v, ok := cache.Load(key)
    if !ok {
        return "", false
    }
    return v.(string), true
}

func Set(key, value string) {
    cache.Store(key, value)
}
```

## Data Structure Optimization

### Struct Field Ordering (Memory Alignment)

```go
// BAD: 24 bytes (padding)
type Bad struct {
    a bool   // 1 byte + 7 padding
    b int64  // 8 bytes
    c bool   // 1 byte + 7 padding
}

// GOOD: 16 bytes (no padding)
type Good struct {
    b int64  // 8 bytes
    a bool   // 1 byte
    c bool   // 1 byte + 6 padding
}
```

### Avoid Interface{} When Possible

```go
// SLOW: Type assertions, boxing
func Sum(values []interface{}) float64 {
    var sum float64
    for _, v := range values {
        sum += v.(float64)
    }
    return sum
}

// FAST: Concrete types
func Sum(values []float64) float64 {
    var sum float64
    for _, v := range values {
        sum += v
    }
    return sum
}
```

## Benchmarking Patterns

```go
func BenchmarkProcess(b *testing.B) {
    data := generateTestData()
    b.ResetTimer() // Exclude setup time

    for i := 0; i < b.N; i++ {
        Process(data)
    }
}

// Memory benchmarks
func BenchmarkAllocs(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = make([]byte, 1024)
    }
}

// Compare implementations
func BenchmarkComparison(b *testing.B) {
    b.Run("old", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            OldImplementation()
        }
    })
    b.Run("new", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            NewImplementation()
        }
    })
}
```

Run with:

```bash
go test -bench=. -benchmem ./...
go test -bench=. -benchtime=5s ./...  # Longer runs
```

## Common Pitfalls

### Defer in Hot Loops

```go
// BAD: Defer overhead per iteration
for _, item := range items {
    mu.Lock()
    defer mu.Unlock()  // Defers stack up!
    process(item)
}

// GOOD: Explicit unlock
for _, item := range items {
    mu.Lock()
    process(item)
    mu.Unlock()
}

// BETTER: Extract to function
for _, item := range items {
    processWithLock(item)
}

func processWithLock(item Item) {
    mu.Lock()
    defer mu.Unlock()
    process(item)
}
```

### JSON Encoding Performance

```go
// SLOW: Reflection on every call
json.Marshal(v)

// FAST: Reuse encoder
var buf bytes.Buffer
encoder := json.NewEncoder(&buf)
encoder.Encode(v)

// FASTER: Code generation (easyjson, ffjson)
```

## Best Practices

1. **Measure before optimizing** - Profile to find actual bottlenecks
2. **Pre-allocate slices** - Use `make([]T, 0, capacity)` when size is known
3. **Pool frequently allocated objects** - Use `sync.Pool` for buffers
4. **Minimize allocations in hot paths** - Reuse objects, avoid interfaces
5. **Right-size channels** - Buffer to reduce blocking without wasting memory
6. **Avoid premature optimization** - Clarity first, optimize measured problems
7. **Use value receivers for small structs** - Avoid pointer indirection
8. **Order struct fields by size** - Largest to smallest reduces padding

Related Skills

web-performance-seo

242
from aiskillstore/marketplace

Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.

web-performance-optimization

242
from aiskillstore/marketplace

Optimize website and web application performance including loading speed, Core Web Vitals, bundle size, caching strategies, and runtime performance

performance-testing-review-multi-agent-review

242
from aiskillstore/marketplace

Use when working with performance testing review multi agent review

performance-testing-review-ai-review

242
from aiskillstore/marketplace

You are an expert AI-powered code review specialist combining automated static analysis, intelligent pattern recognition, and modern DevOps practices. Leverage AI tools (GitHub Copilot, Qodo, GPT-5, C

performance-profiling

242
from aiskillstore/marketplace

Performance profiling principles. Measurement, analysis, and optimization techniques.

performance-engineer

242
from aiskillstore/marketplace

Expert performance engineer specializing in modern observability, application optimization, and scalable system performance. Masters OpenTelemetry, distributed tracing, load testing, multi-tier caching, Core Web Vitals, and performance monitoring. Handles end-to-end optimization, real user monitoring, and scalability patterns. Use PROACTIVELY for performance optimization, observability, or scalability challenges.

dbos-golang

242
from aiskillstore/marketplace

DBOS Go SDK for building reliable, fault-tolerant applications with durable workflows. Use this skill when writing Go code with DBOS, creating workflows and steps, using queues, using the DBOS Client from external applications, or building Go applications that need to be resilient to failures.

application-performance-performance-optimization

242
from aiskillstore/marketplace

Optimize end-to-end application performance with profiling, observability, and backend/frontend tuning. Use when coordinating performance optimization across the stack.

golang-pro

242
from aiskillstore/marketplace

Use when building Go applications requiring concurrent programming, microservices architecture, or high-performance systems. Invoke for goroutines, channels, Go generics, gRPC integration.

fixing-motion-performance

242
from aiskillstore/marketplace

Fix animation performance issues.

convex-performance-audit

242
from aiskillstore/marketplace

Audits and optimizes Convex application performance across hot-path reads, write contention, subscription cost, and function limits. Use this skill when a Convex feature is slow or expensive, npx convex insights shows high bytes or documents read, OCC conflict errors or mutation retries appear, subscriptions or UI updates are costly, functions hit execution or transaction limits, or the user mentions performance, latency, read amplification, or invalidation problems in a Convex app.

performance-vitals

242
from aiskillstore/marketplace

Enforce Core Web Vitals optimization. Use when building user-facing features, reviewing performance, or when Lighthouse scores drop. Covers LCP, FID/INP, CLS, and optimization techniques.