Error Handling Chapter 03

How the 1Engage codebase handles, wraps, classifies, and propagates errors across layers.

3.1 The error Interface in Go

What

Go does not have exceptions, try/catch, or stack-unwinding. Instead, errors are ordinary values that implement a single-method interface:

type error interface {
    Error() string
}

Why

This design means every function that can fail returns an error as its last return value. The caller is forced to check the error immediately—errors can never be silently ignored without an explicit _. This makes error paths visible and explicit in every function.

How

The standard pattern you see everywhere in the codebase:

result, err := doSomething()
if err != nil {
    return fmt.Errorf("context about what failed: %w", err)
}
// use result

Any type that has an Error() string method satisfies the interface. The 1Engage codebase defines several custom error types that implement this interface to carry structured information (HTTP status codes, field-level validation details, retry classification).

3.2 Custom HTTP Error Type

What

A structured error type in pkg/shared/http/error.go that carries an HTTP status code, a machine-readable code, a human message, and optional field-level details.

// pkg/shared/http/error.go

type ErrorDetail struct {
    Field   string `json:"field,omitempty"`
    Message string `json:"message"`
}

type Error struct {
    Status  int           `json:"-"`           // HTTP status (not serialized)
    Code    string        `json:"code"`        // Machine-readable: "BAD_REQUEST"
    Message string        `json:"message"`     // Human-readable
    Details []ErrorDetail `json:"details,omitempty"`  // Per-field errors
}

func (e *Error) Error() string {
    return e.Message
}

The Status field uses json:"-" so it is never leaked to API consumers. The Error() method makes *Error satisfy Go's error interface, so it can be returned as a regular error value.

Factory Functions

Convenience constructors for common HTTP error responses:

func BadRequest(msg string) error {
    return &Error{Status: 400, Code: "BAD_REQUEST", Message: msg}
}

func Unauthorized(msg string) error {
    return &Error{Status: 401, Code: "UNAUTHORIZED", Message: msg}
}

func Forbidden(msg string) error {
    return &Error{Status: 403, Code: "FORBIDDEN", Message: msg}
}

func NotFound(msg string) error {
    return &Error{Status: 404, Code: "NOT_FOUND", Message: msg}
}

func Internal(err error) error {
    return &Error{Status: 500, Code: "INTERNAL_ERROR", Message: "internal server error"}
}

func Conflic(msg string) error {
    return &Error{Status: 409, Code: "CONFLICT", Message: msg}
}
Design note: Internal() accepts an error parameter (for logging) but always returns a generic message. This prevents leaking internal details to API consumers. Conflic (no 't') is the actual function name in the codebase.

Why This Design

Service-layer code returns errors like httpx.NotFound("user not found"). The handler layer calls HandleError(w, err) which uses errors.As to extract the status code and serialize the response. This achieves separation of concerns—services decide what went wrong, the HTTP layer decides how to respond.

// In a service method:
func (s *UserService) GetByID(id string) (*User, error) {
    user, err := s.repo.FindByID(id)
    if err != nil {
        return nil, httpx.NotFound("user not found")  // returns error with Status=404
    }
    return user, nil
}

// In the handler — the handler never decides status codes:
func (h *UserHandler) Get(w http.ResponseWriter, r *http.Request) {
    user, err := h.service.GetByID(id)
    if err != nil {
        httpx.HandleError(w, err)  // automatically sends 404
        return
    }
    httpx.SendSuccess(w, r, user)
}

3.3 Error Wrapping with %w

What

Go's fmt.Errorf with the %w verb wraps an existing error inside a new error, adding contextual information while preserving the original error for later inspection.

Real Examples

// pkg/eventbus/kafka/producer.go
return fmt.Errorf("failed to marshal event: %w", err)
return fmt.Errorf("failed to publish event: %w", err)
return fmt.Errorf("failed to marshal event %s: %w", event.ID, err)
return fmt.Errorf("failed to publish batch: %w", err)

// pkg/auth/jwt.go
return TenantContext{}, fmt.Errorf("token validation failed: %w", err)

// pkg/eventbus/kafka/client.go
errs = append(errs, fmt.Errorf("producer close error: %w", err))
errs = append(errs, fmt.Errorf("consumer close error: %w", err))
return fmt.Errorf("failed to connect to Kafka: %w", err)
return fmt.Errorf("failed to get controller: %w", err)

// pkg/shared/helpers/uploadfile.go
return nil, fmt.Errorf("multiple failed upload index[%d] %s: %w", i, d.Name, err)
return fmt.Errorf("failed to delete file %s: %w", filepath, err)

Why

  • Preserves the error chain: The original error is accessible via errors.Unwrap(), enabling errors.Is() and errors.As() to inspect wrapped errors at any depth.
  • The prefix tells WHERE: The string before %w indicates the operation that failed ("failed to marshal event"), creating a stack-trace-like chain without actual stack traces.
  • Contrast with %v: Using %v instead of %w would create a new error with the original's text but break the chain—you could no longer use errors.Is/errors.As on it.
Pattern: A wrapped error message reads like a stack from outer to inner:
"failed to publish batch: failed to marshal event: json: unsupported type"
Each layer adds its context prefix.

3.4 errors.As — Type-Based Error Handling

What

errors.As walks the error chain (unwrapping at each level) and checks if any error in the chain matches the target type. If found, it assigns the matched error to the target variable.

HTTP Error Handling

From pkg/shared/http/error.go — the central error response function:

// pkg/shared/http/error.go

func HandleError(w http.ResponseWriter, err error) {
    var httpErr *Error
    if errors.As(err, &httpErr) {
        // Found an *httpx.Error in the chain — use its status code
        w.WriteHeader(httpErr.Status)
        json.NewEncoder(w).Encode(httpErr)
        return
    }

    // Unknown error type — default to 500
    w.WriteHeader(http.StatusInternalServerError)
    json.NewEncoder(w).Encode(&Error{
        Code:    "INTERNAL_ERROR",
        Message: "internal server error",
    })
}

PostgreSQL Error Handling

From auth-service/internal/repository/user_repo.go — extracting database-specific error codes:

// auth-service/internal/repository/user_repo.go

func (r *UserRepository) Create(user *model.User) (*model.User, error) {
    err := r.db.Transaction(func(tx *gorm.DB) error {
        if err := tx.Create(&user).Error; err != nil {
            var pgErr *pgconn.PgError
            if errors.As(err, &pgErr) {
                if pgErr.Code == "23503" {
                    return httpx.Conflic("email or phone already exists")
                }
            }
            return httpx.BadRequest(err.Error())
        }
        return nil
    })
    // ...
}

Why errors.As Over Type Assertion

Approach Code Works Through Wrapping?
Type assertion err.(*pgconn.PgError) No — fails if error was wrapped with %w
errors.As errors.As(err, &pgErr) Yes — unwraps through any number of layers

Because errors are frequently wrapped with fmt.Errorf("...: %w", err), the outer error is a different type. errors.As peels through each wrapping layer to find the target type.

3.5 errors.Is — Sentinel Error Checking

What

errors.Is checks if any error in the chain matches a specific sentinel value. Like errors.As, it walks through wrapped errors.

GORM Record Not Found

This is the most common pattern in the codebase, appearing in nearly every repository:

// auth-service/internal/repository/user_repo.go
if errors.Is(err, gorm.ErrRecordNotFound) {
    return nil, httpx.NotFound("user not found")
}

// auth-service/internal/repository/role_repo.go
if errors.Is(err, gorm.ErrRecordNotFound) {
    return nil, httpx.NotFound("role not found")
}

// auth-service/internal/repository/auth_repo.go
if errors.Is(err, gorm.ErrRecordNotFound) {
    return nil, httpx.NotFound("tenant not found")
}

// broadcast-service/internal/repository/broadcast_repo.go
if errors.Is(err, gorm.ErrRecordNotFound) {
    return nil, httpx.NotFound("broadcast not found")
}

// admin-service/internal/service/tenant_service.go
if errors.Is(err, gorm.ErrRecordNotFound) {
    return nil, httpx.NotFound("tenant not found")
}
Pattern: Every repository translates gorm.ErrRecordNotFound into an httpx.NotFound error with a domain-specific message. This ensures a consistent 404 response regardless of which entity was missing.

Context Cancellation

From the Kafka consumer — distinguishing graceful shutdown from real errors:

// pkg/eventbus/kafka/consumer.go

// In the consume loop — a cancelled context is normal shutdown, not an error
if errors.Is(err, context.Canceled) {
    return err
}

// When starting goroutines — don't propagate cancellation as a failure
go func(s *subscription) {
    defer c.wg.Done()
    err := c.consumeLoop(ctx, s)
    if err != nil && !errors.Is(err, context.Canceled) {
        errCh <- err  // only real errors go to the error channel
    }
}(sub)

Why errors.Is Over ==

Approach Code Works Through Wrapping?
Direct comparison err == gorm.ErrRecordNotFound No — fails if error was wrapped
errors.Is errors.Is(err, gorm.ErrRecordNotFound) Yes — unwraps through all layers

3.6 Sentinel Errors with errors.New

What

Simple, standalone error values created with errors.New. These represent programming mistakes or API misuse, not runtime failures from external systems.

// pkg/eventbus/kafka/consumer.go

func (c *Consumer) Subscribe(topic, groupID string, handler eventbus.Handler) error {
    c.mu.Lock()
    defer c.mu.Unlock()

    if c.running {
        return errors.New("cannot subscribe while consumer is running")
    }
    if groupID == "" {
        return errors.New("group ID is required for subscription")
    }
    // ...
}

func (c *Consumer) Run(ctx context.Context) error {
    c.mu.Lock()
    if c.running {
        c.mu.Unlock()
        return errors.New("consumer is already running")
    }
    if len(c.subscriptions) == 0 {
        c.mu.Unlock()
        return errors.New("no subscriptions registered")
    }
    // ...
}

Why These Aren't Wrapped

  • No originating error: These errors don't come from a failed operation—they're state validation checks. There's nothing to wrap.
  • Programming errors: Calling Subscribe while the consumer is running is a bug in the calling code, not a runtime failure. The message is enough.
  • Guard clauses: They protect invariants—no consumer should run without subscriptions, no subscription should happen while running.
When to use errors.New vs fmt.Errorf: Use errors.New for errors that originate here (no cause). Use fmt.Errorf("...: %w", err) when wrapping an error from a called function.

3.7 PermanentError — Retry Classification

What

A custom error type that signals "do not retry this operation." The codebase has two implementations of this concept for different contexts.

Implementation 1: Event Bus (pkg/eventbus/retry.go)

Used in the generic retry handler for event processing:

// pkg/eventbus/retry.go

type PermanentError struct {
    Err error
}

func (e *PermanentError) Error() string {
    return e.Err.Error()
}

func (e *PermanentError) Unwrap() error {
    return e.Err       // enables errors.Is/As through the chain
}

// Pointer receiver → checked with *PermanentError
func IsPermanentError(err error) bool {
    _, ok := err.(*PermanentError)
    return ok
}

Implementation 2: Helpers (pkg/shared/helpers/permanent_error.go)

Used for classifying external API errors (Meta/WhatsApp):

// pkg/shared/helpers/permanent_error.go

type PermanentError struct {
    Err error
}

func (e PermanentError) Error() string {  // VALUE receiver — not pointer
    return e.Err.Error()
}

func NewPermanentError(err error) error {
    return PermanentError{Err: err}       // returns value, not pointer
}

// Value type assertion → checked with PermanentError (no pointer)
func IsPermanent(err error) bool {
    _, ok := err.(PermanentError)
    return ok
}
Key difference: The eventbus version uses a pointer receiver (*PermanentError) and implements Unwrap(). The helpers version uses a value receiver (PermanentError) and does not implement Unwrap(). This means they are checked differently: err.(*PermanentError) vs err.(PermanentError).

How It's Used in Kafka Consumer

The consumer loop uses the helpers version to decide whether to retry or skip a message:

// pkg/eventbus/kafka/consumer.go

if err := c.processMessage(ctx, sub, msg); err != nil {

    // PERMANENT ERROR → stop retrying
    if helpers.IsPermanent(err) {
        slog.Error("Permanent error, skipping message",
            "topic", sub.topic,
            "partition", msg.Partition,
            "offset", msg.Offset,
            "error", err,
        )

        // Commit the offset so Kafka won't deliver this message again
        sub.reader.CommitMessages(ctx, msg)
        continue
    }

    // TRANSIENT ERROR → do NOT commit, Kafka will redeliver
    slog.Warn("Transient error, will retry",
        "topic", sub.topic,
        "error", err,
    )
}

Decision Table

Error Type Commit Offset? Retry? Example
PermanentError Yes No Invalid template ID, bad request to Meta API
Transient error No Yes (auto) Network timeout, rate limit, 503 from server
nil (success) Yes N/A Message processed successfully

3.8 Meta API Error Classification

What

The HandleMetaAPIError function classifies HTTP responses from the Meta/WhatsApp Business API into permanent or retryable errors. This drives the retry behavior of the Kafka consumer.

// pkg/shared/helpers/permanent_error.go

type MetaErrorResponse struct {
    Error struct {
        Message      string `json:"message"`
        Type         string `json:"type"`
        Code         int    `json:"code"`
        ErrorSubcode int    `json:"error_subcode"`
        FbTraceID    string `json:"fbtrace_id"`
    } `json:"error"`
}

func HandleMetaAPIError(statusCode int, respBody []byte) error {

    // Success — no error
    if statusCode >= 200 && statusCode < 300 {
        return nil
    }

    // Parse Meta's error response body
    var metaErr MetaErrorResponse
    json.Unmarshal(respBody, &metaErr)

    baseErr := fmt.Errorf(
        "meta api error status=%d code=%d subcode=%d message=%s",
        statusCode, metaErr.Error.Code,
        metaErr.Error.ErrorSubcode, metaErr.Error.Message,
    )

    // ── HTTP-level classification ──
    switch statusCode {
    case 400, 401, 403, 404:
        return NewPermanentError(baseErr)   // client error → don't retry
    case 429:
        return baseErr                      // rate limited → retry later
    case 500, 502, 503, 504:
        return baseErr                      // server error → retry
    }

    // ── Meta error code classification ──
    switch metaErr.Error.Code {
    case 100:  return NewPermanentError(baseErr)  // invalid parameter
    case 190:  return NewPermanentError(baseErr)  // invalid OAuth token
    case 10, 200, 368:
        return NewPermanentError(baseErr)         // permission / blocked
    }

    // ── Subcode classification ──
    switch metaErr.Error.ErrorSubcode {
    case 33:       return NewPermanentError(baseErr)  // unsupported request
    case 2388007:  return NewPermanentError(baseErr)  // template not found
    }

    // ── Fallback ──
    if statusCode >= 400 && statusCode < 500 {
        return NewPermanentError(baseErr)  // unknown 4xx → permanent
    }
    return baseErr                         // everything else → retryable
}

Why This Three-Layer Classification

The Meta API uses a combination of HTTP status codes, error codes, and error subcodes. A single HTTP status isn't enough to determine retry strategy:

Layer Permanent (Don't Retry) Retryable
HTTP Status 400, 401, 403, 404 429 (rate limit), 500-504
Meta Error Code 100, 190, 10, 200, 368 Other codes
Meta Subcode 33, 2388007 Other subcodes
Fallback Any unknown 4xx Everything else
The retry logic: When HandleMetaAPIError returns a PermanentError, the Kafka consumer commits the message offset (skipping it forever). When it returns a regular error, the consumer does NOT commit, so Kafka will redeliver the message for retry.

3.9 Validation Error Handling

What

Two functions handle validation errors from go-playground/validator, converting field-level validation failures into structured JSON responses with per-field detail messages.

ValidationError — Simple Response

// pkg/shared/http/error.go

func ValidationError(w http.ResponseWriter, err error) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusBadRequest)

    httpErr := &Error{
        Status:  http.StatusBadRequest,
        Code:    "VALIDATION_ERROR",
        Message: "validation error",
        Details: []ErrorDetail{},
    }

    if ve, ok := err.(validator.ValidationErrors); ok {
        for _, e := range ve {
            httpErr.Details = append(httpErr.Details, ErrorDetail{
                Field:   toSnakeCase(e.Field()),  // PascalCase → snake_case
                Message: getValidationErrorMessage(e.Tag(), e.Param()),
            })
        }
    } else {
        httpErr.Details = append(httpErr.Details, ErrorDetail{
            Message: err.Error(),
        })
    }

    json.NewEncoder(w).Encode(httpErr)
}

ValidationErrorWithMeta — With Request Meta

Adds request metadata (trace ID, timing) to the validation error response:

func ValidationErrorWithMeta(w http.ResponseWriter, r *http.Request, err error) {
    // Same validation logic, but wraps in:
    resp := ValidationErrorResponse{
        Success: false,
        Error:   httpErr,
        Meta:    NewMeta(r.Context()),  // adds request_id, timestamp, etc.
    }
    json.NewEncoder(w).Encode(resp)
}

Validation Tag Messages

The getValidationErrorMessage function translates validator tags into human-readable messages:

func getValidationErrorMessage(tag, param string) string {
    switch tag {
    case "required":  return "required"
    case "email":     return "invalid email format"
    case "min":       return "must be at least " + param + " characters"
    case "max":       return "must be at most " + param + " characters"
    case "url":       return "invalid URL format"
    case "uuid":      return "invalid UUID format"
    case "oneof":     return "must be one of: " + param
    case "numeric":   return "must be numeric"
    default:          return tag
    }
}

Example API Response

{
  "code": "VALIDATION_ERROR",
  "message": "validation error",
  "details": [
    { "field": "email", "message": "invalid email format" },
    { "field": "password", "message": "must be at least 8 characters" },
    { "field": "role_id", "message": "invalid UUID format" }
  ]
}
Note: Field names are converted from Go's PascalCase (RoleID) to snake_case (role_id) via toSnakeCase(), matching the JSON convention used by the API.

3.10 Error Flow Diagram

HTTP Request Error Propagation

Errors flow upward through layers, being wrapped, classified, or transformed at each boundary:

┌─────────────────────────────────────────────────────────────────────────┐
│                          HTTP HANDLER                                   │
│                                                                         │
│  func (h *Handler) Get(w http.ResponseWriter, r *http.Request) {       │
│      user, err := h.service.GetByID(id)                                │
│      if err != nil {                                                    │
│          httpx.HandleError(w, err)  ─────────────────┐                 │
│          return                                       │                 │
│      }                                                │                 │
│  }                                                    │                 │
│                                                       ▼                 │
│                                            ┌──────────────────┐        │
│                                            │  errors.As()     │        │
│                                            │  *httpx.Error?   │        │
│                                            └──┬───────────┬───┘        │
│                                          yes  │           │  no        │
│                                               ▼           ▼            │
│                                          Status from   500 Internal    │
│                                          Error.Status  Server Error    │
└─────────────────────────────────────────────────────────────────────────┘
                              ▲
                              │  returns error (e.g., httpx.NotFound)
                              │
┌─────────────────────────────────────────────────────────────────────────┐
│                          SERVICE LAYER                                  │
│                                                                         │
│  func (s *Service) GetByID(id string) (*User, error) {                 │
│      user, err := s.repo.FindByID(id)                                  │
│      if err != nil {                                                    │
│          return nil, err  // pass through httpx errors from repo       │
│      }                                                                  │
│      return user, nil                                                   │
│  }                                                                      │
└─────────────────────────────────────────────────────────────────────────┘
                              ▲
                              │  returns httpx.NotFound / httpx.Conflic
                              │
┌─────────────────────────────────────────────────────────────────────────┐
│                          REPOSITORY LAYER                               │
│                                                                         │
│  func (r *Repo) FindByID(id string) (*User, error) {                  │
│      var user User                                                      │
│      err := r.db.First(&user, "id = ?", id).Error                     │
│      if err != nil {                                                    │
│          if errors.Is(err, gorm.ErrRecordNotFound) {                   │
│              return nil, httpx.NotFound("user not found")  ◄── CLASSIFY│
│          }                                                              │
│          return nil, httpx.Internal(err)       ◄── CLASSIFY            │
│      }                                                                  │
│      return &user, nil                                                 │
│  }                                                                      │
│                                                                         │
│  func (r *Repo) Create(user *User) error {                             │
│      err := r.db.Create(user).Error                                    │
│      if err != nil {                                                    │
│          var pgErr *pgconn.PgError                                     │
│          if errors.As(err, &pgErr) {                                   │
│              if pgErr.Code == "23503" {                                 │
│                  return httpx.Conflic("already exists")    ◄── CLASSIFY│
│              }                                                          │
│          }                                                              │
│          return httpx.BadRequest(err.Error())                          │
│      }                                                                  │
│      return nil                                                         │
│  }                                                                      │
└─────────────────────────────────────────────────────────────────────────┘
                              ▲
                              │  raw errors (gorm.ErrRecordNotFound, *pgconn.PgError)
                              │
┌─────────────────────────────────────────────────────────────────────────┐
│                     DATABASE / EXTERNAL SYSTEM                          │
│                                                                         │
│  gorm.ErrRecordNotFound    *pgconn.PgError{Code: "23503"}             │
│  connection errors          constraint violations                       │
└─────────────────────────────────────────────────────────────────────────┘

Kafka Consumer Error Propagation

A separate flow for asynchronous event processing with retry classification:

┌──────────────────────────────────────────────────────────────────┐
│                      KAFKA CONSUMER LOOP                         │
│                                                                  │
│  msg := reader.FetchMessage(ctx)                                │
│  err := processMessage(ctx, sub, msg)                           │
│                                                                  │
│  ┌─── err == nil? ────────────────────── YES ──► Commit offset  │
│  │                                                               │
│  NO                                                              │
│  │                                                               │
│  ├─── helpers.IsPermanent(err)? ── YES ──► Log + Commit offset  │
│  │                                          (skip forever)       │
│  │                                                               │
│  └─── transient error ────────────────── ► Do NOT commit        │
│                                            (Kafka will redeliver)│
└──────────────────────────────────────────────────────────────────┘
                          ▲
                          │  PermanentError or regular error
                          │
┌──────────────────────────────────────────────────────────────────┐
│                    EVENT HANDLER / SERVICE                        │
│                                                                  │
│  err := callMetaAPI(ctx, payload)                               │
│                                                                  │
│  ┌─── err == nil? ───────── YES ──► return nil (success)        │
│  │                                                               │
│  └─── err != nil ────────────────► return err (may be           │
│                                     PermanentError or regular)   │
└──────────────────────────────────────────────────────────────────┘
                          ▲
                          │  classified by HandleMetaAPIError
                          │
┌──────────────────────────────────────────────────────────────────┐
│                   HandleMetaAPIError()                            │
│                                                                  │
│  HTTP 400/401/403/404   ──► NewPermanentError()  (don't retry)  │
│  HTTP 429               ──► regular error         (retry later)  │
│  HTTP 500/502/503/504   ──► regular error         (retry)        │
│  Meta code 100/190      ──► NewPermanentError()  (don't retry)  │
│  Unknown 4xx            ──► NewPermanentError()  (don't retry)  │
│  Everything else        ──► regular error         (retry)        │
└──────────────────────────────────────────────────────────────────┘

Summary of Error Patterns

Pattern Where Used Purpose
httpx.NotFound() Repositories, services Classify error with HTTP status
fmt.Errorf("...: %w") All layers Wrap with context, preserve chain
errors.As() Handlers, repositories Extract typed error from chain
errors.Is() Repositories, consumer Check for specific sentinel error
errors.New() Consumer guards API misuse / programming errors
PermanentError Event processing Classify as non-retryable
HandleMetaAPIError() Meta API client Classify external API errors
ValidationError() HTTP handlers Field-level validation response
HandleError() HTTP handlers Central error → HTTP response