Skip to main content

What is TierZero.md?

TierZero.md is a special Markdown file that provides TierZero with organization-specific instructions that TierZero should always follow when processing requests. All content in TierZero.md is added as context to every request.

Why TierZero.md?

TierZero learns from your integrations (logs, metrics, code, incidents, documents), but some critical context is sometimes not captured in those data sources. For example,
  • Default values for tags such as environment, region, cell, or clusters
  • Terminology that is specific to your organization such as service naming format
  • Telemetry sources and their location, e.g. where logs, metrics, errors, and traces are stored
  • Response formatting that TierZero should follow that should be applied to all general requests
  • High-level architecture of your systems such as the technical stack and how applications or services are hosted
  • Known issues and baseline patterns such as errors that TierZero should ignore
  • Key files or directories in the codebase such as service or team ownership catalogs or location of infrastructure-as-code configurations
  • Common patterns or tips for debugging and investigation
  • Public documentation domains that TierZero should search and fetch content from
By providing these contexts in TierZero.md, you help TierZero give more accurate, context-aware assistance during incidents and investigations.

Sample TierZero.md

### About <YOUR_COMPANY>
Acme Corp is a SaaS platform that provides project management and collaboration tools for distributed teams.

### Default Values
- If the environment is not provided, use `env:prod*` tag.
- For AWS resources, use `1234567890` as the account ID for production resources.
- If zone names are not specified, use 'acme.com', 'acmeapis.com', and 'acmecdn.com'.

### Terminology
- When a user refers to `web server`, they usually mean `api-server` service.
- Service names are generally structured `prod-<service_name>-server`.

### Telemetry Sources
- Frontend errors and client-side exceptions are in Sentry.
- Application logs, metrics, and distributed traces are in Datadog.
- Database monitoring logs are in CloudWatch for RDS instances.
- API gateway logs are in CloudWatch under `/aws/apigateway/` log group.

### Response Format
- When possible, include the Kubernetes namespaces and any `team` tags associated with the user's request.
- When possible, provide direct links to commits in the following format: `https://github.com/acme/commit/<commit_hash>`.

### Architecture
- Each service is named `<NAME>-service`, e.g. `identity-service`, and hosted as deployments on an AWS EKS cluster.
- The core databases are hosted on AWS Aurora RDS and Redis caches are hosted on AWS ElastiCache.
- External system dependencies include: Stripe (payments), SendGrid (email), Auth0 (SSO).

### Known Issues or Baseline Patterns
- For databases, CPU utilization is typically 10-30% during business hours, spikes to 80% are acceptable during peak times.
- 404 errors for `identity` service are a known issue. Exclude these errors for any analysis or requests. Example error messages for 404 errors include:
  - `[IdentityService] request status 404`
  - `User not found exception`
- Log retention period is 30 days, so absence of logs beyond 30 days is expected, not an issue.

### Key Files or Directories in Codebase
- The directory `deployment/terraform` in the repository `acme/main` contains all Terraform code for all infrastructure deployments.
- Use the `api/team_catalog.yaml` file in repository `acme/main` to determine which team owns any given API endpoint.

### Investigation Tips and Gotchas
- When using logs to analyze API requests, look at all three of the following sources: `source:api` (application), `source:elb` (ELB), and `source:cloudflare` (Cloudflare).
- The `request_id` tag in logs can trace the same user request across logs in services.

### Public Documentation as Knowledge Sources
- To answer questions about how to use the Acme product, search `acme.com`.

FAQ

What should I NOT include?

The following should be imported or configured in Integration Resources:
  • A comprehensive list of key telemetry tags or attributes
  • Detailed dashboards and notebooks
  • Documents such as team knowledge wikis and detailed runbooks

How detailed should it be?

Use specific names, identifiers, and examples when applicable to help TierZero better extract the key information from mass telemetry data.