# BigQuery Insights

Build an AI agent that connects to Google BigQuery, analyzes large datasets, and generates data-driven insights using TeamDay

# BigQuery Insights Agent

Build an AI agent that connects to Google BigQuery, runs complex SQL queries, and generates actionable insights from your data warehouse.

## What You'll Build

A data analysis agent that:
- Connects to Google BigQuery via MCP
- Writes and executes SQL queries on your datasets
- Analyzes trends, detects anomalies, and segments customers
- Runs on a schedule to deliver daily or weekly reports

**Time to complete:** 45-60 minutes

## Prerequisites

- A TeamDay account with an organization
- Google Cloud project with BigQuery enabled
- BigQuery datasets with data to analyze
- Basic SQL knowledge

## Architecture

```mermaid
graph LR
    A[User / Mission] --> B[Data Analyst Agent]
    B --> C[BigQuery MCP Server]
    C --> D[BigQuery API]
    D --> E[Your Datasets]
    B --> F[Insights & Recommendations]
```

## Step 1: Google Cloud Setup

### 1.1 Enable BigQuery API

```bash
gcloud config set project your-project-id
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.com
```

Or enable via the Console: **APIs & Services > Enable APIs > BigQuery API**.

### 1.2 Create a Service Account

```bash
# Create service account
gcloud iam service-accounts create teamday-bigquery \
  --display-name="TeamDay BigQuery Agent"

SA_EMAIL="teamday-bigquery@your-project-id.iam.gserviceaccount.com"

# Grant BigQuery permissions
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.jobUser"

# Download key file
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
  --iam-account=$SA_EMAIL
```

**Required roles:**
- `bigquery.dataViewer` -- Read dataset contents
- `bigquery.jobUser` -- Run queries
- `bigquery.dataEditor` -- (Optional) Write data back

### 1.3 Create Sample Data (Optional)

If you want to follow along with the examples below:

```bash
bq mk --dataset --location=US --description="Analytics Data" \
  your-project-id:analytics

bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
  GENERATE_UUID() as event_id,
  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
  CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
  ['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
  RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"
```

## Step 2: TeamDay Setup

### 2.1 Store GCP Credentials as Space Secrets

Base64-encode your service account key and store it alongside your project ID:

```bash
# Base64 encode the service account file
SA_ENCODED=$(base64 < ~/teamday-bigquery-key.json)

# Store secrets on the space
curl -X POST https://cc.teamday.ai/api/v1/spaces/$SPACE_ID/secrets \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"secrets\": {
      \"GCP_SERVICE_ACCOUNT_JSON\": \"$SA_ENCODED\",
      \"GCP_PROJECT_ID\": \"your-project-id\"
    }
  }"
```

Secret keys must be `UPPER_SNAKE_CASE`.

### 2.2 Create the Data Analyst Agent

```bash
curl -X POST https://cc.teamday.ai/api/v1/agents \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BigQuery Data Analyst",
    "role": "Data Analyst",
    "model": "claude-sonnet-4-6",
    "visibility": "organization",
    "tags": ["bigquery", "analytics", "data"],
    "systemPrompt": "You are an expert data analyst with deep knowledge of SQL and BigQuery.\n\n## Capabilities\n- Write efficient BigQuery SQL queries with proper partitioning and clustering\n- Analyze query results for trends, patterns, and anomalies\n- Generate clear insights with business context\n- Make data-driven recommendations\n\n## BigQuery Best Practices\n- Select only needed columns (avoid SELECT *)\n- Use LIMIT for exploratory queries\n- Use approximate aggregation functions when precision is not critical\n- Leverage window functions for complex calculations\n- Use WITH clauses for query readability\n\n## Response Format\nFor each analysis:\n1. Explain your approach\n2. Show the SQL query\n3. Summarize the results\n4. List key insights\n5. Provide actionable recommendations"
  }'
```

**Response:**
```json
{
  "success": true,
  "id": "kf8x2mNp4qRt",
  "name": "BigQuery Data Analyst",
  "status": "active",
  "chatUrl": "/agents/kf8x2mNp4qRt/chat"
}
```

Save the agent ID (`kf8x2mNp4qRt`) -- you will need it in subsequent steps.

## Step 3: Configure BigQuery MCP

For the current UI and CLI activation flow, see [Activate BigQuery for an Agent](https://docs.teamday.ai/guides/bigquery-agent-activation). The API example below is useful when you want to automate setup from your own provisioning scripts.

### 3.1 Register the MCP Instance

```bash
curl -X POST https://cc.teamday.ai/api/v1/mcps \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "mcpType": "bigquery",
    "name": "BigQuery Analytics",
    "description": "Connection to production BigQuery datasets",
    "isActive": true,
    "credentials": {
      "GCP_PROJECT_ID": {
        "value": "your-project-id",
        "isSecret": false
      },
      "GCP_SERVICE_ACCOUNT_JSON": {
        "value": "'"$SA_ENCODED"'",
        "isSecret": true
      }
    }
  }'
```

**Response:**
```json
{
  "success": true,
  "id": "wR3jLm9vKxYz",
  "name": "BigQuery Analytics",
  "mcpType": "bigquery"
}
```

### 3.2 Attach Agent and MCP to the Space

```bash
curl -X PATCH https://cc.teamday.ai/api/v1/spaces/$SPACE_ID \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "addAgents": ["kf8x2mNp4qRt"],
    "addMcps": ["wR3jLm9vKxYz"]
  }'
```

### 3.3 Alternative: `.mcp.json` in the Space

If you prefer file-based configuration, place an `.mcp.json` in the space's root directory:

```json
{
  "mcpServers": {
    "bigquery": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-bigquery"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
        "GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
      }
    }
  }
}
```

### 3.4 Verify the Connection

Execute the agent with a simple test query:

```bash
curl -X POST https://cc.teamday.ai/api/v1/agents/kf8x2mNp4qRt/execute \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"message\": \"List all available datasets and tables in the project.\",
    \"spaceId\": \"$SPACE_ID\"
  }"
```

## Step 4: Usage Examples

### 4.1 Basic Query

```
You: How many total events do we have, broken down by type?

Agent: I'll query the user_events table.

SELECT
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
GROUP BY event_type
ORDER BY event_count DESC

Results:
| Event Type | Events | Unique Users | Avg Value |
|------------|--------|--------------|-----------|
| page_view  | 2,534  | 412          | 49.82     |
| click      | 2,489  | 398          | 51.34     |
| signup     | 2,501  | 497          | 48.91     |
| purchase   | 2,476  | 491          | 50.12     |

Total: 10,000 events across ~500 users over the last 30 days.
```

### 4.2 User Behavior Analysis

```sql
SELECT
  DATE(timestamp) as date,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESC
```

The agent will interpret results and surface insights like:
- Conversion funnel health (view > click > signup > purchase rates)
- Day-over-day trends and growth rate
- Recommendations for top-of-funnel optimization

### 4.3 Cohort Retention Analysis

```sql
WITH user_cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
),
user_activity AS (
  SELECT
    user_id,
    DATE_TRUNC(timestamp, WEEK) as activity_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id, activity_week
)
SELECT
  cohort_week,
  COUNT(DISTINCT uc.user_id) as cohort_size,
  DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
  COUNT(DISTINCT ua.user_id) as active_users,
  ROUND(
    COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1
  ) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 5 WEEK)
ORDER BY cohort_week, weeks_since_signup
```

Example output:

| Cohort  | Size | Week 0 | Week 1 | Week 2 | Week 3 | Week 4 |
|---------|------|--------|--------|--------|--------|--------|
| Jan 20  | 178  | 100%   | 56.2%  | 38.8%  | 29.2%  | 24.7%  |
| Jan 27  | 195  | 100%   | 58.5%  | 41.0%  | 31.3%  | --     |
| Feb 3   | 203  | 100%   | 59.1%  | 43.3%  | --     | --     |
| Feb 10  | 189  | 100%   | 61.4%  | --     | --     | --     |

The agent will identify trends (improving Week 1 retention), pinpoint the biggest drop-off window (Week 1 to Week 2), and recommend interventions like onboarding email sequences or re-engagement triggers.

### 4.4 Revenue & Customer Segmentation

```sql
WITH customer_metrics AS (
  SELECT
    user_id,
    COUNT(*) as total_purchases,
    SUM(value) as total_revenue,
    AVG(value) as avg_order_value,
    DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE event_type = 'purchase'
  GROUP BY user_id
)
SELECT
  CASE
    WHEN total_revenue >= 500 THEN 'VIP'
    WHEN total_revenue >= 200 THEN 'High Value'
    WHEN total_revenue >= 100 THEN 'Medium Value'
    ELSE 'Low Value'
  END as segment,
  COUNT(*) as customers,
  ROUND(SUM(total_revenue), 2) as segment_revenue,
  ROUND(AVG(total_revenue), 2) as avg_customer_value,
  ROUND(AVG(avg_order_value), 2) as avg_order_value,
  ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY segment
ORDER BY segment_revenue DESC
```

Example output:

| Segment      | Customers | Revenue  | Avg Value | AOV    | Days Since |
|--------------|-----------|----------|-----------|--------|------------|
| VIP          | 8         | $4,234   | $529.25   | $77.87 | 3.2        |
| High Value   | 23        | $5,892   | $256.17   | $80.05 | 5.1        |
| Medium Value | 67        | $8,123   | $121.24   | $80.83 | 8.7        |
| Low Value    | 156       | $7,234   | $46.37    | $42.15 | 12.3       |

The agent surfaces the Pareto distribution (top 12% of customers drive 40% of revenue), flags the at-risk low-value segment (12+ days inactive), and recommends VIP retention programs and win-back campaigns.

### 4.5 Anomaly Detection

```sql
WITH hourly_metrics AS (
  SELECT
    TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
    event_type,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  GROUP BY hour, event_type
),
historical_avg AS (
  SELECT
    EXTRACT(HOUR FROM timestamp) as hour_of_day,
    event_type,
    AVG(hourly_count) as avg_count,
    STDDEV(hourly_count) as stddev_count
  FROM (
    SELECT
      TIMESTAMP_TRUNC(timestamp, HOUR) as timestamp,
      event_type,
      COUNT(*) as hourly_count
    FROM `your-project-id.analytics.user_events`
    WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
      AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
    GROUP BY timestamp, event_type
  )
  GROUP BY hour_of_day, event_type
)
SELECT
  hm.hour,
  hm.event_type,
  hm.event_count as actual,
  ROUND(ha.avg_count, 1) as expected,
  ROUND(
    (hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2
  ) as z_score,
  CASE
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2
      THEN 'ANOMALY'
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
      THEN 'WARNING'
    ELSE 'NORMAL'
  END as status
FROM hourly_metrics hm
JOIN historical_avg ha
  ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
  AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY z_score DESC
```

The agent compares current hourly metrics against the trailing 7-day average, flags anything beyond 1.5 standard deviations, and recommends immediate investigation steps for critical anomalies (z-score > 2).

## Step 5: Advanced Patterns

### 5.1 Funnel Analysis

```sql
WITH funnel_steps AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
    MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
    MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
    MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY user_id
)
SELECT
  SUM(step_1_view) as viewed,
  SUM(step_2_click) as clicked,
  SUM(step_3_signup) as signed_up,
  SUM(step_4_purchase) as purchased,
  ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as view_to_click_pct,
  ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1)
    as click_to_signup_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1)
    as signup_to_purchase_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as overall_conversion_pct
FROM funnel_steps
```

### 5.2 Churn Prediction

Identify users likely to churn based on activity patterns:

```sql
WITH user_features AS (
  SELECT
    user_id,
    COUNT(*) as total_events,
    COUNT(DISTINCT DATE(timestamp)) as active_days,
    DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_inactive,
    AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
)
SELECT
  user_id,
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
    WHEN days_inactive > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
    WHEN days_inactive > 3 AND total_events < 5 THEN 'LOW_RISK'
    ELSE 'ACTIVE'
  END as churn_risk,
  days_inactive,
  total_events,
  purchase_count,
  ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_inactive > 3
ORDER BY
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 1
    WHEN days_inactive > 7 AND active_days < 3 THEN 2
    ELSE 3
  END,
  days_inactive DESC
```

## Step 6: Schedule with Missions

### 6.1 Daily Insights Report

```bash
curl -X POST https://cc.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Daily BigQuery Insights",
    "goal": "Analyze yesterday data from BigQuery: 1) Overall event metrics and trends 2) Revenue summary 3) Any anomalies detected 4) Top recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 8 * * *"
    }
  }'
```

Note: `characterId` is the agent's ID in the API.

**Response:**
```json
{
  "success": true,
  "id": "nV7bTq2wXpLk",
  "title": "Daily BigQuery Insights",
  "status": "pending",
  "schedule": {
    "type": "cron",
    "value": "0 8 * * *"
  }
}
```

### 6.2 Weekly Deep Dive

```bash
curl -X POST https://cc.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Weekly Analytics Deep Dive",
    "goal": "Comprehensive weekly analysis: 1) Cohort retention trends 2) Revenue by customer segment 3) Funnel conversion rates 4) Churn risk report 5) Key insights and strategic recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 9 * * 1"
    }
  }'
```

## Troubleshooting

**Authentication failed (403)**
- Verify the service account JSON is valid and base64-encoded correctly
- Confirm the service account has `bigquery.dataViewer` and `bigquery.jobUser` roles
- Check that the BigQuery API is enabled in your GCP project

**Query timeout**
- Add a `LIMIT` clause for exploratory queries
- Use partitioned and clustered tables for large datasets
- Break complex queries into smaller CTEs

**No data returned**
- Verify dataset and table names (they are case-sensitive)
- Check date range filters -- timestamps may be in UTC
- Test with a simple `SELECT COUNT(*) FROM table` first

**Cost concerns**
- BigQuery charges $5 per TB scanned -- always select only needed columns
- Use `maximumBytesBilled` in your query settings to cap cost
- Partition tables by date and cluster by frequently filtered columns

## Cost Estimation

| Component | Estimate |
|-----------|----------|
| BigQuery queries (typical: 10-100 MB each) | $0.00005-$0.0005 per query |
| BigQuery storage | $0.02 per GB/month |
| TeamDay agent execution (~15K input + ~4K output tokens) | ~$0.18 per analysis |
| Monthly total (daily reports) | ~$6-10/month |

## Next Steps

- [Analytics Reporter](https://docs.teamday.ai/examples/analytics-reporter) -- Combine with GA4 data
- [Code Review Bot](https://docs.teamday.ai/examples/code-review-bot) -- Analyze engineering metrics
- [API Reference](https://docs.teamday.ai/api/) -- Full TeamDay API documentation

## Resources

- [BigQuery Documentation](https://cloud.google.com/bigquery/docs)
- [BigQuery SQL Reference](https://cloud.google.com/bigquery/docs/reference/standard-sql)
- [MCP BigQuery Server](https://github.com/modelcontextprotocol/servers/tree/main/src/bigquery)
- [TeamDay on GitHub](https://github.com/TeamDay-AI/teamday)
