All Modules

SQL Simulate

Synthetic Data Generation

Generate realistic synthetic data from schema metadata alone. No source data required - create complete test environments from scratch with proper distributions and relationships.

How It Works

1

Analyze Schema

Read table structures, data types, constraints, and relationships

2

Infer Semantics

AI determines what each column represents (name, email, date, etc.)

3

Generate Data

Create realistic data respecting all constraints and relationships

Intelligent Column Detection

AI infers data types from column names and generates appropriate values

Column Analysis Results
════════════════════════════════════════════════════════════

Table: Customers
┌──────────────────┬─────────────┬──────────────────┬───────────────┐
│ Column           │ SQL Type    │ Inferred Type    │ Generator     │
├──────────────────┼─────────────┼──────────────────┼───────────────┤
│ CustomerId       │ INT         │ Primary Key      │ Sequential    │
│ FirstName        │ NVARCHAR    │ Person.FirstName │ Faker         │
│ LastName         │ NVARCHAR    │ Person.LastName  │ Faker         │
│ Email            │ NVARCHAR    │ Email Address    │ {first}.{last}│
│ PhoneNumber      │ VARCHAR     │ Phone (US)       │ ###-###-####  │
│ DateOfBirth      │ DATE        │ Birth Date       │ 18-80 years   │
│ CreatedAt        │ DATETIME    │ Timestamp        │ Recent dates  │
│ IsActive         │ BIT         │ Boolean          │ 90% true      │
│ CreditScore      │ INT         │ Score (300-850)  │ Normal dist   │
└──────────────────┴─────────────┴──────────────────┴───────────────┘

Confidence: 94% (override any detection in config)

Fine-Tune Generation

# sql2ai-simulate.yaml
generation:
  seed: 42  # Reproducible results
  locale: en_US

tables:
  Customers:
    row_count: 10000
    columns:
      CreditScore:
        distribution: normal
        mean: 680
        std_dev: 80
        min: 300
        max: 850

      State:
        distribution: weighted
        values:
          CA: 0.15
          TX: 0.12
          FL: 0.10
          NY: 0.08
          other: 0.55

  Orders:
    row_count: 50000
    date_range:
      start: 2023-01-01
      end: 2024-12-31
    parent_distribution:
      table: Customers
      type: pareto  # Some customers order more

relationships:
  preserve_referential_integrity: true
  cascade_generation: true

Use Cases

Load Testing

Generate millions of rows to stress test your database

New Projects

Populate empty databases for development

Demo Environments

Create realistic data for sales demos

CI/CD Testing

Fresh test data for every pipeline run

Simulate vs Anonymize

SQL Simulate

  • • No source data required
  • • Generates from schema metadata
  • • Perfect for new projects
  • • Configurable distributions
  • • Zero privacy risk

SQL Anonymize

  • • Requires production data
  • • Preserves data patterns
  • • Maintains distributions
  • • Keeps edge cases
  • • Secure clean room process

Generate Test Data Instantly

Create realistic synthetic data from schema alone. No production data needed.

No credit card required • Free for individual developers