Filters

Filters control what documents can be retrieved, providing data classification, access control, and content filtering at the retrieval level. Filters ensure users only see documents they're authorized to access.

What is a Filter?

A filter is a rule that determines whether a document should be included in retrieval results based on:

Document metadata (classification, source, owner, sensitivity)
User context (organization, role, clearance level, tier)
Content attributes (confidentiality, sensitivity score, freshness)

Filters operate after vector search to narrow down results to only those the user is authorized to access. This protects sensitive data while maintaining semantic relevance.

Filter Structure

A complete filter includes:

filters:
  - name: enterprise_only
    description: Only retrieve documents accessible to enterprise customers
    condition:
      field: metadata.required_tier
      operator: equals
      value: enterprise
      source: documents
      document_match: all

Required Fields

Field	Type	Description
`name`	string	Unique filter identifier (lowercase, underscores)
`condition`	object	The condition that must match

Optional Fields

Field	Type	Description
`description`	string	Human-readable explanation of the filter

Condition Structure

Every filter has exactly one condition with these fields:

condition:
  field: metadata.classification      # What field to check
  operator: equals                    # How to compare
  value: public                       # What to compare against
  source: documents                   # Where to get the field (user or documents)
  document_match: all                 # For documents: any or all

Condition Fields

Field	Type	Source	Description
`field`	string	Both	Field name (supports nested paths like `metadata.owner.department`)
`operator`	string	Both	Comparison operator (equals, contains, in, lt, gt, etc.)
`value`	any	Both	Expected value to compare against
`source`	string	Both	`user` for user context, `documents` for document metadata
`document_match`	string	Documents only	`any` (at least one doc matches) or `all` (every doc must match)

Operators Reference

Equality Operators

Operator	Type	Description	Example
`equals`	All	Exact match	`classification equals "public"`
`not_equals`	All	Not equal to value	`status not_equals "archived"`

Collection Operators

Operator	Type	Description	Example
`in`	All	Value in list	`source in [internal, proprietary]`
`not_in`	All	Value not in list	`tier not_in [free, trial]`
`contains`	String	String contains substring	`tags contains "confidential"`
`not_contains`	String	String does not contain	`tags not_contains "deprecated"`

Numeric Operators

Operator	Type	Description	Example
`gt`	Numeric	Greater than	`sensitivity gt 7`
`gte`	Numeric	Greater than or equal	`age_days gte 0`
`lt`	Numeric	Less than	`confidentiality_score lt 50`
`lte`	Numeric	Less than or equal	`required_clearance_level lte user_clearance`

Sources: User vs. Documents

Source: user

Check fields from the user's context:

condition:
  field: org_tier
  operator: equals
  value: enterprise
  source: user
  # document_match not needed - single value from user

Available user fields: org_id, user_id, org_tier, role, and custom attributes.

Use case: Filter based on who is making the request

Example:

- name: premium_only
  description: Only available to premium tier users
  condition:
    field: org_tier
    operator: equals
    value: premium
    source: user

Source: documents

Check metadata from each document:

condition:
  field: metadata.classification
  operator: equals
  value: public
  source: documents
  document_match: all      # REQUIRED for documents source

Use case: Filter based on document properties

Example:

- name: public_documents
  description: Only retrieve publicly available documents
  condition:
    field: metadata.classification
    operator: equals
    value: public
    source: documents
    document_match: all

Document Matching: any vs. all

When source: documents, you must specify how to match:

document_match: any

Include document if at least one document in the retrieval result set matches:

condition:
  field: metadata.source
  operator: equals
  value: verified
  source: documents
  document_match: any   # Pass if ANY doc is verified

Behavior: More permissive, includes documents if condition matches any retrieved document

document_match: all

Include document if ALL documents in the retrieval result set match:

condition:
  field: metadata.classification
  operator: equals
  value: internal
  source: documents
  document_match: all   # Pass only if ALL docs are internal

Behavior: More restrictive, requires all documents to match

Vector Store Adapter Support

Filters are applied during document retrieval and must be supported by the vector_store_adapter. The adapter is responsible for:

Accepting Filters: Receive filter definitions from the configuration
Applying Filters at Retrieval: Use filters when querying the vector store
Metadata-Based Filtering: Filter documents based on their metadata and user context

The adapter must support:

Filtering by document metadata fields (any field in document metadata)
Filtering by user context (org_id, role, tier, custom attributes)
All supported operators: equals, contains, in, gt, lt, gte, lte, etc.
Nested field paths (e.g., metadata.owner.department)
Document matching modes: any and all

When implementing a vector store adapter, ensure it applies configured filters to filter results based on user authorization and data accessibility rules.

Real World Filter Examples

Data Classification Filters

Retrieve only public documents:

- name: public_only
  description: Retrieve only publicly available documents
  condition:
    field: metadata.classification
    operator: equals
    value: public
    source: documents
    document_match: all

Exclude confidential documents:

- name: exclude_confidential
  description: Do not retrieve confidential documents
  condition:
    field: metadata.classification
    operator: not_equals
    value: confidential
    source: documents
    document_match: all

Allow internal or higher classification:

- name: internal_and_above
  description: Allow internal and higher classification levels
  condition:
    field: metadata.classification
    operator: in
    value: [internal, confidential, secret]
    source: documents
    document_match: all

Organization Filters

Enterprise customers only:

- name: enterprise_customers
  description: Only for enterprise tier customers
  condition:
    field: org_tier
    operator: equals
    value: enterprise
    source: user

Specific organization documents:

- name: acme_only
  description: Only documents belonging to Acme Corporation
  condition:
    field: metadata.owner_org_id
    operator: equals
    value: acme_corp
    source: documents
    document_match: all

Multi-organization access:

- name: partner_organizations
  description: Allow documents from partner organizations
  condition:
    field: metadata.owner_org_id
    operator: in
    value: [acme_corp, techstart_inc, global_enterprises]
    source: documents
    document_match: all

Source/Origin Filters

Internal sources only:

- name: internal_only
  description: Only internally sourced documents
  condition:
    field: metadata.source
    operator: in
    value: [internal_wiki, internal_docs, intranet]
    source: documents
    document_match: all

Exclude web content:

- name: no_web_content
  description: Exclude publicly scraped web documents
  condition:
    field: metadata.source
    operator: not_equals
    value: public-web
    source: documents
    document_match: all

Verified sources only:

- name: verified_sources
  description: Only documents from verified, trusted sources
  condition:
    field: metadata.is_verified
    operator: equals
    value: true
    source: documents
    document_match: all

Sensitivity/Access Filters

Premium user access:

- name: premium_access
  description: Premium users can access more sensitive documents
  condition:
    field: metadata.sensitivity_score
    operator: lte
    value: 8
    source: documents
    document_match: all

Role-based sensitivity:

- name: analyst_access
  description: Analysts can view up to sensitivity level 7
  condition:
    field: metadata.sensitivity_score
    operator: lte
    value: 7
    source: documents
    document_match: all

Time-based Filters

Recent documents only:

- name: recent_documents
  description: Only documents updated in last 30 days
  condition:
    field: metadata.updated_at_days_ago
    operator: lte
    value: 30
    source: documents
    document_match: all

Exclude archived documents:

- name: active_documents
  description: Exclude archived documents
  condition:
    field: metadata.is_archived
    operator: equals
    value: false
    source: documents
    document_match: all

Complex Tag-based Filters

Exclude deprecated content:

- name: no_deprecated
  description: Exclude documents tagged as deprecated
  condition:
    field: metadata.tags
    operator: not_contains
    value: deprecated
    source: documents
    document_match: all

Require specific tags:

- name: requires_approval_tag
  description: Only approved documents
  condition:
    field: metadata.tags
    operator: contains
    value: approved
    source: documents
    document_match: all

Applying Filters in Organizations

Filters are referenced in organization configurations. Each organization has one filter applied to all document retrievals:

orgs:
  - org_id: acme_corp
    description: Acme Corporation
    default_policy: strict_citations
    document_policy:
      top_k: 8
      filter_name: enterprise_only        # Single filter name

  - org_id: startup_xyz
    description: Startup XYZ
    default_policy: balanced_production
    document_policy:
      top_k: 10
      filter_name: public_only            # Different filter for different org

Single Filter Per Organization

Each organization's document_policy specifies exactly one filter name to apply to all document retrievals for that organization.

To apply complex filtering logic, create a compound filter using the in operator or other operators:

# Define a compound filter that handles multiple conditions:
- name: internal_and_active
  description: Documents that are both internal and active
  condition:
    field: metadata.status
    operator: equals
    value: active
    source: documents
    document_match: all

# Then reference it in your organization:
orgs:
  - org_id: my_org
    document_policy:
      filter_name: internal_and_active

Or create multiple organizations with different filters for different tiers:

Filter Resolution Flow

When retrieving documents:

User request arrives
  ↓
Extract user_context and filters from org config
  ↓
Call vector_store_adapter.retrieve_with_filters()
  ↓
Adapter performs vector search
  ↓
Adapter applies filters to results:
  For each document:
    ├─ Check filter 1 condition (must pass)
    ├─ Check filter 2 condition (must pass)
    ├─ Check filter N condition (must pass)
    ├─ All pass? → Include document
    └─ Any fail? → Exclude document
  ↓
Return filtered documents (up to top_k)
  ↓
Policy applied to response
  ↓
Return to user

What is a Filter?​

Filter Structure​

Required Fields​

Optional Fields​

Condition Structure​

Condition Fields​

Operators Reference​

Equality Operators​

Collection Operators​

Numeric Operators​

Sources: User vs. Documents​

Source: user​

Source: documents​

Document Matching: any vs. all​

document_match: any​

document_match: all​

Vector Store Adapter Support​

Real World Filter Examples​

Data Classification Filters​

Organization Filters​

Source/Origin Filters​

Sensitivity/Access Filters​

Time-based Filters​

Complex Tag-based Filters​

Applying Filters in Organizations​

Single Filter Per Organization​

Filter Resolution Flow​

See Also​