Skip to main content

Filters

Filters control what documents can be retrieved, providing data classification, access control, and content filtering at the retrieval level. Filters ensure users only see documents they're authorized to access.

What is a Filter?

A filter is a rule that determines whether a document should be included in retrieval results based on:

  • Document metadata (classification, source, owner, sensitivity)
  • User context (organization, role, clearance level, tier)
  • Content attributes (confidentiality, sensitivity score, freshness)

Filters operate after vector search to narrow down results to only those the user is authorized to access. This protects sensitive data while maintaining semantic relevance.

Filter Structure

A complete filter includes:

filters:
- name: enterprise_only
description: Only retrieve documents accessible to enterprise customers
condition:
field: metadata.required_tier
operator: equals
value: enterprise
source: documents
document_match: all

Required Fields

FieldTypeDescription
namestringUnique filter identifier (lowercase, underscores)
conditionobjectThe condition that must match

Optional Fields

FieldTypeDescription
descriptionstringHuman-readable explanation of the filter

Condition Structure

Every filter has exactly one condition with these fields:

condition:
field: metadata.classification # What field to check
operator: equals # How to compare
value: public # What to compare against
source: documents # Where to get the field (user or documents)
document_match: all # For documents: any or all

Condition Fields

FieldTypeSourceDescription
fieldstringBothField name (supports nested paths like metadata.owner.department)
operatorstringBothComparison operator (equals, contains, in, lt, gt, etc.)
valueanyBothExpected value to compare against
sourcestringBothuser for user context, documents for document metadata
document_matchstringDocuments onlyany (at least one doc matches) or all (every doc must match)

Operators Reference

Equality Operators

OperatorTypeDescriptionExample
equalsAllExact matchclassification equals "public"
not_equalsAllNot equal to valuestatus not_equals "archived"

Collection Operators

OperatorTypeDescriptionExample
inAllValue in listsource in [internal, proprietary]
not_inAllValue not in listtier not_in [free, trial]
containsStringString contains substringtags contains "confidential"
not_containsStringString does not containtags not_contains "deprecated"

Numeric Operators

OperatorTypeDescriptionExample
gtNumericGreater thansensitivity gt 7
gteNumericGreater than or equalage_days gte 0
ltNumericLess thanconfidentiality_score lt 50
lteNumericLess than or equalrequired_clearance_level lte user_clearance

Sources: User vs. Documents

Source: user

Check fields from the user's context:

condition:
field: org_tier
operator: equals
value: enterprise
source: user
# document_match not needed - single value from user

Available user fields: org_id, user_id, org_tier, role, and custom attributes.

Use case: Filter based on who is making the request

Example:

- name: premium_only
description: Only available to premium tier users
condition:
field: org_tier
operator: equals
value: premium
source: user

Source: documents

Check metadata from each document:

condition:
field: metadata.classification
operator: equals
value: public
source: documents
document_match: all # REQUIRED for documents source

Use case: Filter based on document properties

Example:

- name: public_documents
description: Only retrieve publicly available documents
condition:
field: metadata.classification
operator: equals
value: public
source: documents
document_match: all

Document Matching: any vs. all

When source: documents, you must specify how to match:

document_match: any

Include document if at least one document in the retrieval result set matches:

condition:
field: metadata.source
operator: equals
value: verified
source: documents
document_match: any # Pass if ANY doc is verified

Behavior: More permissive, includes documents if condition matches any retrieved document

document_match: all

Include document if ALL documents in the retrieval result set match:

condition:
field: metadata.classification
operator: equals
value: internal
source: documents
document_match: all # Pass only if ALL docs are internal

Behavior: More restrictive, requires all documents to match

Vector Store Adapter Support

Filters are applied during document retrieval and must be supported by the vector_store_adapter. The adapter is responsible for:

  1. Accepting Filters: Receive filter definitions from the configuration
  2. Applying Filters at Retrieval: Use filters when querying the vector store
  3. Metadata-Based Filtering: Filter documents based on their metadata and user context

The adapter must support:

  • Filtering by document metadata fields (any field in document metadata)
  • Filtering by user context (org_id, role, tier, custom attributes)
  • All supported operators: equals, contains, in, gt, lt, gte, lte, etc.
  • Nested field paths (e.g., metadata.owner.department)
  • Document matching modes: any and all

When implementing a vector store adapter, ensure it applies configured filters to filter results based on user authorization and data accessibility rules.

Real World Filter Examples

Data Classification Filters

Retrieve only public documents:

- name: public_only
description: Retrieve only publicly available documents
condition:
field: metadata.classification
operator: equals
value: public
source: documents
document_match: all

Exclude confidential documents:

- name: exclude_confidential
description: Do not retrieve confidential documents
condition:
field: metadata.classification
operator: not_equals
value: confidential
source: documents
document_match: all

Allow internal or higher classification:

- name: internal_and_above
description: Allow internal and higher classification levels
condition:
field: metadata.classification
operator: in
value: [internal, confidential, secret]
source: documents
document_match: all

Organization Filters

Enterprise customers only:

- name: enterprise_customers
description: Only for enterprise tier customers
condition:
field: org_tier
operator: equals
value: enterprise
source: user

Specific organization documents:

- name: acme_only
description: Only documents belonging to Acme Corporation
condition:
field: metadata.owner_org_id
operator: equals
value: acme_corp
source: documents
document_match: all

Multi-organization access:

- name: partner_organizations
description: Allow documents from partner organizations
condition:
field: metadata.owner_org_id
operator: in
value: [acme_corp, techstart_inc, global_enterprises]
source: documents
document_match: all

Source/Origin Filters

Internal sources only:

- name: internal_only
description: Only internally sourced documents
condition:
field: metadata.source
operator: in
value: [internal_wiki, internal_docs, intranet]
source: documents
document_match: all

Exclude web content:

- name: no_web_content
description: Exclude publicly scraped web documents
condition:
field: metadata.source
operator: not_equals
value: public-web
source: documents
document_match: all

Verified sources only:

- name: verified_sources
description: Only documents from verified, trusted sources
condition:
field: metadata.is_verified
operator: equals
value: true
source: documents
document_match: all

Sensitivity/Access Filters

Premium user access:

- name: premium_access
description: Premium users can access more sensitive documents
condition:
field: metadata.sensitivity_score
operator: lte
value: 8
source: documents
document_match: all

Role-based sensitivity:

- name: analyst_access
description: Analysts can view up to sensitivity level 7
condition:
field: metadata.sensitivity_score
operator: lte
value: 7
source: documents
document_match: all

Time-based Filters

Recent documents only:

- name: recent_documents
description: Only documents updated in last 30 days
condition:
field: metadata.updated_at_days_ago
operator: lte
value: 30
source: documents
document_match: all

Exclude archived documents:

- name: active_documents
description: Exclude archived documents
condition:
field: metadata.is_archived
operator: equals
value: false
source: documents
document_match: all

Complex Tag-based Filters

Exclude deprecated content:

- name: no_deprecated
description: Exclude documents tagged as deprecated
condition:
field: metadata.tags
operator: not_contains
value: deprecated
source: documents
document_match: all

Require specific tags:

- name: requires_approval_tag
description: Only approved documents
condition:
field: metadata.tags
operator: contains
value: approved
source: documents
document_match: all

Applying Filters in Organizations

Filters are referenced in organization configurations. Each organization has one filter applied to all document retrievals:

orgs:
- org_id: acme_corp
description: Acme Corporation
default_policy: strict_citations
document_policy:
top_k: 8
filter_name: enterprise_only # Single filter name

- org_id: startup_xyz
description: Startup XYZ
default_policy: balanced_production
document_policy:
top_k: 10
filter_name: public_only # Different filter for different org

Single Filter Per Organization

Each organization's document_policy specifies exactly one filter name to apply to all document retrievals for that organization.

To apply complex filtering logic, create a compound filter using the in operator or other operators:

# Define a compound filter that handles multiple conditions:
- name: internal_and_active
description: Documents that are both internal and active
condition:
field: metadata.status
operator: equals
value: active
source: documents
document_match: all

# Then reference it in your organization:
orgs:
- org_id: my_org
document_policy:
filter_name: internal_and_active

Or create multiple organizations with different filters for different tiers:

Filter Resolution Flow

When retrieving documents:

User request arrives

Extract user_context and filters from org config

Call vector_store_adapter.retrieve_with_filters()

Adapter performs vector search

Adapter applies filters to results:
For each document:
├─ Check filter 1 condition (must pass)
├─ Check filter 2 condition (must pass)
├─ Check filter N condition (must pass)
├─ All pass? → Include document
└─ Any fail? → Exclude document

Return filtered documents (up to top_k)

Policy applied to response

Return to user

See Also