AI Engine

YourAIrunsonyourinfrastructure.Yourdataneverleavesit.

9 providers, 20+ models, 3 sovereignty levels. The router automatically selects the right model based on data sensitivity. SkaLean configures and maintains the infrastructure. You use it.

9LLM providers

20+integrated models

3sovereignty levels

0unauthorized transit

See plans See features →

LLM Router: real-time analysis

Query —

PII Scan —

Selected tier —

Model —

Latency —

Contract

Patient record

Finance report

Sovereign architecture

Three levels, one router

The LLM router automatically selects the right level based on data sensitivity. No action required from the user.

Compatible with 20+ AI models — Claude, GPT, Gemini

Tier 1

Global cloud APIs

Providers: OpenAI · Anthropic · Mistral · Google

Models: GPT-4o, Claude Opus 4, Gemini 2.5, Mistral Large

Data: transit to provider servers

Performance: ~50 tokens/s · P50 0.8s

Non-sensitive data, general use, best quality

Exact provider billing. Zero commission.

Tier 2

Regional sovereign cloud

Providers: Azure OpenAI · AWS Bedrock · Vertex AI

Models: GPT-4o, Claude Sonnet, Gemini 2.5, hosted in your region

Data: remain in your country. Zero cross-border transfer.

Performance: ~50 tokens/s · same quality as Tier 1

Sensitive data: GDPR + CCPA compliant, HIPAA, local laws

Same prices as Tier 1 · your data stays in your region

Tier 3

Self-hosted infrastructure

Infrastructure: Sovereign CPU inference (included in all plans) + high-performance GPU (included in onboarding)

Models: Open-source self-hosted models: Llama, Qwen, Mistral, specialized medical models

Data: on your infrastructure. Zero transit, zero external cloud.

Performance: 35–120 tokens/s depending on optimization

PHI, trade secrets, zero-cloud requirements

CPU models: $0 · GPU models: service included, tokens billed per use

Data sovereignty levels — local hosting vs cloud

4 automatic steps

The router decides, you do nothing

4-step routing algorithm. No manual configuration. Automatic fallback if preferred model is unavailable.

PII verification

15 types of sensitive data scanned. If critical PII detected → automatically force sovereign Tier 3.

HIPAA mode

If client in HIPAA mode → force Tier 3. BAA-only routing mandatory. No uncertified cloud models.

Client preference

always (all GPU) / auto (default, GPU if PII) / never (cloud API only). Configurable per client and workflow.

Intelligent degradation

If GPU loaded (P95 > 10s) → switch to Tier 2. Re-test 60s. Breaker after 5 consecutive errors.

High-performance GPU → Standard GPU → Sovereign CPU. Never to US if sovereign required.

Features

The most complete AI engine

Built for teams with compliance requirements who won't sacrifice performance.

9 providers, 20+ models

OpenAI, Anthropic, Mistral, Google + self-hosted sovereign infrastructure (CPU and GPU). Automatic routing with fallback. Zero vendor lock-in.

Intelligent document search

Your documents are ingested, segmented, and indexed automatically. Search combines semantic and keyword matching, then ranks results by relevance before generating answers with citations.

Sovereign high-performance GPU

GPU infrastructure on your territory with hardware optimization. 2–4x faster than standard inference. Market-standard compatible API. Service included in onboarding · tokens billed per use.

Industry fine-tuning

Model fine-tuning on your business data. A law firm fine-tunes on its case files. Data encrypted, deleted after training.

Sovereign medical model

Specialized self-hosted medical model. Outperforms general models on health data. Non-disableable medical guardrails. Zero diagnosis, zero prescription.

PII protection 15 types

15 types of sensitive data detected and masked. Automatic routing to sovereign infrastructure if medical data detected. Re-substitution after response.

RAG Pipeline

6 steps from your document to the answer

Target: under 800ms P95. Each step is independent, observable, and auditable.

Ingestion

PDF, DOCX, URLs, Notion

Segmentation

Intelligent chunking into coherent blocks

Vectorization

Cloud or sovereign embedding models

Hybrid search (semantic + keywords)

Reranking

Relevance reranking, top 5

Generation

Response with verifiable source citations

GPU Performance

Sovereign GPU: 2–4x faster

Optimized sovereign GPU inference. Our engine accelerates throughput to multiply performance without leaving your infrastructure.

Standard GPU35 t/s · P50 1.2s · 8 req max

35 t/s

High-performance GPU100 t/s · P50 0.6s · 18 req max

100 t/s

API GPT-4o (ref)50 t/s · P50 0.8s · 100+ req

50 t/s

ZERO Data transit

80 GB GPU VRAM

99.5% Enterprise GPU SLA

Auto Automatic fallback

Why SkaLean

No competitor combines all 3 tiers

OpenAI, Azure, and Mistral each offer a piece of the puzzle. SkaLean is the only AI engine that integrates them all, with automatic routing, sovereign GPU, native RAG, and zero commission.

API only

Regional cloud

Self-hosted

All-in-one

Criterion	OpenAI / Anthropic API	Azure OpenAI · Bedrock · Vertex	Open-source DIY	SkaLean AI Engine
Data sovereignty	— US servers	✓ Region of choice	✓ On your infrastructure	✓ 3 automatic tiers
Number of providers / models	1 provider	1-2 providers	Free models only	✓ 9 providers · 20+ models
Automatic PII routing	—	—	—	✓ 15 types · sensitivity score
PII protection before LLM send	—	—	—	✓ Pseudonymization + re-substitution
TensorRT-LLM (2-4x acceleration)	—	—	Complex DIY	✓ Native · no competing AIaaS
LoRA fine-tuning per client (NeMo)	OpenAI fine-tuning (expensive)	Azure fine-tuning (expensive)	DIY · no client isolation	✓ NeMo · encrypted dataset · isolated
Sovereign medical model	—	—	—	✓ Outperforms general models on health data
Integrated 6-step RAG	—	—	DIY · no turnkey pipeline	✓ Hybrid + RRF + reranking + citations
Breaker + automatic fallback	—	—	—	✓ Automatic cascade fallback · 5 retries
OWASP LLM Top 10	Basic	Partial	—	✓ 10/10 · non-disableable
Activatable HIPAA compliance	—	✓ BAA available (Azure, AWS)	— Manual configuration required	✓ Client-activatable HIPAA compliance
Token commission	Public rate	Public rate + regional markup	DIY infrastructure cost	0% Exact provider rate
Managed service	— Self-service	— Self-service	— Everything to configure	✓ Building · maintenance · SkaLean expertise

25+ players analyzed: none combine all 3

Botpress and Voiceflow do agents but not automation. Third-party tools do automation but not agents. ChatGPT Team and Copilot do workspace but without real sovereignty. SkaLean is the only AI engine that combines multi-provider routing, sovereign GPU, native RAG, and managed service in a single platform.

Compliance & Sovereignty

Your data never leaves your region

Local infrastructure · native regulatory compliance · GDPR + CCPA · HIPAA activatable per tenant. SkaLean configures and maintains your sovereign infrastructure.

GDPR + CCPA mechanisms

data transit out of region

100%

configurable per tenant

Frequently asked questions

Why do I need to choose an LLM model? What is the difference between GPT-4o, Claude, and Mistral?

Each model has different strengths, like different medical specialists. GPT-4o (OpenAI) excels at versatile tasks and complex reasoning. Claude (Anthropic) is known for nuanced responses and caution on sensitive topics — ideal for legal and compliance. Mistral is a lighter European model, optimized for French, less computationally expensive. The good news: you don't need to choose manually — SkaLean's LLM router automatically selects the optimal model based on task type, language, compliance constraints, and cost target.

What is sovereign GPU and why does it matter for my business?

GPU inference is the computation performed to generate each AI response. Normally, with ChatGPT, this computation happens on OpenAI's servers in the United States, subject to the US Cloud Act — your data transits to a foreign country. Sovereign GPU means SkaLean performs this computation on GPUs physically located in your country. For a Quebec medical office (Bill 25), a law firm (professional secrecy), or a financial institution (OSFI), this is a legal requirement. If an audit asks "where is your data processed?", the answer is "on a physical server in your city."

How does the LLM router automatically select the right model for each request?

The LLM router evaluates each request on 4 criteria: (1) Presence of personal data — if the request contains sensitive data, only sovereign models are authorized, (2) HIPAA compliance — if the account is in HIPAA mode, US cloud models are excluded, (3) Client preference — if you have defined a preferred model for a specific use case, it takes priority, (4) Intelligent degradation — if the preferred model is unavailable, automatic failover to the best available alternative without service interruption. This system eliminates vendor lock-in.

Can I use a model fine-tuned on my specific business data?

Yes, available on Enterprise plans. LoRA fine-tuning (Low-Rank Adaptation) adjusts a base model (Llama 3, Mistral) on your specific data in 2 to 5 days. Use cases: dental office fine-tuning on RAMQ code nomenclature, accounting firm on specific Quebec tax regulations, veterinary clinic with its animal medical terminology. Result: 15-30% superior precision on your specific tasks. Most clients don't need it — the RAG pipeline suffices — but it's available if your vocabulary is highly specialized.

Are models as performant in French as in English?

GPT-4o and Claude achieve very close performance in French and English (5-10% difference according to benchmarks). Mistral has been specifically optimized for French and often outperforms GPT-4 on French writing tasks. For Arabic, available models support Modern Standard Arabic (MSA) with good quality. Regional dialects (Darija, Levant, Gulf) are supported for simple conversational tasks. During your demonstration, SkaLean makes it easy to compare models side-by-side on your real use cases.

How are LLM model updates managed without disrupting my service?

SkaLean manages updates in "blue-green" mode: the new version is tested in parallel for 72 hours before replacing the old one. If quality metrics regress on your use cases, the switch is automatically canceled. You are notified 7 days before any major update. For Enterprise plans, a "pinned model" (fixed version) can be configured to avoid any unplanned behavior change — unlike direct OpenAI/Anthropic APIs where an update can change your application overnight.

The SkaLean Ecosystem

The AI Engine powers the entire ecosystem

The sovereign AI Engine is the brain powering AI Studio, AI Automation, and AI Assistants, locally hosted, compliant with your regulations, zero imposed cloud dependency.

AI Studio

Documents · RAG · Real-time collaboration

AI Automation

200+ connectors · intelligent workflows

AI Assistant

10 channels · voice · ReAct loop

Transparent pricing

You pay for tokens. Nothing more.

SkaLean takes zero commission on LLM calls. You're billed exactly at the provider's published rate.

0% commission on LLM tokens

We charge exactly what the LLM provider charges — no markup, no hidden fees. Custom LLM deployment and development are included in the onboarding fee.

Provider	Model	Input / 1K tokens	Output / 1K tokens	Notes
OpenAI	gpt-4o	0,0025 $	0,01 $	128K context · Tool calling
OpenAI	gpt-4o-mini	0,00015 $	0,0006 $	Ultra fast · economical
OpenAI	gpt-4.1 / gpt-4.1-mini	0,002 $ / 0,0001 $	0,008 $ / 0,0004 $	Latest generation
Anthropic	claude-opus-4	0,015 $	0,075 $	200K context · reasoning
Anthropic	claude-sonnet-4	0,003 $	0,015 $	Balanced performance/cost
Anthropic	claude-haiku-4.5	0,00025 $	0,00125 $	Very fast · low cost
Mistral	mistral-large-2	0,002 $	0,006 $	European hosting (Paris)
Mistral	mistral-small-3.1	0,0002 $	0,0006 $	Compact European model
Google	gemini-2.5-pro	0,00125 $	0,005 $	Very long context
Google	gemini-2.5-flash	0,00015 $	0,0006 $	Ultra fast · streaming

Prices in USD per 1,000 tokens. Billed directly at provider rates, no markup.

Vectorization is included in the plan: $0 extra.

Platform	Model	Input / 1K tokens	Output / 1K tokens	Sovereignty
Azure OpenAI	gpt-4o / gpt-4o-mini	0,0025 $ / 0,00015 $	0,01 $ / 0,0006 $	Sovereign region of your choice
Azure OpenAI	gpt-4.1 / gpt-4.1-mini	0,002 $ / 0,0001 $	0,008 $ / 0,0004 $	Data remain in your country
AWS Bedrock	Claude Opus 4 / Sonnet 4	0,015 $ / 0,003 $	0,075 $ / 0,015 $	Sovereign Bedrock region
AWS Bedrock	Llama 3.1 70B / 8B	0,00065 $ / 0,0003 $	0,00085 $ / 0,0006 $	Open model via Bedrock
Vertex AI	Gemini 2.5 Pro / Flash	0,00125 $ / 0,00015 $	0,005 $ / 0,0006 $	Sovereign Vertex region
Vertex AI	Claude Sonnet 4 (via Vertex)	0,003 $	0,015 $	Anthropic via Google Model Garden

Same prices as Tier 1; your data stays in your region at no extra cost.

The sovereign region is chosen based on your country and regulatory requirements (GDPR, HIPAA, local laws).

Infrastructure	Models	Input / 1K tokens	Output / 1K tokens	Conditions
Ollama CPU	Llama, Mistral, Qwen and open-source models	0 $	0 $	Included in all plans
GPU Inference	Llama 70B+, Qwen 72B, specialized medical models	Billed at usage	Billed at usage	Configured by SkaLean · included in onboarding
Custom LLM	NeMo LoRA fine-tuning on your data	included in onboarding	included in onboarding	Enterprise plan

CPU inference is included in all SkaLean plans: no token fees, no volume limits.

GPU inference deployment and custom LLM development are included in the onboarding fee. Runtime tokens are billed at actual usage.

Zero data transit. Your data never leaves your infrastructure.

Ready for sovereign AI?

Your AI infrastructure, managed by SkaLean.

9 providers, 20+ models, 3 sovereignty levels. Deployment in 5 to 20 days.

Schedule a demo See plans and pricing →