Skip to main content
AI Engine

YourAIrunsonyourinfrastructure.Yourdataneverleavesit.

9 providers, 20+ models, 3 sovereignty levels. The router automatically selects the right model based on data sensitivity. SkaLean configures and maintains the infrastructure. You use it.

9LLM providers
20+integrated models
3sovereignty levels
0unauthorized transit
LLM Router: real-time analysis
Query
PII Scan
Selected tier
Model
Latency
Contract
Patient record
Finance report
Sovereign architecture

Three levels, one router

The LLM router automatically selects the right level based on data sensitivity. No action required from the user.

Compatible with 20+ AI models — Claude, GPT, Gemini
Tier 1
Global cloud APIs
Providers: OpenAI · Anthropic · Mistral · Google
Models: GPT-4o, Claude Opus 4, Gemini 2.5, Mistral Large
Data: transit to provider servers
Performance: ~50 tokens/s · P50 0.8s
Non-sensitive data, general use, best quality
Exact provider billing. Zero commission.
Tier 2
Regional sovereign cloud
Providers: Azure OpenAI · AWS Bedrock · Vertex AI
Models: GPT-4o, Claude Sonnet, Gemini 2.5, hosted in your region
Data: remain in your country. Zero cross-border transfer.
Performance: ~50 tokens/s · same quality as Tier 1
Sensitive data: GDPR + CCPA compliant, HIPAA, local laws
Same prices as Tier 1 · your data stays in your region
Tier 3
Self-hosted infrastructure
Infrastructure: Sovereign CPU inference (included in all plans) + high-performance GPU (included in onboarding)
Models: Open-source self-hosted models: Llama, Qwen, Mistral, specialized medical models
Data: on your infrastructure. Zero transit, zero external cloud.
Performance: 35–120 tokens/s depending on optimization
PHI, trade secrets, zero-cloud requirements
CPU models: $0 · GPU models: service included, tokens billed per use
Data sovereignty levels — local hosting vs cloud
4 automatic steps

The router decides, you do nothing

4-step routing algorithm. No manual configuration. Automatic fallback if preferred model is unavailable.

1
PII verification
15 types of sensitive data scanned. If critical PII detected → automatically force sovereign Tier 3.
2
HIPAA mode
If client in HIPAA mode → force Tier 3. BAA-only routing mandatory. No uncertified cloud models.
3
Client preference
always (all GPU) / auto (default, GPU if PII) / never (cloud API only). Configurable per client and workflow.
4
Intelligent degradation
If GPU loaded (P95 > 10s) → switch to Tier 2. Re-test 60s. Breaker after 5 consecutive errors.
High-performance GPU → Standard GPU → Sovereign CPU. Never to US if sovereign required.
Features

The most complete AI engine

Built for teams with compliance requirements who won't sacrifice performance.

9 providers, 20+ models
OpenAI, Anthropic, Mistral, Google + self-hosted sovereign infrastructure (CPU and GPU). Automatic routing with fallback. Zero vendor lock-in.
Intelligent document search
Your documents are ingested, segmented, and indexed automatically. Search combines semantic and keyword matching, then ranks results by relevance before generating answers with citations.
Sovereign high-performance GPU
GPU infrastructure on your territory with hardware optimization. 2–4x faster than standard inference. Market-standard compatible API. Service included in onboarding · tokens billed per use.
Industry fine-tuning
Model fine-tuning on your business data. A law firm fine-tunes on its case files. Data encrypted, deleted after training.
Sovereign medical model
Specialized self-hosted medical model. Outperforms general models on health data. Non-disableable medical guardrails. Zero diagnosis, zero prescription.
PII protection 15 types
15 types of sensitive data detected and masked. Automatic routing to sovereign infrastructure if medical data detected. Re-substitution after response.
RAG Pipeline

6 steps from your document to the answer

Target: under 800ms P95. Each step is independent, observable, and auditable.

1
Ingestion
PDF, DOCX, URLs, Notion
2
Segmentation
Intelligent chunking into coherent blocks
3
Vectorization
Cloud or sovereign embedding models
4
Search
Hybrid search (semantic + keywords)
5
Reranking
Relevance reranking, top 5
6
Generation
Response with verifiable source citations
GPU Performance

Sovereign GPU: 2–4x faster

Optimized sovereign GPU inference. Our engine accelerates throughput to multiply performance without leaving your infrastructure.

Standard GPU35 t/s · P50 1.2s · 8 req max
35 t/s
High-performance GPU100 t/s · P50 0.6s · 18 req max
100 t/s
API GPT-4o (ref)50 t/s · P50 0.8s · 100+ req
50 t/s
ZERO Data transit
80 GB GPU VRAM
99.5% Enterprise GPU SLA
Auto Automatic fallback
Why SkaLean

No competitor combines all 3 tiers

OpenAI, Azure, and Mistral each offer a piece of the puzzle. SkaLean is the only AI engine that integrates them all, with automatic routing, sovereign GPU, native RAG, and zero commission.

API only
Regional cloud
Self-hosted
All-in-one
CriterionOpenAI / Anthropic APIAzure OpenAI · Bedrock · VertexOpen-source DIYSkaLean AI Engine
Data sovereignty US servers Region of choice On your infrastructure 3 automatic tiers
Number of providers / models 1 provider 1-2 providers Free models only 9 providers · 20+ models
Automatic PII routing 15 types · sensitivity score
PII protection before LLM send Pseudonymization + re-substitution
TensorRT-LLM (2-4x acceleration) Complex DIY Native · no competing AIaaS
LoRA fine-tuning per client (NeMo) OpenAI fine-tuning (expensive) Azure fine-tuning (expensive) DIY · no client isolation NeMo · encrypted dataset · isolated
Sovereign medical model Outperforms general models on health data
Integrated 6-step RAG DIY · no turnkey pipeline Hybrid + RRF + reranking + citations
Breaker + automatic fallback Automatic cascade fallback · 5 retries
OWASP LLM Top 10 Basic Partial 10/10 · non-disableable
Activatable HIPAA compliance BAA available (Azure, AWS) Manual configuration required Client-activatable HIPAA compliance
Token commission Public rate Public rate + regional markup DIY infrastructure cost 0% Exact provider rate
Managed service Self-service Self-service Everything to configure Building · maintenance · SkaLean expertise
25+ players analyzed: none combine all 3
Botpress and Voiceflow do agents but not automation. Third-party tools do automation but not agents. ChatGPT Team and Copilot do workspace but without real sovereignty. SkaLean is the only AI engine that combines multi-provider routing, sovereign GPU, native RAG, and managed service in a single platform.
Compliance & Sovereignty

Your data never leaves your region

Local infrastructure · native regulatory compliance · GDPR + CCPA · HIPAA activatable per tenant. SkaLean configures and maintains your sovereign infrastructure.

13
GDPR + CCPA mechanisms
0
data transit out of region
100%
configurable per tenant

Frequently asked questions

Each model has different strengths, like different medical specialists. GPT-4o (OpenAI) excels at versatile tasks and complex reasoning. Claude (Anthropic) is known for nuanced responses and caution on sensitive topics — ideal for legal and compliance. Mistral is a lighter European model, optimized for French, less computationally expensive. The good news: you don't need to choose manually — SkaLean's LLM router automatically selects the optimal model based on task type, language, compliance constraints, and cost target.
GPU inference is the computation performed to generate each AI response. Normally, with ChatGPT, this computation happens on OpenAI's servers in the United States, subject to the US Cloud Act — your data transits to a foreign country. Sovereign GPU means SkaLean performs this computation on GPUs physically located in your country. For a Quebec medical office (Bill 25), a law firm (professional secrecy), or a financial institution (OSFI), this is a legal requirement. If an audit asks "where is your data processed?", the answer is "on a physical server in your city."
The LLM router evaluates each request on 4 criteria: (1) Presence of personal data — if the request contains sensitive data, only sovereign models are authorized, (2) HIPAA compliance — if the account is in HIPAA mode, US cloud models are excluded, (3) Client preference — if you have defined a preferred model for a specific use case, it takes priority, (4) Intelligent degradation — if the preferred model is unavailable, automatic failover to the best available alternative without service interruption. This system eliminates vendor lock-in.
Yes, available on Enterprise plans. LoRA fine-tuning (Low-Rank Adaptation) adjusts a base model (Llama 3, Mistral) on your specific data in 2 to 5 days. Use cases: dental office fine-tuning on RAMQ code nomenclature, accounting firm on specific Quebec tax regulations, veterinary clinic with its animal medical terminology. Result: 15-30% superior precision on your specific tasks. Most clients don't need it — the RAG pipeline suffices — but it's available if your vocabulary is highly specialized.
GPT-4o and Claude achieve very close performance in French and English (5-10% difference according to benchmarks). Mistral has been specifically optimized for French and often outperforms GPT-4 on French writing tasks. For Arabic, available models support Modern Standard Arabic (MSA) with good quality. Regional dialects (Darija, Levant, Gulf) are supported for simple conversational tasks. During your demonstration, SkaLean makes it easy to compare models side-by-side on your real use cases.
SkaLean manages updates in "blue-green" mode: the new version is tested in parallel for 72 hours before replacing the old one. If quality metrics regress on your use cases, the switch is automatically canceled. You are notified 7 days before any major update. For Enterprise plans, a "pinned model" (fixed version) can be configured to avoid any unplanned behavior change — unlike direct OpenAI/Anthropic APIs where an update can change your application overnight.
The SkaLean Ecosystem

The AI Engine powers the entire ecosystem

The sovereign AI Engine is the brain powering AI Studio, AI Automation, and AI Assistants, locally hosted, compliant with your regulations, zero imposed cloud dependency.

Transparent pricing

You pay for tokens. Nothing more.

SkaLean takes zero commission on LLM calls. You're billed exactly at the provider's published rate.

0% commission on LLM tokens
We charge exactly what the LLM provider charges — no markup, no hidden fees. Custom LLM deployment and development are included in the onboarding fee.
ProviderModelInput / 1K tokensOutput / 1K tokensNotes
OpenAI gpt-4o 0,0025 $ 0,01 $ 128K context · Tool calling
OpenAI gpt-4o-mini 0,00015 $ 0,0006 $ Ultra fast · economical
OpenAI gpt-4.1 / gpt-4.1-mini 0,002 $ / 0,0001 $ 0,008 $ / 0,0004 $ Latest generation
Anthropic claude-opus-4 0,015 $ 0,075 $ 200K context · reasoning
Anthropic claude-sonnet-4 0,003 $ 0,015 $ Balanced performance/cost
Anthropic claude-haiku-4.5 0,00025 $ 0,00125 $ Very fast · low cost
Mistral mistral-large-2 0,002 $ 0,006 $ European hosting (Paris)
Mistral mistral-small-3.1 0,0002 $ 0,0006 $ Compact European model
Google gemini-2.5-pro 0,00125 $ 0,005 $ Very long context
Google gemini-2.5-flash 0,00015 $ 0,0006 $ Ultra fast · streaming
Prices in USD per 1,000 tokens. Billed directly at provider rates, no markup.
Vectorization is included in the plan: $0 extra.
PlatformModelInput / 1K tokensOutput / 1K tokensSovereignty
Azure OpenAI gpt-4o / gpt-4o-mini 0,0025 $ / 0,00015 $ 0,01 $ / 0,0006 $ Sovereign region of your choice
Azure OpenAI gpt-4.1 / gpt-4.1-mini 0,002 $ / 0,0001 $ 0,008 $ / 0,0004 $ Data remain in your country
AWS Bedrock Claude Opus 4 / Sonnet 4 0,015 $ / 0,003 $ 0,075 $ / 0,015 $ Sovereign Bedrock region
AWS Bedrock Llama 3.1 70B / 8B 0,00065 $ / 0,0003 $ 0,00085 $ / 0,0006 $ Open model via Bedrock
Vertex AI Gemini 2.5 Pro / Flash 0,00125 $ / 0,00015 $ 0,005 $ / 0,0006 $ Sovereign Vertex region
Vertex AI Claude Sonnet 4 (via Vertex) 0,003 $ 0,015 $ Anthropic via Google Model Garden
Same prices as Tier 1; your data stays in your region at no extra cost.
The sovereign region is chosen based on your country and regulatory requirements (GDPR, HIPAA, local laws).
InfrastructureModelsInput / 1K tokensOutput / 1K tokensConditions
Ollama CPU Llama, Mistral, Qwen and open-source models 0 $ 0 $ Included in all plans
GPU Inference Llama 70B+, Qwen 72B, specialized medical models Billed at usage Billed at usage Configured by SkaLean · included in onboarding
Custom LLM NeMo LoRA fine-tuning on your data included in onboarding included in onboarding Enterprise plan
CPU inference is included in all SkaLean plans: no token fees, no volume limits.
GPU inference deployment and custom LLM development are included in the onboarding fee. Runtime tokens are billed at actual usage.
Zero data transit. Your data never leaves your infrastructure.
Ready for sovereign AI?

Your AI infrastructure, managed by SkaLean.

9 providers, 20+ models, 3 sovereignty levels. Deployment in 5 to 20 days.