AI Model Configuration
Configure access to various Large Language Models (LLMs) and AI services. At least one AI model provider (OpenAI, Azure OpenAI, Claude, Gemini or Ollama) must be configured for the application to function correctly.
The General AI Settings
section below defines the default models used for background tasks such as document summarisation and embeddings. Any further AI models configured in the AI Model Providers
section will make these models available for use in the chat interface.
General AI Settings
These settings define the default models used for various text generation tasks such as document summarisation and embeddings.
The provider set in the DEFAULT_TEXT_GEN_MODEL
variable will be used for text generation tasks. This provider must have its configuration set in the section below.
GPT4
requires either theOpenAI
configuration or theAzure OpenAI Service
configuration to be provided,Gemini
requires that theGoogle Gemini & Vertex AI
configuration is provided,Claude
requires theAnthropic Claude (Direct and AWS)
configuration is provided, andOllama
requires the Ollama configuration.
The provider set in the EMBEDDING_PROVIDER
variable will be used for embedding generation tasks. This provider must have its configuration set in the section below. An embeddings model must be set for the provider to be used.
openAI
requires either theOpenAI
configuration or theAzure OpenAI Service
configuration to be provided,google
requires that theGoogle Gemini & Vertex AI
configuration is provided,ollama
requires theOllama
configuration.
NOTE
SapienAI's development has mostly involved testing on OpenAI models. This means parameters such as token lengths and temperatures have been optimized for these models. While other providers are supported, they may not perform as well in all scenarios. Ollama models can be used, but performance may vary depending on the model size and capabilities. Please raise any issues you encounter.
Environment Variable | Description | Type | Default Value |
---|---|---|---|
DEFAULT_TEXT_GEN_MODEL | What default models to use for background text generation and embedding generation. This provider will be used for tasks such conversation naming or document summarisation. The app handles the logic between using the mini and large version of each model to maximize performance (e.g. conversation naming uses the mini model while document summarisation uses the large model). | GPT4 , Gemini , Claude , or Ollama | GPT4 |
EMBEDDING_PROVIDER | What default model provider to use for embedding generation. | openAI , google , or ollama | openAI |
VECTOR_LENGTH | Length of vectors from the embeddings endpoint. At present, this cannot be changed once set. | number | 1536 |
AI Model Providers
OpenAI
Uses OpenAI's official API.
Environment Variable | Description | Type | Default Value |
---|---|---|---|
OPENAI_KEY | Your OpenAI API key. | string | null |
OPEN_AI_TEXT_GEN_MODEL | Default OpenAI model for text generation (e.g., gpt-4o ). | string | gpt-4.1 |
OPEN_AI_TEXT_GEN_MINI_MODEL | Default OpenAI model for "mini" text generation tasks (e.g., gpt-4o-mini ). | string | gpt-4.1-mini |
OPEN_AI_TEXT_GEN_REASONING_LRG_MODEL | OpenAI large reasoning model. | string | o3 |
OPEN_AI_TEXT_GEN_REASONING_MINI_MODEL | OpenAI mini reasoning model. | string | o4-mini |
OPEN_AI_IMG_GEN_MODEL | OpenAI image generation model. | string | gpt-image-1 |
OPEN_AI_EMBEDDING_MODEL | OpenAI embedding model. | string | text-embedding-3-small |
Azure OpenAI Service
To use an Azure OpenAI Service deployment
, you can set the following variables. At present, due to the variable availability of different models with different capabilities, you have to set specific deployments for different functionalities.
There are five distinct Azure Resources that can be set (only resource one is absolutely required if using Azure OpenAI as the default provider):
- The resource for text generation and embeddings. These functionalities need to be contained within the same resource. There will need to be a deployment for the
text-embedding-3-small
model, and a deployment for a GPT4 family model that supports 128k tokens and a deployment for a 'mini' model. - A resource for a vision deployment endpoint. This must contain a vision capable GPT4-family model.
- A resource for a Dall.e 3 deployment.
- A resource for reasoning models (e.g. o3 & o4-mini).
- A resource for a real-time chat.
These resources can overlap. If you have a resource in a region that has the availability for all features, you can set the env variables for each feature to the same resource (but you still have to set each env variable).
If you do not set the Azure vision resource, and no other vision-capable models are set, you will not be able to use vision capabilities in the chat.
This is somewhat unwieldy, but until Azure provides all models in all regions, this is the best way to maximise what can be achieved using Azure.
Environment Variable | Description | Type | Default Value |
---|---|---|---|
USING_GPT4O | Set to false if not using a vision-capable GPT-4o model on Azure, leave the default otherwise. | boolean | true |
AZURE_OPENAI_API_VERSION | API version for Azure OpenAI services. | string | 2024-12-01-preview |
Text & Embedding Models (General) | |||
AZURE_OPENAI_KEY | API key for general Azure OpenAI text/embedding services. | string | null |
AZURE_OPENAI_RESOURCE | Resource name/endpoint for Azure OpenAI text/embedding. | string | null |
AZURE_OPENAI_TXT_DEPLOYMENT | Deployment name for the primary Azure OpenAI text generation model. | string | null |
AZURE_OPENAI_TXT_DEPLOYMENT_MINI | Deployment name for the "mini" Azure OpenAI text generation model. | string | null |
AZURE_OPENAI_EMBED_DEPLOYMENT | Deployment name for the Azure OpenAI embedding model. | string | null |
Vision Models | |||
AZURE_OPENAI_VISION_RESOURCE | Resource name/endpoint for Azure OpenAI vision models. | string | null |
AZURE_OPENAI_VISION_DEPLOYMENT | Deployment name for the Azure OpenAI vision model. | string | null |
AZURE_OPENAI_VISION_KEY | API key for Azure OpenAI vision services. | string | null |
Image Generation Models (DALL-E) | |||
AZURE_OPENAI_IMG_RESOURCE | Resource name/endpoint for Azure OpenAI image generation. | string | null |
AZURE_OPENAI_IMG_DEPLOYMENT | Deployment name for Azure OpenAI image generation. | string | null |
AZURE_OPENAI_IMG_KEY | API key for Azure OpenAI image generation. | string | null |
Realtime/Streaming Endpoints | |||
AZURE_REALTIME_URL | URL for Azure real-time (streaming) services. | string | null |
AZURE_REALTIME_KEY | API key for Azure real-time (streaming) services. | string | null |
Reasoning Models (e.g., "o3" series) | |||
AZURE_OPENAI_REASONING_KEY | API key for Azure OpenAI reasoning models. | string | null |
AZURE_OPENAI_REASONING_RESOURCE | Resource name/endpoint for Azure OpenAI reasoning models. | string | null |
AZURE_OPENAI_REASONING_DEPLOYMENT | Deployment name for the large Azure OpenAI reasoning model. | string | null |
AZURE_OPENAI_REASONING_MINI_DEPLOYMENT | Deployment name for the mini Azure OpenAI reasoning model. | string | null |
Anthropic Claude (Direct and AWS)
Configure access to Anthropic's Claude models. Only one option needs to be set to use Claude models.
Environment Variable | Description | Type | Default Value |
---|---|---|---|
General Settings | These first two variables need to be set regardless of the deployment method. | ||
CLAUDE_MODEL | Claude model name (e.g., claude-3-opus-20240229 ). | string | null |
CLAUDE_MINI_MODEL | Claude mini model name. | string | null |
Direct API | |||
CLAUDE_API_KEY | Your Anthropic API key. | string | null |
AWS Bedrock (Claude) | |||
CLAUDE_AWS_REGION | AWS region where the Bedrock model is hosted. | string | null |
CLAUDE_AWS_ACCESS_KEY | AWS Access Key ID for Bedrock access. | string | null |
CLAUDE_AWS_SECRET_KEY | AWS Secret Access Key for Bedrock access. | string | null |
Google Gemini & Vertex AI
Configure access to Google's Gemini models. Only one option needs to be set to use Gemini models.
Environment Variable | Description | Type | Default Value |
---|---|---|---|
General Settings | These first three variables need to be set regardless of the deployment method. | ||
GEMINI_MODEL | Gemini model name (e.g., gemini-2.5-pro-latest ). | string | null |
GEMINI_MINI_MODEL | Gemini mini model name. | string | null |
GEMINI_EMBEDDING_MODEL | Gemini embedding model name. | string | null |
Direct API | |||
GEMINI_API_KEY | Your Google AI Studio API key for Gemini. | string | null |
Vertex AI | |||
VERTEX_PROJECT | Google Cloud Project ID for Vertex AI. | string | null |
VERTEX_LOCATION | Location/Region for your Vertex AI resources (e.g., us-central1 ). | string | null |
GOOGLE_APPLICATION_CREDENTIALS | See below | string | null |
Setting Google Application Credentials
To set the GOOGLE_APPLICATION_CREDENTIALS
with Docker and Docker Compose, you'll need to use a volume mount.
volumes:
- /pathToCredentialsOnHost/credentials.json:/pathToCredentialsInContainer/credentials.json:ro
For example, if the json credential file is in the root directory and is called credentials.json
, your .env
file would include:
GOOGLE_APPLICATION_CREDENTIALS='/app/credentials.json'
The Docker Compose file, under the acid_backend
service, would have the following:
env_file:
- .env
volumes:
- ./credentials.json:/app/credentials.json:ro
Ollama
Environment Variable | Description | Type | Default Value |
---|---|---|---|
OLLAMA_URL | URL for the Ollama service. | string | null |
OLLAMA_MODEL | Ollama model name. | string | null |
OLLAMA_MODEL_MINI | Ollama mini model name. | string | null |
OLLAMA_KEEP_ALIVE | Keep alive for the Ollama service. | string | null |
OLLAMA_EMBEDDING_MODEL | Ollama embedding model name. | string | null |
OLLAMA_MAX_EMBEDDING_TEXT_TOKENS | Max embedding text tokens for Ollama. | number | null |
Ollama performance can vary significantly
SapienAI has been designed to use function calls and structured data throughout. Smaller Ollama models may struggle with this so YMMV when using selfhosted models. If you find the Ollama models are not coping with function calls in the chat interface, you can turn off chat functions by setting the NO_CHAT_FUNCTIONS
environment variable to true
.
Setting Token Limits
These settings control the default token limits for different model families.
With different models supporting different token limits, you may want to adjust these settings to optimize performance.
For example GPT4.1 can support 1 million tokens so you may want to update the MAX_CONVERSATION_TOKEN_COUNT
variable to make use of this larger context.
Each provider can be configered with its own token limits. This is further broken down into "mini" models, large models and reasoning models for OpenAI.
Some limits are set based on the current maximum supported by the model but it is up to you to ensure that the limits you set are within the valid range for the model you are using.
OpenAI Family Models
Environment Variable | Description | Type | Default (Max if not set / Valid Range) |
---|---|---|---|
MAX_OUTPUT_TOKENS | Max response tokens for standard GPT-4-family models. | number | 16384 (<= 30000) |
MAX_CONVERSATION_TOKEN_COUNT | Max conversation tokens for standard GPT-4-family models. | number | 124000 |
MAX_OUTPUT_TOKENS_SMALL | Max response tokens for "mini" GPT-4-family models. | number | 16384 (<= 30000) |
MAX_CONVERSATION_TOKEN_COUNT_SMALL | Max conversation tokens for "mini" GPT-4-family models. | number | 124000 |
reasoning_lrg_openAI_max_response | Max response tokens for large OpenAI reasoning models. | number | 50000 (<= 50000) |
reasoning_lrg_openAI_max_convo_tokens | Max conversation tokens for large OpenAI reasoning models. | number | 200000 (<= 200000) |
reasoning_mini_openAI_max_response | Max response tokens for mini OpenAI reasoning models. | number | 50000 (<= 50000) |
reasoning_mini_openAI_max_convo_tokens | Max conversation tokens for mini OpenAI reasoning models. | number | 120000 (<= 120000) |
Anthropic Claude Models
Environment Variable | Description | Type | Default (Max if not set / Valid Range) |
---|---|---|---|
claude_max_response_tokens | Max response tokens for Claude models. | number | 8192 (<= 60000) |
claude_max_conversation_tokens | Max conversation tokens for Claude models. | number | 200000 (<= 200000) |
claude_mini_max_response_tokens | Max response tokens for mini Claude models. | number | 8192 (<= 60000) |
claude_mini_max_conversation_tokens | Max conversation tokens for mini Claude models. | number | 200000 (<= 200000) |
Google Gemini Models
Environment Variable | Description | Type | Default (Max if not set / Valid Range) |
---|---|---|---|
gemini_max_response_tokens | Max response tokens for Gemini models. | number | 50000 (<= 50000) |
gemini_max_conversation_tokens | Max conversation tokens for Gemini models. | number | 1000000 (<= 1000000) |
gemini_mini_max_response_tokens | Max response tokens for mini Gemini models. | number | 50000 (<= 50000) |
gemini_mini_max_conversation_tokens | Max conversation tokens for mini Gemini models. | number | 1000000 (<= 1000000) |
Ollama Models
Environment Variable | Description | Type | Default (Max if not set / Valid Range) |
---|---|---|---|
OLLAMA_MAX_RESPONSE_TOKENS | Max response tokens for Ollama models. | number | 8192 |
OLLAMA_MAX_CONVERSATION_TOKENS | Max conversation tokens for Ollama models. | number | 200000 |
OLLAMA_MINI_MAX_RESPONSE_TOKENS | Max response tokens for mini Ollama models. | number | 8192 |
OLLAMA_MINI_MAX_CONVERSATION_TOKENS | Max conversation tokens for mini Ollama models. | number | 64000 |