AI Model Configuration

Configure access to various Large Language Models (LLMs) and AI services. At least one AI model provider (OpenAI, Azure OpenAI, Claude, Gemini or Ollama) must be configured for the application to function correctly.

The General AI Settings section below defines the default models used for background tasks such as document summarisation and embeddings. Any further AI models configured in the AI Model Providers section will make these models available for use in the chat interface.

General AI Settings

These settings define the default models used for various text generation tasks such as document summarisation and embeddings.

The provider set in the DEFAULT_TEXT_GEN_MODEL variable will be used for text generation tasks. This provider must have its configuration set in the section below.

GPT4 requires either the OpenAI configuration or the Azure OpenAI Service configuration to be provided,
Gemini requires that the Google Gemini & Vertex AI configuration is provided,
Claude requires the Anthropic Claude (Direct and AWS) configuration is provided, and
Ollama requires the Ollama configuration.

The provider set in the EMBEDDING_PROVIDER variable will be used for embedding generation tasks. This provider must have its configuration set in the section below. An embeddings model must be set for the provider to be used.

openAI requires either the OpenAI configuration or the Azure OpenAI Service configuration to be provided,
google requires that the Google Gemini & Vertex AI configuration is provided,
ollama requires the Ollama configuration.

NOTE

SapienAI's development has mostly involved testing on OpenAI models. This means parameters such as token lengths and temperatures have been optimized for these models. While other providers are supported, they may not perform as well in all scenarios. Ollama models can be used, but performance may vary depending on the model size and capabilities. Please raise any issues you encounter.

Environment Variable	Description	Type	Default Value
`DEFAULT_TEXT_GEN_MODEL`	What default models to use for background text generation and embedding generation. This provider will be used for tasks such conversation naming or document summarisation. The app handles the logic between using the mini and large version of each model to maximize performance (e.g. conversation naming uses the mini model while document summarisation uses the large model).	`GPT4`, `Gemini`, `Claude`, or `Ollama`	`GPT4`
`EMBEDDING_PROVIDER`	What default model provider to use for embedding generation.	`openAI`, `google`, or `ollama`	`openAI`
`VECTOR_LENGTH`	Length of vectors from the embeddings endpoint. At present, this cannot be changed once set.	number	`1536`

AI Model Providers

OpenAI

Uses OpenAI's official API.

Environment Variable	Description	Type	Default Value
`OPENAI_KEY`	Your OpenAI API key.	string	`null`
`OPEN_AI_TEXT_GEN_MODEL`	Default OpenAI model for text generation (e.g., `gpt-4o`).	string	`gpt-4.1`
`OPEN_AI_TEXT_GEN_MINI_MODEL`	Default OpenAI model for "mini" text generation tasks (e.g., `gpt-4o-mini`).	string	`gpt-4.1-mini`
`OPEN_AI_TEXT_GEN_REASONING_LRG_MODEL`	OpenAI large reasoning model.	string	`o3`
`OPEN_AI_TEXT_GEN_REASONING_MINI_MODEL`	OpenAI mini reasoning model.	string	`o4-mini`
`OPEN_AI_IMG_GEN_MODEL`	OpenAI image generation model.	string	`gpt-image-1`
`OPEN_AI_EMBEDDING_MODEL`	OpenAI embedding model.	string	`text-embedding-3-small`

Azure OpenAI Service

To use an Azure OpenAI Service deployment, you can set the following variables. At present, due to the variable availability of different models with different capabilities, you have to set specific deployments for different functionalities.

There are five distinct Azure Resources that can be set (only resource one is absolutely required if using Azure OpenAI as the default provider):

The resource for text generation and embeddings. These functionalities need to be contained within the same resource. There will need to be a deployment for the text-embedding-3-small model, and a deployment for a GPT4 family model that supports 128k tokens and a deployment for a 'mini' model.
A resource for a vision deployment endpoint. This must contain a vision capable GPT4-family model.
A resource for a Dall.e 3 deployment.
A resource for reasoning models (e.g. o3 & o4-mini).
A resource for a real-time chat.

These resources can overlap. If you have a resource in a region that has the availability for all features, you can set the env variables for each feature to the same resource (but you still have to set each env variable).

If you do not set the Azure vision resource, and no other vision-capable models are set, you will not be able to use vision capabilities in the chat.

This is somewhat unwieldy, but until Azure provides all models in all regions, this is the best way to maximise what can be achieved using Azure.

Environment Variable	Description	Type	Default Value
`USING_GPT4O`	Set to `false` if not using a vision-capable GPT-4o model on Azure, leave the default otherwise.	boolean	`true`
`AZURE_OPENAI_API_VERSION`	API version for Azure OpenAI services.	string	`2024-12-01-preview`
Text & Embedding Models (General)
`AZURE_OPENAI_KEY`	API key for general Azure OpenAI text/embedding services.	string	`null`
`AZURE_OPENAI_RESOURCE`	Resource name/endpoint for Azure OpenAI text/embedding.	string	`null`
`AZURE_OPENAI_TXT_DEPLOYMENT`	Deployment name for the primary Azure OpenAI text generation model.	string	`null`
`AZURE_OPENAI_TXT_DEPLOYMENT_MINI`	Deployment name for the "mini" Azure OpenAI text generation model.	string	`null`
`AZURE_OPENAI_EMBED_DEPLOYMENT`	Deployment name for the Azure OpenAI embedding model.	string	`null`
Vision Models
`AZURE_OPENAI_VISION_RESOURCE`	Resource name/endpoint for Azure OpenAI vision models.	string	`null`
`AZURE_OPENAI_VISION_DEPLOYMENT`	Deployment name for the Azure OpenAI vision model.	string	`null`
`AZURE_OPENAI_VISION_KEY`	API key for Azure OpenAI vision services.	string	`null`
Image Generation Models (DALL-E)
`AZURE_OPENAI_IMG_RESOURCE`	Resource name/endpoint for Azure OpenAI image generation.	string	`null`
`AZURE_OPENAI_IMG_DEPLOYMENT`	Deployment name for Azure OpenAI image generation.	string	`null`
`AZURE_OPENAI_IMG_KEY`	API key for Azure OpenAI image generation.	string	`null`
Realtime/Streaming Endpoints
`AZURE_REALTIME_URL`	URL for Azure real-time (streaming) services.	string	`null`
`AZURE_REALTIME_KEY`	API key for Azure real-time (streaming) services.	string	`null`
Reasoning Models (e.g., "o3" series)
`AZURE_OPENAI_REASONING_KEY`	API key for Azure OpenAI reasoning models.	string	`null`
`AZURE_OPENAI_REASONING_RESOURCE`	Resource name/endpoint for Azure OpenAI reasoning models.	string	`null`
`AZURE_OPENAI_REASONING_DEPLOYMENT`	Deployment name for the large Azure OpenAI reasoning model.	string	`null`
`AZURE_OPENAI_REASONING_MINI_DEPLOYMENT`	Deployment name for the mini Azure OpenAI reasoning model.	string	`null`

Anthropic Claude (Direct and AWS)

Configure access to Anthropic's Claude models. Only one option needs to be set to use Claude models.

Environment Variable	Description	Type	Default Value
General Settings	These first two variables need to be set regardless of the deployment method.
`CLAUDE_MODEL`	Claude model name (e.g., `claude-3-opus-20240229`).	string	`null`
`CLAUDE_MINI_MODEL`	Claude mini model name.	string	`null`
Direct API
`CLAUDE_API_KEY`	Your Anthropic API key.	string	`null`
AWS Bedrock (Claude)
`CLAUDE_AWS_REGION`	AWS region where the Bedrock model is hosted.	string	`null`
`CLAUDE_AWS_ACCESS_KEY`	AWS Access Key ID for Bedrock access.	string	`null`
`CLAUDE_AWS_SECRET_KEY`	AWS Secret Access Key for Bedrock access.	string	`null`

Google Gemini & Vertex AI

Configure access to Google's Gemini models. Only one option needs to be set to use Gemini models.

Environment Variable	Description	Type	Default Value
General Settings	These first three variables need to be set regardless of the deployment method.
`GEMINI_MODEL`	Gemini model name (e.g., `gemini-2.5-pro-latest`).	string	`null`
`GEMINI_MINI_MODEL`	Gemini mini model name.	string	`null`
`GEMINI_EMBEDDING_MODEL`	Gemini embedding model name.	string	`null`
Direct API
`GEMINI_API_KEY`	Your Google AI Studio API key for Gemini.	string	`null`
Vertex AI
`VERTEX_PROJECT`	Google Cloud Project ID for Vertex AI.	string	`null`
`VERTEX_LOCATION`	Location/Region for your Vertex AI resources (e.g., `us-central1`).	string	`null`
`GOOGLE_APPLICATION_CREDENTIALS`	See below	string	`null`

Setting Google Application Credentials

To set the GOOGLE_APPLICATION_CREDENTIALS with Docker and Docker Compose, you'll need to use a volume mount.

yaml

volumes:
  - /pathToCredentialsOnHost/credentials.json:/pathToCredentialsInContainer/credentials.json:ro

For example, if the json credential file is in the root directory and is called credentials.json, your .env file would include:

env

GOOGLE_APPLICATION_CREDENTIALS='/app/credentials.json'

The Docker Compose file, under the acid_backend service, would have the following:

yml

env_file:
  - .env
volumes:
  - ./credentials.json:/app/credentials.json:ro

Ollama

Environment Variable	Description	Type	Default Value
`OLLAMA_URL`	URL for the Ollama service.	string	`null`
`OLLAMA_MODEL`	Ollama model name.	string	`null`
`OLLAMA_MODEL_MINI`	Ollama mini model name.	string	`null`
`OLLAMA_KEEP_ALIVE`	Keep alive for the Ollama service.	string	`null`
`OLLAMA_EMBEDDING_MODEL`	Ollama embedding model name.	string	`null`
`OLLAMA_MAX_EMBEDDING_TEXT_TOKENS`	Max embedding text tokens for Ollama.	number	`null`

Ollama performance can vary significantly

SapienAI has been designed to use function calls and structured data throughout. Smaller Ollama models may struggle with this so YMMV when using selfhosted models. If you find the Ollama models are not coping with function calls in the chat interface, you can turn off chat functions by setting the NO_CHAT_FUNCTIONS environment variable to true.

Setting Token Limits

These settings control the default token limits for different model families.

With different models supporting different token limits, you may want to adjust these settings to optimize performance.

For example GPT4.1 can support 1 million tokens so you may want to update the MAX_CONVERSATION_TOKEN_COUNT variable to make use of this larger context.

Each provider can be configered with its own token limits. This is further broken down into "mini" models, large models and reasoning models for OpenAI.

Some limits are set based on the current maximum supported by the model but it is up to you to ensure that the limits you set are within the valid range for the model you are using.

OpenAI Family Models

Environment Variable	Description	Type	Default (Max if not set / Valid Range)
`MAX_OUTPUT_TOKENS`	Max response tokens for standard GPT-4-family models.	number	`16384` (<= 30000)
`MAX_CONVERSATION_TOKEN_COUNT`	Max conversation tokens for standard GPT-4-family models.	number	`124000`
`MAX_OUTPUT_TOKENS_SMALL`	Max response tokens for "mini" GPT-4-family models.	number	`16384` (<= 30000)
`MAX_CONVERSATION_TOKEN_COUNT_SMALL`	Max conversation tokens for "mini" GPT-4-family models.	number	`124000`
`reasoning_lrg_openAI_max_response`	Max response tokens for large OpenAI reasoning models.	number	`50000` (<= 50000)
`reasoning_lrg_openAI_max_convo_tokens`	Max conversation tokens for large OpenAI reasoning models.	number	`200000` (<= 200000)
`reasoning_mini_openAI_max_response`	Max response tokens for mini OpenAI reasoning models.	number	`50000` (<= 50000)
`reasoning_mini_openAI_max_convo_tokens`	Max conversation tokens for mini OpenAI reasoning models.	number	`120000` (<= 120000)

Anthropic Claude Models

Environment Variable	Description	Type	Default (Max if not set / Valid Range)
`claude_max_response_tokens`	Max response tokens for Claude models.	number	`8192` (<= 60000)
`claude_max_conversation_tokens`	Max conversation tokens for Claude models.	number	`200000` (<= 200000)
`claude_mini_max_response_tokens`	Max response tokens for mini Claude models.	number	`8192` (<= 60000)
`claude_mini_max_conversation_tokens`	Max conversation tokens for mini Claude models.	number	`200000` (<= 200000)

Google Gemini Models

Environment Variable	Description	Type	Default (Max if not set / Valid Range)
`gemini_max_response_tokens`	Max response tokens for Gemini models.	number	`50000` (<= 50000)
`gemini_max_conversation_tokens`	Max conversation tokens for Gemini models.	number	`1000000` (<= 1000000)
`gemini_mini_max_response_tokens`	Max response tokens for mini Gemini models.	number	`50000` (<= 50000)
`gemini_mini_max_conversation_tokens`	Max conversation tokens for mini Gemini models.	number	`1000000` (<= 1000000)

Ollama Models

Environment Variable	Description	Type	Default (Max if not set / Valid Range)
`OLLAMA_MAX_RESPONSE_TOKENS`	Max response tokens for Ollama models.	number	`8192`
`OLLAMA_MAX_CONVERSATION_TOKENS`	Max conversation tokens for Ollama models.	number	`200000`
`OLLAMA_MINI_MAX_RESPONSE_TOKENS`	Max response tokens for mini Ollama models.	number	`8192`
`OLLAMA_MINI_MAX_CONVERSATION_TOKENS`	Max conversation tokens for mini Ollama models.	number	`64000`

AI Model Configuration ​

General AI Settings ​

AI Model Providers ​

OpenAI ​

Azure OpenAI Service ​

Anthropic Claude (Direct and AWS) ​

Google Gemini & Vertex AI ​

Setting Google Application Credentials ​

Ollama ​

Setting Token Limits ​

OpenAI Family Models ​

Anthropic Claude Models ​

Google Gemini Models ​

Ollama Models ​