Skip to content

AI Model Configuration

Configure access to various Large Language Models (LLMs) and AI services. At least one AI model provider (OpenAI, Azure OpenAI, Claude, Gemini or Ollama) must be configured for the application to function correctly.

The General AI Settings section below defines the default models used for background tasks such as document summarisation and embeddings. Any further AI models configured in the AI Model Providers section will make these models available for use in the chat interface.

General AI Settings

These settings define the default models used for various text generation tasks such as document summarisation and embeddings.

The provider set in the DEFAULT_TEXT_GEN_MODEL variable will be used for text generation tasks. This provider must have its configuration set in the section below.

  • GPT4 requires either the OpenAI configuration or the Azure OpenAI Service configuration to be provided,
  • Gemini requires that the Google Gemini & Vertex AI configuration is provided,
  • Claude requires the Anthropic Claude (Direct and AWS) configuration is provided, and
  • Ollama requires the Ollama configuration.

The provider set in the EMBEDDING_PROVIDER variable will be used for embedding generation tasks. This provider must have its configuration set in the section below. An embeddings model must be set for the provider to be used.

  • openAI requires either the OpenAI configuration or the Azure OpenAI Service configuration to be provided,
  • google requires that the Google Gemini & Vertex AI configuration is provided,
  • ollama requires the Ollama configuration.

NOTE

SapienAI's development has mostly involved testing on OpenAI models. This means parameters such as token lengths and temperatures have been optimized for these models. While other providers are supported, they may not perform as well in all scenarios. Ollama models can be used, but performance may vary depending on the model size and capabilities. Please raise any issues you encounter.

Environment VariableDescriptionTypeDefault Value
DEFAULT_TEXT_GEN_MODELWhat default models to use for background text generation and embedding generation. This provider will be used for tasks such conversation naming or document summarisation. The app handles the logic between using the mini and large version of each model to maximize performance (e.g. conversation naming uses the mini model while document summarisation uses the large model).GPT4, Gemini, Claude, or OllamaGPT4
EMBEDDING_PROVIDERWhat default model provider to use for embedding generation.openAI, google, or ollamaopenAI
VECTOR_LENGTHLength of vectors from the embeddings endpoint. At present, this cannot be changed once set.number1536

AI Model Providers

OpenAI

Uses OpenAI's official API.

Environment VariableDescriptionTypeDefault Value
OPENAI_KEYYour OpenAI API key.stringnull
OPEN_AI_TEXT_GEN_MODELDefault OpenAI model for text generation (e.g., gpt-4o).stringgpt-4.1
OPEN_AI_TEXT_GEN_MINI_MODELDefault OpenAI model for "mini" text generation tasks (e.g., gpt-4o-mini).stringgpt-4.1-mini
OPEN_AI_TEXT_GEN_REASONING_LRG_MODELOpenAI large reasoning model.stringo3
OPEN_AI_TEXT_GEN_REASONING_MINI_MODELOpenAI mini reasoning model.stringo4-mini
OPEN_AI_IMG_GEN_MODELOpenAI image generation model.stringgpt-image-1
OPEN_AI_EMBEDDING_MODELOpenAI embedding model.stringtext-embedding-3-small

Azure OpenAI Service

To use an Azure OpenAI Service deployment, you can set the following variables. At present, due to the variable availability of different models with different capabilities, you have to set specific deployments for different functionalities.

There are five distinct Azure Resources that can be set (only resource one is absolutely required if using Azure OpenAI as the default provider):

  1. The resource for text generation and embeddings. These functionalities need to be contained within the same resource. There will need to be a deployment for the text-embedding-3-small model, and a deployment for a GPT4 family model that supports 128k tokens and a deployment for a 'mini' model.
  2. A resource for a vision deployment endpoint. This must contain a vision capable GPT4-family model.
  3. A resource for a Dall.e 3 deployment.
  4. A resource for reasoning models (e.g. o3 & o4-mini).
  5. A resource for a real-time chat.

These resources can overlap. If you have a resource in a region that has the availability for all features, you can set the env variables for each feature to the same resource (but you still have to set each env variable).

If you do not set the Azure vision resource, and no other vision-capable models are set, you will not be able to use vision capabilities in the chat.

This is somewhat unwieldy, but until Azure provides all models in all regions, this is the best way to maximise what can be achieved using Azure.

Environment VariableDescriptionTypeDefault Value
USING_GPT4OSet to false if not using a vision-capable GPT-4o model on Azure, leave the default otherwise.booleantrue
AZURE_OPENAI_API_VERSIONAPI version for Azure OpenAI services.string2024-12-01-preview
Text & Embedding Models (General)
AZURE_OPENAI_KEYAPI key for general Azure OpenAI text/embedding services.stringnull
AZURE_OPENAI_RESOURCEResource name/endpoint for Azure OpenAI text/embedding.stringnull
AZURE_OPENAI_TXT_DEPLOYMENTDeployment name for the primary Azure OpenAI text generation model.stringnull
AZURE_OPENAI_TXT_DEPLOYMENT_MINIDeployment name for the "mini" Azure OpenAI text generation model.stringnull
AZURE_OPENAI_EMBED_DEPLOYMENTDeployment name for the Azure OpenAI embedding model.stringnull
Vision Models
AZURE_OPENAI_VISION_RESOURCEResource name/endpoint for Azure OpenAI vision models.stringnull
AZURE_OPENAI_VISION_DEPLOYMENTDeployment name for the Azure OpenAI vision model.stringnull
AZURE_OPENAI_VISION_KEYAPI key for Azure OpenAI vision services.stringnull
Image Generation Models (DALL-E)
AZURE_OPENAI_IMG_RESOURCEResource name/endpoint for Azure OpenAI image generation.stringnull
AZURE_OPENAI_IMG_DEPLOYMENTDeployment name for Azure OpenAI image generation.stringnull
AZURE_OPENAI_IMG_KEYAPI key for Azure OpenAI image generation.stringnull
Realtime/Streaming Endpoints
AZURE_REALTIME_URLURL for Azure real-time (streaming) services.stringnull
AZURE_REALTIME_KEYAPI key for Azure real-time (streaming) services.stringnull
Reasoning Models (e.g., "o3" series)
AZURE_OPENAI_REASONING_KEYAPI key for Azure OpenAI reasoning models.stringnull
AZURE_OPENAI_REASONING_RESOURCEResource name/endpoint for Azure OpenAI reasoning models.stringnull
AZURE_OPENAI_REASONING_DEPLOYMENTDeployment name for the large Azure OpenAI reasoning model.stringnull
AZURE_OPENAI_REASONING_MINI_DEPLOYMENTDeployment name for the mini Azure OpenAI reasoning model.stringnull

Anthropic Claude (Direct and AWS)

Configure access to Anthropic's Claude models. Only one option needs to be set to use Claude models.

Environment VariableDescriptionTypeDefault Value
General SettingsThese first two variables need to be set regardless of the deployment method.
CLAUDE_MODELClaude model name (e.g., claude-3-opus-20240229).stringnull
CLAUDE_MINI_MODELClaude mini model name.stringnull
Direct API
CLAUDE_API_KEYYour Anthropic API key.stringnull
AWS Bedrock (Claude)
CLAUDE_AWS_REGIONAWS region where the Bedrock model is hosted.stringnull
CLAUDE_AWS_ACCESS_KEYAWS Access Key ID for Bedrock access.stringnull
CLAUDE_AWS_SECRET_KEYAWS Secret Access Key for Bedrock access.stringnull

Google Gemini & Vertex AI

Configure access to Google's Gemini models. Only one option needs to be set to use Gemini models.

Environment VariableDescriptionTypeDefault Value
General SettingsThese first three variables need to be set regardless of the deployment method.
GEMINI_MODELGemini model name (e.g., gemini-2.5-pro-latest).stringnull
GEMINI_MINI_MODELGemini mini model name.stringnull
GEMINI_EMBEDDING_MODELGemini embedding model name.stringnull
Direct API
GEMINI_API_KEYYour Google AI Studio API key for Gemini.stringnull
Vertex AI
VERTEX_PROJECTGoogle Cloud Project ID for Vertex AI.stringnull
VERTEX_LOCATIONLocation/Region for your Vertex AI resources (e.g., us-central1).stringnull
GOOGLE_APPLICATION_CREDENTIALSSee belowstringnull

Setting Google Application Credentials

To set the GOOGLE_APPLICATION_CREDENTIALS with Docker and Docker Compose, you'll need to use a volume mount.

yaml
volumes:
  - /pathToCredentialsOnHost/credentials.json:/pathToCredentialsInContainer/credentials.json:ro

For example, if the json credential file is in the root directory and is called credentials.json, your .env file would include:

env
GOOGLE_APPLICATION_CREDENTIALS='/app/credentials.json'

The Docker Compose file, under the acid_backend service, would have the following:

yml
env_file:
  - .env
volumes:
  - ./credentials.json:/app/credentials.json:ro

Ollama

Environment VariableDescriptionTypeDefault Value
OLLAMA_URLURL for the Ollama service.stringnull
OLLAMA_MODELOllama model name.stringnull
OLLAMA_MODEL_MINIOllama mini model name.stringnull
OLLAMA_KEEP_ALIVEKeep alive for the Ollama service.stringnull
OLLAMA_EMBEDDING_MODELOllama embedding model name.stringnull
OLLAMA_MAX_EMBEDDING_TEXT_TOKENSMax embedding text tokens for Ollama.numbernull

Ollama performance can vary significantly

SapienAI has been designed to use function calls and structured data throughout. Smaller Ollama models may struggle with this so YMMV when using selfhosted models. If you find the Ollama models are not coping with function calls in the chat interface, you can turn off chat functions by setting the NO_CHAT_FUNCTIONS environment variable to true.

Setting Token Limits

These settings control the default token limits for different model families.

With different models supporting different token limits, you may want to adjust these settings to optimize performance.

For example GPT4.1 can support 1 million tokens so you may want to update the MAX_CONVERSATION_TOKEN_COUNT variable to make use of this larger context.

Each provider can be configered with its own token limits. This is further broken down into "mini" models, large models and reasoning models for OpenAI.

Some limits are set based on the current maximum supported by the model but it is up to you to ensure that the limits you set are within the valid range for the model you are using.

OpenAI Family Models

Environment VariableDescriptionTypeDefault (Max if not set / Valid Range)
MAX_OUTPUT_TOKENSMax response tokens for standard GPT-4-family models.number16384 (<= 30000)
MAX_CONVERSATION_TOKEN_COUNTMax conversation tokens for standard GPT-4-family models.number124000
MAX_OUTPUT_TOKENS_SMALLMax response tokens for "mini" GPT-4-family models.number16384 (<= 30000)
MAX_CONVERSATION_TOKEN_COUNT_SMALLMax conversation tokens for "mini" GPT-4-family models.number124000
reasoning_lrg_openAI_max_responseMax response tokens for large OpenAI reasoning models.number50000 (<= 50000)
reasoning_lrg_openAI_max_convo_tokensMax conversation tokens for large OpenAI reasoning models.number200000 (<= 200000)
reasoning_mini_openAI_max_responseMax response tokens for mini OpenAI reasoning models.number50000 (<= 50000)
reasoning_mini_openAI_max_convo_tokensMax conversation tokens for mini OpenAI reasoning models.number120000 (<= 120000)

Anthropic Claude Models

Environment VariableDescriptionTypeDefault (Max if not set / Valid Range)
claude_max_response_tokensMax response tokens for Claude models.number8192 (<= 60000)
claude_max_conversation_tokensMax conversation tokens for Claude models.number200000 (<= 200000)
claude_mini_max_response_tokensMax response tokens for mini Claude models.number8192 (<= 60000)
claude_mini_max_conversation_tokensMax conversation tokens for mini Claude models.number200000 (<= 200000)

Google Gemini Models

Environment VariableDescriptionTypeDefault (Max if not set / Valid Range)
gemini_max_response_tokensMax response tokens for Gemini models.number50000 (<= 50000)
gemini_max_conversation_tokensMax conversation tokens for Gemini models.number1000000 (<= 1000000)
gemini_mini_max_response_tokensMax response tokens for mini Gemini models.number50000 (<= 50000)
gemini_mini_max_conversation_tokensMax conversation tokens for mini Gemini models.number1000000 (<= 1000000)

Ollama Models

Environment VariableDescriptionTypeDefault (Max if not set / Valid Range)
OLLAMA_MAX_RESPONSE_TOKENSMax response tokens for Ollama models.number8192
OLLAMA_MAX_CONVERSATION_TOKENSMax conversation tokens for Ollama models.number200000
OLLAMA_MINI_MAX_RESPONSE_TOKENSMax response tokens for mini Ollama models.number8192
OLLAMA_MINI_MAX_CONVERSATION_TOKENSMax conversation tokens for mini Ollama models.number64000

AcademicID - Smart & ethical AI for academia