How to set up a custom OpenAI-compatible Server in PDF Pals

PDF Pals supports a custom OpenAI-compatible Server such as an OpenAI proxy server, LocalAI or LM Studio Local Inference Servers.

OpenAI-compatible server

There are a few options to run a local OpenAI-compatible server.

1. LM Studio

The easiest way to do this is to use LM Studio. Follow this guide by Ingrid Stevens to start.

👉 Running a Local OpenAI-Compatible Mixtral Server with LM Studio

2. Ollama + LiteLLM

Ollama is another fantastic option. It's opensource and easy to use. Unfortunately, its server is not compatible with OpenAI so you will need to use LiteLLM for that.

👉 Set up LiteLLM with Ollama

3. LocalAI

LocalAI is another option if you're comfortable with docker and building it yourself. Follow their guide here:

👉 LocalAI Build Instruction

How to use it in PDF Pals

Go to Settings > Models, click the (+) button and choose "OpenAI-compatible Server"

Fill the form and click "Save Changes"

Set up local LLMs on macOS

  1. Give it a friendly name.
  2. Enter the exact url for the chat completions endpoint. For LM Studio, the default is http://localhost:1234/v1/chat/completions
  3. (Optional) Enter the model id. This will be sent with each chat request (the params model in OpenAI API spec)
  4. Enter the context length of this model. You need to refer to the original model to find this configuration. In LM Studio, find the "Context Length" configuration on the right pane.
  5. Enable streaming if the server supports it

Click "Save Changes".

IMPORTANT: if you don't intend to use OpenAI, make sure to set this as default (6)

Set up local LLMs on macOS

If you are new here, PDF Pals is a native macOS app that allows you to chat with local PDFs instantly. Download now.