Standardizing prompt templates

The library expects prompt templates to be stored as modular YAML or JSON files. They can be stored locally or in a HF repository.

A prompt template YAML or JSON file must follow the following standardized structure:

Top-level key (required): prompt. This top-level key signals to the parser that the content of the file is a prompt template.
Second-level key (required): template. This can be either a simple string, or a list of dictionaries following the OpenAI messages format. The messages format is recommended for use with LLM APIs or inference containers. Variable placeholders for populating the prompt template string are denoted with double curly brackets {{...}}.
Second-level keys (optional): (1) template_variables (list): variables for populating the prompt template. This is used for input validation and to make the required variables for long templates easily accessible; (2) metadata (dict): information about the template such as the source, date, author etc.; (3) client_parameters (dict): parameters for the inference client (e.g. temperature, model_id); (4) custom_data (dict): any other data that does not fit into the other categories.

Example prompt template following the standard in YAML:

href="#__codelineno-0-1">prompt: template: - role: "system" content: "You are a coding assistant who explains concepts clearly and provides short examples." - role: "user" content: "Explain what {{concept}} is in {{programming_language}}." template_variables: - concept - programming_language metadata: name: "Code Teacher" description: "A simple chat prompt for explaining programming concepts with examples" tags: - programming - education version: "0.0.1" author: "Guido van Bossum" client_parameters: - temperature: 0

Repository types on the HF Hub: Prompt template files can be shared in any HF repo type (dataset/model/space repo). We recommend sharing collections of prompt templates in dataset repos by default. See details here.

Naming convention: We call a file a "prompt template", when it has placeholders ({{...}}) for dynamically populating the template similar to an f-string. This makes files more useful and reusable by others for different use-cases. Once the placeholders in the template are populated with specific variables, we call it a "prompt".

Templating: Jinja2 is the default templating engine for populating the variables in the template.

The following example illustrates how the prompt template becomes a prompt.

>>> # 1. Download a prompt template:
>>> from prompt_templates import ChatPromptTemplate
>>> prompt_template = ChatPromptTemplate.load_from_hub(
...     repo_id="MoritzLaurer/example_prompts",
...     filename="code_teacher.yaml"
... )

>>> # 2. Inspect the template and it's variables:
>>> prompt_template.template
[{'role': 'system', 'content': 'You are a coding assistant who explains concepts clearly and provides short examples.'}, {'role': 'user', 'content': 'Explain what {{concept}} is in {{programming_language}}.'}]
>>> prompt_template.template_variables
['concept', 'programming_language']

>>> # 3. Populate the template with its variables
>>> prompt = prompt_template.populate(
...     concept="list comprehension",
...     programming_language="Python"
... )
>>> print(prompt)
[{'role': 'system', 'content': 'You are a coding assistant who explains concepts clearly and provides short examples.'}, {'role': 'user', 'content': 'Explain what list comprehension is in Python.'}]

Pro/Con prompts as YAML files

Existing prompt hubs use YAML (or JSON): LangChain Hub (see also this); Haystack Prompt Hub
YAML (or JSON) is the standard for working with prompts in production settings in my experience with practitioners. See also this discussion.
Managing individual prompt templates in separate YAML files makes each prompt template an independent modular unit.
- This makes it e.g. easier to add metadata and production-relevant information in the respective prompt YAML file.
- Prompt templates in individual YAML files also enables users to add individual prompts into any HF repo abstraction (Dataset, Model, Space repos), while tabular dataset file types are only compatible with one specific repo type.

Pro/Con JSON files

The same pro arguments of YAML also apply to JSON.
Directly parsable as Python dict, similar to YAML
More verbose to type and less pretty than YAML, but probably more familiar to some users

Pro/Con Jinja2 files

Has more rich functionality for populating prompt templates
Can be directly integrated into YAML or JSON, so can always be added to the common YAML/JSON standard
Issue: allows arbitrary code execution and is less safe
Harder to read for beginners

Pro/Con tabular file formats (e.g. parquet)

Some tabular prompt datasets like awesome-chatgpt-prompts have received many likes on HF
The dataset viewer allows for easy and quick visualization
Main cons: the tabular data format is not well suited for reusing prompt templates and is not standard among practitioners
- Extracting a single prompt from a tabular dataset with dataset/pandas-like operations is unnecessarily complicated.
- In industry practice, prompt templates are independent modular units that can be reused for different use-cases. Having multiple templates in the same dataset forces different templates to have the same column structure and prevents proper modular development.
- Datasets on the HF hub are in parquet files, which are not easily editable. Editing a prompt in JSON or YAML is much easier than editing a (parquet) dataset.
- Data viewers for tabular data are bad for visualizing the structure of long prompt templates (where e.g. line breaks have an important substantive meaning). Viewing and editing prompt templates in markdown-like editors is more standard in the ecosystem.
- Saving prompt templates as datasets prevents them from being modular components of model or space repos (see example use-cases for this)

Compatibility with LangChain

LangChain is a great library for creating interoperability between different LLM clients. This library is inspired by LangChain's PromptTemplate and ChatPromptTemplate classes. One difference is that the LangChain ChatPromptTemplate expects a "messages" key instead of a "template" key for the prompt template in the messages format. This HF library uses the "template" key both for HF TextPromptTemplate and for HF ChatPromptTemplate for simplicity. If you still load a YAML/JSON file with a "messages" key, it will be automatically renamed to "template". You can also always convert a HF PromptTemplate to a LangChain template with .to_langchain_template(). The objective of this library is not to reproduce the full functionality of a library like LangChain, but to enable the community to share prompts on the HF Hub and load and reuse them with any of their favourite libraries.

A PromptTemplate from prompt-templates can be easily converted to a langchain template:

from prompt_templates import ChatPromptTemplate
prompt_template = ChatPromptTemplate.load_from_hub(
    repo_id="MoritzLaurer/example_prompts",
    filename="code_teacher.yaml"
)
prompt_template_langchain = prompt_template.to_langchain_template()

Notes on compatibility with `transformers`

transformers provides partial prompt input standardization via chat_templates following the OpenAI messages format:
- The simplest use is via the text-generation pipeline
- See also details on chat_templates.
Limitations:
- The original purpose of these chat_templates is to easily add special tokens that a specific open-source model requires under the hood. The prompt_templates library is designed for prompt templates for any LLM, not just open-source LLMs.
- VLMs require special pre-processors that are not directly compatible with the standardized messages format (?). And new VLMs like InternVL or Molmo often require non-standardized remote code for image preprocessing.
- LLMs like command-r have cool special prompts e.g. for grounded generation, but they provide their own custom remote code for preparing prompts/functionalities properly for these special prompts.

Existing prompt template repos

LangChain Hub for prompts (main hub is proprietary. See the old public oss repo, using JSON or YAML, with {...} for template variables)
LangGraph Templates (underlying data structure unclear, does not seem to have a collaborative way of sharing templates)
LlamaHub (seems to use GitHub as backend)
Deepset Prompt Hub (seems not maintained anymore, used YAML with {...} for template variables)
distilabel templates and tasks (source) (using pure jinja2 with {{ ... }} for template variables)
Langfuse, see also example here (no public prompt repo, using JSON internally with {{...}} for template variables)
Promptify (not maintained anymore, used jinja1 and {{ ... }} for template variables)