Standardizing and Sharing Tools
What are LLM tools?
Imagine you want to build a financial chatbot. For a good chatbot, it is not enough to just generate convincing text, you might also want it to be able to fetch recent financial information or do calculations. While LLM can only generate text, their text output can be used as input to external code, which does some useful action. This external code is called a "function" or a "tool".
Different companies use slightly different language and implementations for this same idea: OpenAI uses the term "function calling" when text input for a single function is produced and the term "tool use" when an LLM assistant has some autonomy to produce text input for one out of several functions; Anthropic primarily uses the term "tool use" (and function calling as a synonym); similar to Mistral; similar to open-source inference engines like TGI or vLLM, which have converged on OpenAI's API specification. (Note that these APIs all follow the JsonAgent paradigm, which is slightly different to the CodeAgent paradigm)
Main components of tools
LLM tools have the following main components:
- A textual description of the tool, including its inputs and outputs. This description is passed to the LLM's prompt, to enable it to produce text outputs that fit to the tool's description. For closed-source LLM, this integration of the tool description into the prompt is hidden.
- Code that implements the tool. For example a simple Python function taking a search query text as input, does an API call, and returns ranked search results as output.
- A compute environment in which the tool's code is executed. This can e.g. be your local computers' development environment, or docker container running on a cloud CPU.
Current formats for sharing tools
The tutorials of LLM API providers format tools either as Python dictionaries or JSON strings (OpenAI, Anthropic, Mistral, TGI, vLLM), which are integrated into example scripts.
LLM agent libraries all have their own implementations of tools for their library: LangChain Tools, LangChain Community Tools or Agent Toolkits (docs); LlamaHub (docs, docs); CrewAI Tools (docs, including wrapper for using LangChain and LlamaHub tools); AutoGen (docs, including a LangChain tool wrapper); Transformers Agents etc.
As all of these libraries and their tool collections are hosten on GitHub, GitHub has indirectly become the main platform for sharing LLM tools today, although it has not been designed for this purpose.
The main standardizing force for LLM tools are the API specifications and the expected JSON input format of LLM API providers. As OpenAI is the main standard setter, most libraries are compatible with the JSON input format specified in the OpenAI function/tool calling guide and docs. In the field of agents, this has lead to the json agent paradigm. (Note that this requirement of LLM API compatibility is unnecessary in the code agent paradigm, where the LLM writes executable code itself, instead of only writing the structured input for existing code.)
Reflections on the best formats for standardizing tools
The most elegant and universal way of creating a tool is probably a .py file with a function and a doc string (used e.g. by CrewAI, AutoGen, LangChain and Transformers Agents). This combines the executable function code with the textual description of the tool via the doc string in an standardized way.
For JsonAgents, the function's docstring can be parsed to construct the expected input for the LLM API and the API then resturns the required inputs for the .py file. For CodeAgents, the function can directly be passed to the LLM's prompt and the .py file is directly executable.
Alternatively, tools could be shared as .json files, but this would decouple the tool's description (in the .json file) from its code (e.g. in a .py file)
Current implementation in transformers.agents
transformers.agents
currently has Tool.push_to_hub which pushes tools to the hub as a Space. Some tools & prompts have been stored like this here on the Hub. This makes sense if users want a hosted tool with compute. The modularity and interoperability of this approach, however, can probably be improved. Tools as single functions in .py files would be independent units that can be reuse more easily by others and would be more interoperable with other libraries.