Autogen

22/11/2024
What is autogen AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks. AutoGen aims to provide an easy-to-use and flexible framework for accelerating development and research on agentic AI, like PyTorch for Deep Learning. It offers features such as agents that can converse with other agents, LLM and tool use support, autonomous and human-in-the-loop workflows, and multi-agent conversation patterns.

Autogen

Table of Contents

What is autogen

AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks. AutoGen aims to provide an easy-to-use and flexible framework for accelerating development and research on agentic AI, like PyTorch for Deep Learning. It offers features such as agents that can converse with other agents, LLM and tool use support, autonomous and human-in-the-loop workflows, and multi-agent conversation patterns.

Image

Main Features

  • AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
  • It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology.
  • It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.

Install Autogen

AutoGen requires Python version >= 3.8, < 3.13. It can be installed from pip:

pip install autogen-agentchat~=0.2

Agents

In AutoGen, an agent is an entity that can send and receive messages to and from other agents in its environment. An agent can be powered by models (such as a large language model like GPT-4), code executors (such as an IPython kernel), human, or a combination of these and other pluggable and customizable components.

Image 1

LLMs, for example, enable agents to converse in natural languages and transform between structured and unstructured text. The following example shows a ConversableAgent with a GPT-4 LLM switched on and other components switched off:

import os

from autogen import ConversableAgent

agent = ConversableAgent(
    "chatbot",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ.get("OPENAI_API_KEY")}]},
    code_execution_config=False,  # Turn off code execution, by default it is off.
    function_map=None,  # No registered functions, by default it is None.
    human_input_mode="NEVER",  # Never ask for human input.
)

You can ask this agent to generate a response to a question using the generate_reply method:

reply = agent.generate_reply(messages=[{"content": "Tell me a joke.", "role": "user"}])
print(reply)
Sure, here's a light-hearted joke for you:

Why don't scientists trust atoms?

Because they make up everything!

Roles and Conversations

In AutoGen, you can assign roles to agents and have them participate in conversations or chat with each other. A conversation is a sequence of messages exchanged between agents. You can then use these conversations to make progress on a task. For example, in the example below, we assign different roles to two agents by setting their system_message.

cathy = ConversableAgent(
    "cathy",
    system_message="Your name is Cathy and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4", "temperature": 0.9, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
)

joe = ConversableAgent(
    "joe",
    system_message="Your name is Joe and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4", "temperature": 0.7, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
)

Now that we have two comedian agents, we can ask them to start a comedy show. This can be done using the initiate_chat method. We set the max_turns to 2 to keep the conversation short.

result = joe.initiate_chat(cathy, message="Cathy, tell me a joke.", max_turns=2)
joe (to cathy):

Cathy, tell me a joke.

--------------------------------------------------------------------------------
cathy (to joe):

Sure, here's one for you:

Why don't scientists trust atoms?

Because they make up everything!

--------------------------------------------------------------------------------
joe (to cathy):

Haha, that's a good one, Cathy! Okay, my turn. 

Why don't we ever tell secrets on a farm?

Because the potatoes have eyes, the corn has ears, and the beans stalk.

--------------------------------------------------------------------------------
cathy (to joe):

Haha, that's a great one! A farm is definitely not the place for secrets. Okay, my turn again. 

Why couldn't the bicycle stand up by itself?

Because it was two-tired!

--------------------------------------------------------------------------------

Chat Termination

Parameters in initiate_chat

If we increase max_turns to say 3 notice the conversation takes more rounds to terminate:

result = joe.initiate_chat(
    cathy, message="Cathy, tell me a joke.", max_turns=3
)  # increase the number of max turns before termination
joe (to cathy):

Cathy, tell me a joke.

--------------------------------------------------------------------------------
cathy (to joe):

Sure, here's one for you:

Why don't scientists trust atoms?

Because they make up everything!

--------------------------------------------------------------------------------
joe (to cathy):

Haha, that's a good one, Cathy! Okay, my turn. 

Why don't we ever tell secrets on a farm?

Because the potatoes have eyes, the corn has ears, and the beans stalk.

--------------------------------------------------------------------------------
cathy (to joe):

Haha, that's a great one! A farm is definitely not the place for secrets. Okay, my turn again. 

Why couldn't the bicycle stand up by itself?

Because it was two-tired!

--------------------------------------------------------------------------------
joe (to cathy):

Haha, that's a wheely good one, Cathy!

Why did the golfer bring two pairs of pants?

In case he got a hole in one!

--------------------------------------------------------------------------------
cathy (to joe):

Haha, that's a perfect swing of a joke!

Why did the scarecrow win an award?

Because he was outstanding in his field!

--------------------------------------------------------------------------------

Agent-triggered termination

Using max_consecutive_auto_reply

In the example below lets set max_consecutive_auto_reply to 1 and notice how this ensures that Joe only replies once.

joe = ConversableAgent(
    "joe",
    system_message="Your name is Joe and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4", "temperature": 0.7, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
    max_consecutive_auto_reply=1,  # Limit the number of consecutive auto-replies.
)

result = joe.initiate_chat(cathy, message="Cathy, tell me a joke.")
joe (to cathy):

Cathy, tell me a joke.

--------------------------------------------------------------------------------
cathy (to joe):

Sure, here's one for you:

Why don't scientists trust atoms?

Because they make up everything!

--------------------------------------------------------------------------------
joe (to cathy):

Haha, that's a good one, Cathy! Okay, my turn. 

Why don't we ever tell secrets on a farm?

Because the potatoes have eyes, the corn has ears, and the beans stalk.

--------------------------------------------------------------------------------
cathy (to joe):

Haha, that's a great one! A farm is definitely not the place for secrets. Okay, my turn again. 

Why couldn't the bicycle stand up by itself?

Because it was two-tired!

--------------------------------------------------------------------------------

Using is_termination_msg

Let’s set the termination message to “GOOD BYE” and see how the conversation terminates.

joe = ConversableAgent(
    "joe",
    system_message="Your name is Joe and you are a part of a duo of comedians.",
    llm_config={"config_list": [{"model": "gpt-4", "temperature": 0.7, "api_key": os.environ.get("OPENAI_API_KEY")}]},
    human_input_mode="NEVER",  # Never ask for human input.
    is_termination_msg=lambda msg: "good bye" in msg["content"].lower(),
)

result = joe.initiate_chat(cathy, message="Cathy, tell me a joke and then say the words GOOD BYE.")
joe (to cathy):

Cathy, tell me a joke and then say the words GOOD BYE.

--------------------------------------------------------------------------------
cathy (to joe):

Why don't scientists trust atoms?

Because they make up everything!

GOOD BYE!

--------------------------------------------------------------------------------

Human in the Loop

Many applications may require putting humans in-the-loop with agents. For example, to allow human feedback to steer agents in the right direction, specify goals, etc. In this chapter, we will show how AutoGen supports human intervention.

In AutoGen’s ConversableAgent, the human-in-the-loop component sits in front of the auto-reply components. It can intercept the incoming messages and decide whether to pass them to the auto-reply components or to provide human feedback. The figure below illustrates the design.

Image 2

Human Input Modes

Currently AutoGen supports three modes for human input. The mode is specified through the human_input_mode argument of the ConversableAgent. The three modes are:

  1. NEVER: human input is never requested.
  2. TERMINATE (default): human input is only requested when a termination condition is met. Note that in this mode if the human chooses to intercept and reply, the conversation continues and the counter used by max_consecutive_auto_reply is reset.
  3. ALWAYS: human input is always requested and the human can choose to skip and trigger an auto-reply, intercept and provide feedback, or terminate the conversation. Note that in this mode termination based on max_consecutive_auto_reply is ignored.

Human Input Mode = NEVER

Here is an example of using this mode to run a simple guess-a-number game between two agents, the termination message is set to check for the number that is the correct guess.

import os

from autogen import ConversableAgent

agent_with_number = ConversableAgent(
    "agent_with_number",
    system_message="You are playing a game of guess-my-number. You have the "
    "number 53 in your mind, and I will try to guess it. "
    "If I guess too high, say 'too high', if I guess too low, say 'too low'. ",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    is_termination_msg=lambda msg: "53" in msg["content"],  # terminate if the number is guessed by the other agent
    human_input_mode="NEVER",  # never ask for human input
)

agent_guess_number = ConversableAgent(
    "agent_guess_number",
    system_message="I have a number in my mind, and you will try to guess it. "
    "If I say 'too high', you should guess a lower number. If I say 'too low', "
    "you should guess a higher number. ",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

result = agent_with_number.initiate_chat(
    agent_guess_number,
    message="I have a number between 1 and 100. Guess it!",
)
agent_with_number (to agent_guess_number):

I have a number between 1 and 100. Guess it!

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 50?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too low.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 75?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too high.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 63?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too high.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 57?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too high.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 54?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too high.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 52?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Too low.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 53?

--------------------------------------------------------------------------------

Human Input Mode = ALWAYS

In this mode, human input is always requested and the human can choose to skip, intercept , or terminate the conversation. Let us see this mode in action by playing the same game as before with the agent with the number, but this time participating in the game as a human. We will be the agent that is guessing the number, and play against the agent with the number from before.

human_proxy = ConversableAgent(
    "human_proxy",
    llm_config=False,  # no LLM used for human proxy
    human_input_mode="ALWAYS",  # always ask for human input
)

# Start a chat with the agent with number with an initial guess.
result = human_proxy.initiate_chat(
    agent_with_number,  # this is the same agent with the number as before
    message="10",
)
human_proxy (to agent_with_number):

10

--------------------------------------------------------------------------------
agent_with_number (to human_proxy):

Too low.

--------------------------------------------------------------------------------
human_proxy (to agent_with_number):

79

--------------------------------------------------------------------------------
agent_with_number (to human_proxy):

Too high.

--------------------------------------------------------------------------------
human_proxy (to agent_with_number):

76

--------------------------------------------------------------------------------
agent_with_number (to human_proxy):

Too high.

--------------------------------------------------------------------------------
human_proxy (to agent_with_number):

I give up

--------------------------------------------------------------------------------
agent_with_number (to human_proxy):

That's okay! The number I was thinking of was 53.

--------------------------------------------------------------------------------

Human Input Mode = TERMINATE

In this mode, human input is only requested when a termination condition is met. If the human chooses to intercept and reply, the counter will be reset; if the human chooses to skip, the automatic reply mechanism will be used; if the human chooses to terminate, the conversation will be terminated.

agent_with_number = ConversableAgent(
    "agent_with_number",
    system_message="You are playing a game of guess-my-number. "
    "In the first game, you have the "
    "number 53 in your mind, and I will try to guess it. "
    "If I guess too high, say 'too high', if I guess too low, say 'too low'. ",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    max_consecutive_auto_reply=1,  # maximum number of consecutive auto-replies before asking for human input
    is_termination_msg=lambda msg: "53" in msg["content"],  # terminate if the number is guessed by the other agent
    human_input_mode="TERMINATE",  # ask for human input until the game is terminated
)

agent_guess_number = ConversableAgent(
    "agent_guess_number",
    system_message="I have a number in my mind, and you will try to guess it. "
    "If I say 'too high', you should guess a lower number. If I say 'too low', "
    "you should guess a higher number. ",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

result = agent_with_number.initiate_chat(
    agent_guess_number,
    message="I have a number between 1 and 100. Guess it!",
)
agent_with_number (to agent_guess_number):

I have a number between 1 and 100. Guess it!

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 50?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
agent_with_number (to agent_guess_number):

Too low.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 75?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

It is too high my friend. 

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 60?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
agent_with_number (to agent_guess_number):

Too high.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 55?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

still too high, but you are very close.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 52?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
agent_with_number (to agent_guess_number):

Too low.

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 54?

--------------------------------------------------------------------------------
agent_with_number (to agent_guess_number):

Almost there! 

--------------------------------------------------------------------------------
agent_guess_number (to agent_with_number):

Is it 53?

--------------------------------------------------------------------------------

Each time after one auto-reply from the agent with the number, the human was asked to provide feedback. Once the human provided feedback, the counter was reset. The conversation was terminated after the agent correctly guessed “53”.

Code Executors

Overview

In AutoGen, a code executor is a component that takes input messages (e.g., those containing code blocks), performs execution, and outputs messages with the results. AutoGen provides two types of built-in code executors, one is command line code executor, which runs code in a command line environment such as a UNIX shell, and the other is Jupyter executor, which runs code in an interactive Jupyter kernel.

For each type of executor, AutoGen provides two ways to execute code: locally and in a Docker container. One way is to execute code directly in the same host platform where AutoGen is running, i.e., the local operating system. It is for development and testing, but it is not ideal for production as LLM can generate arbitrary code. The other way is to execute code in a Docker container. The table below shows the combinations of code executors and execution environments.

Code Executor (autogen.coding) Environment Platform
LocalCommandLineCodeExecutor Shell Local
DockerCommandLineCodeExecutor Shell Docker
jupyter.JupyterCodeExecutor Jupyter Kernel (e.g., python3) Local/Docker

In this blog, we will focus on the command line code executors. For the Jupyter code executor, please refer to the topic page for Jupyter Code Executor.

Local Execution

Upon receiving a message with a code block, the local command line code executor first writes the code block to a code file, then starts a new subprocess to execute the code file. The executor reads the console output of the code execution and sends it back as a reply message.

Image 3

Here is an example of using the code executor to run a Python code block that prints a random number. First we create an agent with the code executor that uses a temporary directory to store the code files. We specify human_input_mode="ALWAYS" to manually validate the safety of the the code being executed.

import tempfile

from autogen import ConversableAgent
from autogen.coding import LocalCommandLineCodeExecutor

# Create a temporary directory to store the code files.
temp_dir = tempfile.TemporaryDirectory()

# Create a local command line code executor.
executor = LocalCommandLineCodeExecutor(
    timeout=10,  # Timeout for each code execution in seconds.
    work_dir=temp_dir.name,  # Use the temporary directory to store the code files.
)

# Create an agent with code executor configuration.
code_executor_agent = ConversableAgent(
    "code_executor_agent",
    llm_config=False,  # Turn off LLM for this agent.
    code_execution_config={"executor": executor},  # Use the local command line code executor.
    human_input_mode="ALWAYS",  # Always take human input for this agent for safety.
)
message_with_code_block = """This is a message with code block.
The code block is below:
```python
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randint(0, 100, 100)
y = np.random.randint(0, 100, 100)
plt.scatter(x, y)
plt.savefig('scatter.png')
print('Scatter plot saved to scatter.png')
```
This is the end of the message.
"""

# Generate a reply for the given code.
reply = code_executor_agent.generate_reply(messages=[{"role": "user", "content": message_with_code_block}])
print(reply)

During the generation of response, a human input is requested to give an opportunity to intercept the code execution. In this case, we choose to continue the execution, and the agent’s reply contains the output of the code execution.


>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...
exitcode: 0 (execution succeeded)
Code output: 
Scatter plot saved to scatter.png

Docker Execution

To mitigate the security risk of running LLM-generated code locally, we can use the docker command line code executor (autogen.coding.DockerCommandLineCodeExecutor) to execute code in a docker container. This way, the generated code can only access resources that are explicitly given to it.

Image 4

Similar to the local command line code executor, the docker executor extracts code blocks from input messages, writes them to code files. For each code file, it starts a docker container to execute the code file, and reads the console output of the code execution.

To use docker execution, you need to install Docker on your machine. Once you have Docker installed and running, you can set up your code executor agent as follow:

from autogen.coding import DockerCommandLineCodeExecutor

# Create a temporary directory to store the code files.
temp_dir = tempfile.TemporaryDirectory()

# Create a Docker command line code executor.
executor = DockerCommandLineCodeExecutor(
    image="python:3.12-slim",  # Execute code using the given docker image name.
    timeout=10,  # Timeout for each code execution in seconds.
    work_dir=temp_dir.name,  # Use the temporary directory to store the code files.
)

# Create an agent with code executor configuration that uses docker.
code_executor_agent_using_docker = ConversableAgent(
    "code_executor_agent_docker",
    llm_config=False,  # Turn off LLM for this agent.
    code_execution_config={"executor": executor},  # Use the docker command line code executor.
    human_input_mode="ALWAYS",  # Always take human input for this agent for safety.
)

# When the code executor is no longer used, stop it to release the resources.
# executor.stop()

Use Code Execution in Conversation

Writing and executing code is necessary for many tasks such as data analysis, machine learning, and mathematical modeling. In AutoGen, coding can be a conversation between a code writer agent and a code executor agent, mirroring the interaction between a programmer and a code interpreter.

Image 5

The code writer agent can be powered by an LLM such as GPT-4 with code-writing capability. And the code executor agent is powered by a code executor.

The following is an agent with a code writer role specified using system_message. The system message contains important instruction on how to use the code executor in the code executor agent.

# The code writer agent's system message is to instruct the LLM on how to use
# the code executor in the code executor agent.
code_writer_system_message = """You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Reply 'TERMINATE' in the end when everything is done.
"""

code_writer_agent = ConversableAgent(
    "code_writer_agent",
    system_message=code_writer_system_message,
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    code_execution_config=False,  # Turn off code execution for this agent.
)
chat_result = code_executor_agent.initiate_chat(
    code_writer_agent,
    message="Write Python code to calculate the 14th Fibonacci number.",
)
code_executor_agent (to code_writer_agent):

Write Python code to calculate the 14th Fibonacci number.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
code_writer_agent (to code_executor_agent):

Sure, here is a Python code snippet to calculate the 14th Fibonacci number. The Fibonacci series is a sequence of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.

```python
def fibonacci(n):
    if(n <= 0):
        return "Input should be a positive integer."
    elif(n == 1):
        return 0
    elif(n == 2):
        return 1
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib[n-1]

print(fibonacci(14))
```

This Python code defines a function `fibonacci(n)` which computes the n-th Fibonacci number. The function uses a list `fib` to store the Fibonacci numbers as they are computed, and then returns the (n-1)-th element as the n-th Fibonacci number due to zero-indexing in Python lists.

--------------------------------------------------------------------------------

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...
code_executor_agent (to code_writer_agent):

exitcode: 0 (execution succeeded)
Code output: 
233

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
code_writer_agent (to code_executor_agent):

Great, the execution was successful and the 14th Fibonacci number is 233. The sequence goes as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233... and so on, where each number is the sum of the previous two. Therefore, the 14th number in the Fibonacci series is 233. 

I hope this meets your expectations. If you have any other concerns or need further computations, feel free to ask.

TERMINATE

--------------------------------------------------------------------------------

Tool Use

Tools are pre-defined functions that agents can use. Instead of writing arbitrary code, agents can call tools to perform actions, such as searching the web, performing calculations, reading files, or calling remote APIs. Because you can control what tools are available to an agent, you can control what actions an agent can perform.

Creating Tools

Tools can be created as regular Python functions. For example, let’s create a calculator tool which can only perform a single operation at a time.

from typing import Annotated, Literal

Operator = Literal["+", "-", "*", "/"]

def calculator(a: int, b: int, operator: Annotated[Operator, "operator"]) -> int:
    if operator == "+":
        return a + b
    elif operator == "-":
        return a - b
    elif operator == "*":
        return a * b
    elif operator == "/":
        return int(a / b)
    else:
        raise ValueError("Invalid operator")

Registering Tools

Once you have created a tool, you can register it with the agents that are involved in conversation.

import os

from autogen import ConversableAgent

# Let's first define the assistant agent that suggests tool calls.
assistant = ConversableAgent(
    name="Assistant",
    system_message="You are a helpful AI assistant. "
    "You can help with simple calculations. "
    "Return 'TERMINATE' when the task is done.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

# The user proxy agent is used for interacting with the assistant agent
# and executes tool calls.
user_proxy = ConversableAgent(
    name="User",
    llm_config=False,
    is_termination_msg=lambda msg: msg.get("content") is not None and "TERMINATE" in msg["content"],
    human_input_mode="NEVER",
)

# Register the tool signature with the assistant agent.
assistant.register_for_llm(name="calculator", description="A simple calculator")(calculator)

# Register the tool function with the user proxy agent.
user_proxy.register_for_execution(name="calculator")(calculator)

In the above code, we registered the calculator function as a tool with the assistant and user proxy agents. We also provide a name and a description for the tool for the assistant agent to understand its usage.

Always provide a clear and concise description for the tool as it helps the agent's underlying LLM to understand the tool's usage.

Similar to code executors, a tool must be registered with at least two agents for it to be useful in conversation. The agent registered with the tool’s signature through register_for_llm can call the tool; the agent registered with the tool’s function object through register_for_execution can execute the tool’s function.

Alternatively, you can use autogen.register_function function to register a tool with both agents at once.

from autogen import register_function

# Register the calculator function to the two agents.
register_function(
    calculator,
    caller=assistant,  # The assistant agent can suggest calls to the calculator.
    executor=user_proxy,  # The user proxy agent can execute the calculator calls.
    name="calculator",  # By default, the function name is used as the tool name.
    description="A simple calculator",  # A description of the tool.
)

Using Tool

Once the tool is registered, we can use it in conversation. In the code below, we ask the assistant to perform some arithmetic calculation using the calculator tool.

chat_result = user_proxy.initiate_chat(assistant, message="What is (44232 + 13312 / (232 - 32)) * 5?")
User (to Assistant):

What is (44232 + 13312 / (232 - 32)) * 5?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_4rElPoLggOYJmkUutbGaSTX1): calculator *****
Arguments: 
{
  "a": 232,
  "b": 32,
  "operator": "-"
}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_4rElPoLggOYJmkUutbGaSTX1) *****
200
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_SGtr8tK9A4iOCJGdCqkKR2Ov): calculator *****
Arguments: 
{
  "a": 13312,
  "b": 200,
  "operator": "/"
}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_SGtr8tK9A4iOCJGdCqkKR2Ov) *****
66
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_YsR95CM1Ice2GZ7ZoStYXI6M): calculator *****
Arguments: 
{
  "a": 44232,
  "b": 66,
  "operator": "+"
}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_YsR95CM1Ice2GZ7ZoStYXI6M) *****
44298
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_oqZn4rTjyvXYcmjAXkvVaJm1): calculator *****
Arguments: 
{
  "a": 44298,
  "b": 5,
  "operator": "*"
}
***************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_oqZn4rTjyvXYcmjAXkvVaJm1) *****
221490
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

The result of the calculation is 221490. TERMINATE

--------------------------------------------------------------------------------

Conversation Patterns

Overview

  1. Two-agent chat: the simplest form of conversation pattern where two agents chat with each other.
  2. Sequential chat: a sequence of chats between two agents, chained together by a carryover mechanism, which brings the summary of the previous chat to the context of the next chat.
  3. Group Chat: a single chat involving more than two agents. An important question in group chat is: What agent should be next to speak? To support different scenarios, we provide different ways to organize agents in a group chat:
    • We support several strategies to select the next agent: round_robin, random, manual (human selection), and auto (Default, using an LLM to decide).
    • We provide a way to constrain the selection of the next speaker (See examples below).
    • We allow you to pass in a function to customize the selection of the next speaker. With this feature, you can build a StateFlow model which allows a deterministic workflow among your agents. Please refer to this guide and this blog post on StateFlow for more details.
  4. Nested Chat: package a workflow into a single agent for reuse in a larger workflow.

Two-Agent Chat and Chat Result

Two-agent chat is the simplest form of conversation pattern. We start a two-agent chat using the initiate_chat method of every ConversableAgent agent. We have already seen multiple examples of two-agent chats but we haven’t covered the details.

Image 6

A two-agent chats takes two inputs: a message, which is a string provided by the caller; a context, which specifies various parameters of the chat. The sender agent uses its chat initializer method (i.e., generate_init_message method of ConversableAgent) to generate an initial message from the inputs, and sends it to the recipient agent to start the chat. The sender agent is the agent whose initiate_chat method is called, and the recipient agent is the other agent.

Once the chat terminates, the history of the chat is processed by a chat summarizer. The summarizer summarizes the chat history and calculates the token usage of the chat. You can configure the type of summary using the summary_method parameter of the initiate_chat method. By default, it is the last message of the chat (i.e., summary_method='last_msg').

Example:

import os

from autogen import ConversableAgent

student_agent = ConversableAgent(
    name="Student_Agent",
    system_message="You are a student willing to learn.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)
teacher_agent = ConversableAgent(
    name="Teacher_Agent",
    system_message="You are a math teacher.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

chat_result = student_agent.initiate_chat(
    teacher_agent,
    message="What is triangle inequality?",
    summary_method="reflection_with_llm",
    max_turns=2,
)
Student_Agent (to Teacher_Agent):

What is triangle inequality?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Teacher_Agent (to Student_Agent):

Triangle inequality theorem is a fundamental principle in geometry that states that the sum of the lengths of any two sides of a triangle must always be greater than the length of the third side. In a triangle with sides of lengths a, b, and c, the theorem can be written as:

a + b > c
a + c > b
b + c > a

Each of these represents the condition for one specific side (a, b, or c). All must be true for a triangle to exist.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Student_Agent (to Teacher_Agent):

Thank you for the explanation. This theorem helps in understanding the basic properties of a triangle. It can also be useful when solving geometric problems or proving other mathematical theorems. Can you give me an example of how we can use the triangle inequality theorem?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Teacher_Agent (to Student_Agent):

Absolutely! Here's an example:

Suppose you're given three line segments with lengths 10, 7, and 3 units. The question is: "Can these three line segments form a triangle?"

To answer this, you would use the triangle inequality theorem. Adding any two side lengths together should be greater than the third:

- For sides 10 and 7: 10 + 7 = 17, which is larger than 3.
- For sides 10 and 3: 10 + 3 = 13, which is larger than 7.
- For sides 7 and 3: 7 + 3 = 10, which is equal to the length of the third side (10), but not greater.

So, these three lines cannot form a triangle, because not all pairs of sides satisfy the triangle inequality theorem.

--------------------------------------------------------------------------------

Let’s see what the summary looks like. The summary is stored in the chat_result object of the type ChatResult that was returned by the initiate_chat method.

print(chat_result.summary)
The triangle inequality theorem states that in a triangle, the sum of the lengths of any two sides must always be greater than the length of the third side. This principle is
significant in geometry and is used in solving problems or proving theorems. For
instance, if given three line segments, you can determine if they can form a triangle
using this theorem.

In the above example, the summary method is set to reflection_with_llm which takes a list of messages from the conversation and summarize them using a call to an LLM. The summary method first tries to use the recipient’s LLM, if it is not available then it uses the sender’s LLM. In this case the recipient is “Teacher_Agent” and the sender is “Student_Agent”. The input prompt for the LLM is the following default prompt:

print(ConversableAgent.DEFAULT_SUMMARY_PROMPT)
Summarize the takeaway from the conversation. Do not add any introductory phrases.

There are some other useful information in the ChatResult object, including the conversation history, human input, and token cost.

# Get the chat history.
import pprint

pprint.pprint(chat_result.chat_history)
[{'content': 'What is triangle inequality?', 'role': 'assistant'},
 {'content': 'Triangle inequality theorem is a fundamental principle in '
             'geometry that states that the sum of the lengths of any two '
             'sides of a triangle must always be greater than the length of '
             'the third side. In a triangle with sides of lengths a, b, and c, '
             'the theorem can be written as:\n'
             '\n'
             'a + b > c\n'
             'a + c > b\n'
             'b + c > a\n'
             '\n'
             'Each of these represents the condition for one specific side (a, '
             'b, or c). All must be true for a triangle to exist.',
  'role': 'user'},
 {'content': 'Thank you for the explanation. This theorem helps in '
             'understanding the basic properties of a triangle. It can also be '
             'useful when solving geometric problems or proving other '
             'mathematical theorems. Can you give me an example of how we can '
             'use the triangle inequality theorem?',
  'role': 'assistant'},
 {'content': "Absolutely! Here's an example:\n"
             '\n'
             "Suppose you're given three line segments with lengths 10, 7, and "
             '3 units. The question is: "Can these three line segments form a '
             'triangle?"\n'
             '\n'
             'To answer this, you would use the triangle inequality theorem. '
             'Adding any two side lengths together should be greater than the '
             'third:\n'
             '\n'
             '- For sides 10 and 7: 10 + 7 = 17, which is larger than 3.\n'
             '- For sides 10 and 3: 10 + 3 = 13, which is larger than 7.\n'
             '- For sides 7 and 3: 7 + 3 = 10, which is equal to the length of '
             'the third side (10), but not greater.\n'
             '\n'
             'So, these three lines cannot form a triangle, because not all '
             'pairs of sides satisfy the triangle inequality theorem.',
  'role': 'user'}]
# Get the cost of the chat.
pprint.pprint(chat_result.cost)
({'gpt-4-0613': {'completion_tokens': 399,
                 'cost': 0.04521,
                 'prompt_tokens': 709,
                 'total_tokens': 1108},
  'total_cost': 0.04521},
 {'total_cost': 0})

Sequential Chats

Sequential Chats is a sequence of chats between two agents, chained together by a mechanism called carryover, which brings the summary of the previous chat to the context of the next chat.

This pattern is useful for complex task that can be broken down into interdependent sub-tasks. The figure below illustrate how this pattern works.

Image 7

In this pattern, the a pair of agents first start a two-agent chat, then the summary of the conversation becomes a carryover for the next two-agent chat. The next chat passes the carryover to the carryover parameter of the context to generate its initial message.

Carryover accumulates as the conversation moves forward, so each subsequent chat starts with all the carryovers from previous chats.

The figure above shows distinct recipient agents for all the chats, however, the recipient agents in the sequence are allowed to repeat.

To illustrate this pattern, let’s consider a simple example of arithmetic operator agents. One agent (called the “Number_Agent”) is responsible for coming up with a number, and other agents are responsible for performing a specific arithmetic operation on the number, e.g., add 1, multiply by 2, etc..

# The Number Agent always returns the same numbers.
number_agent = ConversableAgent(
    name="Number_Agent",
    system_message="You return me the numbers I give you, one number each line.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

# The Adder Agent adds 1 to each number it receives.
adder_agent = ConversableAgent(
    name="Adder_Agent",
    system_message="You add 1 to each number I give you and return me the new numbers, one number each line.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

# The Multiplier Agent multiplies each number it receives by 2.
multiplier_agent = ConversableAgent(
    name="Multiplier_Agent",
    system_message="You multiply each number I give you by 2 and return me the new numbers, one number each line.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

# The Subtracter Agent subtracts 1 from each number it receives.
subtracter_agent = ConversableAgent(
    name="Subtracter_Agent",
    system_message="You subtract 1 from each number I give you and return me the new numbers, one number each line.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

# The Divider Agent divides each number it receives by 2.
divider_agent = ConversableAgent(
    name="Divider_Agent",
    system_message="You divide each number I give you by 2 and return me the new numbers, one number each line.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

The Number Agent chats with the first operator agent, then the second operator agent, and so on. After each chat, the last message in the conversation (i.e., the result of the arithmetic operation from the operator agent) is used as the summary of the chat. This is specified by the summary_method parameter. In the end we will have the result of the arithmetic operations

# Start a sequence of two-agent chats.
# Each element in the list is a dictionary that specifies the arguments
# for the initiate_chat method.
chat_results = number_agent.initiate_chats(
    [
        {
            "recipient": adder_agent,
            "message": "14",
            "max_turns": 2,
            "summary_method": "last_msg",
        },
        {
            "recipient": multiplier_agent,
            "message": "These are my numbers",
            "max_turns": 2,
            "summary_method": "last_msg",
        },
        {
            "recipient": subtracter_agent,
            "message": "These are my numbers",
            "max_turns": 2,
            "summary_method": "last_msg",
        },
        {
            "recipient": divider_agent,
            "message": "These are my numbers",
            "max_turns": 2,
            "summary_method": "last_msg",
        },
    ]
)

********************************************************************************
Start a new chat with the following message: 
14

With the following carryover: 

********************************************************************************
Number_Agent (to Adder_Agent):

14

--------------------------------------------------------------------------------
Adder_Agent (to Number_Agent):

15

--------------------------------------------------------------------------------
Number_Agent (to Adder_Agent):

15

--------------------------------------------------------------------------------
Adder_Agent (to Number_Agent):

16

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
These are my numbers

With the following carryover: 
16

********************************************************************************
Number_Agent (to Multiplier_Agent):

These are my numbers
Context: 
16

--------------------------------------------------------------------------------
Multiplier_Agent (to Number_Agent):

32

--------------------------------------------------------------------------------
Number_Agent (to Multiplier_Agent):

32

--------------------------------------------------------------------------------
Multiplier_Agent (to Number_Agent):

64

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
These are my numbers

With the following carryover: 
16
64

********************************************************************************
Number_Agent (to Subtracter_Agent):

These are my numbers
Context: 
16
64

--------------------------------------------------------------------------------
Subtracter_Agent (to Number_Agent):

15
63

--------------------------------------------------------------------------------
Number_Agent (to Subtracter_Agent):

15
63

--------------------------------------------------------------------------------
Subtracter_Agent (to Number_Agent):

14
62

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
These are my numbers

With the following carryover: 
16
64
14
62

********************************************************************************
Number_Agent (to Divider_Agent):

These are my numbers
Context: 
16
64
14
62

--------------------------------------------------------------------------------
Divider_Agent (to Number_Agent):

8
32
7
31

--------------------------------------------------------------------------------
Number_Agent (to Divider_Agent):

8
32
7
31

--------------------------------------------------------------------------------
Divider_Agent (to Number_Agent):

4
16
3.5
15.5

--------------------------------------------------------------------------------

First thing to note is that the initiate_chats method takes a list of dictionaries, each dictionary contains the arguments for the initiate_chat method.

Second, each chat in the sequence has a maximum round of 2, as specified with the setting max_turns=2, which means each arithmetic operation is performed twice. So you can see in the first chat the number 14 becomes 15 and then 16, in the second chat the number 16 becomes 32 and then 64, and so on.

Third, the carryover accumulates as the chats go on. In the second chat, the carryover is the summary of the first chat “16”. In the third chat, the carryover is the summary of the first and second chat, which is the list “16” and “64”, and both numbers are operated upon. In the forth and last chat, the carryover is the summary of all previous chats, which is the list “16”, “64”, “14” and “62”, and all of these numbers are operated upon.

The final note is that the initiate_chats method returns a list of ChatResult objects, one for each chat in the sequence.

print("First Chat Summary: ", chat_results[0].summary)
print("Second Chat Summary: ", chat_results[1].summary)
print("Third Chat Summary: ", chat_results[2].summary)
print("Fourth Chat Summary: ", chat_results[3].summary)
First Chat Summary:  16
Second Chat Summary:  64
Third Chat Summary:  14
62
Fourth Chat Summary:  4
16
3.5
15.5

Group Chat

AutoGen provides a more general conversation pattern called group chat, which involves more than two agents. The core idea of group chat is that all agents contribute to a single conversation thread and share the same context. This is useful for tasks that require collaboration among multiple agents.

The figure below illustrates how group chat works.

Image 8

A group chat is orchestrated by a special agent type GroupChatManager. In the first step of the group chat, the Group Chat Manager selects an agent to speak. Then, the selected agent speaks and the message is sent back to the Group Chat Manager, who broadcasts the message to all other agents in the group. This process repeats until the conversation stops.

The Group Chat Manager can use several strategies to select the next agent. Currently, the following strategies are supported:

  1. round_robin: The Group Chat Manager selects agents in a round-robin fashion based on the order of the agents provided.
  2. random: The Group Chat Manager selects agents randomly.
  3. manual: The Group Chat Manager selects agents by asking for human input.
  4. auto: The default strategy, which selects agents using the Group Chat Manager’s LLM.

In this example, we use the auto strategy to select the next agent. To help the Group Chat Manager select the next agent, we also set the description of the agents. Without the description, the Group Chat Manager will use the agents’ system_message, which may be not be the best choice.

# The `description` attribute is a string that describes the agent.
# It can also be set in `ConversableAgent` constructor.
adder_agent.description = "Add 1 to each input number."
multiplier_agent.description = "Multiply each input number by 2."
subtracter_agent.description = "Subtract 1 from each input number."
divider_agent.description = "Divide each input number by 2."
number_agent.description = "Return the numbers given."

We first create a GroupChat object and provide the list of agents. If we were to use the round_robin strategy, this list would specify the order of the agents to be selected. We also initialize the group chat with an empty message list and a maximum round of 6, which means there will be at most 6 iterations of selecting a speaker, agent speaks and broadcasting message.

from autogen import GroupChat

group_chat = GroupChat(
    agents=[adder_agent, multiplier_agent, subtracter_agent, divider_agent, number_agent],
    messages=[],
    max_round=6,
)

Now we create a GroupChatManager object and provide the GroupChat object as input. We also need to specify the llm_config of the Group Chat Manager so it can use the LLM to select the next agent (the auto strategy).

from autogen import GroupChatManager

group_chat_manager = GroupChatManager(
    groupchat=group_chat,
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

Finally, we have the Number Agent from before to start a two-agent chat with the Group Chat Manager, which runs the group chat internally and terminates the two-agent chat when the internal group chat is done. Because the Number Agent is selected to speak by us, it counts as the first round of the group chat.

chat_result = number_agent.initiate_chat(
    group_chat_manager,
    message="My number is 3, I want to turn it into 13.",
    summary_method="reflection_with_llm",
)
Number_Agent (to chat_manager):

My number is 3, I want to turn it into 13.

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

6

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

7

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

14

--------------------------------------------------------------------------------
Subtracter_Agent (to chat_manager):

13

--------------------------------------------------------------------------------
Number_Agent (to chat_manager):

13

--------------------------------------------------------------------------------

You can see that the Number Agent is selected to speak first, then the Group Chat Manager selects the Multiplier Agent to speak, then the Adder Agent, and so on. The number is operated upon by each agent in the group chat, and the final result is 13.

We can take a look at the summary of the group chat, provided by the ChatResult object returned by the initiate_chat method.

print(chat_result.summary)
The agents cooperatively manipulated the initial number (3) through multipliying,
adding, and subtracting operations to reach the target number (13).

Send Introductions

In the previous example, we set the description of the agents to help the Group Chat Manager select the next agent. This only helps the Group Chat Manager, however, does not help the participating agents to know about each other. Sometimes it is useful have each agent introduce themselves to other agents in the group chat. This can be done by setting the send_introductions=True.

group_chat_with_introductions = GroupChat(
    agents=[adder_agent, multiplier_agent, subtracter_agent, divider_agent, number_agent],
    messages=[],
    max_round=6,
    send_introductions=True,
)

Under the hood, the Group Chat Manager sends a message containing the agents’ names and descriptions to all agents in the group chat before the group chat starts.

Group Chat in a Sequential Chat

Group chat can also be used as a part of a sequential chat. In this case, the Group Chat Manager is treated as a regular agent in the sequence of two-agent chats.

# Let's use the group chat with introduction messages created above.
group_chat_manager_with_intros = GroupChatManager(
    groupchat=group_chat_with_introductions,
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

# Start a sequence of two-agent chats between the number agent and
# the group chat manager.
chat_result = number_agent.initiate_chats(
    [
        {
            "recipient": group_chat_manager_with_intros,
            "message": "My number is 3, I want to turn it into 13.",
        },
        {
            "recipient": group_chat_manager_with_intros,
            "message": "Turn this number to 32.",
        },
    ]
)

********************************************************************************
Start a new chat with the following message: 
My number is 3, I want to turn it into 13.

With the following carryover: 

********************************************************************************
Number_Agent (to chat_manager):

My number is 3, I want to turn it into 13.

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

6

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

7

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

14

--------------------------------------------------------------------------------
Subtracter_Agent (to chat_manager):

13

--------------------------------------------------------------------------------
Number_Agent (to chat_manager):

Your number is 13.

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
Turn this number to 32.

With the following carryover: 
Your number is 13.

********************************************************************************
Number_Agent (to chat_manager):

Turn this number to 32.
Context: 
Your number is 13.

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

26

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

14

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

28

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

15

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

30

--------------------------------------------------------------------------------

In the above example, the Group Chat Manager runs the group chat two times. In the first time the number 3 becomes 13, and the last message of this group chat is being used as the carryover for the next group chat, which starts from 13.

Constrained Speaker Selection

Group chat is a powerful conversation pattern, but it can be hard to control if the number of participating agents is large. AutoGen provides a way to constrain the selection of the next speaker by using the allowed_or_disallowed_speaker_transitions argument of the GroupChat class.

The allowed_or_disallowed_speaker_transitions argument is a dictionary that maps a given agent to a list of agents that can (or cannot) be selected to speak next. The speaker_transitions_type argument specifies whether the transitions are allowed or disallowed.

Here is an example:

allowed_transitions = {
    number_agent: [adder_agent, number_agent],
    adder_agent: [multiplier_agent, number_agent],
    subtracter_agent: [divider_agent, number_agent],
    multiplier_agent: [subtracter_agent, number_agent],
    divider_agent: [adder_agent, number_agent],
}

In this example, the allowed transitions are specified for each agent. The Number Agent can be followed by the Adder Agent and the Number Agent, the Adder Agent can be followed by the Multiplier Agent and the Number Agent, and so on. Let’s put this into the group chat and see how it works. The speaker_transitions_type is set to allowed so the transitions are positive constraints.

constrained_graph_chat = GroupChat(
    agents=[adder_agent, multiplier_agent, subtracter_agent, divider_agent, number_agent],
    allowed_or_disallowed_speaker_transitions=allowed_transitions,
    speaker_transitions_type="allowed",
    messages=[],
    max_round=12,
    send_introductions=True,
)

constrained_group_chat_manager = GroupChatManager(
    groupchat=constrained_graph_chat,
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

chat_result = number_agent.initiate_chat(
    constrained_group_chat_manager,
    message="My number is 3, I want to turn it into 10. Once I get to 10, keep it there.",
    summary_method="reflection_with_llm",
)
Number_Agent (to chat_manager):

My number is 3, I want to turn it into 10. Once I get to 10, keep it there.

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

4

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

8

--------------------------------------------------------------------------------
Subtracter_Agent (to chat_manager):

7

--------------------------------------------------------------------------------
Divider_Agent (to chat_manager):

3.5

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

4.5

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

9

--------------------------------------------------------------------------------
Subtracter_Agent (to chat_manager):

8

--------------------------------------------------------------------------------
Divider_Agent (to chat_manager):

4

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

5

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

10

--------------------------------------------------------------------------------
Number_Agent (to chat_manager):

10

--------------------------------------------------------------------------------

Changing the select speaker role name

As part of the Group chat process, when the select_speaker_method is set to ‘auto’ (the default value), a select speaker message is sent to the LLM to determine the next speaker.

Each message in the chat sequence has a role attribute that is typically user, assistant, or system. The select speaker message is the last in the chat sequence when used and, by default, has a role of system.

When using some models, such as Mistral through Mistral.AI’s API, the role on the last message in the chat sequence has to be user.

To change the default behaviour, Autogen provides a way to set the value of the select speaker message’s role to any string value by setting the role_for_select_speaker_messages parameter in the GroupChat’s constructor. The default value is system and by setting it to user you can accommodate the last message role requirement of Mistral.AI’s API.

Nested Chats

The previous conversations patterns (two-agent chat, sequential chat, and group chat) are useful for building complex workflows, however, they do not expose a single conversational interface, which is often needed for scenarios like question-answering bots and personal assistants. In some other cases, it is also useful to package a workflow into a single agent for reuse in a larger workflow. AutoGen provides a way to achieve this by using nested chats.

Nested chats is powered by the nested chats handler, which is a pluggable component of ConversableAgent. The figure below illustrates how the nested chats handler triggers a sequence of nested chats when a message is received.

Image 9

When a message comes in and passes the human-in-the-loop component, the nested chats handler checks if the message should trigger a nested chat based on conditions specified by the user. If the conditions are met, the nested chats handler starts a sequence of nested chats specified using the sequential chats pattern. In each of the nested chats, the sender agent is always the same agent that triggered the nested chats. In the end, the nested chat handler uses the results of the nested chats to produce a response to the original message. By default, the nested chat handler uses the summary of the last chat as the response.

Here is an example of using nested chats to build an arithmetic agent that packages arithmetic operations, code-based validation, and poetry into a single agent. This arithmetic agent takes a number transformation request like “turn number 3 into 13” and returns a poem that describes a transformation attempt.

First we define the agents. We reuse the group_chat_manager_with_intros from previous example to orchestrate the arithmetic operations.

import tempfile

temp_dir = tempfile.gettempdir()

arithmetic_agent = ConversableAgent(
    name="Arithmetic_Agent",
    llm_config=False,
    human_input_mode="ALWAYS",
    # This agent will always require human input to make sure the code is
    # safe to execute.
    code_execution_config={"use_docker": False, "work_dir": temp_dir},
)

code_writer_agent = ConversableAgent(
    name="Code_Writer_Agent",
    system_message="You are a code writer. You write Python script in Markdown code blocks.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

poetry_agent = ConversableAgent(
    name="Poetry_Agent",
    system_message="You are an AI poet.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
    human_input_mode="NEVER",
)

Now we define the nested chats using the sequential chat pattern. All the senders are always artihmetic_agent.

nested_chats = [
    {
        "recipient": group_chat_manager_with_intros,
        "summary_method": "reflection_with_llm",
        "summary_prompt": "Summarize the sequence of operations used to turn " "the source number into target number.",
    },
    {
        "recipient": code_writer_agent,
        "message": "Write a Python script to verify the arithmetic operations is correct.",
        "summary_method": "reflection_with_llm",
    },
    {
        "recipient": poetry_agent,
        "message": "Write a poem about it.",
        "max_turns": 1,
        "summary_method": "last_msg",
    },
]

Now we register the nested chats handler to the arithmetic_agent and set the conditions for triggering the nested chats.

arithmetic_agent.register_nested_chats(
    nested_chats,
    # The trigger function is used to determine if the agent should start the nested chat
    # given the sender agent.
    # In this case, the arithmetic agent will not start the nested chats if the sender is
    # from the nested chats' recipient to avoid recursive calls.
    trigger=lambda sender: sender not in [group_chat_manager_with_intros, code_writer_agent, poetry_agent],
)

Finally, we call generate_reply to get a response from the arithmetic_agent – this will trigger a sequence of nested chats and return the summary of the last nested chat as the response

# Instead of using `initiate_chat` method to start another conversation,
# we can use the `generate_reply` method to get single reply to a message directly.
reply = arithmetic_agent.generate_reply(
    messages=[{"role": "user", "content": "I have a number 3 and I want to turn it into 7."}]
)

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

********************************************************************************
Start a new chat with the following message: 
I have a number 3 and I want to turn it into 7.

With the following carryover: 

********************************************************************************
Arithmetic_Agent (to chat_manager):

I have a number 3 and I want to turn it into 7.

--------------------------------------------------------------------------------
Adder_Agent (to chat_manager):

To give you the result, I'll add 1 to the number you gave me. So your new number is 4.

--------------------------------------------------------------------------------
Multiplier_Agent (to chat_manager):

8

--------------------------------------------------------------------------------
Subtracter_Agent (to chat_manager):

7

--------------------------------------------------------------------------------
Number_Agent (to chat_manager):

7

--------------------------------------------------------------------------------
Number_Agent (to chat_manager):

7

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
Write a Python script to verify the arithmetic operations is correct.

With the following carryover: 
First, 1 was added to the initial number 3 to make it 4. Then it was multiplied by 2
which resulted in 8. Finally, 1 was subtracted from 8 to reach the target number 7.

********************************************************************************
Arithmetic_Agent (to Code_Writer_Agent):

Write a Python script to verify the arithmetic operations is correct.
Context: 
First, 1 was added to the initial number 3 to make it 4. Then it was multiplied by 2
which resulted in 8. Finally, 1 was subtracted from 8 to reach the target number 7.

--------------------------------------------------------------------------------
Code_Writer_Agent (to Arithmetic_Agent):

Here is a Python script to verify the aforementioned arithmetic operations:

```python
# defining the initial value
initial_number = 3

# Adding 1 to initial number
initial_number += 1
assert initial_number == 4, "The first operation failed!"

# Multiplying the result by 2
initial_number *= 2
assert initial_number == 8, "The second operation failed!"

# Subtracting 1 from the result
initial_number -= 1
assert initial_number == 7, "The final operation failed!"

print("All operations were carried out successfully!")
```
In the script, the entire process is broken down into steps. The `assert` function is
used to verify the result at every step. If any of the operations doesn't yield the
expected result, an `AssertionError` exception will be raised. If all operations pass,
the message "All operations were carried out successfully!" will be printed.

--------------------------------------------------------------------------------

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Arithmetic_Agent (to Code_Writer_Agent):

exitcode: 0 (execution succeeded)
Code output: 
All operations were carried out successfully!

--------------------------------------------------------------------------------
Code_Writer_Agent (to Arithmetic_Agent):

Certainly, that means the python script was successful and every arithmetic operation performed correctly given the initial input and the steps performed.

--------------------------------------------------------------------------------

********************************************************************************
Start a new chat with the following message: 
Write a poem about it.

With the following carryover: 
First, 1 was added to the initial number 3 to make it 4. Then it was multiplied by 2 which resulted in 8. Finally, 1 was subtracted from 8 to reach the target number 7.
The Python script successfully performed and verified the arithmetic operations on the initial number provided. The steps included adding 1 to the initial number, multiplying the result by 2, and finally subtracting 1. The assert function was used to check the result at each step, and confirmed that all operations were carried out correctly.

********************************************************************************
Arithmetic_Agent (to Poetry_Agent):

Write a poem about it.
Context: 
First, 1 was added to the initial number 3 to make it 4. Then it was multiplied by 2 which resulted in 8. Finally, 1 was subtracted from 8 to reach the target number 7.
The Python script successfully performed and verified the arithmetic operations on the initial number provided. The steps included adding 1 to the initial number, multiplying the result by 2, and finally subtracting 1. The assert function was used to check the result at each step, and confirmed that all operations were carried out correctly.

--------------------------------------------------------------------------------
Poetry_Agent (to Arithmetic_Agent):

From numbers, logic, pure mathematical creation,
Ponder this tale of numeric manipulation.
In the universe of Python where operations exist,
A story of integers and functions persist.

Three was the number from where we began,
Oblivious to the journey and its grandiosely plan.
Added with 1, the sum it adorned,
A sweet quadruple in the dawn was formed.

The saga continued with a twist of the tale,
The four was multiplied, while the winds wail.
The duo of four unfolded its wings,
An octet presence in our midst it brings.

Then enters subtraction, sly and clever,
Removing one to alter the endeavor.
From eight, subtracted one in delight,
To finally bask in the glow of seven's light.

Each operation, together they conspired,
In this tale of integers, creatively inspired.
Through life's equation, the script ran so free,
Amidst the language of Python, a symphony, you see.

Tested with assert, cross-checked the chain,
Confirming accuracy in program's domain.
Each move calculated, each step so right,
In the maze of coding, found was the light. 

Such is the tale, of numbers and operations, 
A dance among digits, logical iterations,
Just another day, in this AI poet's life,
Cutting through ambiguity, like a razor-sharp knife.

--------------------------------------------------------------------------------

Nested chat is a powerful conversation pattern that allows you to package complex workflows into a single agent. You can hide tool usages within a single agent by having the tool-caller agent starts a nested chat with a tool-executor agent and then use the result of the nested chat to generate a response.

Conclusion

AutoGen is a powerful and flexible framework for building AI agents and orchestrating multi-agent conversations. Its key features include:

  • Easy-to-use API for creating and managing agents
  • Support for various conversation patterns, including group chats and nested chats
  • Integration with LLMs and external tools
  • Human-in-the-loop capabilities
  • Customizable agent behaviors and workflows

By leveraging AutoGen, developers and researchers can rapidly prototype and deploy complex AI systems that involve multiple agents working together to solve tasks. The framework's versatility makes it suitable for a wide range of applications, from simple chatbots to sophisticated AI assistants and autonomous problem-solving systems.

As the field of AI continues to evolve, AutoGen provides a solid foundation for exploring and developing new agent-based AI applications, potentially revolutionizing how we interact with and utilize artificial intelligence in various domains.

Dropdown icon

Blog liên quan

Dropdown icon
Contact Us