本文由 简悦 SimpRead 转码, 原文地址 readmedium.com
This will change the way you think about AI and its capabilities
This will change the way you think about AI and its capabilities
As an African proverb states:
Alone, we go faster. Together, we go further.
This also relates to the idea that no one can be an expert in every field. Team work and effective delegation of tasks becomes crucial to achieve great things.
Similar principles applies to Large Language Models (LLMs). Instead of prompting a single LLM to handle a complex task, we can combine multiple LLMs or AI Agents
, each one specializing in a specific area.
This strategy can lead to a more robust system with higher-quality results.
In this article you will learn:
- What
AI Agents
are - Why it is worth considering them to solving real-world use cases
- How to create a complete
AI Agents
system from scratch
Before diving into any coding, let’s have a clear understanding of the main components of the system being built in this article.
- The workflow has overall four agents, and each one has a specialized skillset.
- First, the user’s request is submitted to the system.
-
Agent 1
orVideo Analyzer
performs a deep research on the internet to find relevant information about the user’s request using external tools likeYouTube Channel Search
. The result from that agent is sent to the next agents for further processing. -
Agent 2
orblog post writer
leverages the previous result to write a comprehensive blog post. - Similarly to
Agent 2
,Agent 3
andAgent 4
also create engaging LinkedIn post and Tweets respectively. - The response from both
Agent 2
,Agent 3
, andAgent 4
are saved into different markdown files, which can be used by end users.
If you remember the article I wrote about using LLMs for document parsing, you will notice that this requested tasks is a monolithic, meaning that the LLM tasked with a single goal: Data Extraction
.
Such approach’s limitations because obvious when dealing with more complex, multi-steps tasks. Some of these limitations are illustrated below:
1.Flexibility of task execution
-
Single prompted LLMs
require carefully writing prompts for each task, and may be difficult to update when expectations change from the initial tasks requirements. -
AI Agents
break down those complexities into subtasks, adapt their behavior, without requiring an extensive effort in prompt engineering.
-
Task continuity and context retention
-
Single prompted LLMs
may potentially lose important context from previous interactions. This is because they mainly operates within the constraints of a single conversation turn. -
AI Agents
provides the capability of maintaining context across different interactions, and each agent can refer back to the previous agents response to complete what they are expected to perform.
-
Specialization and Interactions
-
Single prompted LLMs
may have a specialized domain expertise after an extensive fine-tuning, which may be time and financially expensive. -
AI Agents
on the other hand can be designed as a crew of specialized models, where each model focuses on a specific task such asresearcher
,blog writer
,social media expert
.
-
Internet Access
-
Single prompted LLMs
rely on predefined knowledge base, which may not be up to date, leading to hallucinations or limited access. -
AI Agents
can have the ability to access the internet, providing them with the capability to provide more up-to-date information leading to a better decision-making.
In this section, we explore how to leverage an agentic workflow
to create a system that writes blog posts, LinkedIn content, Twitter posts, after an extensive research about the user’s topic.
If you are more of a video oriented person, I will be waiting for you on the other side.
Structure of the code
The code is structured as follows:
project
|
|---Multi_Agents_For_Content_Creation.ipynb
|
data
|
|---- blog-post.md
|---- linkedin-post.md
|---- tweet.md
-
project
folder is the root folder and contains thedata
folder, and the notebook -
data
folder is currently empty but should contain the following three markdown files after the execution of the overall workflow:blog-post.md
,linkedin-post.md
, andtweet.md
- Each markdown file contains the result of the task performed by the the corresponding agent
Now that we have explored what each agent’s role is in the previous sections, let’s see how to actually create them, along with their roles and tasks.
Before that, let set up the prerequisites for us to better implement those roles and tasks.
Prerequisites
The code is run from a Google Colab notebook, and there are only two libraries required for our use case: openai
and crewai[tools]
and they can be installed as follows.
%%bash
pip -qqq install 'crewai[tools]'
pip -qqq install youtube-transcript-api
pip -qqq install yt_dlp
After successfully installing the libraries, the next step is to import the following necessary modules:
import os
from crewai import Agent
from google.colab import userdata
from crewai import Crew, Process
from crewai_tools import YoutubeChannelSearchTool
from crewai import Task
Each agent leverages the OpenAI gpt-4o
model, and we need to set up the access to the model via our OPENAI API KEY
as follows:
OPENAI_API_KEY = userdata.get('OPEN_AI_KEY')
model_ID = userdata.get('GPT_MODEL')
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_MODEL_NAME"] = model_ID
- First we access both the
OPEN_AI_KEY
andGPT_MODEL
from Google colab secrets using the built-inuserdata.get
function. - Then we set up both the
OPEN AI KEY
andgpt-4o
model using theos.environ
function. - After the above two steps, there should not be any issue using the model to create the agents.
Agents and their Roles
With the Agent
class, we can create an agent by mainly providing the following attributes: role
, goal
, backstory
, and memory
.
Only Agent 1
or Video Analyzer
has these additional attributes: tools
and allow_delegation
.
Most of those attributes are self-explanatory, but let’s understand what they mean.
-
role
is like a Job Title, and defines the exact role of an agent. For instanceAgent 1
role isTopic Researcher
-
goal
tells the agent what it’s role is with regards to itsrole
-
backstory
elaborates more on what an agent’s role, by making it specific -
memory
attribute is a boolean. When set toTrue
it allows the agent to remember, reason, and also learn from past interactions. -
tools
is a list of tools used by an agent to perform its task -
allow_delegation
is a boolean telling whether the result of an agent must be delegated to other agents for further processing.
Now, let’s create each one of our agents.
BUT
Before that, we let set up the tool that needs to be used by the first agent to explore my personal YouTube channel.
This is achieved using the YouTubeChannelSearchTool
class, by providing the handle @techwithzoum
.
youtube_tool = YoutubeChannelSearchTool(youtube_channel_handle='@techwithzoum')
- Agent 1 — Topic Researcher
topic_researcher = Agent(
role=
goal=
verbose=True,
memory=True,
backstory=
tools=[youtube_tool],
allow_delegation=True
)
- We start by defining the role of the Agent as a
Topic Researcher
- Then we ask the agent to use the
topic
provided to find relevant videos mentioning that topic. - Finally, our first agent is defined as an expert in finding and analyzing relevant content about AI, Data Science, Machine Learning, and Generative AI topics using the YouTube search tool.
The great thing in the current definition of an agent is exactly the same with minor changes in the attributes’ value. Once understood, then no explanation is required for the next agents’ definition.
2. Agent 2 — Blog Writer
blog_writer = Agent(
role=
goal=
verbose=True,
memory=True,
backstory=
allow_delegation=False
)
3. Agent 3 — LinkedIn Post Creator
# LinkedIn Post Agent
linkedin_post_agent = Agent(
role=
goal=
verbose=True,
memory=True,
backstory=
allow_delegation=False
)
4. Agent 4 — Twitter Post Creator
twitter_agent = Agent(
role=
goal=
verbose=True,
memory=True,
backstory=
allow_delegation=False
)
We notice that none the last three agents delegate their tasks. This is because their result is not processed by other agents.
Perfect! Now our agents are ready to learn about what is expected from them, and that’s performed via the Task
class.
Tasks
A human being performs task to deliver something after receiving the instructions to execute those tasks.
The same applies to agents, and the attributes required for successfully performing those tasks and delivering result are the followings:
-
description
corresponds to the clear description of what needs to be performed by the agent. The clear the description, the better is the output of the model -
expected_output
is the textual description of the result expected from the agent -
agent
is the placeholder for the agent responsible for executing that specific task -
tools
is similar to the definition in therole
section, and not every agent uses a tool. In our case, only theTopic Researcher
uses a tool -
output_file
is the filename and its format. This is only specified for agents that need to have their tasks result in a file, like a markdown, in which case the filename could beblog-post.md
for theBlog Writer
Let’s dive into the Python implementation of those tasks for each agent.
- Agent 1 — Topic Researcher
research_task = Task(
description="Identify and analyze videos on the topic {topic} from the specified YouTube channel.",
expected_output="A complete word by word report on the most relevant video found on the topic {topic}.",
agent=topic_researcher,
tools=[youtube_tool]
)
2. Agent 2 — Blog Writer
blog_writing_task = Task(
description=""" Write a comprehensive blog post based on the transcription provided by the Topic Researcher.
The article must include an introduction , step-by-step guides, and conclusion.
The overall content must be about 1200 words long.""",
expected_output="A markdown-formatted of the blog",
agent=blog_writer,
output_file='./data/blog-post.md'
)
3. Agent 3 — LinkedIn Post Creator
linkedin_post_task = Task(
description="Create a LinkedIn post summarizing the key points from the transcription provided by the Topic Researcher, including relevant hashtags.",
expected_output="A markdown-formatted of the LinkedIn post",
agent=linkedin_post_agent,
output_file='./data/linkedin-post.md'
)
4. Agent 4 — Twitter Post Creator
twitter_task = Task(
description="Create a tweet from the transcription provided by the Topic Researcher, including relevant hastags.",
expected_output="A markdown-formatted of the Twitter post",
agent=twitter_agent,
output_file='./data/tweets.md'
)
For each agent, the attributes are self-explanatory, and the result of the
- first agent is a raw text data used by the last three agents
- second agent is markdown file called
blog-post.md
- third agent is also a markdown file called
linked-post.md
- last agent is the same with the name
tweets.md
This last step orchestrates our agents as a team to properly execute their tasks as follows:
my_crew = Crew(
agents=[topic_researcher, linkedin_post_agent, twitter_agent, blog_writer],
tasks=[research_task, linkedin_post_task, twitter_task, blog_writing_task],
verbose=True,
process=Process.sequential,
memory=True,
cache=True,
max_rpm=100,
share_crew=True
)
-
agents
corresponds to the list of all our agents -
tasks
is the list of the tasks to be performed by each agent - We set verbose to
True
to have the full execution trace -
max_rpm
is the maximum number of requests per minute our crew can perform to avoid rate limits - Finally,
share_crew
means that agents are sharing resources to execute their tasks. This corresponds to the first agent sharing its response with other agents.
After orchestrating the agents, it is time trigger them using the kickoff
function which takes as parameter a dictionary of inputs. Here we are searching about a video I recorded about GPT3.5 Turbo Fine-tuning with Graphical Interface
topic_of_interest = 'GPT3.5 Turbo Fine-tuning and Graphical Interface'
result = my_crew.kickoff(inputs={'topic': topic_of_interest})
The successful execution of the above code generates all the three markdown files that we have specified above.
Here is the result showing the content of each file. As someone who recorded the video, that perfectly corresponds to what was covered in the tutorial.
The blog agent
did not provide a hundred percent step-by-step guide for users to execute the code without facing an issue, but it provides a great understanding of the scope of the tutorial.
For the LinkedIn
and Twitter
posts, the results are brilliant!
You can check the content of all the files within theai_agents_outp
uts folder.
My above assessment of the agents is based on my personal familiarity with the content. However, the need for a more objective and scalable evaluation methods becomes crucial before deploying into production.
Below are some strategies for efficiently evaluating those AI agents.
-
Benchmark testing
can be used to evaluate each agent performance across divers tasks using established datasets likeGLUE
, andFLASK
allowing a standardized comparisons with different state-of-the-art models. -
Factual Accuracy Measurement
evaluates an agent’s capability to provide factual responses to a diverse range of questions across multiple domains. -
Context-Awareness Relevance Scoring
can be used to quantity how well the agent response align with a given prompt.
Multiple frameworks can be leveraged to perform these evaluations, and some of them include:
-
DeepEval
which is an open-source tool for quantifying LLMs’ performance across divers metrics. -
MMLU (Massive Multitask Language Understanding)
is a framework that tests models on divers range of subjects both zero-shot and one-shot settings. -
OpenAI evals
is also a framework to evaluate LLMs or any system built using LLMs.
This article provided a brief overview how to leverage AI agents to effectively accomplish advanced tasks instead of prompting a single Large Language Model to perform the same task.
I hope this short tutorial helped you acquire new skill sets, and the complete code is available on my GitHub, and please subscribe to my YouTube to support my content.
Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $5-a-month commitment, you unlock unlimited access to stories on Medium.
Would you like to buy me a coffee ☕️? → Here you go!
Feel free to follow me on Twitter, or say Hi on LinkedIn. It is always a pleasure to discuss AI stuff!