AI Agents — From Concepts to Practical Implementation in Python

本文由简悦 SimpRead 转码，原文地址 readmedium.com

This will change the way you think about AI and its capabilities

This will change the way you think about AI and its capabilities

As an African proverb states:

Alone, we go faster. Together, we go further.

This also relates to the idea that no one can be an expert in every field. Team work and effective delegation of tasks becomes crucial to achieve great things.

Similar principles applies to Large Language Models (LLMs). Instead of prompting a single LLM to handle a complex task, we can combine multiple LLMs or AI Agents , each one specializing in a specific area.

This strategy can lead to a more robust system with higher-quality results.

In this article you will learn:

What AI Agents are
Why it is worth considering them to solving real-world use cases
How to create a complete AI Agents system from scratch

Before diving into any coding, let’s have a clear understanding of the main components of the system being built in this article.

Autonomous AI Agents workflow (Image by Author)

The workflow has overall four agents, and each one has a specialized skillset.
First, the user’s request is submitted to the system.
Agent 1 or Video Analyzerperforms a deep research on the internet to find relevant information about the user’s request using external tools like YouTube Channel Search . The result from that agent is sent to the next agents for further processing.
Agent 2 or blog post writer leverages the previous result to write a comprehensive blog post.
Similarly to Agent 2 , Agent 3 and Agent 4 also create engaging LinkedIn post and Tweets respectively.
The response from both Agent 2 , Agent 3 , and Agent 4 are saved into different markdown files, which can be used by end users.

If you remember the article I wrote about using LLMs for document parsing, you will notice that this requested tasks is a monolithic, meaning that the LLM tasked with a single goal: Data Extraction.

Such approach’s limitations because obvious when dealing with more complex, multi-steps tasks. Some of these limitations are illustrated below:

1.Flexibility of task execution

Single prompted LLMs require carefully writing prompts for each task, and may be difficult to update when expectations change from the initial tasks requirements.
AI Agents break down those complexities into subtasks, adapt their behavior, without requiring an extensive effort in prompt engineering.

Task continuity and context retention

Single prompted LLMs may potentially lose important context from previous interactions. This is because they mainly operates within the constraints of a single conversation turn.
AI Agents provides the capability of maintaining context across different interactions, and each agent can refer back to the previous agents response to complete what they are expected to perform.

Specialization and Interactions

Single prompted LLMs may have a specialized domain expertise after an extensive fine-tuning, which may be time and financially expensive.
AI Agents on the other hand can be designed as a crew of specialized models, where each model focuses on a specific task such as researcher , blog writer , social media expert .

Internet Access

Single prompted LLMs rely on predefined knowledge base, which may not be up to date, leading to hallucinations or limited access.
AI Agents can have the ability to access the internet, providing them with the capability to provide more up-to-date information leading to a better decision-making.

In this section, we explore how to leverage an agentic workflow to create a system that writes blog posts, LinkedIn content, Twitter posts, after an extensive research about the user’s topic.

If you are more of a video oriented person, I will be waiting for you on the other side.

Structure of the code

The code is structured as follows:

project
   |
   |---Multi_Agents_For_Content_Creation.ipynb
   |
  data
   |
   |---- blog-post.md
   |---- linkedin-post.md
   |---- tweet.md

project folder is the root folder and contains the data folder, and the notebook
data folder is currently empty but should contain the following three markdown files after the execution of the overall workflow: blog-post.md , linkedin-post.md , and tweet.md
Each markdown file contains the result of the task performed by the the corresponding agent

Now that we have explored what each agent’s role is in the previous sections, let’s see how to actually create them, along with their roles and tasks.

Before that, let set up the prerequisites for us to better implement those roles and tasks.

Prerequisites

The code is run from a Google Colab notebook, and there are only two libraries required for our use case: openai and crewai[tools] and they can be installed as follows.

%%bash
pip -qqq install 'crewai[tools]'
pip -qqq install youtube-transcript-api
pip -qqq install yt_dlp

After successfully installing the libraries, the next step is to import the following necessary modules:

import os
from crewai import Agent
from google.colab import userdata
from crewai import Crew, Process
from crewai_tools import YoutubeChannelSearchTool
from crewai import Task

Each agent leverages the OpenAI gpt-4o model, and we need to set up the access to the model via our OPENAI API KEY as follows:

OPENAI_API_KEY = userdata.get('OPEN_AI_KEY')
model_ID = userdata.get('GPT_MODEL')
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_MODEL_NAME"] = model_ID

Environment variables and their values (Image by Author)

First we access both the OPEN_AI_KEY and GPT_MODEL from Google colab secrets using the built-in userdata.get function.
Then we set up both the OPEN AI KEY and gpt-4o model using the os.environ function.
After the above two steps, there should not be any issue using the model to create the agents.

Agents and their Roles

With the Agent class, we can create an agent by mainly providing the following attributes: role , goal , backstory , and memory .

Only Agent 1 or Video Analyzerhas these additional attributes: tools and allow_delegation .

Most of those attributes are self-explanatory, but let’s understand what they mean.

role is like a Job Title, and defines the exact role of an agent. For instance Agent 1 role is Topic Researcher
goal tells the agent what it’s role is with regards to its role
backstory elaborates more on what an agent’s role, by making it specific
memory attribute is a boolean. When set to True it allows the agent to remember, reason, and also learn from past interactions.
tools is a list of tools used by an agent to perform its task
allow_delegation is a boolean telling whether the result of an agent must be delegated to other agents for further processing.

Now, let’s create each one of our agents.

BUT

Before that, we let set up the tool that needs to be used by the first agent to explore my personal YouTube channel.

This is achieved using the YouTubeChannelSearchTool class, by providing the handle @techwithzoum .

youtube_tool = YoutubeChannelSearchTool(youtube_channel_handle='@techwithzoum')

Agent 1 — Topic Researcher

topic_researcher = Agent(
    role=
    goal=
    verbose=True,
    memory=True,
    backstory=
    tools=[youtube_tool],
    allow_delegation=True
)

We start by defining the role of the Agent as a Topic Researcher
Then we ask the agent to use the topic provided to find relevant videos mentioning that topic.
Finally, our first agent is defined as an expert in finding and analyzing relevant content about AI, Data Science, Machine Learning, and Generative AI topics using the YouTube search tool.

The great thing in the current definition of an agent is exactly the same with minor changes in the attributes’ value. Once understood, then no explanation is required for the next agents’ definition.

2. Agent 2 — Blog Writer

blog_writer = Agent(
    role=
    goal=
    verbose=True,
    memory=True,
    backstory=
    allow_delegation=False
)

3. Agent 3 — LinkedIn Post Creator

# LinkedIn Post Agent
linkedin_post_agent = Agent(
    role=
    goal=
    verbose=True,
    memory=True,
    backstory=
    allow_delegation=False
)

4. Agent 4 — Twitter Post Creator

twitter_agent = Agent(
    role=
    goal=
    verbose=True,
    memory=True,
    backstory=
    allow_delegation=False
)

We notice that none the last three agents delegate their tasks. This is because their result is not processed by other agents.

Perfect! Now our agents are ready to learn about what is expected from them, and that’s performed via the Task class.

Tasks

A human being performs task to deliver something after receiving the instructions to execute those tasks.

The same applies to agents, and the attributes required for successfully performing those tasks and delivering result are the followings:

description corresponds to the clear description of what needs to be performed by the agent. The clear the description, the better is the output of the model
expected_output is the textual description of the result expected from the agent
agent is the placeholder for the agent responsible for executing that specific task
tools is similar to the definition in the role section, and not every agent uses a tool. In our case, only the Topic Researcher uses a tool
output_file is the filename and its format. This is only specified for agents that need to have their tasks result in a file, like a markdown, in which case the filename could be blog-post.md for the Blog Writer

Let’s dive into the Python implementation of those tasks for each agent.

Agent 1 — Topic Researcher

research_task = Task(
    description="Identify and analyze videos on the topic {topic} from the specified YouTube channel.",
    expected_output="A complete word by word report on the most relevant video found on the topic {topic}.",
    agent=topic_researcher,
    tools=[youtube_tool]
)

2. Agent 2 — Blog Writer

blog_writing_task = Task(
    description=""" Write a comprehensive blog post based on the transcription provided by the Topic Researcher.
                    The article must include an introduction , step-by-step guides, and conclusion.
                    The overall content must be about 1200 words long.""",
    expected_output="A markdown-formatted of the blog",
    agent=blog_writer,
    output_file='./data/blog-post.md'
)

3. Agent 3 — LinkedIn Post Creator

linkedin_post_task = Task(
    description="Create a LinkedIn post summarizing the key points from the transcription provided by the Topic Researcher, including relevant hashtags.",
    expected_output="A markdown-formatted of the LinkedIn post",
    agent=linkedin_post_agent,
    output_file='./data/linkedin-post.md'
)

4. Agent 4 — Twitter Post Creator

twitter_task = Task(
    description="Create a tweet from the transcription provided by the Topic Researcher, including relevant hastags.",
    expected_output="A markdown-formatted of the Twitter post",
    agent=twitter_agent,
    output_file='./data/tweets.md'
)

For each agent, the attributes are self-explanatory, and the result of the

first agent is a raw text data used by the last three agents
second agent is markdown file called blog-post.md
third agent is also a markdown file called linked-post.md
last agent is the same with the name tweets.md

This last step orchestrates our agents as a team to properly execute their tasks as follows:

my_crew = Crew(
    agents=[topic_researcher, linkedin_post_agent, twitter_agent, blog_writer],
    tasks=[research_task, linkedin_post_task, twitter_task, blog_writing_task],
    verbose=True,
    process=Process.sequential,
    memory=True,
    cache=True,
    max_rpm=100,
    share_crew=True
)

agents corresponds to the list of all our agents
tasks is the list of the tasks to be performed by each agent
We set verbose to True to have the full execution trace
max_rpm is the maximum number of requests per minute our crew can perform to avoid rate limits
Finally, share_crew means that agents are sharing resources to execute their tasks. This corresponds to the first agent sharing its response with other agents.

After orchestrating the agents, it is time trigger them using the kickoff function which takes as parameter a dictionary of inputs. Here we are searching about a video I recorded about GPT3.5 Turbo Fine-tuning with Graphical Interface

topic_of_interest = 'GPT3.5 Turbo Fine-tuning and Graphical Interface'
result = my_crew.kickoff(inputs={'topic': topic_of_interest})

The successful execution of the above code generates all the three markdown files that we have specified above.

Here is the result showing the content of each file. As someone who recorded the video, that perfectly corresponds to what was covered in the tutorial.

The blog agent did not provide a hundred percent step-by-step guide for users to execute the code without facing an issue, but it provides a great understanding of the scope of the tutorial.

For the LinkedIn and Twitter posts, the results are brilliant!

You can check the content of all the files within theai_agents_outputs folder.

My above assessment of the agents is based on my personal familiarity with the content. However, the need for a more objective and scalable evaluation methods becomes crucial before deploying into production.

Below are some strategies for efficiently evaluating those AI agents.

Benchmark testing can be used to evaluate each agent performance across divers tasks using established datasets like GLUE , and FLASK allowing a standardized comparisons with different state-of-the-art models.
Factual Accuracy Measurement evaluates an agent’s capability to provide factual responses to a diverse range of questions across multiple domains.
Context-Awareness Relevance Scoring can be used to quantity how well the agent response align with a given prompt.

Multiple frameworks can be leveraged to perform these evaluations, and some of them include:

DeepEval which is an open-source tool for quantifying LLMs’ performance across divers metrics.
MMLU (Massive Multitask Language Understanding) is a framework that tests models on divers range of subjects both zero-shot and one-shot settings.
OpenAI evals is also a framework to evaluate LLMs or any system built using LLMs.

This article provided a brief overview how to leverage AI agents to effectively accomplish advanced tasks instead of prompting a single Large Language Model to perform the same task.

I hope this short tutorial helped you acquire new skill sets, and the complete code is available on my GitHub, and please subscribe to my YouTube to support my content.

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $5-a-month commitment, you unlock unlimited access to stories on Medium.

Would you like to buy me a coffee ☕️? → Here you go!

Feel free to follow me on Twitter, or say Hi on LinkedIn. It is always a pleasure to discuss AI stuff!

This will change the way you think about AI and its capabilities

1.Flexibility of task execution

Task continuity and context retention

Specialization and Interactions

Internet Access

Structure of the code

Prerequisites

Agents and their Roles

Tasks

发送评论 编辑评论

发送评论编辑评论