AI Agent SDK: Data Extraction Best Practices for Function Tools

Here are some of my personal learnings while working with the Google Agent SDK and building AI Agents using Google Gemini. This article explains best practices for data extraction from tools (also known as `FunctionTool` in the Agent SDK).

The Challenge: Unpredictable LLM Output

When an LLM like Gemini calls one of your defined functions (tools), it populates the function's parameters based on the conversation. Just imagine if you have 10 people trying to make a payment, all of those 10 users may have different ways of interacting and providing the same information during the conversation. A parameter you expect to be a simple String might arrive as a Map, a List, or even a Number. This variability can easily break your tool's logic if you don't handle it gracefully.

Example Scenario: A Payment Agent

Let's consider an agent designed to process payments. It has two primary tools: processAuthorization and processPayment. These tools require various pieces of information like a billing ID, credit card details, and an amount. The agent is expected to gather this information from the user over several turns in a conversation. As you can see in the example below, the getStringFromObject method checks the type of the input and extracts the string value accordingly. This way, your tool can handle unexpected formats without crashing. Even though input specs are parameterized as String, in reality they can be of any type like Map, List, Number etc.

Some of the Best Practices that helped me

1. Implement Resilient and Flexible Parameter Handling

The key to building a robust agent is to create a utility function that can safely extract parameter values regardless of the input type. Instead of assuming a String, your code should be prepared to handle different data structures that the LLM might send. For example, consider the following utility method: getStringFromObject(Optional obj). This method checks if the input is a String, Map, List, or Number, and extracts the string value accordingly. If the input is of an unexpected type, it returns an empty string or a default value. This approach ensures that your tool can gracefully handle a variety of input formats without throwing errors.

2. Provide Specific, Actionable Feedback from Tools

When a tool fails because of missing or malformed parameters, the response should guide the LLM on what to do next. Vague error messages like "Invalid input" are not helpful. Instead, specify exactly which parameter is missing or incorrect and what format is expected. This helps the LLM to correct its output in subsequent calls. For example, if the billingId is missing, the tool's response could be: "Error: Missing required parameter billingId. Please provide a valid billing ID as a string."

3. Instruct the Agent to Review History and Context before re-prompting the user

A common failure mode for agents is re-asking for information that has already been provided by the user. To mitigate this, design your tools to instruct the LLM to first review the conversation history before asking the user for information again. This can be done by including a note in the tool's response, such as: "Please review the previous messages in the conversation to find the required information before asking the user again."

4. Leverage Session State for Persistent Context

For information that need to be persist across multiple turns, consider using session state or a database to store user-provided information. This way, even if the LLM forgets or misinterprets previous inputs, your tool can retrieve the necessary data from a reliable source. Also, store user specific data in session state to avoid repeatedly asking for the same information such as email, name, userId etc. For example, once the user provides their billingId, store it in the session state. Subsequent tool calls can then access this stored value without needing to ask the user again.

5. Explicitly Define Tool-Chaining Logic in Prompts

For a multi step where the output of one tool is the input to another tool, make sure to explicitly define this chaining logic in your prompts. This helps the LLM understand the flow of information and reduces the chances of it skipping necessary steps.

Below is a complete example of a PaymentAgent class that demonstrates this resilient approach. The most important method to study is getStringFromObject, which is designed to safely parse the Optional