AI Agent SDK: Data Extraction Best Practices for Function Tools
Here are some of my personal learnings while working with the Google Agent SDK and building AI Agents using Google Gemini. This article explains best practices for data extraction from tools (also known as `FunctionTool` in the Agent SDK).
The Challenge: Unpredictable LLM Output
When an LLM like Gemini calls one of your defined functions (tools), it populates the function's parameters based on the conversation. Just imagine if you have 10 people trying to make a payment, all of those 10 users may have different ways of interacting and providing the same information during the conversation. A parameter you expect to be a simple `String` might arrive as a `Map`, a `List`, or even a `Number`. This variability can easily break your tool's logic if you don't handle it gracefully.
Example Scenario: A Payment Agent
Let's consider an agent designed to process payments. It has two primary tools: `processAuthorization` and `processPayment`. These tools require various pieces of information like a billing ID, credit card details, and an amount. The agent is expected to gather this information from the user over several turns in a conversation. As you can see in the example below, the `getStringFromObject` method checks the type of the input and extracts the string value accordingly. This way, your tool can handle unexpected formats without crashing even though input specs are pameterized as String, in reality they can be of any type like Map, List, Number etc.
Some of the Best Practices that helped me
1. Implement Resilient and Flexible Parameter Handling
The key to building a robust agent is to create a utility function that can safely extract parameter values regardless of the input type. Instead of assuming a `String`, your code should be prepared to handle different data structures that the LLM might send. For example, consider the following utility method: `getStringFromObject(Optional
2. Provide Specific, Actionable Feedback from Tools
When a tool fails because of missing or malformed parameters, the response should guide the LLM on what to do next. Vague error messages like "Invalid input" are not helpful. Instead, specify exactly which parameter is missing or incorrect and what format is expected. This helps the LLM to correct its output in subsequent calls. For example, if the `billingId` is missing, the tool's response could be: "Error: Missing required parameter 'billingId'. Please provide a valid billing ID as a string."
3. Instruct the Agent to Review History and Context before re-prompting the user
A common failue mode for agents is re-asking fopr infrmation that has already been provided by the user. To mitigate this, design your tools to instruct the LLM to first review the conversation history before asking the user for information again. This can be done by including a note in the tool's response, such as: "Please review the previous messages in the conversation to find the required information before asking the user again."
4. Leverage Session State for Persistent Context
For information that need to be persist across multiple turns, consider using session state or a database to store user-provided information. This way, even if the LLM forgets or misinterprets previous inputs, your tool can retrieve the necessary data from a reliable source. Also, store user specific data in session state to avoid repeatedly asking for the same information such as email, name, userId etc. For example, once the user provides their `billingId`, store it in the session state. Subsequent tool calls can then access this stored value without needing to ask the user again.
5. Explicitly Define Tool-Chaining Logic in Prompts
For a multi step where the output of one tool is the input to another tool, make sure to explicitly define this chaining logic in your prompts. This helps the LLM understand the flow of information and reduces the chances of it skipping necessary steps.
Below is a complete example of a `PaymentAgent` class that demonstrates this resilient approach. The most important method to study is `getStringFromObject`, which is designed to safely parse the `Optional
package com.example.agents;
import com.google.adk.tools.FunctionTool;
import com.google.adk.tools.Tool;
import com.google.cloud.vertexai.api.FunctionDeclaration;
import com.google.cloud.vertexai.api.Schema;
import com.google.cloud.vertexai.api.Type;
import java.util.List;
import java.util.Map;
import java.util.Optional;
public class PaymentAgent {
public List<Tool> getTools() {
return List.of(
new FunctionTool(getProcessAuthorizationFunction()),
new FunctionTool(getProcessPaymentFunction())
);
}
private FunctionDeclaration getProcessAuthorizationFunction() {
return FunctionDeclaration.newBuilder()
.setName("processAuthorization")
.setDescription("Authorizes a payment method for a given billing ID.")
.setParameters(Schema.newBuilder()
.setType(Type.OBJECT)
.putProperties("billingId", Schema.newBuilder().setType(Type.STRING).setDescription("The billing ID to authorize.").build())
.addRequired("billingId")
.build())
.build();
}
private FunctionDeclaration getProcessPaymentFunction() {
return FunctionDeclaration.newBuilder()
.setName("processPayment")
.setDescription("Processes a payment using credit card details.")
.setParameters(Schema.newBuilder()
.setType(Type.OBJECT)
.putProperties("creditCardNumber", Schema.newBuilder().setType(Type.STRING).setDescription("The credit card number.").build())
.putProperties("expiryDate", Schema.newBuilder().setType(Type.STRING).setDescription("The expiry date of the card.").build())
.putProperties("cvv", Schema.newBuilder().setType(Type.STRING).setDescription("The CVV of the card.").build())
.putProperties("amount", Schema.newBuilder().setType(Type.NUMBER).setDescription("The amount to be paid.").build())
.addRequired("creditCardNumber")
.addRequired("expiryDate")
.addRequired("cvv")
.addRequired("amount")
.build())
.build();
}
// The resilient utility method to handle various input types from the LLM
public static String getStringFromObject(Optional<Object> obj) {
if (obj.isEmpty()) {
return "";
}
Object value = obj.get();
if (value instanceof String) {
return (String) value;
}
if (value instanceof Map) {
// If it's a map, try to find a "value" or "text" key, or just convert the whole map
Map<?, ?> map = (Map<?, ?>) value;
if (map.containsKey("value")) {
return String.valueOf(map.get("value"));
}
if (map.containsKey("text")) {
return String.valueOf(map.get("text"));
}
return map.toString();
}
if (value instanceof List) {
// If it's a list, return the first element or the whole list as a string
List<?> list = (List<?>) value;
if (!list.isEmpty()) {
return String.valueOf(list.get(0));
}
return "";
}
// For Numbers or any other type
return String.valueOf(value);
}
}