Part 3: Developing and Configuring the Gemini Agent
This is the core of our project where we bring the AI to life. We will use the Vertex AI Gemini API with the Java SDK to create our agent, provide it with the function declarations from Part 2, and process its responses.
This document provides a comprehensive guide to understanding, running, and deploying a sophisticated conversational agent for scheduling medical appointments. The application is built with Java/Spring Boot and leverages the power of Google's Gemini large language model for natural language understanding and function calling. It is designed for a seamless, automated deployment to Google Cloud Run using Cloud Build.
The core of this project is a smart agent that can understand user requests in plain English, ask clarifying questions, and perform actions on behalf of the user by calling external APIs.
Objective
The primary goal of this project is to provide a conversational AI assistant that helps users book doctor appointments. Unlike a traditional REST API with fixed endpoints, this agent engages in a multi-turn conversation to gather all necessary information before completing a task.
Its core capabilities, driven by Gemini's function-calling feature, include:
- New Member Onboarding: If a user is new, the agent can create a member profile by asking for their name and email.
- Finding Available Appointments: The agent can search for open appointment slots based on a user's location (zip code).
- Scheduling and Confirmation: Once a user confirms a time slot, the agent books the appointment and provides a confirmation.
- Contextual Conversation: The agent maintains conversation history to provide a seamless and intelligent user experience, picking up where the user left off.
How It Works: Conversational AI with Function Calling
This application is not just a set of API endpoints. It's an intelligent layer that sits between the user and your backend services.
- User Prompt: A user sends a natural language prompt, like "I'd like to book an appointment for next Tuesday."
- Gemini's Intelligence: The prompt, along with the conversation history, is sent to the
Gemini model. The model has been given a "system instruction" (as seen in
DataBroker.java
) that tells it how to behave and what tools it has at its disposal. - Function Calling: Based on the user's request, Gemini determines that it needs to
perform an action. Instead of just replying with text, it generates a "function call"—a structured
request to one of the predefined tools. For example, it might decide to call the
get_available_slots
function. - Executing the Function: The Java application receives this function call, executes the corresponding business logic (i.e., calls your actual backend API for appointment slots), and gets a result.
- Responding to the Model: The result of the function call is sent back to Gemini.
- Natural Language Response: Gemini processes the function's result and formulates a natural, human-readable response for the user, such as "I see a few available slots for next Tuesday at 10:00 AM and 2:00 PM. Which one works for you?"
This loop continues until the user's goal is achieved.
Prerequisite APIs: The "Tools" for the Agent
For the agent to function, it relies on a set of pre-existing backend APIs that it can call. These are the
"tools" it uses to perform tasks. In this project, these tools are defined in
FunctionsDefinitions.java
and must be implemented as actual, callable API endpoints that your
Spring Boot application can reach.
The agent requires the following APIs to be available:
create_member(firstName, lastName, email)
: Creates a new member and returns a unique member ID.get_available_slots(zipCode)
: Finds and returns a list of open appointment slots for a given zip code.schedule_appointment(member_id, firstName, lastName, email, ...)
: Books a confirmed appointment slot for the user and returns a confirmation number.
Without these underlying APIs, the agent will know what to do but will have no way to actually do it.
Tech Stack
- Framework: Java 21 & Spring Boot 3
- AI/LLM: Google Cloud Vertex AI (Gemini 1.5 Flash)
- Build Tool: Maven
- Containerization: Docker
- Deployment: Google Cloud Run, Google Cloud Build
- Secrets Management: Google Secret Manager
Prerequisites
Before you begin, ensure you have the following tools installed and configured.
Local Tools
- Java 21+: The project is built on Java 21.
- Maven: To manage dependencies and build the project.
- Docker: To containerize the application.
- Google Cloud SDK: To interact with your Google Cloud project. Installation Guide.
Google Cloud Setup
- GCP Project: Create a new Google Cloud Project or use an existing one. Make sure billing is enabled.
- Set Project ID: Set your project ID in your terminal to simplify
gcloud
commands.export PROJECT_ID="your-gcp-project-id" gcloud config set project $PROJECT_ID
- Enable APIs: Enable the necessary APIs for the project to function.
gcloud services enable run.googleapis.com \ cloudbuild.googleapis.com \ secretmanager.googleapis.com \ aiplatform.googleapis.com \ iam.googleapis.com
Running Locally
To run the application locally for development, you'll need to provide the necessary environment variables that are normally injected by Cloud Run.
- Clone the Repository
git clone <your-repository-url> cd vertex-multi-functions
- Authenticate for Local Development
The application uses Application Default Credentials (ADC) to connect to Google Cloud services like Vertex AI.
gcloud auth application-default login
- Set Environment Variables
Set the environment variables required by the application. These are the same variables defined in
cloudbuild.yaml
.export PROJECT_ID="your-gcp-project-id" export GS_BUCKET="your-gcs-bucket-for-history" export THREAD_SLEEP_TIME="1500" export ATLAS_URI="your-atlas-uri-value" # ... set all other required environment variables
- Run the Application
Use the Maven wrapper to run the application. The
Dockerfile
specifies thedev
profile, so you can activate it locally as well../mvnw spring-boot:run -Dspring-boot.run.profiles=dev
The API will be running at
http://localhost:8080
.
Deployment to Google Cloud Run
We use Google Cloud Build to automate the process of building the container image and deploying it to Cloud Run. This is the recommended approach for CI/CD.
The One-Command Deploy
The
cloudbuild.yaml
file in the root of the project defines the entire build and deployment pipeline. To trigger it, simply run the following command from your project's root directory:gcloud builds submit .
What's Happening Behind the Scenes?
When you run
gcloud builds submit
, Cloud Build orchestrates the following steps defined incloudbuild.yaml
:- Upload: Your code is packaged and uploaded to a Cloud Storage bucket.
- Build: Cloud Build starts a new build process.
- Step 1 - Docker Build: Cloud Build uses the Dockerfile to execute a multi-stage build. It compiles the Java code and packages it into a lean, production-ready container image.
- Step 2 - Docker Push: The newly built image is tagged and pushed to Google Container Registry (GCR).
- Step 3 - GCloud Deploy: Cloud Build uses the gcloud command-line tool to deploy
the
container image from GCR to Cloud Run. During this step, it performs several critical
configurations:
- Sets the service name to
agent-service
. - Injects secrets from Secret Manager as environment variables (e.g.,
ATLAS_URI
). - Injects plain-text environment variables (e.g.,
PROJECT_ID
). - Configures the service to be publicly accessible (
--allow-unauthenticated
).
- Sets the service name to
Post-Deployment
Find Your Service URL: After the deployment succeeds, Cloud Build will print the URL of your service. You can also retrieve it anytime with:
gcloud run services describe agent-service \ --platform managed \ --region us-central1 \ --format "value(status.url)"
Testing the Deployed Agent: Interact with your agent by sending a POST request to its chat endpoint. Replace
[SERVICE_URL]
with the URL you obtained.# Start a conversation curl -X POST "[SERVICE_URL]/api/v1/chat" \ -H "Content-Type: application/json" \ -d { "prompt": "Hi, I need to make an appointment.", "id": "user-session-12345" } # The agent might respond asking for your name. Continue the conversation: curl -X POST "[SERVICE_URL]/api/v1/chat" \ -H "Content-Type: application/json" \ -d { "prompt": "My name is Jane Doe and my email is jane.doe@example.com", "id": "user-session-12345" }
Troubleshooting
IAM Policy Errors: If the deployment fails with an IAM error, it might mean the Cloud Build service account (
[PROJECT_NUMBER]@cloudbuild.gserviceaccount.com
) doesn't have permission to deploy to Cloud Run or act as the Cloud Run service account. Grant theCloud Run Admin
(roles/run.admin
) andService Account User
(roles/iam.serviceAccountUser
) roles to the Cloud Build service account in the IAM console.Service Not Accessible: If the service deploys but you can't access the URL, ensure the IAM policy allows public access.
gcloud run services add-iam-policy-binding agent-service \ --region=us-central1 \ --member="allUsers" \ --role="roles/run.invoker
Java SDK Example
Below is a conceptual example of how to configure the model with tools. This code would typically live in a backend service that communicates with the Gemini API.
// Define the tool (function) the model can call FunctionDeclaration getSlots = FunctionDeclaration.newBuilder() .setName("get_available_slots") .setDescription("Get a list of available appointment slots for a given date.") .setParameters( Schema.newBuilder() .setType(Type.OBJECT) .putProperties("date", Schema.newBuilder().setType(Type.STRING).setDescription("Date in YYYY-MM-DD format").build()) .addRequired("date") .build()) .build(); // Create the model with the tool configured GenerativeModel model = new GenerativeModel.Builder() .setModelName("gemini-1.5-pro") .setTools(List.of(Tool.newBuilder().addFunctionDeclarations(getSlots).build())) .build();
Once the model responds with a function call, our backend code will execute that call against our Spring Boot API, get the result, and send it back to the model so it can formulate a natural language response for the user.
This concludes the setup for the AI Agent. In the next part, we will build the frontend application in Angular that will allow users to interact with the AI Agent and schedule appointments.
- Clone the Repository