Part 3: Developing and Configuring the Gemini Agent

chat_bubble_outline 0 responses 7 min read

This is the core of our project where we bring the AI to life. We will use the Vertex AI Gemini API with the Java SDK to create our agent, provide it with the function declarations from Part 2, and process its responses.

This document provides a comprehensive guide to understanding, running, and deploying a sophisticated conversational agent for scheduling medical appointments. The application is built with Java/Spring Boot and leverages the power of Google's Gemini large language model for natural language understanding and function calling. It is designed for a seamless, automated deployment to Google Cloud Run using Cloud Build.

The core of this project is a smart agent that can understand user requests in plain English, ask clarifying questions, and perform actions on behalf of the user by calling external APIs.

Objective

The primary goal of this project is to provide a conversational AI assistant that helps users book doctor appointments. Unlike a traditional REST API with fixed endpoints, this agent engages in a multi-turn conversation to gather all necessary information before completing a task.

Its core capabilities, driven by Gemini's function-calling feature, include:

  • New Member Onboarding: If a user is new, the agent can create a member profile by asking for their name and email.
  • Finding Available Appointments: The agent can search for open appointment slots based on a user's location (zip code).
  • Scheduling and Confirmation: Once a user confirms a time slot, the agent books the appointment and provides a confirmation.
  • Contextual Conversation: The agent maintains conversation history to provide a seamless and intelligent user experience, picking up where the user left off.

How It Works: Conversational AI with Function Calling

This application is not just a set of API endpoints. It's an intelligent layer that sits between the user and your backend services.

  1. User Prompt: A user sends a natural language prompt, like "I'd like to book an appointment for next Tuesday."
  2. Gemini's Intelligence: The prompt, along with the conversation history, is sent to the Gemini model. The model has been given a "system instruction" (as seen in DataBroker.java) that tells it how to behave and what tools it has at its disposal.
  3. Function Calling: Based on the user's request, Gemini determines that it needs to perform an action. Instead of just replying with text, it generates a "function call"—a structured request to one of the predefined tools. For example, it might decide to call the get_available_slots function.
  4. Executing the Function: The Java application receives this function call, executes the corresponding business logic (i.e., calls your actual backend API for appointment slots), and gets a result.
  5. Responding to the Model: The result of the function call is sent back to Gemini.
  6. Natural Language Response: Gemini processes the function's result and formulates a natural, human-readable response for the user, such as "I see a few available slots for next Tuesday at 10:00 AM and 2:00 PM. Which one works for you?"

This loop continues until the user's goal is achieved.

Prerequisite APIs: The "Tools" for the Agent

For the agent to function, it relies on a set of pre-existing backend APIs that it can call. These are the "tools" it uses to perform tasks. In this project, these tools are defined in FunctionsDefinitions.java and must be implemented as actual, callable API endpoints that your Spring Boot application can reach.

The agent requires the following APIs to be available:

  • create_member(firstName, lastName, email): Creates a new member and returns a unique member ID.
  • get_available_slots(zipCode): Finds and returns a list of open appointment slots for a given zip code.
  • schedule_appointment(member_id, firstName, lastName, email, ...): Books a confirmed appointment slot for the user and returns a confirmation number.

Without these underlying APIs, the agent will know what to do but will have no way to actually do it.

Tech Stack

  • Framework: Java 21 & Spring Boot 3
  • AI/LLM: Google Cloud Vertex AI (Gemini 1.5 Flash)
  • Build Tool: Maven
  • Containerization: Docker
  • Deployment: Google Cloud Run, Google Cloud Build
  • Secrets Management: Google Secret Manager

Prerequisites

Before you begin, ensure you have the following tools installed and configured.

Local Tools

  • Java 21+: The project is built on Java 21.
  • Maven: To manage dependencies and build the project.
  • Docker: To containerize the application.
  • Google Cloud SDK: To interact with your Google Cloud project. Installation Guide.

Google Cloud Setup

  1. GCP Project: Create a new Google Cloud Project or use an existing one. Make sure billing is enabled.
  2. Set Project ID: Set your project ID in your terminal to simplify gcloud commands.
    
    export PROJECT_ID="your-gcp-project-id"
    gcloud config set project $PROJECT_ID
    
  3. Enable APIs: Enable the necessary APIs for the project to function.
    
    gcloud services enable run.googleapis.com \
        cloudbuild.googleapis.com \
        secretmanager.googleapis.com \
        aiplatform.googleapis.com \
        iam.googleapis.com
    

    Running Locally

    To run the application locally for development, you'll need to provide the necessary environment variables that are normally injected by Cloud Run.

    1. Clone the Repository
      
      git clone <your-repository-url>
      cd vertex-multi-functions
      
    2. Authenticate for Local Development

      The application uses Application Default Credentials (ADC) to connect to Google Cloud services like Vertex AI.

      
      gcloud auth application-default login
      
    3. Set Environment Variables

      Set the environment variables required by the application. These are the same variables defined in cloudbuild.yaml.

      
      export PROJECT_ID="your-gcp-project-id"
      export GS_BUCKET="your-gcs-bucket-for-history"
      export THREAD_SLEEP_TIME="1500"
      export ATLAS_URI="your-atlas-uri-value"
      # ... set all other required environment variables
      
    4. Run the Application

      Use the Maven wrapper to run the application. The Dockerfile specifies the dev profile, so you can activate it locally as well.

      
      ./mvnw spring-boot:run -Dspring-boot.run.profiles=dev
      

      The API will be running at http://localhost:8080.

    Deployment to Google Cloud Run

    We use Google Cloud Build to automate the process of building the container image and deploying it to Cloud Run. This is the recommended approach for CI/CD.

    The One-Command Deploy

    The cloudbuild.yaml file in the root of the project defines the entire build and deployment pipeline. To trigger it, simply run the following command from your project's root directory:

    
    gcloud builds submit .
    

    What's Happening Behind the Scenes?

    When you run gcloud builds submit, Cloud Build orchestrates the following steps defined in cloudbuild.yaml:

    1. Upload: Your code is packaged and uploaded to a Cloud Storage bucket.
    2. Build: Cloud Build starts a new build process.
    3. Step 1 - Docker Build: Cloud Build uses the Dockerfile to execute a multi-stage build. It compiles the Java code and packages it into a lean, production-ready container image.
    4. Step 2 - Docker Push: The newly built image is tagged and pushed to Google Container Registry (GCR).
    5. Step 3 - GCloud Deploy: Cloud Build uses the gcloud command-line tool to deploy the container image from GCR to Cloud Run. During this step, it performs several critical configurations:
      • Sets the service name to agent-service.
      • Injects secrets from Secret Manager as environment variables (e.g., ATLAS_URI).
      • Injects plain-text environment variables (e.g., PROJECT_ID).
      • Configures the service to be publicly accessible (--allow-unauthenticated).

    Post-Deployment

    Find Your Service URL: After the deployment succeeds, Cloud Build will print the URL of your service. You can also retrieve it anytime with:

    
    gcloud run services describe agent-service \
      --platform managed \
      --region us-central1 \
      --format "value(status.url)"
    

    Testing the Deployed Agent: Interact with your agent by sending a POST request to its chat endpoint. Replace [SERVICE_URL] with the URL you obtained.

    
    # Start a conversation
    curl -X POST "[SERVICE_URL]/api/v1/chat" \
    -H "Content-Type: application/json" \
    -d {
      "prompt": "Hi, I need to make an appointment.",
      "id": "user-session-12345"
    }
    
    # The agent might respond asking for your name. Continue the conversation:
    curl -X POST "[SERVICE_URL]/api/v1/chat" \
    -H "Content-Type: application/json" \
    -d {
      "prompt": "My name is Jane Doe and my email is jane.doe@example.com",
      "id": "user-session-12345"
    }
    

    Troubleshooting

    IAM Policy Errors: If the deployment fails with an IAM error, it might mean the Cloud Build service account ([PROJECT_NUMBER]@cloudbuild.gserviceaccount.com) doesn't have permission to deploy to Cloud Run or act as the Cloud Run service account. Grant the Cloud Run Admin (roles/run.admin) and Service Account User (roles/iam.serviceAccountUser) roles to the Cloud Build service account in the IAM console.

    Service Not Accessible: If the service deploys but you can't access the URL, ensure the IAM policy allows public access.

    
    gcloud run services add-iam-policy-binding agent-service \
      --region=us-central1 \
      --member="allUsers" \
      --role="roles/run.invoker
    

    Java SDK Example

    Below is a conceptual example of how to configure the model with tools. This code would typically live in a backend service that communicates with the Gemini API.

    // Define the tool (function) the model can call
    FunctionDeclaration getSlots = FunctionDeclaration.newBuilder()
        .setName("get_available_slots")
        .setDescription("Get a list of available appointment slots for a given date.")
        .setParameters(
            Schema.newBuilder()
                .setType(Type.OBJECT)
                .putProperties("date", Schema.newBuilder().setType(Type.STRING).setDescription("Date in YYYY-MM-DD format").build())
                .addRequired("date")
                .build())
        .build();
    
    // Create the model with the tool configured
    GenerativeModel model = new GenerativeModel.Builder()
        .setModelName("gemini-1.5-pro")
        .setTools(List.of(Tool.newBuilder().addFunctionDeclarations(getSlots).build()))
        .build();
    

    Once the model responds with a function call, our backend code will execute that call against our Spring Boot API, get the result, and send it back to the model so it can formulate a natural language response for the user.

    0 responses

    This concludes the setup for the AI Agent. In the next part, we will build the frontend application in Angular that will allow users to interact with the AI Agent and schedule appointments.