# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Using Gemini Long Context Window for Text¶
Open in Colab |
Open in Colab Enterprise |
Open in Vertex AI Workbench |
View on GitHub |
Author(s) | Vijay Reddy |
Reviewer(s) | Rajesh Thallam, Skander Hannachi |
Overview¶
Gemini 1.5 Pro supports up to 2 Million input tokens. This is the equivalent of roughly:
- ~2000 pages of text
- ~19 hours of audio
- ~2 hours of video
- ~60K lines of code
This long context window (LCW) opens up possibilities for prompting on large contexts that previously could only be approximated using pre-processing steps such as Retrieval Augmented Generation (RAG). Long context windows in LLMs are enabling new use cases and optimizing standard use cases such as:
- Summarizing, analyzing and question-answering on large documents
- Analyzing large code repositories
- Agentic workflows for keeping the state of agents
- Many-shot in-context learning providing examples at the scale of hundreds or thousands leading to performance comparable to fine-tuned models.
In this notebook we will demonstrate long context window (LCW) using the text modality*. We will demonstrate 3 approaches to long context prompting and compare each of these approaches along the following dimensions of accuracy, latency and cost. We will also compare LCW to a RAG approach.
Below is the summary of results observed at the time these experiments were run. Continue on for a detailed analysis of each.
Trial | Accuracy | Latency | Cost |
---|---|---|---|
Baseline | 38% (3/8) | 0.5 min | $0.004 |
LCW - Naive | 100% (8/8) | 4.7 min | $19.68 |
LCW - Batched | 88% (7/8) | 1.7 min | $2.47 |
LCW - Cached | 100% (8/8) | 2.9 min | $10.22 |
RAG | 63% (5/8) | 0.5min | $0.30 |
Getting Started¶
The following steps are necessary to run this notebook, no matter what notebook environment you're using.
If you're entirely new to Google Cloud, get started here.
Google Cloud Project Setup¶
- Select or create a Google Cloud project. When you first create an account, you get a $300 free credit towards your compute/storage costs.
- Make sure that billing is enabled for your project.
- Enable the Service Usage API
- Enable the Vertex AI API.
- Enable the Cloud Storage API.
Google Cloud Permissions¶
To run the complete Notebook, including the optional section, you will need to have the Owner role for your project.
If you want to skip the optional section, you need at least the following roles:
roles/serviceusage.serviceUsageAdmin
to enable APIsroles/iam.serviceAccountAdmin
to modify service agent permissionsroles/aiplatform.user
to use AI Platform componentsroles/storage.objectAdmin
to modify and delete GCS buckets
Install Vertex AI SDK for Python and other dependencies (If Needed)¶
The list packages
contains tuples of package import names and install names. If the import name is not found then the install name is used to install quitely for the current user.
! pip install pandas google-cloud-aiplatform langchain langchain-community langchain-google-vertexai faiss-cpu --upgrade --quiet --user
Restart Runtime¶
To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.
# Restart kernel after installs so that your environment can access the new packages
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)
Authenticate¶
If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud project.
If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into Application Default Credentials for your local environment and initializing the Google Cloud CLI. In many cases, running gcloud auth application-default login
in a shell on the machine running the notebook kernel is sufficient.
More authentication options are discussed here.
# Colab authentication.
import sys
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
print("Authenticated")
Set Google Cloud project information and Initialize Vertex AI SDK¶
To get started using Vertex AI, you must have an existing Google Cloud project and enable the Vertex AI API.
Learn more about setting up a project and a development environment.
Make sure to change PROJECT_ID
in the next cell. You can leave the values for REGION
unless you have a specific reason to change them.
import vertexai
PROJECT_ID = "[your-project-id]" # @param {type:"string"}
REGION = "us-central1" # @param {type:"string"}
vertexai.init(project=PROJECT_ID, location=REGION)
print("Vertex AI SDK initialized.")
print(f"Vertex AI SDK version = {vertexai.__version__}")
Vertex AI SDK initialized. Vertex AI SDK version = 1.63.0
Import Libraries¶
import datetime
import pandas as pd
from IPython.display import Markdown
from vertexai.generative_models import (GenerativeModel, HarmBlockThreshold,
HarmCategory, Part)
pd.set_option("display.max_colwidth", None)
Initialize Gemini¶
# Gemini Config
GENERATION_CONFIG = dict(temperature=0)
SAFETY_CONFIG = {
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}
gemini_pro_model = GenerativeModel(
model_name="gemini-1.5-pro-001",
generation_config=GENERATION_CONFIG,
safety_settings=SAFETY_CONFIG,
)
Long Context for Question and Answering¶
To demonstrate Gemini's long context capabilities in the text modality we will do Questions and Answer about the novel David Copperfield by Charles Dickens. It is ~360K words and ~540K tokens, sufficiently long to evaluate long context capabilities of Gemini.
The questions were sourced manually from the novel, without any prior knowledge of how they would perform in the various tests. The answers for these questions are roughly evenly distributed throughout the source material (beginning, middle, end) so as to evaluate performance across the full context window.
questions = [
"What are the first objects that David can recall from his infancy?",
"At the inn where the mail stops, what is painted on the door?",
"What name does David's aunt suggest to Mr. Dick they call him by?",
"What is the name of the chapter in which Mr. Jorkins is first mentioned?",
"Describe the room in which Mr. Copperfield meets Steerforth for breakfast.",
"After his engagement to Dora, David write Agnes a letter. What is the letter about?",
"What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?",
"Who are David's final thoughts about in the book?",
]
answers = [
"His mother with her pretty hair, and Peggotty.",
"DOLPHIN",
"Trotwood Copperfield",
"CHAPTER XXIII: I Corroborate Mr. Dick, and Choose a Profession.",
"A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard.",
"He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision.",
"A letter stating that he would appear in the morning at half past nine.",
"Agnes",
]
Baseline Behavior: Without adding any context¶
As David Copperfield is a popular classic novel, it is likely Gemini already knows something about it from its internal knowledge. So that we can later measure how adding the novel explicitly as context helps, let's first ask our sample questions without any added context (aka zero-shot).
prompt_template = "Answer the following question about {context}: {question}"
context = "the book 'David Copperfield' by Charles Dickens"
def evaluate(
questions, answers, prompt_template, context, model, is_context_cached=False
):
df = pd.DataFrame(
columns=[
"question",
"ground_truth",
"model_response",
"input_token_count",
"output_token_count",
]
)
for i in range(len(questions)):
if is_context_cached:
prompt = prompt_template.format(question=questions[i])
else:
prompt = prompt_template.format(context=context, question=questions[i])
response = model.generate_content(prompt)
res = response.text
input_token_count = 0
output_token_count = 0
if response.usage_metadata:
input_token_count = response.usage_metadata.prompt_token_count
output_token_count = response.usage_metadata.candidates_token_count
df.loc[len(df)] = {
"question": questions[i],
"model_response": res,
"ground_truth": answers[i],
"input_token_count": input_token_count,
"output_token_count": output_token_count,
}
return df
%%time
df_zeroshot = evaluate(questions, answers, prompt_template, context, gemini_pro_model)
df_zeroshot
I0000 00:00:1725978817.120466 97862253 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
CPU times: user 76.4 ms, sys: 36.3 ms, total: 113 ms Wall time: 27.9 s
question | ground_truth | model_response | input_token_count | output_token_count | |
---|---|---|---|---|---|
0 | What are the first objects that David can recall from his infancy? | His mother with her pretty hair, and Peggotty. | The first objects David Copperfield remembers from his infancy are **the dressing-table with its silver inkstand, and his mother's face reflected in the mirror above it.** \n\nThis memory is significant because it highlights the importance of his mother in his early life and establishes a sense of domestic peace that will be shattered by her remarriage. \n | 29 | 71 |
1 | At the inn where the mail stops, what is painted on the door? | DOLPHIN | At the inn where the mail coach stops in "David Copperfield," the door features a painting (rather crudely done, we can assume!) of a **blue lion**. \n\nThis detail is mentioned in Chapter 5, when young David is on his journey to Yarmouth with the Peggotys. \n | 31 | 64 |
2 | What name does David's aunt suggest to Mr. Dick they call him by? | Trotwood Copperfield | David's aunt, Betsey Trotwood, suggests that Mr. Dick call David by the name **"Trotwood"**. \n\nShe dislikes the name "David" because it was the name of David's deceased father, whom she strongly disapproved of. She believes that calling David "Trotwood" will help distance him from his father's memory and allow him to forge his own identity under her care. \n | 33 | 88 |
3 | What is the name of the chapter in which Mr. Jorkins is first mentioned? | CHAPTER XXIII: I Corroborate Mr. Dick, and Choose a Profession. | Mr. Jorkins is first mentioned in Chapter 8, titled **"My Holidays. Especially One Happy Afternoon."** \n\nAlthough he doesn't physically appear in this chapter, Mr. Jorkins is introduced as Mr. Spenlow's business partner at the law firm where David is taken to begin work. He is described as a meek and mild man who is constantly dominated by Mr. Spenlow. \n | 34 | 89 |
4 | Describe the room in which Mr. Copperfield meets Steerforth for breakfast. | A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard. | While Dickens describes many rooms in detail throughout "David Copperfield," he doesn't specifically describe the room where Mr. Copperfield and Steerforth have breakfast. \n\nIt's likely you're thinking of the breakfast scene at the **Inn in Yarmouth**, where David first meets Steerforth as a young boy. However, the text focuses more on the characters and their interactions than the room itself. \n\nWe can infer some details about the room from the context:\n\n* **It's an inn:** This suggests a common room used for meals by multiple guests, rather than a private dining room.\n* **It's likely basic but comfortable:** The inn seems respectable but not luxurious, reflecting the modest means of David and Peggotty.\n* **The atmosphere is lively:** The presence of the other guests, including the outspoken carrier, suggests a bustling and convivial atmosphere.\n\nAlthough Dickens doesn't provide a detailed description of the room, he masterfully uses dialogue and character interaction to paint a vivid picture of the scene and the dynamics between young David, the confident Steerforth, and the devoted Peggotty. \n | 31 | 233 |
5 | After his engagement to Dora, David write Agnes a letter. What is the letter about? | He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision. | In Charles Dickens's "David Copperfield," David does indeed write to Agnes Wickfield after his engagement to Dora Spenlow. However, the letter isn't about his love for Agnes, as some readers might anticipate. \n\nHere's a breakdown of the letter's content:\n\n* **Sharing the news:** David's primary purpose is to inform Agnes of his engagement to Dora. He describes his joy and Dora's charming qualities.\n* **Seeking approval and reassurance:** Deep down, David seeks Agnes's validation of his choice. He values her opinion and wants her blessing on his happiness, even though he might not fully realize the extent of his reliance on her.\n* **Hinting at doubts:** While expressing happiness, the letter subtly reveals David's underlying concerns about his compatibility with Dora. He acknowledges their differences and hints at her lack of practicality. \n* **Confiding in Agnes:** Despite his engagement, David still turns to Agnes for emotional support and understanding. He confides in her about his anxieties and hopes for the future.\n\nThe letter is significant because it highlights:\n\n* **David's naivety:** He's blinded by Dora's beauty and charm, overlooking their fundamental differences.\n* **The complexity of his feelings:** While in love with Dora, he still depends on Agnes for emotional support and guidance.\n* **Foreshadowing:** The letter foreshadows the challenges and eventual disillusionment David will face in his marriage to Dora.\n\nIn essence, the letter to Agnes is a mixture of joyful announcement, subconscious plea for approval, and an unconscious revelation of David's underlying doubts about his choice of wife. \n | 34 | 347 |
6 | What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night? | A letter stating that he would appear in the morning at half past nine. | David finds Mr. Micawber in a state of emotional distress, having already consumed a significant amount of punch. He's about to expose the villainy of his employer, Uriah Heep. \n | 40 | 43 |
7 | Who are David's final thoughts about in the book? | Agnes | In the closing lines of "David Copperfield," David's final thoughts are about **Agnes Wickfield**. \n\nHe reflects on their shared past, her unwavering love and support, and the happiness their future holds. He realizes that his restless pursuit of other loves was misguided, and that true happiness resided with Agnes all along. \n\nHere's a snippet from the final paragraph:\n\n"And now, my own beloved husband, I am going to tell you of the greatest change that ever happened in my life... when something whispered to me, 'This is the man who can help me best!'"\n\nThe book ends with David's thoughts turning towards their future together, filled with peace and contentment. \n | 28 | 146 |
Analysis¶
- Latency: 28 seconds
- Cost: $0.004
- Accuracy: 3/8
Accuracy is determined manually by comparing the ground truth to the model response. There is some subjectivity in this evaluation, and at the time of this writing Gemini is non-deterministic, so your results may vary slightly.
In our analysis only 2 answers are unambigiously correct, and another two we consider close enough to give partial credit, for a total of 1+1+0.5+0.5=3 out of 8, or 38% accuracy.
The cost is negligable.
Long Context Window¶
Now tet's take advantage of the 2M context window with Gemini 1.5 Pro and see if accuracy improves by feeding the entire novel text as context.
Download Novel¶
import requests
url = "https://www.gutenberg.org/ebooks/766.txt.utf-8"
try:
response = requests.get(url)
response.raise_for_status() # Raise an exception for bad status codes (4xx and 5xx)
except requests.exceptions.RequestException as e:
print(f"Error downloading file: {e}")
novel = response.text
Naive approach¶
We will first construct our prompt in a naive way by stuffing the entirety of the novel into the prompt and asking it one question at a time.
💡 TIP
When working with a long context prompts, you can follow a few prompting strategies:
- Structure your prompt separating out input data (documents) from the instructions. In the prompt template, we are using XML tags to separate out document and instructions. This helps Gemini 1.5 Pro to disambiguate data from instructions and process the prompt optimally.
- Location of instruction and user input matters! Documents are added first followed by instructions and user input/question. This placement helps the model to address the question better.
%%time
prompt_template = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below.
<document>
{context}
</document>
Based on the novel text provided, answer the following:
{question}
"""
df_noncache = evaluate(questions, answers, prompt_template, novel, gemini_pro_model)
df_noncache
CPU times: user 4.78 s, sys: 341 ms, total: 5.12 s Wall time: 4min 41s
question | ground_truth | model_response | input_token_count | output_token_count | |
---|---|---|---|---|---|
0 | What are the first objects that David can recall from his infancy? | His mother with her pretty hair, and Peggotty. | The first objects David Copperfield remembers from his infancy are his mother, with her pretty hair and youthful shape, and Peggotty, with "no shape at all" and dark eyes, red cheeks, and hard arms. \n | 539714 | 48 |
1 | At the inn where the mail stops, what is painted on the door? | DOLPHIN | The door has **DOLPHIN** painted on it. \n | 539716 | 14 |
2 | What name does David's aunt suggest to Mr. Dick they call him by? | Trotwood Copperfield | David's aunt suggests to Mr. Dick that they call him "Trotwood". \n | 539718 | 20 |
3 | What is the name of the chapter in which Mr. Jorkins is first mentioned? | CHAPTER XXIII: I Corroborate Mr. Dick, and Choose a Profession. | Mr. Jorkins is first mentioned in **Chapter 23, "I Corroborate Mr. Dick, and Choose a Profession".** \n | 539719 | 32 |
4 | Describe the room in which Mr. Copperfield meets Steerforth for breakfast. | A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard. | The room where David Copperfield meets Steerforth for breakfast is described as a "snug private apartment," a welcome contrast to the dingy, shared coffee-room where David had spent the previous night. It is decorated with red curtains and has a Turkey carpet. A bright fire burns in the fireplace, and a "fine hot breakfast" is laid out on a table covered with a clean tablecloth. The scene is reflected in a "cheerful miniature" in the small, round mirror hanging over the sideboard. \n | 539716 | 103 |
5 | After his engagement to Dora, David write Agnes a letter. What is the letter about? | He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision. | After David gets engaged to Dora, he writes Agnes a long letter telling her how happy he is and how wonderful Dora is. He describes his love for Dora as profound and unlike anything ever known, trying to convince Agnes that this is not a passing fancy like his childhood infatuations. He wants her to understand that his love for Dora is serious and lasting. \n | 539719 | 73 |
6 | What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night? | A letter stating that he would appear in the morning at half past nine. | In the hotel where Mr. Micawber requested to meet David in the middle of the night, David finds a **letter**. \n\nThis letter informs David that Mr. Micawber will be appearing in the morning at precisely half past nine. \n | 539725 | 52 |
7 | Who are David's final thoughts about in the book? | Agnes | At the end of the book, David's final thoughts are about **Agnes**. He reflects on their journey together through life, surrounded by their children and friends. He recognizes her as the guiding force that has always led him to be a better person, and expresses his enduring love for her. He imagines her by his side as he closes his life, a constant source of solace and inspiration. \n | 539713 | 81 |
Analysis¶
- Latency: 4.7min
- Cost: $19.68
- Accuracy: 8/8
Unsuprisingly latency is much greater since we're increasing our prompt size by 500K tokens.
Cost is also significantly increased. At the time of this writing (refer to the pricing for latest) Gemini 1.5 Pro costs \$0.00125/1k input characters. The novel is 1,970,730 characters which amounts to \$2.46 per invocation.
However we now have 100% accuracy.
Batching multiple questions¶
One way to save on cost and latency when dealing with long contexts is by batching multiple questions into one prompt. Let's try asking all 8 of our questions at once.
%%time
prompt_template = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below.
<document>
{context}
</document>
Based on the novel text provided, answer the following:
{question}
"""
prompt = prompt_template.format(context=novel, question=questions)
response = gemini_pro_model.generate_content(prompt).text
Markdown(response)
CPU times: user 649 ms, sys: 103 ms, total: 752 ms Wall time: 1min 41s
Here are the answers to your questions, based on the provided text of David Copperfield:
What are the first objects that David can recall from his infancy? The first objects David remembers are his mother, with her pretty hair and youthful shape, and Peggotty, with her dark eyes and hard, red cheeks and arms.
At the inn where the mail stops, what is painted on the door? The door of David's room at the inn has "DOLPHIN" painted on it.
What name does David's aunt suggest to Mr. Dick they call him by? David's aunt suggests they call him "Trotwood," later shortening it to "Trot."
What is the name of the chapter in which Mr. Jorkins is first mentioned? Mr. Jorkins is first mentioned in Chapter 23, "I Corroborate Mr. Dick, and Choose a Profession."
Describe the room in which Mr. Copperfield meets Steerforth for breakfast. David meets Steerforth for breakfast in a "snug private apartment, red-curtained and Turkey-carpeted," where a fire burns brightly and a hot breakfast is laid out. A miniature of the cozy scene is reflected in a small, round mirror over the sideboard.
After his engagement to Dora, David writes Agnes a letter. What is the letter about? In his letter to Agnes, David tries to convey how happy he is and how much he loves Dora. He insists that his love for Dora is not a passing fancy and asks Agnes not to see it as similar to the boyish infatuations they used to joke about. He also mentions the sadness in Yarmouth over Emily's disappearance, saying it is a double wound for him due to the circumstances.
What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night? David finds a letter from Mr. Micawber at the hotel. In the letter, Mr. Micawber dramatically declares himself "Crushed" and facing financial ruin. He also hints at impending legal trouble and the imminent arrival of another child.
Who are David's final thoughts about in the book? David's final thoughts are about Agnes. He reflects on how she has always been his guiding light and how his love for her has sustained him. The book ends with him imagining her by his side as he dies, "pointing upward."
Analysis¶
- Latency: 1.7min
- Cost: $2.47
- Accuracy: 7/8
While we save considerably on latency and cost, it now hallucinates the answer for question 7. Asking several questions in a single prompt, while cost and latency efficient, can sacrifice accuracy.
You can treat the number of questions per prompt as a sort of hyperparameter, reducing it to prioritize accuracy and increasing it prioritize cost and/or latency.
Context Caching¶
In cases where we anticipate multiple model invocations about the same long context, instead of passing the whole context in the prompt each time we can take advantage of context caching.
Caching can be combined with batching to further reduce cost and latency. For example if you have 100 questions, ask in 10 batches of 10 questions. However for the sake of comparison we will forgo batching and ask only one question per prompt.
A few things to note with context caching:
- The minimum size of a context cache is 32K tokens.
- By default, each context cache has a expiration time of 60min, which can be updated either at or after cache creation.
from vertexai.preview import caching
from vertexai.preview.generative_models import GenerativeModel
system_instruction = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below.
"""
contents = [Part.from_text(novel)]
cached_content = caching.CachedContent.create(
model_name="gemini-1.5-pro-001",
system_instruction=system_instruction,
contents=contents,
ttl=datetime.timedelta(minutes=10),
)
cached_content = caching.CachedContent(cached_content_name=cached_content.name)
model_cached = GenerativeModel.from_cached_content(
cached_content=cached_content,
generation_config=GENERATION_CONFIG,
safety_settings=SAFETY_CONFIG,
)
I0000 00:00:1725979817.435063 97862253 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported I0000 00:00:1725979825.959627 97862253 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported I0000 00:00:1725979826.326992 97862253 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
%%time
cached_prompt_template = "Answer the following question from the full text: {question}"
df_cache = evaluate(
questions,
answers,
cached_prompt_template,
novel,
model_cached,
is_context_cached=True,
)
df_cache
I0000 00:00:1725979826.734134 97862253 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
CPU times: user 210 ms, sys: 143 ms, total: 353 ms Wall time: 2min 52s
question | ground_truth | model_response | input_token_count | output_token_count | |
---|---|---|---|---|---|
0 | What are the first objects that David can recall from his infancy? | His mother with her pretty hair, and Peggotty. | The first objects David Copperfield remembers from his infancy are his mother, with her pretty hair and youthful shape, and Peggotty, with "no shape at all" and dark eyes, red cheeks, and hard arms. \n | 539701 | 48 |
1 | At the inn where the mail stops, what is painted on the door? | DOLPHIN | The door has **DOLPHIN** painted on it. \n | 539703 | 14 |
2 | What name does David's aunt suggest to Mr. Dick they call him by? | Trotwood Copperfield | David's aunt suggests to Mr. Dick that they call David "Trotwood". \n | 539705 | 20 |
3 | What is the name of the chapter in which Mr. Jorkins is first mentioned? | CHAPTER XXIII: I Corroborate Mr. Dick, and Choose a Profession. | Mr. Jorkins is first mentioned in Chapter 23, "I Corroborate Mr. Dick, and Choose a Profession". \n | 539706 | 30 |
4 | Describe the room in which Mr. Copperfield meets Steerforth for breakfast. | A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard. | Mr. Copperfield describes the room where he has breakfast with Steerforth as a "snug private apartment." It is decorated with red curtains and has a Turkey carpet. A bright fire burns in the fireplace, and a "fine hot breakfast" is laid out on a table covered with a clean tablecloth. The scene is reflected in a "little round mirror over the sideboard," creating a cozy and inviting atmosphere. \n | 539703 | 84 |
5 | After his engagement to Dora, David write Agnes a letter. What is the letter about? | He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision. | After getting engaged to Dora, David writes a long letter to Agnes telling her about his engagement and how blissful he is. He describes how much he adores Dora and tries to convince Agnes that this love is different from the boyish fancies they used to joke about, claiming its depth is unfathomable. He avoids mentioning Steerforth and only tells her about the sadness in Yarmouth caused by Emily's disappearance, which has deeply affected him. \n | 539706 | 91 |
6 | What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night? | A letter stating that he would appear in the morning at half past nine. | David finds a letter from Mr. Micawber stating that he will appear in the morning at half past nine. \n | 539712 | 25 |
7 | Who are David's final thoughts about in the book? | Agnes | At the end of the book, David's final thoughts are about **Agnes**. He reflects on their journey together through life, surrounded by their children and friends. He acknowledges her unwavering support and guidance, and expresses his enduring love for her. He sees her as a guiding light, a source of solace and inspiration, and hopes that her presence will remain with him until the end of his life. \n | 539700 | 82 |
Analysis¶
- Latency: 2.9 minutes
- Cost: \$10.22 (\$9.85 query cost + \$0.37 storage for 10 minutes)
- Accuracy: 8/8
Cached input is 2x discounted at \$0.000625/1k input characters (> 128K context window), plus a storage charge of \$0.001125/1k characters/hour.
The more often you query, the more the caching approach saves. There is a latency improvement from caching and accuracy is back at 100%.
RAG¶
Lastly we will implement a retrieval augmented generation (RAG) approach. Prior to the introduction of Gemini's long context window, this was the only way to do question and answer on text of this length. For the RAG approach we will assume a max input token length of 30K, which corresponds to the limit for the previous version of Gemini (1.0).
Google offers an out of the box enterprise grade RAG experience via Vertex AI Search. While for production use cases we would recommend that, in order to keep this notebook self contained we will implement an in-memory RAG approach using langchain.
As this is not a RAG tutorial, detailed implementation instructions are ommited. For detailed instructions see the langchain docs.
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
from langchain.schema.document import Document
# 1. Load and Split Text
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=4000,
chunk_overlap=200,
separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
)
docs = [Document(page_content=x) for x in text_splitter.split_text(novel)]
# 2. Create Embeddings and Vectorstore
embeddings = VertexAIEmbeddings("text-embedding-004")
vectorstore = FAISS.from_documents(docs, embeddings)
# 3. Set up RetrievalQA Chain
llm = ChatVertexAI(model="gemini-1.5-pro-001")
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 7}))
I0000 00:00:1726061249.322897 110062116 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported I0000 00:00:1726061249.746353 110062116 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported I0000 00:00:1726061249.746655 110062116 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
%%time
df_rag = pd.DataFrame(
columns=[
"question",
"ground_truth",
"model_response",
]
)
for i in range(len(questions)):
res = qa.run(questions[i])
df_rag.loc[len(df_rag)] = {
"question": questions[i],
"ground_truth": answers[i],
"model_response": res,
}
df_rag
CPU times: user 228 ms, sys: 52.8 ms, total: 281 ms Wall time: 31 s
question | ground_truth | model_response | |
---|---|---|---|
0 | What are the first objects that David can recall from his infancy? | His mother with her pretty hair, and Peggotty. | The first objects David Copperfield remembers are his mother, with her pretty hair and youthful shape, and Peggotty, with no shape at all, and very dark eyes and red cheeks. \n |
1 | At the inn where the mail stops, what is painted on the door? | DOLPHIN | DOLPHIN \n |
2 | What name does David's aunt suggest to Mr. Dick they call him by? | Trotwood Copperfield | Trotwood \n |
3 | What is the name of the chapter in which Mr. Jorkins is first mentioned? | CHAPTER XXIII: I Corroborate Mr. Dick, and Choose a Profession. | This passage mentions that Mr. Jorkins is first mentioned in the context of Mr. Copperfield trying to cancel his articles, but it does not specify the chapter number. Therefore, I cannot answer your question. \n |
4 | Describe the room in which Mr. Copperfield meets Steerforth for breakfast. | A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard. | The room is described as a "snug private apartment" that's "red-curtained and Turkey-carpeted." It has a bright, burning fire, and a hot breakfast is laid out on a table covered with a clean cloth. A small, round mirror over the sideboard reflects the cozy scene. \n |
5 | After his engagement to Dora, David write Agnes a letter. What is the letter about? | He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision. | |
6 | What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night? | A letter stating that he would appear in the morning at half past nine. | This answer is not available in the provided text. \n |
7 | Who are David's final thoughts about in the book? | Agnes | David's final thoughts are about Agnes, whom he pictures as a guiding light by his side. \n |
Analysis¶
- Latency: 0.5 minutes
- Cost: \$0.30
- Accuracy: 5/8
On accuracy RAG does well when:
- The question is semantically similar to the answer.
- The answer is self contained in a single chunk/passage.
If our questions violate either of these principles RAG will struggle and long context will likely be more accurate. This is exemplified by the question "What is the name of the chapter in which Mr. Jorkins is first mentioned?" which violates the second principle. The name of the chapter does not appear close to the mention of Mr. Jorkins, and so is not in a self contained chunk, therefore the retriever fails to retrieve the necessary information.
It's important to note that there are several ways to implement RAG which will affect the cost, latency and accuracy. Here we opted for a simple in-memory implementation which is cheap but won't scale well. Production grade approaches would come with additional overhead costs associated with a persistant vector store database and potentially improved accuracy.
💡 RAG and Long Context Window are NOT mutually exclusive
By adjusting the chunk size and number of chunks in RAG you can use as large of a context window as the LLM supports.
If cost and latency are more important prioritize a curated retrieval (small chunk size/number of chunks).
If accuracy is the priority use a larger chunk size/number of chunks. If the entire context can fit in the prompt consider bypassing RAG altogether.
Conclusion¶
Trial | Accuracy | Latency | Cost |
---|---|---|---|
Baseline | 38% (3/8) | 0.5 min | $0.004 |
LCW - Naive | 100% (8/8) | 4.7 min | $19.68 |
LCW - Batched | 88% (7/8) | 1.7 min | $2.47 |
LCW - Cached | 100% (8/8) | 2.9 min | $10.22 |
RAG | 63% (5/8) | 0.5min | $0.30 |
We have demonstrated various approaches to long context prompting and compared them across the dimensions of latency, cost and accuracy.
Whenever using the same long context across multiple prompts, caching is a great option to reduce cost. You can amplify the cost savings and reduce latency by batching, but be careful as batching too many questions will start to negatively impact accuracy. RAG is still useful in many cases, but doesn't do as well in retrieving answers that require analyzing large chunks or multiple disparate chunks of text.