Have you ever in a tough situation asked yourself – what would my idol do? You can now get the answer using the Be Your Idol project made available through Be Your Idol Kaggle Notebook, created as capstone project for the Kaggle 2025 GenAI Intensive course.
This project combines technologies from Google AI and Chroma DB to build a Retrieval Augmented Generation (RAG) application using Gemini embeddings and model. Biographies are pulled from http://www.gutenberg.org and chunked into paragraph-sized documents, which are then used to generate embeddings used to create vector-based databases. The project then allows you to use a prompt personifying a personal coach and leveraging embeddings reflecting passages pulled from the passage based on the situations you request for help on.
The following code from the notebook shows how you can use Chroma DB functions to create a chroma database associated with an embedding function using the get_or_create_collection function, and then add documents to the database using the add function. The embedding function is applied to each document to generate the embedding for each document added.
def create_and_populate_chroma_db_batched(
documents: list[str],
db_name: str,
embedding_function: EmbeddingFunction,
chroma_client: Client,
batch_size: int = 100 # Default batch size is 100
):
"""
Creates or gets a ChromaDB collection and adds a list of documents to it in batches.
Args:
documents: A list of text documents to add to the database.
db_name: The name of the ChromaDB collection.
embedding_function: The ChromaDB EmbeddingFunction to use for the collection.
chroma_client: An initialized ChromaDB Client object.
batch_size: The number of documents to process and add in each batch.
Defaults to 100.
Returns:
The ChromaDB Collection object.
"""
db = chroma_client.get_or_create_collection(name=db_name, embedding_function=embedding_function)
num_documents = len(documents)
for i in range(0, num_documents, batch_size):
end_index = min(i + batch_size, num_documents)
batch_docs = documents[i:end_index]
batch_ids = [str(j) for j in range(i, end_index)]
db.add(documents=batch_docs, ids=batch_ids)
print(f"Added batch {i//batch_size + 1}/{ (num_documents + batch_size - 1) // batch_size } containing {len(batch_docs)} documents to '{db_name}'.")
print(f"Finished adding {db.count()} documents to the collection '{db_name}'.")
return db
The notebook currently only works for a specific idol – Abraham Lincoln, but will be expanded to support more idols. It will show you how to use an AI agent to help you choose an idol, and even choose whether the advice given will done using the same tone and writing style as your idol. Stay tuned!