In [None]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.141 --progress-bar off
!pip install -qqq openai==0.27.4 --progress-bar off
!pip install -Uqqq watermark==2.3.1 --progress-bar off
!pip install -Uqqq chromadb==0.3.21 --progress-bar off
!pip install -Uqqq tiktoken==0.3.3 --progress-bar off
!pip install -Uqqq youtube-transcript-api==0.5.0 --progress-bar off
!pip install -Uqqq pytube==12.1.3 --progress-bar off
!pip install -Uqqq unstructured[local-inference]==0.5.12 --progress-bar off

In [None]:
import os
import textwrap

import chromadb
import langchain
import openai
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader, UnstructuredPDFLoader, YoutubeLoader
from langchain.embeddings import HuggingFaceEmbeddings, OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.llms import OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

In [None]:
%load_ext watermark
%watermark --iversions -v -m

Python implementation: CPython
Python version       : 3.9.16
IPython version      : 7.34.0

Compiler    : GCC 9.4.0
OS          : Linux
Release     : 5.10.147+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

chromadb : 0.3.21
openai   : 0.27.4
langchain: 0.0.141



In [None]:
def print_response(response: str):
    print("\n".join(textwrap.wrap(response, width=100)))

In [None]:
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"

In [None]:
!gdown 1eetuan04uj9-QKu_Vok2mbSK23G0H7yN
!gdown 1MVIhlCJS5RjVDy_s93Jb4vkHt6jAmgaa

Downloading...
From: https://drive.google.com/uc?id=1MVIhlCJS5RjVDy_s93Jb4vkHt6jAmgaa
To: /content/Andrej_Karpathy_Resume.pdf
  0% 0.00/46.9k [00:00<?, ?B/s]100% 46.9k/46.9k [00:00<00:00, 48.9MB/s]


In [None]:
txt_loader = TextLoader("./the-need-to-read.txt", encoding="utf8")

In [None]:
index = VectorstoreIndexCreator().from_loaders([txt_loader])



In [None]:
query = "Why someone in todays world would read? Answer in 3 sentences."
result = index.query_with_sources(query)
result

{'question': 'Why someone in todays world would read? Answer in 3 sentences.',
 'answer': ' Reading helps to develop critical thinking skills, encourages creativity, and allows for the discovery of new ideas. It also helps to develop writing skills, which is important for expressing and exploring ideas.\n',
 'sources': './the-need-to-read.txt'}

In [None]:
print_response(result["answer"])

 Reading helps to develop critical thinking skills, encourages creativity, and allows for the
discovery of new ideas. It also helps to develop writing skills, which is important for expressing
and exploring ideas.


## Loaders

In [None]:
yt_loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=n2uY3-2Goek", add_video_info=True
)

In [None]:
yt_documents = yt_loader.load()
yt_documents

[Document(page_content="you get more out of reading one book that's great five times than out of reading five mediocre books if your behavior doesn't change as a result of reading a book and it means you learn nothing which means it was a waste of time and many people who read books are just wasting their time because their behavior doesn't change and so I consolidate once I find something that's good I plug everything I possibly can into it and suck the juice out of it so that I can change my behavior as a result which comes from the Frameworks and how I think about it so I read one thing that's very good many times rather than trying to brag about the fact that I read a book a week because I'm like what was the book last week not that good that it wasn't worth rereading", metadata={'source': 'n2uY3-2Goek', 'title': 'How to get the most out of reading', 'description': "WE'RE BUYING! $1M-10M EBITDA Founders - We invest and help you scale faster. To find out more, apply here: https://ac

In [None]:
document = yt_documents[0]
document.page_content

"you get more out of reading one book that's great five times than out of reading five mediocre books if your behavior doesn't change as a result of reading a book and it means you learn nothing which means it was a waste of time and many people who read books are just wasting their time because their behavior doesn't change and so I consolidate once I find something that's good I plug everything I possibly can into it and suck the juice out of it so that I can change my behavior as a result which comes from the Frameworks and how I think about it so I read one thing that's very good many times rather than trying to brag about the fact that I read a book a week because I'm like what was the book last week not that good that it wasn't worth rereading"

In [None]:
# Use OnlinePDFLoade to load PDFs from the Internets
pdf_loader = UnstructuredPDFLoader("./Andrej_Karpathy_Resume.pdf")
pdf_pages = pdf_loader.load_and_split()



In [None]:
len(pdf_pages[0].page_content)

1434

In [None]:
pdf_pages[0]

Document(page_content='Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage course grade: 94.4%\n\nUniversity of Toronto (Bachelor’s degree), 2005 - 2009\n\nDouble major in Computer Science and Physics, minor in Mathematics\n\nCumulative GPA: 3.74\n\nWORK EXPERIENCE\n\nGoogle Research (internship), June 2011 – September 2011\n\nDeveloped learning algorithms for video classification tasks\n\nWorked on a large-scale learning framework for video analysis\n\nTeaching Assistant\n\n\n\n\n\n\n\n2011: Assisted with online offering of the Machine Learning class at Stanford\n\n2011: Graduate Probabilistic Machine Learning class\n\n2009-2010: Taught tutorial sections for a first year Discrete Mathema

## Text Splitters

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
texts = text_splitter.split_documents(pdf_pages)
len(texts)

2

In [None]:
len(texts[0].page_content), len(texts[1].page_content)

(987, 486)

In [None]:
texts[0]

Document(page_content='Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage course grade: 94.4%\n\nUniversity of Toronto (Bachelor’s degree), 2005 - 2009\n\nDouble major in Computer Science and Physics, minor in Mathematics\n\nCumulative GPA: 3.74\n\nWORK EXPERIENCE\n\nGoogle Research (internship), June 2011 – September 2011\n\nDeveloped learning algorithms for video classification tasks\n\nWorked on a large-scale learning framework for video analysis\n\nTeaching Assistant\n\n\n\n\n\n\n\n2011: Assisted with online offering of the Machine Learning class at Stanford\n\n2011: Graduate Probabilistic Machine Learning class\n\n2009-2010: Taught tutorial sections for a first year Discrete Mathema

In [None]:
texts[1]

Document(page_content='four consecutive semesters\n\nCOURSE WORK\n\n\n\nStanford: Machine Learning, Computer Vision, Convex Optimization,\n\nProbabilistic Graphical Models (I and II)\n\nUniversity of British Columbia: Machine Learning (I and II), Computer Vision (I\n\nand II)\n\nHACKING SKILLS\n\n\n\nPython, C++, MATLAB, Java, Objective C, Javascript/HTML/CSS, PHP, SQL\n\nINTERESTS\n\nHobbies include Ping Pong, Ice skating, Scuba diving, PC strategy/fps games,\n\nProgramming, and solving the Rubik’s cube in less than 20 seconds', metadata={'source': './Andrej_Karpathy_Resume.pdf'})

## Embeddings

In [None]:
MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
hf_embeddings = HuggingFaceEmbeddings(model_name=MODEL_NAME)

In [None]:
text = texts[0].page_content
text

'Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage course grade: 94.4%\n\nUniversity of Toronto (Bachelor’s degree), 2005 - 2009\n\nDouble major in Computer Science and Physics, minor in Mathematics\n\nCumulative GPA: 3.74\n\nWORK EXPERIENCE\n\nGoogle Research (internship), June 2011 – September 2011\n\nDeveloped learning algorithms for video classification tasks\n\nWorked on a large-scale learning framework for video analysis\n\nTeaching Assistant\n\n\n\n\n\n\n\n2011: Assisted with online offering of the Machine Learning class at Stanford\n\n2011: Graduate Probabilistic Machine Learning class\n\n2009-2010: Taught tutorial sections for a first year Discrete Mathematics class on\n\nfour 

In [None]:
hf_embedding = hf_embeddings.embed_documents([text])
len(hf_embedding[0])

768

In [None]:
hf_embedding[0][:10]

[-0.0012547640362754464,
 0.05444266274571419,
 -0.041984450072050095,
 -0.019023854285478592,
 0.007353615947067738,
 -0.012013374827802181,
 0.06387557089328766,
 -0.02246193215250969,
 -0.04335080459713936,
 -0.04206854850053787]

In [None]:
embeddings = OpenAIEmbeddings()

In [None]:
openai_embedding = embeddings.embed_documents([text])
len(openai_embedding[0])

1536

In [None]:
openai_embedding[0][:10]

[-0.0034319146679993133,
 0.016217479770247397,
 0.020403068874950882,
 -0.03693009233481942,
 0.01301435869943405,
 0.025678797149630162,
 -0.00714645780273548,
 0.017321074689020152,
 -0.03157361652884209,
 -0.020618405559186648]

## Vectorstores

In [None]:
db = Chroma.from_documents(texts, embeddings)



In [None]:
db.similarity_search_with_score("What is the candidate work experience?", k=2)

[(Document(page_content='four consecutive semesters\n\nCOURSE WORK\n\n\n\nStanford: Machine Learning, Computer Vision, Convex Optimization,\n\nProbabilistic Graphical Models (I and II)\n\nUniversity of British Columbia: Machine Learning (I and II), Computer Vision (I\n\nand II)\n\nHACKING SKILLS\n\n\n\nPython, C++, MATLAB, Java, Objective C, Javascript/HTML/CSS, PHP, SQL\n\nINTERESTS\n\nHobbies include Ping Pong, Ice skating, Scuba diving, PC strategy/fps games,\n\nProgramming, and solving the Rubik’s cube in less than 20 seconds', metadata={'source': './Andrej_Karpathy_Resume.pdf'}),
  0.4737962484359741),
 (Document(page_content='Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage cours

### Storing and Loading Embeddings

In [None]:
PERSIST_DIRECTORY = "db"

db = Chroma.from_documents(
    documents=texts, embedding=embeddings, persist_directory=PERSIST_DIRECTORY
)



In [None]:
db.similarity_search_with_score("What is the candidate work experience?", k=2)

[(Document(page_content='four consecutive semesters\n\nCOURSE WORK\n\n\n\nStanford: Machine Learning, Computer Vision, Convex Optimization,\n\nProbabilistic Graphical Models (I and II)\n\nUniversity of British Columbia: Machine Learning (I and II), Computer Vision (I\n\nand II)\n\nHACKING SKILLS\n\n\n\nPython, C++, MATLAB, Java, Objective C, Javascript/HTML/CSS, PHP, SQL\n\nINTERESTS\n\nHobbies include Ping Pong, Ice skating, Scuba diving, PC strategy/fps games,\n\nProgramming, and solving the Rubik’s cube in less than 20 seconds', metadata={'source': './Andrej_Karpathy_Resume.pdf'}),
  0.47347402572631836),
 (Document(page_content='Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage cour

In [None]:
db.persist()

Load data

In [None]:
vectordb = Chroma(persist_directory=PERSIST_DIRECTORY, embedding_function=embeddings)



In [None]:
vectordb.similarity_search_with_score("What is the candidate work experience?", k=2)

[(Document(page_content='four consecutive semesters\n\nCOURSE WORK\n\n\n\nStanford: Machine Learning, Computer Vision, Convex Optimization,\n\nProbabilistic Graphical Models (I and II)\n\nUniversity of British Columbia: Machine Learning (I and II), Computer Vision (I\n\nand II)\n\nHACKING SKILLS\n\n\n\nPython, C++, MATLAB, Java, Objective C, Javascript/HTML/CSS, PHP, SQL\n\nINTERESTS\n\nHobbies include Ping Pong, Ice skating, Scuba diving, PC strategy/fps games,\n\nProgramming, and solving the Rubik’s cube in less than 20 seconds', metadata={'source': './Andrej_Karpathy_Resume.pdf'}),
  0.47347402572631836),
 (Document(page_content='Andrej Karpathy\n\nandrej.karpathy@gmail.com\n\nhttp://cs.stanford.edu/~karpathy/\n\nEDUCATION\n\nStanford University (PhD), 2011 –\n\nComputer Science, studying Machine Learning and Computer Vision\n\nUniversity of British Columbia (Master’s degree), 2009 - 2011\n\nComputer Science graduate studies in Machine Learning, Vision, Motor Control\n\nAverage cour

## Use a Chain

In [None]:
chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(temperature=0),
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
)

In [None]:
query = "What is the work experience of the candidate? Use no more than 2 sentences."
response = chain.run(query)

In [None]:
print_response(response)

The candidate has worked as a Google Research intern, developing learning algorithms for video
classification tasks and worked on a large-scale learning framework for video analysis. They have
also worked as a teaching assistant for various classes.


In [None]:
query = "Give a background summary of the candidate. Use no more than 3 sentences."
response = chain.run(query)
print_response(response)

Andrej Karpathy is a PhD student in Computer Science at Stanford University, studying Machine
Learning and Computer Vision. He has a Master's degree in Computer Science from the University of
British Columbia and a Bachelor's degree in Computer Science and Physics from the University of
Toronto. He has worked as a Teaching Assistant and interned at Google Research.


In [None]:
query = """
How likely is this candidate to be a top-tier DL researcher 2 years from now? 
Use 0-10 scale, where
0 - chance is nonexistent
10 - beyond reasonable doubt

You must choose a number and explain why
"""
response = chain.run(query)
print_response(response)

As an AI language model, I cannot predict the future or make assumptions about individuals. However,
based on the candidate's educational background, work experience, and course work, it seems that
they have a strong foundation in machine learning and computer vision. Additionally, their
experience as a teaching assistant and their interest in programming and problem-solving suggest
that they have a passion for the field. Therefore, I would rate their chances of becoming a top-tier
DL researcher 2 years from now as 7 out of 10.


## References

- [Paul Graham - The Need to Read](http://www.paulgraham.com/read.html)
- [Alex Hormozi - How to get the most out of reading](https://www.youtube.com/watch?v=n2uY3-2Goek)
- [Andrej Karpathy (very old) Resume](https://cs.stanford.edu/~karpathy/Andrej_Karpathy_Resume.pdf)
- [LangChain Docs](https://python.langchain.com/en/latest/index.html)