Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harrison/self hosted runhouse #1154

Merged
merged 2 commits into from
Feb 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/ecosystem/runhouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Runhouse

This page covers how to use the [Runhouse](https://github.com/run-house/runhouse) ecosystem within LangChain.
It is broken into three parts: installation and setup, LLMs, and Embeddings.

## Installation and Setup
- Install the Python SDK with `pip install runhouse`
- If you'd like to use on-demand cluster, check your cloud credentials with `sky check`

## Self-hosted LLMs
For a basic self-hosted LLM, you can use the `SelfHostedHuggingFaceLLM` class. For more
custom LLMs, you can use the `SelfHostedPipeline` parent class.

```python
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
```

For a more detailed walkthrough of the Self-hosted LLMs, see [this notebook](../modules/llms/integrations/self_hosted_examples.ipynb)

## Self-hosted Embeddings
There are several ways to use self-hosted embeddings with LangChain via Runhouse.

For a basic self-hosted embedding from a Hugging Face Transformers model, you can use
the `SelfHostedEmbedding` class.
```python
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
```

For a more detailed walkthrough of the Self-hosted Embeddings, see [this notebook](../modules/utils/combine_docs_examples/embeddings.ipynb)

##
2 changes: 2 additions & 0 deletions docs/modules/llms/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ The examples here are all "how-to" guides for how to integrate with various LLM

`Anthropic <./integrations/anthropic_example.html>`_: Covers how to use Anthropic models with Langchain.

`Self-Hosted Models (via Runhouse) <./integrations/self_hosted_examples.html>`_: Covers how to run models on existing or on-demand remote compute with Langchain.


.. toctree::
:maxdepth: 1
Expand Down
296 changes: 296 additions & 0 deletions docs/modules/llms/integrations/self_hosted_examples.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9597802c",
"metadata": {},
"source": [
"# Self-Hosted Models via Runhouse\n",
"This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.\n",
"\n",
"For more information, see [Runhouse](https://github.com/run-house/runhouse) or the [Runhouse docs](https://runhouse-docs.readthedocs-hosted.com/en/latest/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fb585dd",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM\n",
"from langchain import PromptTemplate, LLMChain\n",
"import runhouse as rh"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06d6866e",
"metadata": {},
"outputs": [],
"source": [
"# For an on-demand A100 with GCP, Azure, or Lambda\n",
"gpu = rh.cluster(name=\"rh-a10x\", instance_type=\"A100:1\", use_spot=False)\n",
"\n",
"# For an on-demand A10G with AWS (no single A100s on AWS)\n",
"# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')\n",
"\n",
"# For an existing cluster\n",
"# gpu = rh.cluster(ips=['<ip of the cluster>'], \n",
"# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},\n",
"# name='rh-a10x')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "035dea0f",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f3458d9",
"metadata": {},
"outputs": [],
"source": [
"llm = SelfHostedHuggingFaceLLM(model_id=\"gpt2\", hardware=gpu, model_reqs=[\"pip:./\", \"transformers\", \"torch\"])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a641dbd9",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "6fb6fdb2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO | 2023-02-17 05:42:23,537 | Running _generate_text via gRPC\n",
"INFO | 2023-02-17 05:42:24,016 | Time to send message: 0.48 seconds\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nLet's say we're talking sports teams who won the Super Bowl in the year Justin Beiber\""
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "c88709cd",
"metadata": {},
"source": [
"You can also load more custom models through the SelfHostedHuggingFaceLLM interface:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22820c5a",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"llm = SelfHostedHuggingFaceLLM(\n",
" model_id=\"google/flan-t5-small\",\n",
" task=\"text2text-generation\",\n",
" hardware=gpu,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "1528e70f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO | 2023-02-17 05:54:21,681 | Running _generate_text via gRPC\n",
"INFO | 2023-02-17 05:54:21,937 | Time to send message: 0.25 seconds\n"
]
},
{
"data": {
"text/plain": [
"'berlin'"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm(\"What is the capital of Germany?\")"
]
},
{
"cell_type": "markdown",
"id": "7a0c3746",
"metadata": {},
"source": [
"Using a custom load function, we can load a custom pipeline directly on the remote hardware:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "893eb1d3",
"metadata": {},
"outputs": [],
"source": [
"def load_pipeline():\n",
" from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # Need to be inside the fn in notebooks\n",
" model_id = \"gpt2\"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
" model = AutoModelForCausalLM.from_pretrained(model_id)\n",
" pipe = pipeline(\n",
" \"text-generation\", model=model, tokenizer=tokenizer, max_new_tokens=10\n",
" )\n",
" return pipe\n",
"\n",
"def inference_fn(pipeline, prompt, stop = None):\n",
" return pipeline(prompt)[0][\"generated_text\"][len(prompt):]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "087d50dc",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"llm = SelfHostedHuggingFaceLLM(model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "feb8da8e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO | 2023-02-17 05:42:59,219 | Running _generate_text via gRPC\n",
"INFO | 2023-02-17 05:42:59,522 | Time to send message: 0.3 seconds\n"
]
},
{
"data": {
"text/plain": [
"'john w. bush'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm(\"Who is the current US president?\")"
]
},
{
"cell_type": "markdown",
"id": "af08575f",
"metadata": {},
"source": [
"You can send your pipeline directly over the wire to your model, but this will only work for small models (<2 Gb), and will be pretty slow:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d23023b9",
"metadata": {},
"outputs": [],
"source": [
"pipeline = load_pipeline()\n",
"llm = SelfHostedPipeline.from_pipeline(\n",
" pipeline=pipeline, hardware=gpu, model_reqs=model_reqs\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fcb447a1",
"metadata": {},
"source": [
"Instead, we can also send it to the hardware's filesystem, which will be much faster."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7206b7d6",
"metadata": {},
"outputs": [],
"source": [
"rh.blob(pickle.dumps(pipeline), path=\"models/pipeline.pkl\").save().to(gpu, path=\"models\")\n",
"\n",
"llm = SelfHostedPipeline.from_pipeline(pipeline=\"models/pipeline.pkl\", hardware=gpu)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading