Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

can I use "micropip" to install "langchain"?? if not, I request the python package "langchain". #3711

Open
cosmosanalytics opened this issue Mar 29, 2023 · 10 comments 路 May be fixed by #4549
Open

Comments

@cosmosanalytics
Copy link

cosmosanalytics commented Mar 29, 2023

馃悕 Package Request

@rth
Copy link
Member

rth commented Mar 29, 2023

If you look a the list of requirements here for this package, it has around 40 first level dependencies some of which are large and difficult to build (e.g. torch #1625, tensorflow #3121, redis, psycopg2 for WASM etc). Overall it feels like it simultaneously imports a lot of heavy dependencies, which is not something that makes sense to do in a browser.

So overall I think it's very unlikely to be ever available in the browser, and I will close this issue as we will not work on it, sorry. However, people are of course very welcome to try to make it work, in particular by trying to to build some of its dependencies linked above.

Or alternatively, if longchain developers make a release with a smaller list of dependencies, that might make it more possible to run in the browser.

@hoodmane
Copy link
Member

I just started looking into this and in fact langchain doesn't seem to have that many base dependencies. The only missing thing is aiohttp:

>>> await micropip.install("langchain", keep_going=True)
ValueError: Can't find a pure Python 3 wheel for: 'aiohttp<4.0.0,>=3.8.3'

it also doesn't use much API surface of aiohttp, so it should be reasonable to shim in terms of fetch.

@hoodmane
Copy link
Member

Though maybe what happened is they just moved these deps out of their hard requirements into "peer" requirements and the package is useless without them.

@hoodmane
Copy link
Member

hoodmane commented Oct 18, 2023

Okay it is possible to make langchain do stuff in Pyodide:

Screenshot from 2023-10-17 19-27-40

@ryanking13
Copy link
Member

Yes, openai package requires aiohttp, but it is not a hard dependency. so I think people can just patch aiohttp part (or ask maintainers to make it optional) and use it.

@hoodmane
Copy link
Member

Well the aiohttp part is the better part, the other option is it uses requests which just doesn't work at all. Ideally I think we should package aiohttp with a patch.

@rth
Copy link
Member

rth commented Oct 19, 2023

Nice! Ahh so what must be happening when you install longchain and openai it downloads the old pure python aiohttp package but never uses it.

the other option is it uses requests which just doesn't work at all.

Even with https://github.com/koenvo/pyodide-http ?

Ideally I think we should package aiohttp with a patch.

I would also be fine if it's done by pyodide-http cc @koenvo who was interested in working on this as far as I remember. Also there is already an upstream issue FYI aio-libs/aiohttp#7253

@Dynathresh
Copy link

Dynathresh commented Oct 20, 2023

I am dealing with this as well, trying to use stlite to make an executable of a streamlit app. @hoodmane, how did you get langchain to use an older version of aiohttp, assuming that's how you did it? My pip shows aiohttp [required: >=3.8.3,<4.0.0, installed: 3.8.6] which pyodide (micropip) tries to follow. I can't find any requirements file that defines it inside of my langchain package. I'm happy to have a quick hack for now.

@hoodmane
Copy link
Member

hoodmane commented Oct 20, 2023

I built aiohttp-3.8.6 for use with Pyodide. I also used the following monkeypatch:

aiohttp patch
from multidict import CIMultiDict, istr
from aiohttp import payload, InvalidURL, hdrs, ClientSession, ClientTimeout
from aiohttp.client_reqrep import _merge_ssl_params
from aiohttp.helpers import TimeoutHandle, strip_auth_from_url, get_env_proxy_for_url
from contextlib import suppress
from typing import Any, Optional, Iterable
from yarl import URL

class Content:
    __slots__ = ("_jsresp", "_exception")
    def __init__(self, _jsresp):
        self._jsresp = _jsresp
        self._exception = None

    async def read(self):
        if self._exception:
            raise self._exception
        buf = await self._jsresp.arrayBuffer()
        self._jsresp = None
        return buf.to_bytes()

    def exception(self):
        return self._exception

    def set_exception(self, exc: BaseException) -> None:
        self._exception = exc

async def _request(
    self,
    method: str,
    str_or_url,
    *,
    params = None,
    data: Any = None,
    json: Any = None,
    cookies = None,
    headers = None,
    skip_auto_headers: Optional[Iterable[str]] = None,
    auth = None,
    allow_redirects: bool = True,
    max_redirects: int = 10,
    compress: Optional[str] = None,
    chunked: Optional[bool] = None,
    expect100: bool = False,
    raise_for_status = None,
    read_until_eof: bool = True,
    proxy = None,
    proxy_auth = None,
    timeout = None,
    verify_ssl: Optional[bool] = None,
    fingerprint: Optional[bytes] = None,
    ssl_context = None,
    ssl = None,
    proxy_headers = None,
    trace_request_ctx = None,
    read_bufsize: Optional[int] = None,
):
    # NOTE: timeout clamps existing connect and read timeouts.  We cannot
    # set the default to None because we need to detect if the user wants
    # to use the existing timeouts by setting timeout to None.

    if self.closed:
        raise RuntimeError("Session is closed")
    
    ssl = _merge_ssl_params(ssl, verify_ssl, ssl_context, fingerprint)

    if data is not None and json is not None:
        raise ValueError(
            "data and json parameters can not be used at the same time"
        )
    elif json is not None:
        data = payload.JsonPayload(json, dumps=self._json_serialize)


    redirects = 0
    history = []
    version = self._version
    params = params or {}

    # Merge with default headers and transform to CIMultiDict
    headers = self._prepare_headers(headers)
    proxy_headers = self._prepare_headers(proxy_headers)

    try:
        url = self._build_url(str_or_url)
    except ValueError as e:
        raise InvalidURL(str_or_url) from e

    skip_headers = set(self._skip_auto_headers)
    if skip_auto_headers is not None:
        for i in skip_auto_headers:
            skip_headers.add(istr(i))

    if proxy is not None:
        try:
            proxy = URL(proxy)
        except ValueError as e:
            raise InvalidURL(proxy) from e

    if timeout is None:
        real_timeout = self._timeout
    else:
        if not isinstance(timeout, ClientTimeout):
            real_timeout = ClientTimeout(total=timeout)  # type: ignore[arg-type]
        else:
            real_timeout = timeout
    # timeout is cumulative for all request operations
    # (request, redirects, responses, data consuming)
    tm = TimeoutHandle(self._loop, real_timeout.total)
    handle = tm.start()

    if read_bufsize is None:
        read_bufsize = self._read_bufsize

    traces = []

    timer = tm.timer()
    try:
        with timer:
            url, auth_from_url = strip_auth_from_url(url)
            if auth and auth_from_url:
                raise ValueError(
                    "Cannot combine AUTH argument with "
                    "credentials encoded in URL"
                )

            if auth is None:
                auth = auth_from_url
            if auth is None:
                auth = self._default_auth
            # It would be confusing if we support explicit
            # Authorization header with auth argument
            if auth is not None and hdrs.AUTHORIZATION in headers:
                raise ValueError(
                    "Cannot combine AUTHORIZATION header "
                    "with AUTH argument or credentials "
                    "encoded in URL"
                )

            all_cookies = self._cookie_jar.filter_cookies(url)

            if cookies is not None:
                tmp_cookie_jar = CookieJar()
                tmp_cookie_jar.update_cookies(cookies)
                req_cookies = tmp_cookie_jar.filter_cookies(url)
                if req_cookies:
                    all_cookies.load(req_cookies)

            if proxy is not None:
                proxy = URL(proxy)
            elif self._trust_env:
                with suppress(LookupError):
                    proxy, proxy_auth = get_env_proxy_for_url(url)

            req = self._request_class(
                method,
                url,
                params=params,
                headers=headers,
                skip_auto_headers=skip_headers,
                data=data,
                cookies=all_cookies,
                auth=auth,
                version=version,
                compress=compress,
                chunked=chunked,
                expect100=expect100,
                loop=self._loop,
                response_class=self._response_class,
                proxy=proxy,
                proxy_auth=proxy_auth,
                timer=timer,
                session=self,
                ssl=ssl,
                proxy_headers=proxy_headers,
                traces=traces,
            )

            req.response = resp = req.response_class(
                req.method,
                req.original_url,
                writer=None,
                continue100=req._continue,
                timer=req._timer,
                request_info=req.request_info,
                traces=req._traces,
                loop=req.loop,
                session=req._session,
            )
            from js import fetch, Headers
            from pyodide.ffi import to_js
            body = None
            if req.body:
                body = to_js(req.body._value)
            jsresp = await fetch(str(req.url), method=req.method, headers=Headers.new(headers.items()), body=body)
            resp.version = version
            resp.status = jsresp.status
            resp.reason = jsresp.statusText
            # This is not quite correct in handling of repeated headers
            resp._headers = CIMultiDict(jsresp.headers) 
            resp._raw_headers = tuple(tuple(e) for e in jsresp.headers)
            resp.content = Content(jsresp)


        # check response status
        if raise_for_status is None:
            raise_for_status = self._raise_for_status

        if raise_for_status is None:
            pass
        elif callable(raise_for_status):
            await raise_for_status(resp)
        elif raise_for_status:
            resp.raise_for_status()

        # register connection
        if handle is not None:
            if resp.connection is not None:
                resp.connection.add_callback(handle.cancel)
            else:
                handle.cancel()

        resp._history = tuple(history)

        for trace in traces:
            await trace.send_request_end(
                method, url.update_query(params), headers, resp
            )
        return resp

    except BaseException as e:
        # cleanup timer
        tm.close()
        if handle:
            handle.cancel()
            handle = None

        for trace in traces:
            await trace.send_request_exception(
                method, url.update_query(params), headers, e
            )
        raise

ClientSession._request = _request

@hoodmane
Copy link
Member

I'll try to put up a demo on GitHub soon.

@hoodmane hoodmane reopened this Jan 31, 2024
@hoodmane hoodmane linked a pull request Feb 23, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants