Spaces:
Sleeping
Sleeping
- The system's architecture is designed to mitigate online toxicity by transforming text inputs into less provocative forms using Large Language Models (LLMs), which are pivotal in analysing and refining text.
- Different workers, or LLM interfaces are defined, each suited for specific operational environments.
- The HTTP server worker is optimised for development purposes, facilitating dynamic updates without necessitating server restarts, it can work offline, with or without a GPU using the
llama-cpp-pythonlibrary, provided a downloaded model. - An in-memory worker is used by the serverless worker.
- For on-demand, scalable processing, the system includes a RunPod API worker that leverages serverless GPU functions.
- Additionally, the Mistral API worker offers a paid service alternative for text processing tasks.
- A set of environment variables are predefined to configure the LLM workers' functionality.
- The
LLM_WORKERenvironment variable sets the active LLM worker. - The
N_GPU_LAYERSenvironment variable allows for the specification of GPU layers utilised, defaulting to the maximum available, used when the LLM worker is ran with a GPU. CONTEXT_SIZEis an adjustable parameter that defines the extent of text the LLM can process concurrently.- The
LLM_MODEL_PATHenvironment variable indicates the LLM model's storage location, which can be either local or sourced from the HuggingFace Hub. - The system enforces some rate limiting to maintain service integrity and equitable resource distribution.
- The
LAST_REQUEST_TIMEandREQUEST_INTERVALglobal variables are used for Mistral rate limiting. - The system's worker architecture is somewhat modular, enabling easy integration or replacement of components such as LLM workers.
- The system is capable of streaming responses in some modes, allowing for real-time interaction with the LLM.
- The
llm_streamingfunction handles communication with the LLM via HTTP streaming when the server worker is active. - The
llm_stream_sans_networkfunction provides an alternative for local LLM inference without network dependency. - For serverless deployment, the
llm_stream_serverlessfunction interfaces with the RunPod API. - The
llm_stream_mistral_apifunction facilitates interaction with the Mistral API for text processing. - The system includes a utility function,
replace_text, for template-based text replacement operations. - A scoring function,
calculate_overall_score, amalgamates different metrics to evaluate the text transformation's effectiveness. - The
query_ai_promptfunction serves as a dispatcher, directing text processing requests to the chosen LLM worker. - The
inference_binary_checkfunction withinapp.pyensures compatibility with the available hardware, particularly GPU presence. - The system provides a user interface through Gradio, enabling end-users to interact with the text transformation service.
- The
chill_outfunction inapp.pyis the entry point for processing user inputs through the Gradio interface. - The
improvement_loopfunction inchill.pycontrols the iterative process of text refinement using the LLM.