Objective
The primary objective is to develop a Legal LLM, leveraging both proprietary contracts and public datasets like CUAD. This model will assist in drafting and reviewing contracts, with continuous learning capabilities from user feedback.
Description
We are a boutique legal and commercial consulting firm with highly experienced partners. In order for us to stay competitive, we would like to limit redundant work by investing time and effort in training our own model based on the work we do in order for us to limit the time spent on new projects that includes work already provided previously to another client.
Accordingly, we are looking to train an existing LLM model (e.g. Llama2 or another open-source LLM model) based on our repository of contracts as well as existing data sets like CUAD.
The goal is to create an LLM with a baseline contract understanding that we use in our daily work to draft and review contracts. Based on this baseline, we then want to be able to use and to continuously train the model by scoring its suggestions and/or by providing it with new or better solutions when drafting or reviewing specific clauses so that it keeps learning from our high expertise.
In addition, to the extent feasible, we would like to feed the model with laws, regulations and guidelines as well as specific legal books in order to advance its understanding. It's key for us to understand and that we find the right balance between the hardware required to run and train the model and the accuracy of the model. The reason for this is that the model won't be allowed to run in public cloud once we start using it with our clients.
We are open to and would appreciate suggestions on how to approach this project in a way where we start "small" and then keep building the model while utilizing the model in our daily work.
Key Responsibilities
Train an existing LLM model (e.g., Llama2 or another open-source LLM model) using our contract repository and datasets like CUAD.
Integrate laws, regulations, guidelines, and specific legal literature to enhance the model's comprehension (with guidance from us).
Ensure a balance between the hardware required to run/train the model and its accuracy, considering the model will run in a dedicated hosting environment with client data.
Collaborate with our team to gather feedback and continuously refine the model.
Provide insights and suggestions for a phased approach to the project, starting "small" and progressively building the model.
Qualifications
Experience with LLM models and machine learning.
Familiarity with enhancing LLM models with other datasets like CUAD.
Strong understanding of hardware requirements for model training and deployment.
Ability to work collaboratively and gather feedback for continuous improvement.
Position Types
Student: Part-time employment as a student worker
Freelance or part-time: Entry-level or experienced professionals interested in part-time or freelance engagement
Future Prospects
Upon project completion, the engagement/employment will continue to support and ensure the ongoing development and refinement of the model.
Interested candidates are encouraged to share their approach to the project, especially how to initiate on a smaller scale and progressively expand while integrating the model into our daily operations.
Apply now and be a part of our innovative journey!
This job comes with several perks and benefits