Fine-Tuning Large Language Models Made Easier: The ScaleXI Python Library

13 min readDec 2, 2023

Learning Outcomes

1- Simplification of LLM Usage with ScaleXI: Understanding how ScaleXI streamlines the use of Large Language Models for various skill levels.
2- Practical Applications of ScaleXI: Gaining knowledge of how to effectively employ ScaleXI for dataset formatting, model fine-tuning, and performance evaluation.
3- Addressing AI Development Challenges: Recognizing the common challenges in AI model development and how ScaleXI provides solutions to these issues.

ScaleXI Package is a framework to facilitate manipulation of Generative AI and LLMs

I. Introduction

The landscape of artificial intelligence is constantly evolving, and with the introduction of Large Language Models (LLMs) like OpenAI’s GPT series, we’ve seen significant strides in natural language processing. These models offer impressive capabilities in generating and understanding text that mirrors human communication. However, harnessing the full potential of these technologies often requires a level of programming expertise that can be a barrier for many.

For those without extensive coding experience, navigating the complexities of LLMs can be a challenging endeavor. This includes understanding the nuances of machine learning, as well as the intricacies of programming required to effectively deploy and fine-tune these models. As a result, a gap exists between the capabilities of these advanced AI tools and the accessibility to a wider audience.

Addressing this challenge is ScaleXI, an open-source Python library. ScaleXI is crafted to make the utilization of LLMs more approachable for a diverse user base. It aims to simplify the process of managing and customizing LLMs, providing a more user-friendly pathway for those interested in exploring AI technologies. This library offers tools and features that are designed to lower the barrier to entry, making it feasible for professionals, students, and AI enthusiasts with varying levels of coding expertise to engage with these powerful models.

In this blog, we will explore how ScaleXI is contributing to the democratization of AI tools and examine its role in making advanced natural language processing more accessible.

II. Why ScaleXI is a Game-Changer

User-Friendly Low-Code Interface: ScaleXI stands out with its intuitive low-code interface, significantly lowering the entry barrier for LLM development. This feature is particularly beneficial for those who may have great ideas but limited coding expertise, making it possible for a wider audience to engage in AI development.
Automated Dataset Generation: Preparing datasets is often one of the most labor-intensive aspects of training LLMs. ScaleXI addresses this by automating the dataset generation process. It efficiently transforms raw data into structured formats that are optimized for fine-tuning, saving valuable time and effort.
Versatile Dataset Support: ScaleXI demonstrates its flexibility by supporting various data formats, including CSV, JSON, and JSONL. This versatility is crucial for users who work with diverse data sources, simplifying the task of dataset management.
Streamlined Fine-Tuning Process: The library enhances the process of customizing and optimizing LLMs for specific data sets. This streamlined approach not only improves model performance but also makes these advanced customizations more accessible to a broader range of users.
Efficient Model Evaluation Tools: Ensuring the effectiveness and reliability of fine-tuned models is crucial. ScaleXI includes automated tools for model evaluation, providing users with insights into the performance and accuracy of their models.
Cost and Token Usage Estimation: A unique feature of ScaleXI is its ability to estimate token usage and the associated costs in LLM projects. This function aids in efficient resource management, helping users to plan and budget their projects more effectively.

III. Installation

Getting started with ScaleXI is straightforward. Here’s how you can install it:

pip install -U scalexi

This command will download and install ScaleXI along with its necessary dependencies. The pip package manager makes the installation process smooth and efficient. Once the installation is complete, ScaleXI is ready for use in your Python environment.

IV. Building Large Fine-Tuning Datasets: Tackling the Challenge

A. The Problem: Cumbersome and Time-Consuming Dataset Preparation

One of the most daunting tasks in the process of fine-tuning Large Language Models (LLMs) is the creation of a substantial and effective fine-tuning dataset. This process typically involves collecting, organizing, and formatting vast amounts of data — a task that is not only cumbersome but also incredibly time-consuming. For many, especially those without extensive resources or technical teams, this step can be a significant barrier to leveraging the full potential of LLMs. The challenges include:

Data Collection: Gathering relevant and diverse data that can effectively train the LLM.
Data Organization: Structuring the data in a way that aligns with the requirements of the LLM.
Data Formatting: Ensuring the data is in a format that is compatible with the LLM’s training process.

B. ScaleXI’s Solution: Streamlining Dataset Generation

ScaleXI emerges as a powerful tool in this context, significantly easing the burden of dataset preparation. Here’s how ScaleXI simplifies each step of the process:

Automated Dataset Generation: ScaleXI introduces automation in generating datasets, which drastically reduces the manual effort involved in data preparation. Users can provide basic context inputs in a simple CSV format, and ScaleXI handles the complex task of transforming these inputs into a structured dataset optimized for LLM training.
Context File Setup: To facilitate this process, ScaleXI requires users to set up a ‘context file’ — a CSV file with a single column titled ‘context’. This file should contain various context entries, each representing a potential training input for the LLM. ScaleXI’s design ensures that these entries adhere to the token limits of the LLM, thus maintaining the efficiency of the dataset. Here is an example of a context.csv file.

context,
"Your first context entry goes here. It can be a paragraph or a document that you want to use as the basis for generating questions or prompts.",
"Your second context entry goes here. Make sure that each entry is not too lengthy to stay within the token limits of your LLM."

3. User-Friendly Scripting for Dataset Creation: With ScaleXI, users can easily generate a fine-tuning dataset through a simple Python script. The script allows for specifying parameters such as the number of questions, types of questions (e.g., yes-no, open-ended), and model specifications. This process not only saves time but also makes the task of dataset creation more accessible to those with limited coding expertise.

Here’s an example of how ScaleXI transforms the complex process of dataset preparation into a more manageable task:

import os
from scalexi.dataset_generation.prompt_completion import PromptCompletionGenerator

 # Ensure your OpenAI API key is set as an environment variable
 os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

 # Instantiate the generator with desired settings
 generator = PromptCompletionGenerator(enable_timeouts=True)

 # Specify the path to your context file and the desired output file for the dataset
 context_file = 'path/to/your/context.csv'
 output_dataset_file = 'path/to/your/generated_dataset.csv'

 # Call the create_dataset method with your parameters
 generator.create_dataset(context_file, output_dataset_file,
                         num_questions=1,
                         question_types=["yes-no", "open-ended", "reflective"],
                         model="gpt-3.5-turbo-1106",
                         temperature=0.3,
                         detailed_explanation=True)

This script will generate a dataset with 'yes-no', 'open-ended' and 'reflective', type questions based on the context provided in your CSV file.

In summary, ScaleXI not only simplifies the process of dataset preparation but also democratizes the ability to fine-tune LLMs, making it a practical option for a broader range of users.

V. Cost Estimation with ScaleXI

A. The Challenge of Cost Management in LLM Projects

A major concern when working with Large Language Models, particularly for individuals or organizations with limited budgets, is the ability to accurately estimate and manage costs. These costs are primarily based on the number of tokens processed during fine-tuning and inference stages, and inaccurate estimations can lead to significant budget overruns.

B. ScaleXI’s Solution: OpenAIPricing for Accurate Cost Estimation

ScaleXI addresses this challenge with its OpenAIPricing class, designed to provide clear and precise cost estimations for both fine-tuning and inference stages. This tool helps users forecast expenses accurately, ensuring better budget management. Here’s how it works:

import json
import pkgutil
from scalexi.openai.pricing import OpenAIPricing

# Load the pricing data
data = pkgutil.get_data('scalexi', 'data/openai_pricing.json')
pricing_info = json.loads(data)

# Create an OpenAIPricing instance
pricing = OpenAIPricing(pricing_info)

# Estimate cost for fine-tuning
number_of_tokens = 10000  # Replace with your actual token count
estimated_cost = pricing.estimate_finetune_training_cost(number_of_tokens, model_name="gpt-3.5-turbo")
print(f"Estimated cost for fine-tuning with {number_of_tokens} tokens: ${estimated_cost:.2f}")

# Estimate cost for inference
input_tokens = 10000  # Replace with your actual input token count
output_tokens = 5000  # Replace with your actual output token count
estimated_cost = pricing.estimate_inference_cost(input_tokens, output_tokens, model_name="gpt-3.5-turbo")
print(f"Estimated inference cost: ${estimated_cost:.2f}")

With OpenAIPricing, users can enter their expected token usage and receive an instant cost estimate, enabling more informed financial planning for LLM projects.

VI. Dataset Formatting with ScaleXI

A. The Problem of Dataset Preparation for LLMs

Preparing a dataset in the correct format for training LLMs is a technical and meticulous task. It requires not only structuring the data correctly but also ensuring compatibility with the specific requirements of the chosen LLM. This can be a daunting task, especially for those who are not deeply versed in data science or machine learning.

B. How ScaleXI Simplifies Dataset Formatting

ScaleXI offers tools and functionalities that ease the burden of dataset formatting. These tools help users convert their datasets into the required formats for different LLMs, streamlining one of the most intricate aspects of model training. While specific steps and scripts for dataset formatting using ScaleXI are not detailed here, the package typically provides straightforward methods for converting common data types (like CSV, JSON, and JSONL) into the formats needed for LLM training.

C. Converting Datasets with DataFormatter

A key component of ScaleXI’s toolkit is the DataFormatter class, designed to facilitate the conversion of datasets into formats compatible with LLMs, such as OpenAI's models. One common requirement for fine-tuning datasets on OpenAI is converting CSV files into JSONL format. The DataFormatter class makes this process straightforward. Here's a brief guide on how to use it:

from scalexi.utilities.data_formatter import DataFormatter

# Initialize the DataFormatter
dfm = DataFormatter()

# Convert a CSV dataset to JSONL format
csv_dataset_path = "path/to/your/dataset.csv"  # Replace with your actual CSV file path
jsonl_dataset_path = "path/to/your/dataset.jsonl"  # Replace with your desired JSONL file path
dfm.csv_to_jsonl(csv_dataset_path, jsonl_dataset_path)

With these simple steps, you can efficiently convert your CSV datasets to the JSONL format required for fine-tuning, streamlining one of the most critical steps in preparing your data for LLM training.

D. Fine-Tuning Dataset Conversion

Transforming Datasets for Conversational Model Fine-Tuning: When it comes to fine-tuning OpenAI’s GPT-based conversational models, preparing your dataset in the right format is crucial. ScaleXI provides an efficient method to transform a dataset from a prompt completion format to a conversation format, which is more suitable for these specific models. This conversion is key for anyone looking to train a conversational AI model.

Here’s how you can use ScaleXI to perform this conversion:

# Convert prompt completion dataset to conversation format
prompt_completion_dataset_path = "path/to/your/generated_dataset.jsonl"  # Replace with your actual JSONL file path
conversation_dataset_path = "path/to/your/conversation_dataset.jsonl"  # Replace with your desired JSONL file path
dfm.convert_prompt_completion_to_conversation(prompt_completion_dataset_path, conversation_dataset_path)

# Calculate token usage for the conversation dataset
number_of_tokens = pricing.calculate_token_usage_for_dataset(conversation_dataset_path)
print(f"Number of tokens in the conversation dataset: {number_of_tokens}")

# Estimate fine-tuning cost for the conversation dataset
estimated_cost = pricing.estimate_finetune_training_cost(number_of_tokens, model_name="gpt-3.5-turbo")
print(f"Estimated fine-tuning cost for the conversation dataset: ${estimated_cost:.2f}")

This process restructures your dataset, making it more aligned with the requirements of conversational AI training.

Estimating Token Usage and Fine-Tuning Costs: Beyond dataset conversion, understanding and managing the token usage and associated costs of fine-tuning is essential. ScaleXI aids in this regard as well:

from scalexi.openai.pricing import OpenAIPricing

# Initialize OpenAIPricing
pricing = OpenAIPricing(pricing_info)

# Calculate the token usage for your conversation dataset
number_of_tokens = pricing.calculate_token_usage_for_dataset(conversation_dataset_path)
print(f"Number of tokens in the conversation dataset: {number_of_tokens}")

# Estimate the fine-tuning cost for your conversation dataset
estimated_cost = pricing.estimate_finetune_training_cost(number_of_tokens, model_name="gpt-3.5-turbo")
print(f"Estimated fine-tuning cost for the conversation dataset: ${estimated_cost:.2f}")

VII. Fine-Tuning OpenAI Models with ScaleXI

A. The Challenge of Fine-Tuning LLMs

Fine-tuning Large Language Models (LLMs) like those offered by OpenAI is a critical step in tailoring these models to specific needs or datasets. However, this process can be daunting due to its complexity and the technical expertise required. Challenges include setting up the environment, managing API keys, starting and monitoring fine-tuning jobs, and interpreting the results. For those without extensive background in AI or programming, these tasks can be overwhelming, often acting as a barrier to the effective use of LLMs.

B. ScaleXI’s Solution: Streamlining Fine-Tuning with FineTuningAPI

ScaleXI addresses these challenges by simplifying the fine-tuning process. It provides a user-friendly interface and tools that make fine-tuning more accessible and manageable. Here’s how it works:

C. Setting up FineTuningAPI

The first step in this simplified process is to set up the FineTuningAPI class, which involves configuring your OpenAI API key. ScaleXI allows you to do this directly within your script, enhancing security and convenience:

import os
from scalexi.openai.fine_tuning_api import FineTuningAPI

# Prompt for the OpenAI API key and set it as an environment variable
api_key = input("Please enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = api_key

# Confirm that the API key has been set
print(f"OpenAI API key set: {os.getenv('OPENAI_API_KEY') is not None}")

D. Running the Fine-Tuning Dashboard

After setting up the API key, ScaleXI offers a fine-tuning dashboard that serves as a control center for your fine-tuning activities. This dashboard is an invaluable tool for managing and monitoring your fine-tuning jobs:

from scalexi.openai.fine_tuning_api import FineTuningAPI

# Initialize the FineTuningAPI
api = FineTuningAPI(api_key=os.getenv("OPENAI_API_KEY"))

# Launch the dashboard
api.run_dashboard()

This intuitive dashboard provides options for various tasks:

Menu:
1. Create a fine-tune file
2. Create a fine-tuning job
3. List of tune-tune files
4. List 10 fine-tuning jobs
5. Retrieve the state of a fine-tune
6. Cancel a job
7. List up to 10 events from a fine-tuning job
8. Use a fine-tuned model
8. Delete a fine-tuned model
10. Exit

Through these features, ScaleXI makes the fine-tuning process more transparent and manageable, enabling users to effectively customize LLMs for their specific requirements without needing extensive technical expertise.

VII. Fine-Tuned LLM Evaluation with ScaleXI

A. The Challenge of Evaluating Fine-Tuned LLMs

Evaluating the performance of a fine-tuned Large Language Model (LLM) is a critical step in ensuring its effectiveness and suitability for specific tasks. This process involves testing the model’s responses against a variety of prompts to assess accuracy, relevance, and consistency. However, creating an effective evaluation framework can be intricate, requiring the generation of representative test samples, rephrasing of prompts for robust testing, and systematic comparison against ground truth.

B. ScaleXI’s Approach to LLM Evaluation

ScaleXI’s LLMEvaluation module provides a user-friendly and efficient method for evaluating the performance of fine-tuned Large Language Models (LLMs). This evaluation process is crucial for understanding how well a model has adapted to specific tasks or datasets, ensuring its effectiveness in practical applications.

The LLMEvaluation module in ScaleXI is designed to simplify and systematize this evaluation process, making it accessible even to those with limited experience in machine learning.

Step 1: Random Sample Creation

The evaluation journey begins with the creation of a random sample from your dataset. This step is fundamental in ensuring that the evaluation is comprehensive and unbiased. A representative sample is essential to gauge the model’s performance across various scenarios and data points, providing a realistic overview of its capabilities and limitations.

The first step is to generate a random sample from your dataset for evaluation purposes. ScaleXI’s LLMEvaluation class simplifies this process:

from scalexi.llm_evaluation.evaluate import LLMEvaluation
import os

# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

# Initialize LLMEvaluation
llm_evaluation = LLMEvaluation(enable_timeouts=False)

# Define your dataset path
conversation_dataset_jsonl = 'path/to/your/files/conversation_dataset.jsonl'

# Set up output details for the sample dataset
output_folder = 'path/to/output/folder/'
output_file = output_folder + 'random_prompts.csv'

# Generate a CSV with random prompts for evaluation
llm_evaluation.save_random_prompts(conversation_dataset_jsonl, output_file, output_format='csv', n_samples=10)

Step 2: Rephrase Prompts

In the second step of the evaluation process, ScaleXI offers a unique feature to rephrase prompts. This is crucial for testing the model’s ability to generalize and understand variations in language. The tool not only rephrases prompts but also allows for optional classification into categories like ‘ACADEMIC’, ‘RESEARCH’, or ‘ADMIN’. This step is essential in creating diverse scenarios that mimic real-world applications, ensuring the model’s responses are robust and versatile. By rephrasing and classifying prompts, users gain deeper insights into the nuanced capabilities of their fine-tuned LLMs.

# Rephrase prompts and classify them
rephrased_dataset_csv = output_folder + 'rephrased_dataset.csv'
llm_evaluation.rephrase_and_classify_prompts_in_dataset(output_file, rephrased_dataset_csv,
                                                        classify=True,
                                                        classes=['ACADEMIC', 'RESEARCH', 'ADMIN', 'SCIENCE', 'OTHERS'])

Step 3: Evaluate LLM

The final step in ScaleXI’s evaluation process involves a comprehensive assessment of the fine-tuned Large Language Model’s performance. This is achieved by comparing the LLM’s responses to the rephrased prompts generated in the previous step.

This comparison is pivotal in understanding how well the model has adapted to the nuances and variations of language presented in different prompts. It helps in evaluating the model’s ability to generate relevant, accurate, and contextually appropriate responses. This step is crucial for gauging the effectiveness of the fine-tuning process, ensuring that the model not only performs well in standard scenarios but also maintains its reliability and accuracy in varied and potentially unforeseen situations.

# Define the fine-tuned model for evaluation
finetuned_model = 'ft-gpt-3.5-turbo-your-model-id'
evaluation_results_csv = output_folder + 'evaluation_results.csv'

# Evaluate using the rephrased dataset
llm_evaluation.evaluate_model(finetuned_model,
                              rephrased_dataset_csv,
                              evaluation_results_csv,
                              temperature=0.3, max_tokens=250, top_p=1.0,
                              frequency_penalty=0, presence_penalty=0,
                              llm_evaluator_model_name='gpt-3.5-turbo',
                              experiment_id=1,  # Experiment identifier
                              save_immediately=False)

The evaluation_results.csv file generated by ScaleXI's evaluation process offers a detailed assessment of your fine-tuned LLM's performance. Each response is scored between 0 and 5 by an advanced LLM, like GPT-4 or GPT-3.5-Turbo. The file includes statistical metrics such as average score and standard deviation, providing insights into the overall accuracy and consistency of the model's responses. This quantitative analysis is crucial for understanding the model's effectiveness and identifying areas for improvement.

Conclusions

ScaleXI emerges as a practical tool in the AI landscape, particularly for those working with Large Language Models like OpenAI’s GPT series. It simplifies several aspects of AI model development, including dataset preparation, cost estimation, and model fine-tuning and evaluation. For researchers, developers, and AI enthusiasts, ScaleXI offers a set of tools that can make the process of working with advanced AI models more accessible and manageable. Its contribution lies in facilitating a more streamlined approach to engaging with complex AI technologies.

References and Further Reading

For more information on ScaleXI and to deepen your understanding of its capabilities and applications, the following resources are invaluable:

ScaleXI GitHub Repository: Visit ScaleXI on GitHub for the source code, detailed documentation, and latest updates.
ScaleXI Official Documentation: Access comprehensive guides and tutorials on ScaleXI’s Documentation, which offer valuable insights into how to effectively use the library.
ScaleXI Website: For an overview of ScaleXI, its features, and its impact on AI and machine learning, explore the ScaleXI Official Website.

Stay Updated and Connect

To stay updated with the latest developments and connect with the community, consider subscribing to ScaleXI’s social media channels:

ScaleXI on LinkedIn: Connect with the professional community and engage with like-minded individuals and experts in the field.
ScaleXI on Twitter: Follow for quick updates, tips, and news related to ScaleXI and AI technology.