FINETUNE_PREP.AI

v1.0.0

[SYS] OpenAI fine-tuning dataset preparation system: Multi-format converter + JSONL processor

$ cat description.txt Advanced n8n workflow for preparing OpenAI fine-tuning datasets. Automatically processes various input formats into OpenAI-ready JSONL files. Supports tone of voice customization, regular instruction tuning, and function calling models with complete tool usage formatting.

CORE_FEATURES:

Multi Format Processing

> Tone of voice datasets
> Instruction-following datasets
> Function-calling datasets

Tool Formatting

> Function call JSON structure
> Tool response mapping
> Multi-turn conversations

JSONL Generation

> OpenAI spec-compliant format
> Proper line formatting
> Validation & error checking

Variable Substitution

> System prompt templating
> Response variable mapping
> Dynamic content insertion

EXAMPLE_OUTPUT:

Workflow Input:

"Fine-tune Type": "Reg_Func (Function Calling)",

"Format Type": "Multi-turn conversations",

"Input Dataset": "customer_service_examples.xlsx",

"Tool Definitions": "Automatically extracted"

RESULTING JSONL OUTPUT

Finetune Dataprep - JSONL Example

ℹ️
This document demonstrates the standardized JSONL format created by the Finetune Dataprep workflow. It transforms various input formats into OpenAI-ready training data for fine-tuning different model types.

Document Metadata

Source: Customer Service Dataset
Format: JSONL with Tool Calls
Tags: #CustomerService #ToolCalling #FunctionCalling #OpenAI #FineTuning

Finetune Dataprep Output Examples

Input Formats

📋
The workflow supports three primary data preparation formats:
                        
                        1. TOV (Tone of Voice) - For customizing model's tone and style
                        
                        2. Reg (Regular Instruction) - For general instruction-following capabilities
                        
                        3. Reg_Func (Function Calling) - For training models to use tools/functions

Example Function Calling Spreadsheet Input

Below is an example of how you would format your data in a spreadsheet for function calling fine-tuning:

| Customer Query | System Context | Agent Response | Function Call | System Return | Final Agent Response |
|---------------|----------------|----------------|--------------|--------------|---------------------|
| What is your return policy for electronics? | You are a helpful customer service agent with access to company policies via vector store. | Let me check our policy database for you. | {"name": "query_vector_store", "args": {"query": "return policy electronics"}} | Vector store returns: Electronics can be returned within 30 days with original packaging and receipt. Items must be undamaged and include all accessories. | Our electronics return policy allows returns within 30 days of purchase. The item must be in its original packaging with all accessories included, and you'll need your receipt. The item should be undamaged and in working condition. Would you like me to provide more specific details about any part of this policy? |
                    

JSONL Output Format

After processing your spreadsheet data, the workflow generates a properly formatted JSONL file that looks like this:

{
  "messages": [
    {"role": "system", "content": "You are a helpful customer service agent with access to company policies via vector store."},
    {"role": "user", "content": "What is your return policy for electronics?"},
    {"role": "assistant", "content": "Let me check our policy database for you."},
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "call_123456",
          "type": "function",
          "function": {
            "name": "query_vector_store",
            "arguments": "{\"query\": \"return policy electronics\"}"
          }
        }
      ]
    },
    {"role": "tool", "tool_call_id": "call_123456", "content": "Vector store returns: Electronics can be returned within 30 days with original packaging and receipt. Items must be undamaged and include all accessories."},
    {"role": "assistant", "content": "Our electronics return policy allows returns within 30 days of purchase. The item must be in its original packaging with all accessories included, and you'll need your receipt. The item should be undamaged and in working condition. Would you like me to provide more specific details about any part of this policy?"}
  ]
}
                    

Tool Definitions

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "query_vector_store",
        "description": "Searches the company knowledge base for relevant information",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {
              "type": "string",
              "description": "The search query to find information in the knowledge base"
            }
          },
          "required": ["query"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "query_order_database",
        "description": "Retrieves order information from the database",
        "parameters": {
          "type": "object",
          "properties": {
            "order_id": {
              "type": "string",
              "description": "The order ID to look up"
            },
            "query_type": {
              "type": "string",
              "enum": ["shipping_status", "item_status", "order_details"],
              "description": "Type of information to retrieve about the order"
            }
          },
          "required": ["order_id"]
        }
      }
    }
  ]
}
                    

Workflow Process

The Finetune Dataprep workflow processes your data through these key steps:

INPUT PROCESSING: The workflow accepts XLS or CSV files with customer service conversations organized in columns.
FORMAT SELECTION: Based on the job type (TOV, Reg, or Reg_Func), the workflow applies appropriate data transformation rules.
TOOL DEFINITIONS: For function calling datasets, tool definitions are automatically extracted and formatted according to OpenAI's schema.
CONVERSATION FORMATTING: Conversations are properly structured with user, assistant, and tool messages in the correct sequence.
JSONL GENERATION: The workflow outputs a properly formatted JSONL file with each JSON object on its own line, ready for upload to OpenAI's fine-tuning API.

💡
The workflow handles all the complex formatting requirements for different OpenAI fine-tuning types, including:
                        
                        - Proper message sequence and role attribution
                        
                        - Correct JSON structure for function/tool calls
                        
                        - Tool definition schema validation
                        
                        - Multi-turn conversation support with contextual awareness

This is an example of a fine-tuning JSONL file created with our template

$ system_requirements

MODELS: none required
STORAGE: google sheets
SERVICES: none required
OUTPUT: downloadable .jsonl file
PRICING: google sheets - free
EST. PER RUN COST: free

PROCESS_FLOW:

AUTOMATION_BENEFITS:

> Save hours of manual data formatting
> Error-free JSONL conversion
> Support for complex tool/function calling
> Process multi-turn conversations
> OpenAI compliant data structure

€119

PURCHASE_TEMPLATE

* Compatible with all n8n installations v1.0.0+

*Superflowz is a subsidiary of CARDUME ESBELTO UNIP. LDA. Your purchase will be from, and your receipt will list, CARDUME ESBELTO