MPCA: Multi-Party Conversation Agent

2026-02-04

This project extracts and validates human-like conversational parameters by analyzing Ubuntu IRC channel logs — then uses those parameters to make LLM agents converse more naturally.

GitHub Repository

Project Structure

The project is split into two parts:

Project 1 (Preprocessing) handles IRC data parsing, thread disentanglement, and parameter extraction. Project 2 (LLM Test) runs comparison experiments with metrics toggled on and off to measure how much these parameters actually matter.

Thread Disentanglement

IRC channels are chaotic — dozens of conversations happening simultaneously in a single stream. The first challenge was separating this mess into individual threads.

I built a semantic time-decay algorithm that weighs three signals:

SignalBehavior
Participant continuityMaintained regardless of time gap
Semantic similarityDecays exponentially over time — exp(-Δt/τ)
Mentions (@user)Always merged

With a time scale (τ) of 120 seconds and a merge threshold of 1.0, this turned 8,951 raw utterances into 1,318 coherent threads.

Human-like Timing

Real humans don't respond instantly. They pause, think, type. I extracted timing distributions from the IRC data:

Response TypeInstant Reply RateDelay Formula
Quick71%3–10 seconds
Normal69%3–10 seconds
Detailed62%10 + words×1.0 + tech×20

Chunking Policy

Humans don't dump everything into one message. They chunk:

67% of messages are single-chunk, 21% are split into two, and 12% into three or more. Each chunk maxes out at about 15 words (the 75th percentile of human messages).

These parameters — timing delays, chunking ratios, thread awareness — are what separate an LLM that sounds like a chatbot from one that feels like it belongs in the conversation.

Dataset

ItemValue
SourceUbuntu IRC #ubuntu (Libera.chat)
PeriodJanuary 2024 (1 month)
Utterances8,951
Threads1,318

← Back