logpare

Parameter Tuning Guide

Learn how to optimize logpare's parameters for different log types

Learn how to optimize logpare's parameters for different log types and use cases.

Core Parameters

logpare has four key parameters that control template generation:

ParameterDefaultRangeEffect
depth42-8Parse tree depth - higher = more specific templates
simThreshold0.40.0-1.0Similarity threshold - higher = more templates
maxChildren10010-500Max children per node - affects tree width
maxClusters1000100-10000Max total templates - limits memory

Tuning Strategy

Problem: Too Many Templates

Symptoms:

  • Hundreds or thousands of templates for a moderate log file
  • Similar-looking templates that should be grouped together
  • High compression ratio (e.g., 0.8 or 0.9)

Solutions:

1. Lower the Similarity Threshold

Make template matching more lenient:

compress(logs, {
  simThreshold: 0.3, // More aggressive grouping
});

When to use:

  • Logs with high variability in non-critical tokens
  • Similar messages with minor differences
  • Noisy application logs

2. Add Custom Preprocessing

Mask domain-specific variables that aren't caught by default patterns:

import { defineStrategy, DEFAULT_PATTERNS, WILDCARD } from 'logpare';
 
const strategy = defineStrategy({
  preprocess(line: string): string {
    let result = line;
 
    // Apply default patterns
    for (const [, pattern] of Object.entries(DEFAULT_PATTERNS)) {
      result = result.replace(pattern, WILDCARD);
    }
 
    // Add custom patterns
    result = result.replace(/order-[A-Z0-9]{8}/g, WILDCARD);
    result = result.replace(/user_\d+/g, WILDCARD);
    result = result.replace(/session-[a-f0-9]+/gi, WILDCARD);
 
    return result;
  }
});
 
compress(logs, { preprocessing: strategy });

Problem: Templates Too Generic

Symptoms:

  • Very few templates (e.g., 5-10 for thousands of lines)
  • Templates grouping unrelated log types together
  • Loss of important diagnostic information

Solutions:

1. Raise the Similarity Threshold

Make template matching more strict:

compress(logs, {
  simThreshold: 0.5, // More conservative grouping
});

2. Increase Tree Depth

Allow the algorithm to consider more tokens:

compress(logs, {
  depth: 5, // or 6
});

When to use:

  • Structured logs with many informative tokens
  • When you need fine-grained template separation
  • Logs with consistent formatting

Problem: High Memory Usage

Symptoms:

  • Out of memory errors on large log files
  • Slow processing times
  • System becomes unresponsive

Solutions:

1. Limit Maximum Clusters

Cap the total number of templates:

compress(logs, {
  maxClusters: 500,
});

2. Reduce Max Children

Prevent tree explosion:

compress(logs, {
  maxChildren: 50,
});

3. Process in Batches

Use incremental processing for very large files:

import { createDrain } from 'logpare';
 
const drain = createDrain({
  maxClusters: 500,
  maxChildren: 50,
});
 
// Process in chunks
const chunkSize = 10000;
for (let i = 0; i < logs.length; i += chunkSize) {
  const chunk = logs.slice(i, i + chunkSize);
  drain.addLogLines(chunk);
}
 
const result = drain.getResult();

Structured Logs (JSON, CSV)

These logs have consistent fields and formatting:

compress(logs, {
  depth: 3,
  simThreshold: 0.5,
});

Why:

  • Structured logs have predictable token positions
  • Higher threshold prevents over-grouping
  • Shallow depth is sufficient

Noisy Application Logs

Logs with variable formatting and many unique values:

compress(logs, {
  depth: 5,
  simThreshold: 0.3,
});

Why:

  • Higher depth captures more context
  • Lower threshold groups similar messages
  • Handles inconsistent formatting

System Logs (syslog, journald)

Well-formatted system logs with standard patterns:

compress(logs, {
  depth: 4,        // Default
  simThreshold: 0.4, // Default
});

Why:

  • Default settings work well for standard formats
  • System logs have consistent structure
  • Good balance between grouping and specificity

High-Volume Logs (>1M lines)

Optimize for memory efficiency:

compress(logs, {
  depth: 4,
  simThreshold: 0.4,
  maxClusters: 500,
  maxChildren: 50,
});

Web Server Access Logs

HTTP request logs with standard formats:

compress(logs, {
  depth: 6,
  simThreshold: 0.5,
});

Why:

  • Access logs have many tokens (method, path, status, etc.)
  • Higher depth captures full request patterns
  • Higher threshold prevents grouping different endpoints

Advanced Tuning

Depth-Dependent Similarity Threshold

Adjust threshold based on tree depth:

const strategy = defineStrategy({
  getSimThreshold(depth: number): number {
    if (depth <= 2) return 0.3; // More lenient for early tokens
    if (depth <= 4) return 0.4; // Default for middle
    return 0.5; // Stricter for deeper levels
  }
});
 
compress(logs, { preprocessing: strategy });

Use case: When early tokens are highly variable but later tokens are consistent.

Diagnostic Tools

Check Compression Ratio

Monitor how well compression is working:

const result = compress(logs);
 
console.log(`Compression ratio: ${result.stats.compressionRatio}`);
console.log(`Token reduction: ${result.stats.estimatedTokenReduction}%`);
 
if (result.stats.compressionRatio > 0.5) {
  console.log('⚠️  Low compression - consider lowering simThreshold');
} else if (result.stats.compressionRatio < 0.05) {
  console.log('⚠️  Very high compression - templates may be too generic');
}

Analyze Template Distribution

Check if templates are well-distributed:

const result = compress(logs, { format: 'json' });
 
const occurrences = result.templates.map(t => t.occurrences);
const avg = occurrences.reduce((a, b) => a + b, 0) / occurrences.length;
const max = Math.max(...occurrences);
 
console.log(`Average occurrences: ${avg}`);
console.log(`Max occurrences: ${max}`);
 
if (max > avg * 10) {
  console.log('⚠️  Skewed distribution - one template dominates');
}

Summary

ProblemSolutionParameter Change
Too many templatesLower thresholdsimThreshold: 0.3
Templates too genericRaise thresholdsimThreshold: 0.5
Missing groupingIncrease depthdepth: 5
Memory issuesLimit clustersmaxClusters: 500
Unmasked variablesCustom preprocessingAdd patterns

Remember: The best settings depend on your specific log format and use case. Start with defaults and adjust based on results.