Parameter Tuning Guide
Learn how to optimize logpare's parameters for different log types
Learn how to optimize logpare's parameters for different log types and use cases.
Core Parameters
logpare has four key parameters that control template generation:
| Parameter | Default | Range | Effect |
|---|---|---|---|
depth | 4 | 2-8 | Parse tree depth - higher = more specific templates |
simThreshold | 0.4 | 0.0-1.0 | Similarity threshold - higher = more templates |
maxChildren | 100 | 10-500 | Max children per node - affects tree width |
maxClusters | 1000 | 100-10000 | Max total templates - limits memory |
Tuning Strategy
Problem: Too Many Templates
Symptoms:
- Hundreds or thousands of templates for a moderate log file
- Similar-looking templates that should be grouped together
- High compression ratio (e.g., 0.8 or 0.9)
Solutions:
1. Lower the Similarity Threshold
Make template matching more lenient:
When to use:
- Logs with high variability in non-critical tokens
- Similar messages with minor differences
- Noisy application logs
2. Add Custom Preprocessing
Mask domain-specific variables that aren't caught by default patterns:
Problem: Templates Too Generic
Symptoms:
- Very few templates (e.g., 5-10 for thousands of lines)
- Templates grouping unrelated log types together
- Loss of important diagnostic information
Solutions:
1. Raise the Similarity Threshold
Make template matching more strict:
2. Increase Tree Depth
Allow the algorithm to consider more tokens:
When to use:
- Structured logs with many informative tokens
- When you need fine-grained template separation
- Logs with consistent formatting
Problem: High Memory Usage
Symptoms:
- Out of memory errors on large log files
- Slow processing times
- System becomes unresponsive
Solutions:
1. Limit Maximum Clusters
Cap the total number of templates:
2. Reduce Max Children
Prevent tree explosion:
3. Process in Batches
Use incremental processing for very large files:
Recommended Settings by Log Type
Structured Logs (JSON, CSV)
These logs have consistent fields and formatting:
Why:
- Structured logs have predictable token positions
- Higher threshold prevents over-grouping
- Shallow depth is sufficient
Noisy Application Logs
Logs with variable formatting and many unique values:
Why:
- Higher depth captures more context
- Lower threshold groups similar messages
- Handles inconsistent formatting
System Logs (syslog, journald)
Well-formatted system logs with standard patterns:
Why:
- Default settings work well for standard formats
- System logs have consistent structure
- Good balance between grouping and specificity
High-Volume Logs (>1M lines)
Optimize for memory efficiency:
Web Server Access Logs
HTTP request logs with standard formats:
Why:
- Access logs have many tokens (method, path, status, etc.)
- Higher depth captures full request patterns
- Higher threshold prevents grouping different endpoints
Advanced Tuning
Depth-Dependent Similarity Threshold
Adjust threshold based on tree depth:
Use case: When early tokens are highly variable but later tokens are consistent.
Diagnostic Tools
Check Compression Ratio
Monitor how well compression is working:
Analyze Template Distribution
Check if templates are well-distributed:
Summary
| Problem | Solution | Parameter Change |
|---|---|---|
| Too many templates | Lower threshold | simThreshold: 0.3 |
| Templates too generic | Raise threshold | simThreshold: 0.5 |
| Missing grouping | Increase depth | depth: 5 |
| Memory issues | Limit clusters | maxClusters: 500 |
| Unmasked variables | Custom preprocessing | Add patterns |
Remember: The best settings depend on your specific log format and use case. Start with defaults and adjust based on results.