Custom Preprocessing
Create custom preprocessing strategies for domain-specific log formats
Learn how to create custom preprocessing strategies for domain-specific log formats.
Overview
Preprocessing transforms raw log lines before template extraction. It's crucial for:
- Masking variable data (IDs, tokens, values)
- Normalizing inconsistent formatting
- Handling domain-specific patterns
- Improving compression quality
Default Preprocessing
logpare includes built-in patterns for common variables:
These patterns automatically replace matching values with <*>.
Creating Custom Strategies
Use defineStrategy() to create a custom preprocessing strategy:
All three methods are optional. Only override what you need.
Common Patterns
Adding Custom ID Patterns
Mask application-specific identifiers:
E-commerce Logs
Multi-tenant SaaS Logs
Kubernetes/Container Logs
Custom Tokenization
CSV Logs
Split on commas instead of whitespace:
Tab-Separated Logs
JSON Logs
Extract specific fields for tokenization:
Depth-Based Similarity Thresholds
Adjust matching strictness by tree depth:
Use case: When initial tokens are highly variable but later tokens are consistent.
Testing Custom Strategies
Verify your strategy works as expected:
Best Practices
- Apply defaults first - Start with
DEFAULT_PATTERNSthen add custom patterns - Test incrementally - Add patterns one at a time and verify results
- Be specific - Use precise regex to avoid over-matching
- Cache patterns - Compile regex once, reuse many times
- Document patterns - Comment what each pattern matches
- Validate input - Handle malformed logs gracefully
- Monitor performance - Complex regex can slow processing
Debugging Tips
Inspect Preprocessing Output
Check Pattern Matches
Compare Results
See Also
- Parameter Tuning Guide - Optimize algorithm parameters
- Types Reference - ParsingStrategy interface
- compress() API - Using custom strategies