|
| 1 | +# RatOS Unified Logging System |
| 2 | + |
| 3 | +This document describes the comprehensive unified logging system implemented for the RatOS-configurator project. The system consolidates all RatOS logs into a single main log file while providing specialized tools for viewing and analyzing logs from different sources, including update scripts and other system operations. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The unified logging system consists of four main components: |
| 8 | + |
| 9 | +1. **Structured Bash Logging Library** - Captures errors from shell scripts in JSON format, writing to the main RatOS log |
| 10 | +2. **CLI Log Management Commands** - Command-line tools for viewing and analyzing logs with source filtering |
| 11 | +3. **Web UI Integration** - Browser-based log viewer with filtering and analysis capabilities |
| 12 | +4. **Debug Integration** - Automatic inclusion of logs in debug packages |
| 13 | + |
| 14 | +## Architecture |
| 15 | + |
| 16 | +### 1. Bash Logging Library (`configuration/scripts/ratos-logging.sh`) |
| 17 | + |
| 18 | +The bash logging library provides structured logging capabilities for shell scripts, outputting logs in JSON format compatible with the pino logging system used throughout the application. **All logs are written to the main RatOS log file** (`/var/log/ratos-configurator.log`) with a `source: "ratos-update"` field for filtering. |
| 19 | + |
| 20 | +#### Features: |
| 21 | +- **JSON-formatted logs** compatible with pino |
| 22 | +- **Multiple log levels**: trace, debug, info, warn, error, fatal |
| 23 | +- **Unified log file** - writes to main RatOS log instead of separate files |
| 24 | +- **Source identification** - all entries tagged with `source: "ratos-update"` |
| 25 | +- **Error trapping** with stack trace capture |
| 26 | +- **Command execution logging** with automatic error handling |
| 27 | +- **Timestamped entries** with process information |
| 28 | + |
| 29 | +#### Usage Example: |
| 30 | +```bash |
| 31 | +#!/bin/bash |
| 32 | +source "$(dirname "$0")/ratos-logging.sh" |
| 33 | + |
| 34 | +# Set up error trapping |
| 35 | +setup_error_trap "my-script" |
| 36 | + |
| 37 | +# Log script start |
| 38 | +log_script_start "my-script.sh" "1.0.0" |
| 39 | + |
| 40 | +# Log various levels |
| 41 | +log_info "Starting operation" "main" |
| 42 | +log_warn "This is a warning" "main" "WARN_CODE" |
| 43 | +log_error "This is an error" "main" "ERROR_CODE" |
| 44 | + |
| 45 | +# Execute commands with logging |
| 46 | +execute_with_logging "apt-get update" "package_update" "APT_UPDATE_FAILED" |
| 47 | + |
| 48 | +# Log script completion |
| 49 | +log_script_complete "my-script.sh" $? |
| 50 | +``` |
| 51 | + |
| 52 | +#### Configuration: |
| 53 | +- `RATOS_LOG_LEVEL`: Set minimum log level (default: info) |
| 54 | +- `RATOS_LOG_FILE`: Log file path (default: uses `${LOG_FILE}` from environment, typically `/var/log/ratos-configurator.log`) |
| 55 | +- `RATOS_LOG_MAX_SIZE`: Maximum log file size before rotation (default: 0 = disabled when using main log) |
| 56 | +- `RATOS_LOG_BACKUP_COUNT`: Number of backup files to keep (default: 0 = disabled when using main log) |
| 57 | + |
| 58 | +**Note**: When using the unified logging system, log rotation is handled by the main RatOS log configuration, not by individual scripts. |
| 59 | + |
| 60 | +### 2. CLI Log Management (`src/cli/commands/update-logs.tsx`) |
| 61 | + |
| 62 | +The CLI provides several commands for viewing and analyzing update logs. **Update logs are now a subcommand of the main `logs` command** and automatically filter the main log file to show only entries with `source: "ratos-update"`. |
| 63 | + |
| 64 | +#### Commands: |
| 65 | + |
| 66 | +**`ratos logs update-logs summary`** |
| 67 | +- Shows a summary of the most recent update attempt from the main log |
| 68 | +- Displays success/failure status, error counts, and timing information |
| 69 | +- Automatically filters by `source: "ratos-update"` |
| 70 | + |
| 71 | +**`ratos logs update-logs show`** |
| 72 | +- Shows detailed log entries with filtering options from the main log |
| 73 | +- Options: |
| 74 | + - `-n, --lines <number>`: Number of recent lines to show (default: 50) |
| 75 | + - `-l, --level <level>`: Minimum log level (trace, debug, info, warn, error, fatal) |
| 76 | + - `-c, --context <context>`: Filter by context |
| 77 | + - `-d, --details`: Show detailed information |
| 78 | + |
| 79 | +**`ratos logs update-logs errors`** |
| 80 | +- Shows only errors and warnings from the most recent update |
| 81 | +- Options: |
| 82 | + - `-d, --details`: Show detailed information |
| 83 | + |
| 84 | +#### Usage Examples: |
| 85 | +```bash |
| 86 | +# Show update summary (note the new command structure) |
| 87 | +ratos logs update-logs summary |
| 88 | + |
| 89 | +# Show last 100 log entries at debug level |
| 90 | +ratos logs update-logs show -n 100 -l debug |
| 91 | + |
| 92 | +# Show only errors with details |
| 93 | +ratos logs update-logs errors -d |
| 94 | + |
| 95 | +# Show logs from specific context |
| 96 | +ratos logs update-logs show -c "update_symlinks" -d |
| 97 | + |
| 98 | +# Other log commands remain available: |
| 99 | +ratos logs tail # Tail the main log file |
| 100 | +ratos logs rotate # Force log rotation |
| 101 | +``` |
| 102 | + |
| 103 | +### 3. Web UI Integration |
| 104 | + |
| 105 | +The web interface provides a comprehensive log viewer accessible at `/configure/update-logs`. |
| 106 | + |
| 107 | +#### Features: |
| 108 | +- **Log Summary Dashboard**: Overview of recent update attempts |
| 109 | +- **Interactive Log Viewer**: Browse and filter log entries |
| 110 | +- **Real-time Filtering**: Filter by log level, context, and search terms |
| 111 | +- **Error Highlighting**: Visual distinction for different log levels |
| 112 | +- **Download Capability**: Download raw log files |
| 113 | +- **Auto-refresh**: Automatic updates when new logs are available |
| 114 | + |
| 115 | +#### Components: |
| 116 | +- `UpdateLogsViewer`: Main component for displaying logs |
| 117 | +- `UpdateLogsErrorBoundary`: Error boundary for graceful error handling |
| 118 | +- `LogSummaryCard`: Summary statistics and controls |
| 119 | +- `LogEntryComponent`: Individual log entry display |
| 120 | + |
| 121 | +### 4. API Endpoints |
| 122 | + |
| 123 | +#### TRPC Endpoints (`src/server/routers/update-logs.ts`): |
| 124 | +- `update-logs.summary`: Get log summary statistics (filtered by `source: "ratos-update"`) |
| 125 | +- `update-logs.entries`: Get filtered log entries (filtered by `source: "ratos-update"`) |
| 126 | +- `update-logs.errors`: Get only errors and warnings (filtered by `source: "ratos-update"`) |
| 127 | +- `update-logs.contexts`: Get available log contexts (filtered by `source: "ratos-update"`) |
| 128 | +- `update-logs.clear`: **Disabled** - Cannot clear main log file (use log rotation instead) |
| 129 | +- `update-logs.download`: Download main log file (contains all sources) |
| 130 | + |
| 131 | +#### REST Endpoints: |
| 132 | +- `GET /api/update-logs/download`: Download log file as attachment |
| 133 | + |
| 134 | +### 5. Debug Integration |
| 135 | + |
| 136 | +Update logs are automatically included in debug packages as part of the main log file: |
| 137 | +- Main log file (`/var/log/ratos-configurator.log`) is added to debug packages |
| 138 | +- Rotated log files (`.1`, `.2`, etc.) are included |
| 139 | +- All log sources (including update logs) are included in a single file |
| 140 | +- Logs are categorized appropriately in the debug package |
| 141 | + |
| 142 | +## Log Format |
| 143 | + |
| 144 | +All logs follow a consistent JSON format: |
| 145 | + |
| 146 | +```json |
| 147 | +{ |
| 148 | + "level": 30, |
| 149 | + "time": "2024-01-01T10:00:00.000Z", |
| 150 | + "msg": "Log message", |
| 151 | + "source": "ratos-update", |
| 152 | + "context": "update_symlinks", |
| 153 | + "errorCode": "SYMLINK_CREATE_FAILED", |
| 154 | + "pid": 1234, |
| 155 | + "hostname": "ratos-pi" |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +### Fields: |
| 160 | +- `level`: Numeric log level (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal) |
| 161 | +- `time`: ISO 8601 timestamp |
| 162 | +- `msg`: Human-readable log message |
| 163 | +- `source`: Source component (e.g., "ratos-update") |
| 164 | +- `context`: Function or operation context (optional) |
| 165 | +- `errorCode`: Standardized error code (optional) |
| 166 | +- `pid`: Process ID |
| 167 | +- `hostname`: System hostname |
| 168 | + |
| 169 | +## Error Codes |
| 170 | + |
| 171 | +Standardized error codes help identify common issues: |
| 172 | + |
| 173 | +### Update Script Error Codes: |
| 174 | +- `SCRIPT_ERROR`: General script failure |
| 175 | +- `SCRIPT_SUCCESS`: Script completed successfully |
| 176 | +- `SYMLINK_CREATE_FAILED`: Failed to create symbolic link |
| 177 | +- `SYMLINK_REMOVE_FAILED`: Failed to remove symbolic link |
| 178 | +- `NODE_INSTALL_FAILED`: Node.js installation failed |
| 179 | +- `APT_UPDATE_FAILED`: Package list update failed |
| 180 | +- `EXTENSION_SYMLINK_FAILED`: Extension symlinking failed |
| 181 | +- `OWNERSHIP_CHANGE_FAILED`: File ownership change failed |
| 182 | + |
| 183 | +### System Error Codes: |
| 184 | +- `FILE_NOT_FOUND`: Required file not found |
| 185 | +- `PERMISSION_DENIED`: Insufficient permissions |
| 186 | +- `NETWORK_ERROR`: Network connectivity issue |
| 187 | +- `DISK_FULL`: Insufficient disk space |
| 188 | + |
| 189 | +## Error Handling and Retry Logic |
| 190 | + |
| 191 | +### Bash Scripts: |
| 192 | +- Automatic error trapping with `set -eE` |
| 193 | +- Stack trace capture on script failure |
| 194 | +- Graceful error reporting with context |
| 195 | +- Exit codes indicate success/failure status |
| 196 | + |
| 197 | +### Web UI: |
| 198 | +- Error boundaries prevent UI crashes |
| 199 | +- Automatic retry with exponential backoff |
| 200 | +- Graceful degradation when logs unavailable |
| 201 | +- User-friendly error messages |
| 202 | + |
| 203 | +### CLI: |
| 204 | +- Robust error handling for missing files |
| 205 | +- Clear error messages with suggested actions |
| 206 | +- Non-zero exit codes for scripting |
| 207 | + |
| 208 | +## Monitoring and Alerting |
| 209 | + |
| 210 | +### Log Rotation: |
| 211 | +- Automatic rotation when files exceed 10MB |
| 212 | +- Keeps 5 backup files by default |
| 213 | +- Configurable via environment variables |
| 214 | + |
| 215 | +### Performance: |
| 216 | +- Efficient JSON parsing with error recovery |
| 217 | +- Indexed log entries for fast filtering |
| 218 | +- Lazy loading for large log files |
| 219 | + |
| 220 | +## Troubleshooting |
| 221 | + |
| 222 | +### Common Issues: |
| 223 | + |
| 224 | +**Log file not found:** |
| 225 | +- Ensure update scripts have been run at least once |
| 226 | +- Check `RATOS_DATA_DIR` environment variable |
| 227 | +- Verify directory permissions |
| 228 | + |
| 229 | +**Permission errors:** |
| 230 | +- Ensure log directory is writable by the RatOS user |
| 231 | +- Check file ownership and permissions |
| 232 | +- Run scripts with appropriate privileges |
| 233 | + |
| 234 | +**Large log files:** |
| 235 | +- Log rotation should handle this automatically |
| 236 | +- Manually clear logs using `ratos update-logs clear` (CLI) or web UI |
| 237 | +- Adjust `RATOS_LOG_MAX_SIZE` if needed |
| 238 | + |
| 239 | +**Missing log entries:** |
| 240 | +- Check `RATOS_LOG_LEVEL` setting |
| 241 | +- Ensure scripts are using the logging library correctly |
| 242 | +- Verify JSON format of log entries |
| 243 | + |
| 244 | +### Debug Commands: |
| 245 | +```bash |
| 246 | +# Check main log file location and size |
| 247 | +ls -la /var/log/ratos-configurator.log* |
| 248 | + |
| 249 | +# View raw log file (all sources) |
| 250 | +cat /var/log/ratos-configurator.log |
| 251 | + |
| 252 | +# View only update logs |
| 253 | +grep '"source":"ratos-update"' /var/log/ratos-configurator.log |
| 254 | + |
| 255 | +# Test log parsing |
| 256 | +ratos logs update-logs summary |
| 257 | + |
| 258 | +# Force log rotation (instead of clearing) |
| 259 | +ratos logs rotate |
| 260 | + |
| 261 | +# Validate bash scripts with ShellCheck |
| 262 | +shellcheck -ax -s bash configuration/scripts/ratos-logging.sh |
| 263 | +shellcheck -ax -s bash configuration/scripts/ratos-update.sh |
| 264 | +``` |
| 265 | + |
| 266 | +## Development |
| 267 | + |
| 268 | +### Adding New Log Sources: |
| 269 | +1. Source the logging library: `source "$(dirname "$0")/ratos-logging.sh"` |
| 270 | +2. Set up error trapping: `setup_error_trap "script-name"` |
| 271 | +3. Use logging functions: `log_info`, `log_error`, etc. |
| 272 | +4. Add appropriate error codes to documentation |
| 273 | + |
| 274 | +### Code Quality Standards: |
| 275 | +- **ShellCheck Compliance**: All bash scripts must pass ShellCheck validation |
| 276 | +- **Error Handling**: Use proper error trapping with selective `set +e`/`set -e` |
| 277 | +- **Variable Quoting**: Always quote variables and use `read -r` for input |
| 278 | +- **Exit Codes**: Use proper exit code handling and propagation |
| 279 | + |
| 280 | +### Testing: |
| 281 | +- Unit tests in `src/__tests__/update-logs.test.ts` |
| 282 | +- Integration tests for CLI commands |
| 283 | +- End-to-end tests for web UI |
| 284 | +- ShellCheck validation in CI/CD pipeline |
| 285 | + |
| 286 | +### Contributing: |
| 287 | +- Follow existing log format and error code conventions |
| 288 | +- Run ShellCheck on all bash scripts before committing |
| 289 | +- Add tests for new functionality |
| 290 | +- Update documentation for new features |
| 291 | +- Ensure backward compatibility |
| 292 | + |
| 293 | +## Security Considerations |
| 294 | + |
| 295 | +- Log files may contain sensitive information |
| 296 | +- Automatic inclusion in debug packages with user consent |
| 297 | +- No credentials or secrets should be logged |
| 298 | +- File permissions restrict access to RatOS user |
| 299 | +- Log rotation prevents unbounded disk usage |
| 300 | + |
| 301 | +## Future Enhancements |
| 302 | + |
| 303 | +- Real-time log streaming via WebSocket |
| 304 | +- Log aggregation from multiple sources |
| 305 | +- Advanced filtering and search capabilities |
| 306 | +- Integration with external monitoring systems |
| 307 | +- Automated error pattern detection |
| 308 | +- Performance metrics and trending |
0 commit comments