How I Work — Michael Krawczyk

Ticket Workflow

Triage

impact · scope · time

›

Confirm

reproduce · validate

›

Correlate

logs · tools · changes

›

Hypothesize

1–3 likely causes

›

Test Safely

smallest move first

›

Fix

with controls

›

Verify

prove + no regression

›

Repeatable Playbook

story · evidence · reuse

Evidence-First Troubleshooting

🔗

Log Correlation

RMM (ConnectWise Automate, NinjaOne), EDR, backup console, event logs, and network — I cross-reference all of them before I trust any single one.

🗺

Scope Mapping

One device, one site, or everywhere? Knowing the blast radius before touching anything changes everything about how I respond.

📸

Before / After Snapshots

I establish a clean baseline before making any change. That snapshot is the proof the fix worked — and the proof I didn't break anything else.

🧭

Root Cause Narrative

I don't close a ticket with "issue resolved." I close it with the full chain of what happened, why, and what proved the fix held.

✅

Verification Checks

Post-fix validation isn't optional. I confirm the fix did what it was supposed to do — and nothing else changed that shouldn't have.

↩

Rollback-Ready Fixes

The rollback plan is defined before I deploy — not after something breaks. Every change I make has a clear way back.

Production-Safe Scripting

Before anything I write touches production, I run through this internally — even for "quick" scripts.

✓

Pre-check validation — confirm the environment is what I expect before execution begins

✓

Verbose logging — every action is recorded so I can prove what ran and when

✓

Guardrails — no risky defaults, no silent failures, no assumptions about target state

✓

Post-check verification — the script confirms its own success before reporting done

✓

Rollback logic — defined before I run it, not discovered after something breaks

✓

Idempotent behavior — safe to re-run without causing damage if it fires twice

Even when it's "just a script" — I write it like it might be audited later. Because sometimes it is.

Process vs. Procedure

Process — Always the Same

The Consistent Method

Triage the signal

Validate scope and impact

Remediate with controls

Verify the resolution

Document for reuse

This never changes. It doesn't matter what tools are in use, what client it is, or what the stack looks like.

Procedure — Tool-Specific

How It Gets Done

NinjaOne RMM script

ConnectWise Automate script

SentinelOne exclusion rule

Firewall rule change

GPO enforcement

This changes with every environment. It's just the execution layer — not the thinking behind it.

Severity & Response Posture

● P1 — Critical

Restore. Contain. Communicate.

›Restore service or contain the blast — pick one, move fast

›Communicate status early and often — no one likes silence

›Document everything as it happens, not after

›Post-incident review within 24 hours

● P2 — Elevated

Stabilize. Prevent Recurrence.

›Stabilize first — make it stop getting worse

›Identify the pattern and break it

›Flag automation opportunity if this is repeatable

›Full root cause before closure

● P3 — Standard

Optimize. Prevent. Document.

›Fix it right, not just fast

›Check if there's a preventative control worth adding

›Add to backlog if it needs a proper solution

›Turn the fix into a reusable note

Noise Reduction & Signal Tuning

Mean Time to Identify

Faster signal recognition through alert discipline

Reduced false positive noise↓ Alert fatigue

Tuned alert thresholds per client↑ Fidelity

Cross-tool correlation on intake↓ MTTI

Scope confirmed before escalatingStructured

Mean Time to Resolve

Faster resolution through reusable ops patterns

Playbook-driven repeat incidents↓ MTTR

Evidence-built hypothesis list↓ Guesswork

Layered triage — basic pass first↑ Efficiency

Post-fix verification before closeNo callbacks

Repeatability Engine

🎫

Ticket

Problem lands. Box it immediately.

📝

Notes

Evidence logged as I go — not reconstructed after.

🔍

Pattern

Seen this before? Check the archive first.

📚

Playbook

If a pattern exists — follow it. If not, build it.

↓ output feeds back into the system ↓

⚙️

Script / Automation

If it ran manually twice — it should run automatically.

👁

Monitoring Check

Add a watch signal so the next occurrence is caught early.

📋

Standard

Document the resolution as a team reference.

🔁

Faster Next Time

Every ticket makes the next one cheaper to solve.

Audit-Ready Documentation

My documentation standard

🕐Chain-of-custody mindset

📅Timeline building

🔎Evidence-backed conclusions

📜Compliance mapping awareness

🗂Defensible ticket narratives

📖Reusable KB format

Toolchain as Systems Thinking

EDR / MDR / Email

SentinelOne

Huntress

Webroot

Proofpoint

KnowBe4

Cisco Umbrella

RMM / PSA

ConnectWise Automate

ConnectWise Control

ConnectWise Manage

NinjaOne

Autotask PSA

Network

Cisco Meraki MX

SonicWall

pfSense

WatchGuard

Wireshark

LANSurveyor

Backup / DR

Datto

Axcient

Veeam

StorageCraft

Monitoring / SIEM

Zabbix

Grafana

Nagios

Graylog

OpenVAS

Identity / Cloud

Azure / Entra ID

Microsoft 365

Active Directory

Group Policy (GPO)

Exchange Server

How I Communicate Under Pressure

My incident update format — every status message follows this structure so stakeholders always know exactly where things stand.

Current Impact

What is broken right now and who is affected — users, systems, functions. No jargon.

What We Know

Confirmed facts only. What the evidence says. No speculation in a status update.

What We're Testing Next

Next action and why. Shows I have a plan and I'm not just waiting for something to fix itself.

ETA for Next Update

A specific time. Even "30 minutes" is better than silence. People can work around a timeline.

Mitigation in Place

What's been done to reduce impact while the root fix is in progress. Workarounds, containment, manual fallback.