Cloud / Platform / Observability Engineer
Backend โ Kubernetes โ Production Reliability
Now teaching cloud-native engineering (2024.03 ~ now)
I build...! reliable infra, observable systems, and engineers who can run them.
About Me[Eng]
Iโm not โsomeone who deploys YAML.โ
Iโm someone who has killed production by accident, revived it at 3AM, and then made sure it wonโt wake anyone up again.
- 2 years of hands-on work as a Software / Platform Engineer
 (backend services, AWS, containers, Kubernetes monitoring products)
- Currently a technical instructor for cloud-native engineering bootcamps
 (Java/Spring, Cloud, Kubernetes, IaC, CI/CD, GitOps, Ops culture)
I focus on:
- Production reliability (not just โit runs,โ but โit survivesโ)
- Observability as a first-class requirement
- Teaching people how to operate, not just deploy
I build environments you can trust, even if youโre half-asleep and on-call.
Software / Platform Engineer
- Built and operated e-commerce backend microservices on AWS.
- Joined a Kubernetes monitoring product team:
- Provisioned and maintained multiple K8s environments for agent developers and QA.
- Ran agent load tests to measure CPU/memory/network impact.
- Tuned resource usage so customer clusters stayed stable.
 
- Reverse-engineered competitors, identified feature gaps, and turned them into roadmap items and demos.
- Supported on-prem / customer installs and led hands-on troubleshooting sessions.
Technical Instructor (Mar 2024 โ present)
- Teach Java / Spring Boot, Cloud (AWS/Azure/GCP), Docker/K8s, IaC, CI/CD, GitOps.
- Train students on delivery the way real product teams ship: planning โ release โ monitoring โ RCA โ iteration.
- Mission-driven: proving (with execution, not degrees) that nontraditional engineers can deliver production-grade work โ and even mentor others.
A personal lab where I rebuild โseriousโ infra from scratch โ as code.
- IaC + GitOps + Observability stack bootstrapped end-to-end
- SLO / SLI / Error Budget culture simulated like a real on-call team
- FinOps / DR / Policy / Supply Chain Security wired in from day 0
| Project | Link | What it is | 
|---|---|---|
| Digital Asset Exchange Infra | 2025-demo-01 | 24/365 โnever stop tradingโ infra: EKS, MSK(Kafka), Aurora, ClickHouse, Istio, ArgoCD. Multi-repo, infra-as-code, Binance-style reliability simulation. | 
| PromQL Assistant CLI Text โ PromQL via LLM | promql-assistant-cli | Turn โshow pods above 90% CPUโ into valid PromQL. Built for on-call humans who don't want to fight dashboards at 2AM. | 
| hello-ebpf-demo Kernel tracing / eBPF | hello-ebpf-demo | Trace kernel-level events with eBPF and ship them to user space via a Go loader. For performance / runtime visibility. | 
| LLM Observability Stack Datadog + OTel + Llama3 + Grafana | datadog-llm-workshop | Treat LLM pipelines like production systems: latency, token cost, RAG path, failure hotspots โ all observable. | 
| Hybrid MLOps Platform | hybrid-mlops-demo | Cloud + On-Prem ML pipeline. Airflow, MLflow, Ray Serve (GPU), EKS. Training + inference + metrics in one workflow. | 
| Area | Repo | Notes | 
|---|---|---|
| Terraform / IaC Lab | terraform-playground | Terraform experiments for network/cluster provisioning and multi-env patterns. | 
| Kubernetes Lab | kubernetes-playground | Namespace strategy, ArgoCD sync, multi-cluster ops patterns. | 
โPerfect software doesnโt exist.
Reliable infrastructure does โ and it must be code.
And that code must prove itself through observability.โ
โ Sophie
About Me[Kor]
์ ๋ โKubernetes ์ข ๋ง์ ธ๋ดค์ด์โ ํ๋ ์ฌ๋์ด ์๋์์.
์ ๋ ์๋น์ค๋ฅผ ํ๋ฒ์ ๋ง๊ฐ๋จ๋ ค๋ณด๊ณ , ์ง์  ์ด๋ ค๋ณด๊ณ ,
๊ทธ ๋ค์์ ๋ค์๋ ์๋ฒฝ์ ์๋ฌด๋ ์ ๊นจ๋ ๋๊ฒ ๋ง๋๋ ์ฌ๋์ด ๋๋ ค๊ณ  ๋ชฉ์จ์ ๊ฒ๋๋ค!
- Backend Engineer & Platform Engineer ๊ฒฝ๋ ฅ 2๋
 (์ ์์๊ฑฐ๋ MSA on AWS โ ์ฟ ๋ฒ๋คํฐ์ค ๋ชจ๋ํฐ๋ง ์ ํํ์ผ๋ก ์ด์ง)
- 2024.03๋ถํฐ๋ ์ค์  ๋ถํธ์บ ํ์์ ๊ธฐ์  ๊ฐ์ ์ค
 (Java/Spring, ํด๋ผ์ฐ๋, ์ฟ ๋ฒ๋คํฐ์ค, IaC, CI/CD, GitOps, ์ด์๋ฌธํ๊น์ง)
์ ๊ฐ ์ง์ํ๊ฒ ์ง์ฐฉํ๋ ๊ฒ์!
- โ์ผ๋จ ๋์๊ฐ๋คโ๊ฐ ์๋๋ผ โํฐ์ ธ๋ ์ฐ๋คโ
- Observability๋ ๊ธฐ๋ฅ์ด ์๋๋ผ ์๊ตฌ์ฌํญ์ผ๋ก ๋๋ ๊ฒ
- ๋ฐฐํฌ ๋ฐฉ๋ฒ์ด ์๋๋ผ ์ด์ ๋ฐฉ๋ฒ๊น์ง ๊ฐ๋ฅด์น๋ ๊ฒ
โ๋๊ฐ ์๋ฒฝ 3์์ ๊นจ์๋ ๋ฏฟ๊ณ ๋งก๊ธธ ์ ์๋ ํ๊ฒฝโ
๊ทธ๊ฑธ ์ค๊ณํ๊ณ ๋ง๋ค๊ณ ์ ํ์ด๋ ์ฌ๋์ ๋๋ค.
Backend Engineer & Platform Engineer
- AWS ๊ธฐ๋ฐ ์ ์์๊ฑฐ๋ MSA ๋ฐฑ์๋ ๊ตฌ์ถ ๋ฐ ์ด์
- ์ดํ K8s ๋ชจ๋ํฐ๋ง ์ ํํ์ผ๋ก ์ด์ง
- Agent ๊ฐ๋ฐ์ ์ํ ์ฌ๋ฌ K8s์ ํ ๊ตฌ์ถ์ ํตํด ๊ฐ๋ฐ ํ๊ฒฝ ์ ๊ณต
- Agent ๋ฆฌ์์ค Usage (memory/CPU/network) ๋ถํ ํ ์คํธ
- ๋ฆฌ์์ค Tuningํด์ ๊ณ ๊ฐ์ฌ Cluster ์์ ์ฑ ์ ์ง
 
- ๊ฒฝ์์ฌ ์๋ฃจ์  ๋ถ์ โ ๊ธฐ๋ฅ ๊ฒฉ์ฐจ ์ ์ โ ๊ธฐ๋ฅ ๊ฐ์ ์ / Demo๊น์ง ์ฐ๊ฒฐ
- On-prem ๊ณ ๊ฐ์ฌ ํ๊ฒฝ ์ค์น ์ง์, ๋ผ์ด๋ธ ํธ๋ฌ๋ธ์ํ ์ฐธ์ฌ (์ง์ง ์ ์ํฐ)
๋ถํธ์บ ํ ๊ฐ์ฌ (2024.03 ~ ์งํ ์ค)
- Java / Spring Boot ๋ฐฑ์๋, ํผ๋ธ๋ฆญ ํด๋ผ์ฐ๋(AWS/Azure/GCP), Docker / Kubernetes, IaC, CI/CD, GitOps ๊ต์ก
- ์ค์  ํ๋ก๋ํธ ํ์ ์ผํ๋ ์์๋ฅผ ๊ทธ๋๋ก ๊ฐ๋ฅด์นจ
 ๊ธฐํ โ ๋ฐฐํฌ โ ๋ชจ๋ํฐ๋ง โ RCA โ ๊ฐ์
- ๋ชฉํ๋ โ๋ฐฐํฌ ๋ฒํผ ๋๋ฅผ ์ ์๋ ์ฌ๋โ์ด ์๋๋ผ
 โ์๋น์ค๋ฅผ ์ฑ ์์ง ์ ์๋ ์ฌ๋โ์ ๋ง๋๋ ๊ฒ
์ ๋ ๋น์ ๊ณต์ ์ถ์ ๋ ์ค์  ํ๋ก๋์ ์ ์ฑ ์์ง ์ ์๋ค๋ ๊ฑธ ์  ๊ฒฝ๋ ฅ์ผ๋ก ์ฆ๋ช ํ๊ณ ์ถ์ต๋๋ค.
์ ๋ ์คํ๋ ฅ๊ณผ ์ํฉํธ๋ก ๊ฒฐ๊ณผ๋ฅผ ๋ง๋ค์ด๋ด๊ณ , ๋์์ ๋ค๋ฅธ ์ฌ๋์ ๊ฐ๋ฅด์น ์๋ ์๋ค๋ ๊ฑธ ๋ณด์ฌ์ค ์ ์๋ ์ญํ ์ ํนํ ๊ด์ฌ์ด ์์ต๋๋ค. ๊ฐ์ฌ๋ก์์ ์  ์ผ์ ๊ทธ ๋ฏธ์ ์ ์ผ๋ถ์ ๋๋ค. ์ ๋ ํ์๋ค์ด ์ค์  ํ๋ก๋์  ์์ค์ ์ญ๋์ ๊ฐ์ถ๋๋ก ๋๊ณ , โ๋ฐฐ๊ฒฝ์ด ๋ค๋ฅด๋คโ๋ ์ด์ ๋ง์ผ๋ก ๊ธฐ์ ์  ๊น์ด๋ ๋ฆฌ๋์ญ์ด ์ ํ๋์ง ์๋๋ค๋ ๊ฑธ ์ฆ๋ช ํ๊ธฐ ์ํด ๊ต์ก ํ์ฅ์์ ๊ฒฝํ์ ์์์์ต๋๋ค.
SophieLabs ๋ ์ ๊ฐ ์ง์  ๋ง๋๋ ๊ฐ์ธ ์ฐ๊ตฌ ํ๊ฒฝ์
๋๋ค.
๋ชฉํ๋ ๊ฐ๋จํด์:
โ์ง์ง ํ์ฌ๋ฅผ ํ๋ด๋ด์ง ๋ง๊ณ , ๊ทธ๋ฅ ๋ด๊ฐ ํ์ฌ์ฒ๋ผ ๊ตด๋ฆฌ์.โ
- IaC + GitOps + Observability ์ ์ฒด ํ์ดํ๋ผ์ธ ์๋ํ
- SLO / SLI / Error Budget ๊ฐ์ ์ด์ ๋ฌธํ๊น์ง ์ฝ๋๋ก ์๋ฎฌ๋ ์ด์ 
- FinOps / DR / Policy / Supply Chain Security ๋ฅผ ์ด๋ฐ๋ถํฐ ๊ตฌ์กฐ ์์ ์ฌ๋ ๋ฐฉ์ ์ฐ๊ตฌ
| ํ๋ก์ ํธ | ๋งํฌ | ์ค๋ช | 
|---|---|---|
| Digital Asset Exchange Infra | 2025-demo-01 | 24/365 ๋ฉ์ถ์ง ์๋ ๊ฐ์์์ฐ ๊ฑฐ๋์ ์ธํ๋ผ ์คํ. EKS, MSK(Kafka), Aurora, ClickHouse, Istio, ArgoCD ๋ฑ ์ ์ฒด ๊ตฌ์ฑ์ ์ฝ๋๋ก ๊ด๋ฆฌ. (10๊ฐ ์ด์ ๋ ํฌ ๊ตฌ์กฐ) | 
| PromQL Assistant CLI ์์ฐ์ด โ PromQL | promql-assistant-cli | โCPU 90% ๋์ ํ๋ ๋๊ตฌ์ผ?โ ๊ฐ์ ๋ฌธ์ฅ์ ๊ณง๋ฐ๋ก PromQL๋ก ๋ฐ๊ฟ์ฃผ๋ CLI. ์๋ฒฝ ์จ์ฝ ์์ ์ด๋ฆฌ๋ ๋๊ตฌ. | 
| hello-ebpf-demo ์ปค๋ ํธ๋ ์ด์ฑ / eBPF | hello-ebpf-demo | eBPF๋ก ์ปค๋ ๋ ๋ฒจ ์ด๋ฒคํธ๋ฅผ ์ถ์ ํ๊ณ Go ๋ก๋๋ฅผ ํตํด ์ ์  ๊ณต๊ฐ์ผ๋ก ์ ๋ฌ. ์ฑ๋ฅ/๋ณด์ ๊ฐ์์ฑ ํ๋ณด ๋ชฉ์ . | 
| LLM Observability Stack Datadog + OTel + Llama3 + Grafana | datadog-llm-workshop | LLM ํธ์ถ ์ฒด์ธ์ ๊ทธ๋ฅ โAI ๋ง๋ฒโ์ผ๋ก ๋์ง ์๊ณ , ์ง์ฐ / ํ ํฐ ๋น์ฉ / RAG ๊ฒฝ๋ก / ์คํจ ์ง์ ์ ์ ๋ถ ๊ฐ์ํ. | 
| Hybrid MLOps Platform | hybrid-mlops-demo | ์จํ๋ ๋ฏธ์ค + ํด๋ผ์ฐ๋ ํผํฉ ML ํ์ดํ๋ผ์ธ. Airflow, MLflow, Ray Serve(GPU), EKS๊น์ง ํ ์ํฌํ๋ก์ฐ๋ก ๋ฌถ์ด์ ํ์ต/์ถ๋ก /๋ชจ๋ํฐ๋ง. | 
| ๋ถ์ผ | Repo | ์ค๋ช | 
|---|---|---|
| Terraform / IaC ์คํ์ค | terraform-playground | Terraform์ผ๋ก ๋คํธ์ํฌ/ํด๋ฌ์คํฐ ๊ตฌ์ฑ, ๋ฉํฐํ๊ฒฝ ํจํด ์คํ. | 
| Kubernetes ์คํ์ค | kubernetes-playground | ๋ค์์คํ์ด์ค ์ ๋ต, ArgoCD ๋๊ธฐํ ํจํด, ๋ฉํฐํด๋ฌ์คํฐ ์ด์ ๋ฐฉ์ ๊ฒ์ฆ. | 

