SITE RELIABILITY ENGINEER (MID-LEVEL)
Hyrbid Work Pattern Available
ABOUT THE ROLE
We are looking for a Mid-level Site Reliability Engineer (SRE) to help transition our incident management from reactive firefighting to proactive reliability engineering.
You will play a key role in improving observability, reducing incident frequency, and helping engineering teams understand how systems behave in production.
KEY RESPONSIBILITIES
• Own and improve monitoring, alerting, and observability for production systems
• Lead or contribute to incident investigations and postmortems
• Design alerts based on symptoms and user impact rather than infrastructure noise
• Use observability tools to analyze performance, errors, and traffic patterns
• Identify reliability risks before they turn into incidents
• Improve run books, on-call processes, and operational readiness
• Work closely with software teams to improve system resilience
• Automate repetitive operational tasks
REQUIRED SKILLS AND EXPERIENCE
• Strong Linux experience in production environments
• Hands-on experience with at least one major cloud provider (AWS preferred)
• Solid understanding of monitoring, alerting, and incident response
• Experience with observability tools (New Relic, Prometheus, Datadog, etc.)
• Scripting or automation experience (Bash, Python, or similar)
• Understanding of distributed systems fundamentals
• Comfortable participating in on-call rotations
NICE TO HAVE
• Experience with Infrastructure as Code (Terraform, CloudFormation, etc.)
• Experience with containers or orchestration (ECS, Kubernetes, Docker)
• Experience supporting PHP, Node.js, or similar application stacks
• Familiarity with SRE concepts such as SLIs, SLOs, and error budgets
WHAT SUCCESS LOOKS LIKE
• Reduced number of repeat incidents
• Clear and actionable alerts
• Faster detection and resolution of incidents
• Improved visibility into system health and performance
• Engineering teams that trust monitoring data
WHY JOIN US
• Real influence over how reliability is implemented across the company
• Work on systems operating at meaningful scale
• Opportunity to grow into Senior SRE or SRE Tech Lead roles
• Strong focus on engineering quality rather than ticket volume
Flexible working arrangements, with the ability to work both from our Sofia office and from home in a hybrid working model also available!
Knowledge of foreign languages:
Proficiency in English at least level B1 of the Common European Framework of Reference for Languages.
- Department
- Engineering
- Locations
- Sofia
- Remote status
- Hybrid
About Shkolo
With over 1,700 schools and more than 1 million users, Shkolo is Bulgaria's leading Management Information System (MIS) provider.
Now a proud member of the Juniper Education group, Shkolo is expanding its products to over 16,000 schools worldwide.
At Shkolo, we are revolutionizing education by leveraging cloud-based technology to enhance school efficiency, reduce teacher workload, and improve student outcomes.
Our passionate team is dedicated to making a meaningful and lasting impact on education at every level.