Site Reliability Engineer
We are seeking an experienced and highly motivated Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in cloud infrastructure management, automation, and monitoring, with a specific focus on tools like Grafana and Terraform. In this role, you will collaborate with software engineering teams to build and maintain reliable, scalable, and secure infrastructure that supports our growing user base. This is an excellent opportunity to work on cutting-edge technologies while ensuring optimal system performance and reliability.
Essential functions
Design, implement, and maintain automated infrastructure using Terraform to manage cloud resources, ensuring scalability and reliability across multiple environments.
Build and maintain real-time monitoring dashboards and alerting systems using Grafana, ensuring that key metrics (e.g., uptime, latency, error rates) are tracked and reported across services.
Lead incident response efforts, troubleshoot issues, and perform root cause analysis (RCA) to minimize service downtime and improve system reliability.
Leverage automation tools to optimize infrastructure provisioning and management, reducing manual interventions and operational overhead.
Work closely with engineering teams to ensure that infrastructure supports application needs, such as performance, scalability, and high availability.
Continuously monitor, analyze, and improve system performance across production environments, ensuring minimal latency and optimal resource utilization.
Ensure infrastructure meets security best practices and compliance requirements.
Maintain clear and comprehensive documentation related to infrastructure, automation scripts, and system configurations.
Qualifications
Strong experience using Terraform for infrastructure provisioning and automation in cloud environments (AWS, Azure, GCP).
Proven experience using Grafana for creating monitoring dashboards, setting up alerts, and providing actionable insights into system health.
Strong background in automating operational workflows, especially using tools like Terraform and scripting languages (e.g., Python or Bash).
Familiarity with continuous integration and continuous delivery (CI/CD) pipelines and tools.
Proficiency in monitoring tools, especially Grafana, for creating and maintaining system observability solutions.
Would be a plus
Experience with Java or Python for scripting, automation, or software integration tasks.
Solid understanding of SQL and experience with performance tuning and managing databases (MySQL, PostgreSQL, or similar).
Familiarity with Quantum Metrics for user session analytics and performance insights.
Experience with Splunk for log management and analysis.
Familiarity with Jira for issue tracking and project management.
Previous experience in an SRE or DevOps role within a fast-paced production environment.
Familiarity with New Relic or similar APM tools for monitoring application
We offer
- 100% payroll scheme, benefits by law (IMSS, INFONAVIT, 12+ vacation days)
- Benefits above the law: Vacation premium 50%, 5 PTOs, 3 sick days, 10 guaranteed public holidays per year
- Major medical insurance, Dental and Vision plan for an employee and direct family members
- Minor Medical Insurance (Multiservicios Médicos Santander) for an employee and direct family members
- Life Insurance and funeral expenses
- 5% savings fund, uncapped (matched by the company in the end of the year)
- Grocery cards/vouchers (Vales de Despensa)
- 30 days End of the Year Bonus (Aguinaldo)
- Opportunity to work on bleeding-edge projects with a highly motivated and dedicated team all over the world
- Individual career development plan and support from the best experts
- Professional development opportunities (Linkedin Learning, Cloud certification programs, access to corporate LMS integrated with other learning platforms)
- Well-equipped office in a business area of Guadalajara (quiet room, games room, air hockey, PS5, Nintendo Switch and Xbox Series X, pool table, ping pong, snacks, smoothies, and much more)
- Corporate social events (yoga, massages, sport tournaments, discussion panels, technical talks, lunch & learns)
- Flexible working hours
- Opportunity to relocate to another country where the company's offices are present.
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.Apply to the position
Thank you!
You applied for the position Site Reliability Engineer successfully. We will get back to you soon. Have a great day!
Something went wrong...
There are possible difficulties with connection or other issues. Please try to use another browser (it's recommended to use the latest version of Google Chrome browser). If the problem still persists, please send your application to cv@griddynamics.com
RetrySomething went wrong...
Please double-check the information filled in the form, and make sure to provide valid data.
RetryDon’t see the right opportunity?
Contact us anyway and let’s talk! To apply, send your resume and cover letter to jobs@griddynamics.com
Grid Dynamics is an equal opportunity employer. We are committed to creating an inclusive environment for all employees during their employment and for all candidates during the application process.
All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on, age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. All employment is decided on the basis of qualifications, merit, and business need.
Get in touch
Let's connect! How can we reach you?
Thank you!
It is very important to be in touch with you.
We will get back to you soon. Have a great day!
Something went wrong...
There are possible difficulties with connection or other issues.
Please try again after some time.