Staff SRE

Atlanta, GA, US

We are looking for a very experienced Splunk focussed Staff Site Reliability Engineer to join our team and work on some exciting projects. This position requires someone to be locally present and go to the work location 4 days in a week.

Essential functions

  1. Splunk Administration & Management Deploy, configure, and manage Splunk Cloud and Splunk Observability Suite.Ensure proper indexing, data ingestion, parsing, and retention policies.
    Manage user roles, authentication, and security controls within Splunk.

  2. Monitoring & Observability Implement and maintain dashboards, alerts, and reports using Splunk Observability tools (APM, Infrastructure Monitoring, RUM).Optimize log ingestion and monitoring strategies for real-time insights.Work with development and operations teams to enhance application and infrastructure observability.

  3. Performance Optimization Fine-tune Splunk queries, reports, and dashboards to ensure optimal performance.Optimize data models, searches, and indexing strategies to improve efficiency.
    Troubleshoot slow searches and resolve performance bottlenecks.

  4. Automation & Integration Automate Splunk administration tasks using scripts (Python, Bash, etc.).
    Integrate Splunk with third-party tools, including cloud services (AWS, Azure, GCP).
    Develop automation for log onboarding and data normalization.

  5. Collaboration & Support Work closely with DevOps, SRE, Security, and Application teams to understand logging and monitoring needs.Provide training and documentation for internal users to maximize Splunk usage.Support troubleshooting efforts for production incidents using Splunk insights.

  6. Capacity Planning & Upgrades Monitor system health and plan for scaling as log volumes grow.
    Manage Splunk upgrades, patches, and new feature rollouts.Stay updated with Splunk best practices and emerging technologies.

Qualifications

Splunk Expertise: 

  • 3+ years of hands-on experience in Splunk administration and management.
  • Strong understanding of Splunk Cloud, Splunk Observability Suite, and log ingestion pipelines.
  • Experience configuring and managing Splunk indexing, parsing, and retention policies.
  • Service optimization to enhance performance and cost efficiency.
  • Governance and guardrails implementation to ensure compliance with service limits.
  1. Monitoring & Observability:Proficiency in implementing dashboards, alerts, and reports using Splunk APM, Infrastructure Monitoring, and RUM.
    Experience optimizing log ingestion and monitoring strategies for real-time insights.
    Familiarity with observability best practices and troubleshooting performance issues.
  2. Performance Optimization & Troubleshooting:Ability to fine-tune Splunk queries, dashboards, and reports for performance efficiency.
    Strong experience in optimizing data models, searches, and indexing strategies.
    Expertise in troubleshooting Splunk-related performance bottlenecks and slow searches.
  3. Automation & Scripting:Experience in automating Splunk administration tasks using Python, Bash, or similar scripting languages.
    Knowledge of integrating Splunk with cloud services such as Azure, AWS, or GCP.
    Ability to develop automation for log onboarding and data normalization.
  4. Cloud & Infrastructure Knowledge:Hands-on experience with Azure services, including AKS, API Management, Azure Cache for Redis, Azure Blob Storage, Cosmos DB, and Service Bus.
    Understanding of cloud-based monitoring and logging best practices.
  5. Collaboration & Support:Ability to work cross-functionally with DevOps, SRE, Security, and Application teams.
    Strong documentation skills for creating internal training and operational runbooks.
    Experience supporting production incident troubleshooting using Splunk insights.
  6. Capacity Planning & Upgrades:Ability to monitor system health, scale log volumes, and manage Splunk upgrades.
    Familiarity with Splunk patches, new feature rollouts, and best practices.

Would be a plus

  • Experience with Other Observability Tools:Hands-on knowledge of Prometheus, Grafana, New Relic, and Splunk integrations.

  • Programming & Development Skills:Experience with Java, TypeScript, or Python for backend and observability enhancements.Familiarity with microservices architecture and API development.

  • Security & Compliance Knowledge:Understanding of security best practices in monitoring and logging.
    Experience implementing RBAC and authentication policies in Splunk.

  • Multi-Cloud & Hybrid Cloud Exposure:Experience with hybrid or multi-cloud environments, including on-premise Splunk deployments.

  • Certifications:Splunk Certified Admin or Splunk Certified Architect.
    Azure, AWS, or GCP certifications related to cloud observability or administration.

  • Kubernetes & Container Observability:Understanding of Kubernetes logging and monitoring within AKS.Experience managing logs from containerized environments.

  • Performance Engineering & Optimization:Experience optimizing Splunk resource consumption and query efficiency in high-volume environments.

We offer

  • Opportunity to work on cutting-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Benefits package - medical insurance, vision, dental, etc.
  • Corporate social events
  • Professional development opportunities
  • Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Apply to the position

apply status Information on personal data processing
decline status You cannot apply for a position without accepting “INFORMATION ON PERSONAL DATA PROCESSING”

    decline-status file-icon
    Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)
    Invalid phone format

    Consent to the processing of personal data in future recruitment processes*

    decline-status file-icon
    Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)
    Submitting
    decline status

    Applications for this job are no longer accepted. Please explore other open opportunities on our platform.

    Vacancy

    Thank you!

    You applied for the position Staff SRE successfully. We will get back to you soon. Have a great day!

    check

    Something went wrong...

    There are possible difficulties with connection or other issues. Please try to use another browser (it's recommended to use the latest version of Google Chrome browser). If the problem still persists, please send your application to

    Retry

    Something went wrong...

    Please double-check the information filled in the form, and make sure to provide valid data.

    Retry

    Don’t see the right opportunity?

    Grid Dynamics is an equal opportunity employer. We are committed to creating an inclusive environment for all employees during their employment and for all candidates during the application process.

    All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on, age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. All employment is decided on the basis of qualifications, merit, and business need.

    Get in touch

    Let's connect! How can we reach you?

      Invalid phone format
      Submitting
      Vacancy

      Thank you!

      It is very important to be in touch with you.
      We will get back to you soon. Have a great day!

      check

      Something went wrong...

      There are possible difficulties with connection or other issues.
      Please try again after some time.

      Retry