Senior CloudOps Engineer

Pune, Maharashtra, India
Full Time
Research and Development
Mid Level

Senior CloudOps Engineer

Onit, Inc. is looking for a Sr. CloudOps Engineer to join our team in Pune to help manage and maintain a diverse infrastructure across numerous geographical locations. To be successful in this role, great people skills are a must, as well as a passion for technology. The individual we seek is bright, creative and a problem solver. You must be able to multi-task in a fast-paced environment and be a self-starter with the ability to work independently. 

 

Responsibilities 

  • Responsible for optimizing performance, ensuring security, and driving innovation in our cloud environment while responding to infrastructure and security alerts in a 24x7x365 operation. 

  • Create automation, runbooks, and playbooks to help others support the infrastructure 

  • Troubleshoot infrastructure and application 

  • -level issues and collaborate with support specialists and Cloud Operations / SRE 

  • Write and present weekly report highlighting the previous week’s alerts, with detailed analysis, resolution and any impact to SLA. 

  • Monitor performance and capacity of Onit systems. 

  • Monitor for hardware, software and environmental alerts or malfunctions. 

  • Monitor security alerts from multiple sources. 

  • Triage and troubleshoot problems as they arise, following runbooks and standard operating procedures. 

  • Track all issues from start to finish and document in detail all resolutions, across trouble ticketing system and engineering runbooks. 

  • Escalate issues to InfraOps/Devops engineers and Onit management. 

  • Ready to work in shifts. 

Requirements 

  • Bachelor’s degree in Computer Science or equivalent experience is required. 

  • 4+ years’ experience with Red Hat Enterprise or Amazon Linux 2023 is required. 

  • 3+ years hands-n experience with AWS (EC2, S3, RDS, VPC, Cloudwatch, CloudTrail, IAM, EKS, ECS, Security, etc.) 

  • A solid understanding of the components that make up production systems (Memory, CPU, Disk space, Disk i/o, Network i/o, etc.) is required. 

  • Strong experience with monitoring, alerting, and log aggregation tools: Datadog, AWS CloudWatch, PagerDuty, Statuspage. 

  • Experience with SIEM/event correlation systems like Elastic, Splunk, ELK, etc. required. 

  • Strong understanding of AWS security and monitoring and experience implementing best practices. 

  • Ability to read and interpret application server logs, outputs, CloudTrail and other critical logging output 

  • Experience working with Relational Database such as Postgres, AWS RDS is a plus 

  • Hands-on experience working in Kubernetes is a plus 

  • Experience with Enterprise Web applications in production  

  • Experience with a programming language such as Python a plus 

  • Excellent troubleshooting skills required. 

  • Ensure resource availability and allocation 

  • Excellent written and verbal communication skills required. 

  • Experience using Git (GitLab a plus), CI/CD pipelines (eg: Jenkins) 

Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*