Site Reliability Engineer
Remote
Full Time
Engineering
Mid Level
Zenoss is seeking an experienced Site Reliability Engineer (SRE) to join a team of Engineers and Architects creating breakthrough ITOps and AIOps Platform. We are seeking individuals with experience and knowledge in building software and tools to support our Ops and Support teams providing SaaS offerings in Cloud and microservices based architectures.
Zenoss has a culture that encourages its employees to ask questions, challenge assumptions, and dig in to address the problems posed with highly distributed cloud and SaaS software systems. As a SRE, you will be responsible for architecture, design, implementation and delivery of features and functionality used in supporting, operating and monitoring Zenoss Cloud. We offer a healthy work/life balance, a positive work environment and a host of amenities to enable our teams to do their best work. This is an ideal role for a self-motivated professional with passion for cutting edge technology and creative problem solving.
This position can be Remote (Work from Home) or work out of our Austin, TX office.
Responsibilities:
Zenoss has a culture that encourages its employees to ask questions, challenge assumptions, and dig in to address the problems posed with highly distributed cloud and SaaS software systems. As a SRE, you will be responsible for architecture, design, implementation and delivery of features and functionality used in supporting, operating and monitoring Zenoss Cloud. We offer a healthy work/life balance, a positive work environment and a host of amenities to enable our teams to do their best work. This is an ideal role for a self-motivated professional with passion for cutting edge technology and creative problem solving.
This position can be Remote (Work from Home) or work out of our Austin, TX office.
Responsibilities:
- Develop, deploy, operate, and support cloud infrastructure primarily utilizing GCP
- Work with development, operations and support personnel to identify, isolate, diagnose issues, handle support escalations, plan and deliver high value monitoring and alerting features
- Review of technical designs/information, automating processes through scripting, installation and configuration of software, and validation of technical environments
- Responsible for the ongoing maintenance, security, and availability of several applications based on business requirements and adhering to tight operations, security, and procedural models
- Ensure production level systems are running at all times and have multiple levels of redundancy to meet committed SLAs
- Applies professional-level technical skill and judgment to provide non-routine technical support for production operations to drive optimal performance, reliability, redundancy, and scale
- Document environment topology and installation details along with incident reviews
- Automation of tasks using scripting and configuration management systems
- Communicate highly technical information to both technical and non-technical personnel
- Work with customers to troubleshoot and resolve technical issues.
- Troubleshoot network performance issues, perform intrusion monitoring, and maintain disaster recovery procedures
- Plan for, and recommend, expansion of capacity and upgrades, patches, and new applications and equipment when necessary
- Participation in the development of information technology and infrastructure projects
- Document and thoroughly understand the application architecture and system configuration across platforms
- Determine the root cause of an outage, duration, and recommendations or steps to resolve issues
- Provide 24x7 support for all network and server systems that are pivotal to production
- Bachelor's degree in Computer Science/Engineering or equivalent relevant experience
- 3-6 years of professional hands-on experience with Cloud production environments hosted on GCP using BigTable, BigQuery, Dataflow, GKE and other GCP services
- Experience with CI/CD tools like Spinnaker and Jenkins and cloud-based software development and delivery processes/methodologies
- Strong scripting skills and demonstrated ability to automate tasks. (SaltStack, Python, Terraform preferred)
- Strong understanding of networking, firewalls, load balancers, and databases
- Strong verbal and written communication skills
- Project and task oriented with a focus on details and an ability to proactively communicate detailed status to customer and project team
- Strong organization skills and an ability to work both within a team and independently
- Ability to make sound decisions based on customer needs and technical knowledge
- Self-motivated and able to work under pressure to deliver high-quality solutions
- Ability to work after hours including weekends and night when required
Apply for this position
Required*