Site Reliability Engineer (SRE)
SICPA SA
Prendre contact
Liens Importants
Offre >
Entreprise >
Site Reliability Engineer (SRE)
Req ID: 29260 Posted on: 28 Mar 2025 Location:
Huechuraba, Chile
Department: Customer Projects Deployment & Services Job Family: Information Technology
SITE RELIABILITY ENGINEER
Site Reliability Engineer (SRE)
AIM OF THE JOB:
As an SRE, responsible for responding to incidents and escalation. This includes on-call support and escalation
support that may be required after office hours and planned during the weekend. A support duty roster shall be
implemented. On Technical Support, competent in troubleshooting and investigating technical problems, perform RCA,
recommending resolutions, and implementing workarounds when a software fix is not available yet. On Solution and
Observability Monitoring must be competent in developing, customizing, and implementing Monitoring of the solution. On
Continuous delivery, responsible for deployment of new versions of applications. On Solution Quality Assurance,
participate with Product Dev and DevOps on development testing activities (FAT) and drive solution testing during
deployment (SAT). Proactively shares knowledge with team members and SRE community. Possess a curious mindset that is
always learning new things or making new improvements.
Main responsibilities and activities:
- Implement solution monitoring and observability monitoring, automate detections and responses
- Implement SLI and SLO measurements and monitoring in our Solution Monitoring
- Conduct Service improvement actions and review with the team using data from SLI and SLO
- Troubleshoot incidents, post-incidents analysis, perform root cause analysis
- Implement workarounds to avoid recurrence of incidents, improvements to monitoring detection
- Implement Observability monitoring and perform distributed tracing analysis of applications
- Deployment of new application releases to the preproduction and production environments
- Participate and contribute to automation in deployment, automated testing, and monitoring detection
- Collaborate with SQC team on testing automation deployment and DevOps on continuous delivery
- Participate in the planning and review sessions with Development, DevOps, Platform teams
- Expand and grow the technical knowledge, skillsets, and expertise expected of an SRE
- Create and document any artifacts related to SRE practices, for example, good practices or patterns or customized
dashboards or workarounds or troubleshooting methods, solution monitoring and observability improvements.
PROFILE:
- College degree or technical training in Computer Science, software engineering or equivalent combination of
training, and/or experience
- At least 5 years of working experience, of which at least 3 years involved software development and 2 years related
to IT operations or IT support or basic System Administration. Experience in application maintenance especially in
application troubleshooting, bug detection, fixing, testing and application is a must.
TECHNICAL SKILLS:
- Troubleshooting or debugging applications and complex systems
- Application tracing and log analysis
- Linux and VM
- Hands-on experience in Shell Scripts
- Application deployment, and deployment tools (e.g. Jenkins)
- Competent knowledge of at least a database (understand schema, able to perform DML using SQL)
- Programming and development at least one programming language (e.g. Python, C, Java, etc).
- Incident resolution and root cause analysis and incident management
- JIRA, ITSM ticketing tool and any documentation tools (e.g. Wiki), Nagios, Splunk, Dockers, OpenShift, Kubernetes,
automation (e.g. Ansible)
- English B2
Req ID: 29260 Posted on: 28 Mar 2025 Location:
Huechuraba, Chile
Department: Customer Projects Deployment & Services Job Family: Information Technology
SITE RELIABILITY ENGINEER
Site Reliability Engineer (SRE)
AIM OF THE JOB:
As an SRE, responsible for responding to incidents and escalation. This includes on-call support and escalation
support that may be required after office hours and planned during the weekend. A support duty roster shall be
implemented. On Technical Support, competent in troubleshooting and investigating technical problems, perform RCA,
recommending resolutions, and implementing workarounds when a software fix is not available yet. On Solution and
Observability Monitoring must be competent in developing, customizing, and implementing Monitoring of the solution. On
Continuous delivery, responsible for deployment of new versions of applications. On Solution Quality Assurance,
participate with Product Dev and DevOps on development testing activities (FAT) and drive solution testing during
deployment (SAT). Proactively shares knowledge with team members and SRE community. Possess a curious mindset that is
always learning new things or making new improvements.
Main responsibilities and activities:
- Implement solution monitoring and observability monitoring, automate detections and responses
- Implement SLI and SLO measurements and monitoring in our Solution Monitoring
- Conduct Service improvement actions and review with the team using data from SLI and SLO
- Troubleshoot incidents, post-incidents analysis, perform root cause analysis
- Implement workarounds to avoid recurrence of incidents, improvements to monitoring detection
- Implement Observability monitoring and perform distributed tracing analysis of applications
- Deployment of new application releases to the preproduction and production environments
- Participate and contribute to automation in deployment, automated testing, and monitoring detection
- Collaborate with SQC team on testing automation deployment and DevOps on continuous delivery
- Participate in the planning and review sessions with Development, DevOps, Platform teams
- Expand and grow the technical knowledge, skillsets, and expertise expected of an SRE
- Create and document any artifacts related to SRE practices, for example, good practices or patterns or customized
dashboards or workarounds or troubleshooting methods, solution monitoring and observability improvements.
PROFILE:
- College degree or technical training in Computer Science, software engineering or equivalent combination of
training, and/or experience
- At least 5 years of working experience, of which at least 3 years involved software development and 2 years related
to IT operations or IT support or basic System Administration. Experience in application maintenance especially in
application troubleshooting, bug detection, fixing, testing and application is a must.
TECHNICAL SKILLS:
- Troubleshooting or debugging applications and complex systems
- Application tracing and log analysis
- Linux and VM
- Hands-on experience in Shell Scripts
- Application deployment, and deployment tools (e.g. Jenkins)
- Competent knowledge of at least a database (understand schema, able to perform DML using SQL)
- Programming and development at least one programming language (e.g. Python, C, Java, etc).
- Incident resolution and root cause analysis and incident management
- JIRA, ITSM ticketing tool and any documentation tools (e.g. Wiki), Nagios, Splunk, Dockers, OpenShift, Kubernetes,
automation (e.g. Ansible)
- English B2