Oracle Site Reliability Engineer - Oracle Ecosystem in Longmont, Colorado
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
A BS or MS in Computer Science, or equivalent. Identifies and implements complex solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies and implements complex solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 8 years experience of running large scale customer facing web services.
Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.
Site Reliability Engineer – Oracle Ecosystem
NOTE:We are unable to provide visa sponsorship for this role at this time. No candidates requiring visa sponsorship will be considered.
The Hospitality Cloud SRE team is focused on maximizing service reliability for our hotel product service offerings across global Oracle data centers. Our team runs with a start-up like approach, leaving room for creative freedom. We have worked to assemble the smartest people in the industry to build and grow this revolutionary and disruptive team.
We are looking to add new members to this dynamic team and are seeking subject matter experts for designing and continuously improving reliability for all components within our solution portfolio
About The Job
As part of the SRE team, you will be continually challenged and directly contribute to the success of our Oracle Hospitality cloud service offerings, every day, working closely with product and Infrastructure partners.
As an SRE, you will solve interesting technical challenges by defining, designing deploying and troubleshooting key Oracle Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance.
In this role, which is a mix of software, architecture and operational readiness, you will be responsible for the following:
Service Ownership–You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services and/or technology areas, with our Development partners.
Ownership Scope– As an SRE, you will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you own. In partnership with your Development partners, you will have the responsibility to ensure that services are designed, delivered and deployed to be mission critical with focus on security, resiliency, scale, and performance. SREs are accountable for the end-to-end performance and operability of the services they own.
Service Design– As Oracle Hospitality Cloud Services continually evolve; you will partner with development teams in defining and implementing improvements in service architecture, both current and future. As an SRE, you will be an expert at articulating technical characteristics of your services and the dependencies between services, and guide Development teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
Operations Engineering– You will understand and be able to communicate the scale, capacity, security, performance attributes and requirements of the services you own. To understand and communicate every characteristic of their service stack, such as:
• degradation and behavior under load of the services and their dependencies
• end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate
• Instrumentation and metrics that clearly describe the service behaviors
• scaling requirements and patterns
• resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
Automation– You will have a clear understanding of automation and orchestration principles, and will be eager to automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA.
Broad Interests- SREs are a rare mix of sysadmins and development Engineers, and as such have the ability to understand and explain the effect of product architecture decisions on the ability to run as distributed systems. They are driven by professional curiosity and a desire to develop deep understanding of their services and the technologies they depend upon.
Ideal Qualification/ Experience
BS or MS in Computer Science, or equivalent work experience
Must have hands-on experience developing and deploying large-scale HA enterprise solutions according to MAA best practices.
Ability to create, manage and administer Production/UAT/development environments for an enterprise application.
Good understanding and appreciation of Cloud Native Computing Foundation Charter (CNCF) and Cloud Native Technologies
Knowledge of networking and security i.e. DNS records, Load Balancers (F5 / LbaaS /NGINX), subnets, TLS, SSL, SAML etc.
Knowledge of Containers (Docker etc.), developing software to work in containers and Container orchestration technologies (Kubernetes)
Conducting performance tuning to maintain system reliability and stability.
Understanding of Cloud Native Technologies and appreciation of Cloud Native Computing Foundation (CNCF) Charter
CI/CD toolset experience a must.
Knowledge and experience of Observability and Observability enabling tools
Experience with automation/configuration management using either Puppet/Chef or an equivalent
Knowledge of areas outside of their own setting, keep up to date with technologies and direction the industry is working today
Oracle Fusion Middleware Products - Oracle Access Manager (OAM), Oracle Identity Manager (OIM) and Oracle Internet Directory (OID)
Oracle Database experience a distinct advantage
Experience / Knowledge of IT Security and compliance, PA-DSS, PCI DSS
Methodical approach to troubleshooting complex problems
Defining and documenting technical architecture of complex and highly scalable products
Most importantly, the aptitude to be a good team player and the willingness to learn and implement new Cloud technologies
/At Oracle, we don’t just value differences—we celebrate them. We’re committed to creating a workplace where all kinds of people work together. We believe innovation starts with diversity and inclusion./
Job: *Product Development
Title: Site Reliability Engineer - Oracle Ecosystem
Location: United States
Requisition ID: 20000RI3