DevOps/Site Reliability Engineer in Addison, TX at Signature Consultants

Date Posted: 7/21/2020

Job Snapshot

Job Description

DevOps/Site Reliability Engineer:

Signature Consultants has an opportunity for a DevOps/Site Reliability Engineer. As a member of the Site Reliability and Platform Engineering Team this individual will act as a subject matter expert in the discipline of Site Reliability Engineering. The candidate will work closely with the DevOps Product Owner to define and establish a roadmap of activities to mature the capabilities of Site Reliability Engineering Practice. They will work with application, infrastructure, security team to establish a robust monitoring and notification scheme that ensures visibility awareness for IT staff into health and availability of the business critical applications. The candidate will perform hands-on design, development, testing, documentation and deployment of various monitors, automation, reporting, and dashboards. They will collaborate closely with the Enterprise Monitoring and Service Now Group to evolve and develop the necessary monitoring, alerting, auto remediation capabilities to support the needs of the Site Reliability Engineering Discipline. They will also provide training and mentoring to other team members to develop the skills and competancies of our regional COE teams. This role will be expected to help triage major incidents, assist in rootcause analysis and use that information to help drive remediation activities to increase system stability reliability. In addition to the Site Reliability discipline, this individual will also help manage, maintain and support Mulesoft API Integration platform.

Qualifications:

  • Demonstrate ability to be given a high level objective, understand the "why" behind the objective, take ownership of the objective, break down the high level objective into execution tasks, identify dependencies, clarify and confirm critical aspect of the plan with supervisor, set reasonable commitment dates for delivery
  • Ability to take a new technology platform/system, self-learn/train through generally available existing documentations or vendor support resources, setup POC of the technology, assess and evaluate the critical features and functionality of the technology for fit of use and purpose
  • Ability to provide practical, maintainable technical solutions that scales
  • Broad experience with the design and implementation of application and infrastructure monitoring through APMs, web synthetic monitors, ticketing, notification, reporting and dashboarding platforms
  • Excellent troubleshooting skills and knowledge of the infrastructure, middleware, and application layers
  • Very comfortable with using diagnostic tools such as Fiddler, Chrome Dev Tools, SPLUNK, Dynatrace Synethics, etc.
  • Strong background in SPLUNK is required.
  • Strong experience with supporting complex service oriented large-scale web based transactional systems and the common integration patterns utilized and associated protocols and interfaces such as REST, SOAP, Message Queues, Custom Services etc.
  • Solid foundation in various security aspects of authentication and authorization schemes - SSL Certificates/Cookies/Integrated Auth/Basic Auth/Oauth, SAML
  • Hands-on experience in application load testing, analysis of the load testing data and providing recommendation on capacity management, availability and performance.
  • Deep experience in understanding and troubleshooting network layers including DNS, CDN, Firewalls, VPNs, MPLS, Proxies/Reverse Proxies, Load Balancers
  • Demonstrate ability to design, develop, test, and deploy automations related to maintaining and improving the health, stability, resiliency, and security of the application services and web sites.
  • Experience managing automation code through repos such as GIT, Bitbucket, TFS etc…
  • Experience defining relevant KPIs and producing reports and dashboards which provides the necessary insight on the health, stability, resiliency, and security of the application services and web sites and SDLC activities.
  • In addition to the Site Reliability Discipline we would also want this individual to help augment the engineering and management of our Mulesoft API platform. They do not have to have prior experience with Mulesoft but any experience with similar API gateway technologies would be helpful and they will be expected to quickly come up to speed on the technology and implementation of the Mulesoft platform
  • Exceptional interpersonal and communication skills
  • Ability to participate in 24/7 escalation on-call rotation and respond to mission-critical issues as needed
  • Willing and able to attend early morning or late night meetings as required when interfacing with our Regional IT teams in Europe, LATAM and APAC.
  • Passion and drive to improve efficiencies in how we deliver IT services

Preferred Skills:

  • Mulesoft or other API Gateway platforms
  • AzureDevOps or other CI/CD platforms
  • Salesforce Service Cloud, Saleforce Marketing Cloud, Salesforce Community Cloud, Salesforce Commerce Cloud
  • Heroku and AWS
  • Mulesoft, CA API Gateways
  • Terraform
  • Let's Encrypt

About Signature Consultants, LLC

Headquartered in Fort Lauderdale, Florida, Signature Consultants was established in 1997 with a singular focus: to provide clients and consultants with superior staffing solutions. For the ninth consecutive year, Signature was voted as one of the "Best Staffing Firms to Work For" and is now the 14th largest IT staffing firm in the United States (source: Staffing Industry Analysts). With 28 locations throughout North America, Signature annually deploys thousands of consultants to support, run, and manage their clients' technology needs. Signature offers IT staffing, consulting, managed solutions, and direct placement services. For more information on the company, please visit www.sigconsult.com. Signature Consultants is the parent company to Hunter Hollis and Madison Gunn.

Job Requirements

SPECIFIC SKILL SETS AND EXPERIENCE REQUIREMENTS: Requisite Technical Skills Requirement: Demonstrate ability to be given a high level objective, understand the “why” behind the ojective, take ownership of the objective, break down the high level objective into execution tasks, identify dependencies, clarify and confirm critical aspect of the plan with supervisor, set reasonable commitment dates for delievery. Ability to take a new technology platform/system, self-learn/train through generally available existing documentations or vendor support resources, setup POC of the technology, assess and evaluate the critical features and functionality of the technology for fit of use and purpose. Ability to provide practical, maintainable technical solutions that scales. Site Reliability Engineering Experience: Broad experience with the design and implementation of application and infrastructure monitoring through APMs, web synthetic monitors, ticketing, notification, reporting and dashboarding platforms. • Excellent troubleshooting skills and knowledge of the infrastructure, middleware, and application layers. Very comfortable with using diagnostic tools such as Fiddler, Chrome Dev Tools, SPLUNK, Dynatrace Synethics, etc… - NOTE that a strong background in SPLUNK is required. • Strong experience with supporting complex service oriented large-scale web based transactional systems and the common integration patterns utilized and associated protocols and interfaces such as REST, SOAP, Message Queues, Custom Services etc… Solid foundation in various security aspects of authentication and authorization schemes – SSL Certificates/Cookies/Integrated Auth/Basic Auth/Oauth, SAML Hands-on experience in application load testing, analysis of the load testing data and providing recommendation on capacity management, availability and performance. Deep experience in understanding and troubleshooting network layers including DNS, CDN, Firewalls, VPNs, MPLS, Proxies/Reverse Proxies, Load Balancers, Demonstrate ability to design, develop, test, and deploy automations related to maintaining and improving the health, stability, resiliency, and security of the application services and web sites. Experience managing automation code through repos such as GIT, Bitbucket, TFS etc… Experience defining relevant KPIs and producing reports and dashboards which provides the necessary insight on the health, stability, resiliency, and security of the application services and web sites and SDLC activities. In addition to the Site Reliability Discipline we would also want this individual to help augment the engineering and management of our Mulesoft API platform. They do not have to have prior experience with Mulesoft but any experience with similar API gateway technologies would be helpful and they will be expected to quickly come up to speed on the technology and implementation of the Mulesoft platform. General: • Exceptional interpersonal and communication skills • Ability to participate in 24/7 escalation on-call rotation and respond to mission-critical issues as needed They should be willing and able to attend early morning or late night meetings as required when interfacing with our Regional IT teams in Europe, LATAM and APAC. • Passion and drive to improve efficiencies in how we deliver IT services SPECIFIC SKILL SETS PREFERRED/DESIRED, BUT NOT NECESSARILY REQUIRED: Mulesoft or other API Gateway platforms AzureDevOps or other CI/CD platforms Salesforce Service Cloud, Saleforce Marketing Cloud, Salesforce Community Cloud, Salesforce Commerce Cloud Heroku and AWS Mulesoft, CA API Gateways • Terraform Let’s Encrypt WHAT EXACTLY WILL THIS INDIVIDUAL BE WORKING ON? As a member of the Site Reliability and Platform Engineering Team this individual will act as a subject matter expert in the discipline of Site Reliability Engineering. They will work closely with the DevOps Product Owner to define and establish a roadmap of activities to mature the capabilities of Site Reliability Engineering Practice. They will work with application, infrastructure, security team to establish a robust monitoring and notification scheme that ensures visibility awareness for IT staff into health and availability of the business critical applications. They will perform hands on design, development, testing, documentation and deployment of various monitors, automation, reporting, and dashboards. They will collaborate closely with the Enterprise Monitoring and Service Now Group to evolve and develop the necessary monitoring, alerting, auto remediation capabilties to support the needs of the Site Reliability Engineering Discipline. They will also provide training and mentoring to other team members to develop the skills and competancies of our regional COE teams. No C2C Candidates at this time.