Network Operations Center (NOC) Engineer I
RapidSOS
In the time it takes you to read this job description, RapidSOS will have handled ~1,380 emergencies.
At RapidSOS, we are committed to using technology to build a safer, stronger future and working together to save lives. We’re in an exciting phase of growth, welcoming new members from across the globe to our mission-driven, ambitious, and inclusive team. Our work is founded on our values of elevating purpose, inventing tomorrow, delivering with urgency, serving with integrity, and winning together, all of which support a company culture where people can innovate, collaborate, grow, and, above all, make an impact.
RapidSOS is the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. Real-time data from the world’s largest safety network of 700M+ devices, 200+ global enterprises, and 23,000+ federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most. Learn more at www.RapidSOS.com.
What this role is about:
As a Network Operations Center Engineer, you will work to uphold and maintain the reliability of the RapidSOS platform and internal enterprise environments. We are looking for a candidate with a strong operations background to ensure the continuous monitoring and analysis of a dynamic cloud-based environment. At a high level, this candidate will have a solid understanding of incident & alert management with an eye to continuous monitoring and improvement via automation.
What you’ll do:
- Monitor Production and Enterprise Infrastructure and react to alarms according to documented SLAs
- Work with Engineering and Customer Support teams to remediate alarms and incidents
- Continually strive to improve the environment through optimization and automation
- Create and update documentation as necessary to share new methods and knowledge around troubleshooting
- Perform operational tasks as assigned by Engineering and Customer Support teams
- Support incident response, deployments, and infrastructure training as the role evolves
- Work with international teams to diagnose and resolve critical issues
- Build, tune, and maintain alerting rules and monitors to ensure every alert is actionable, including investigating root cause, not just symptom mitigation
- Participate in post-incident reviews and contribute to blameless post-mortems
What we’re looking for in our ideal candidate:
- 2+ years of experience in a help desk environment or NOC role, ideally in a cloud-based environment
- Experience managing and creating alerts and monitors using enterprise monitoring tools such as Nagios, Zabbix, SolarWinds and Datadog (Datadog preferred)
- Experience with Incident Management platforms such as Pagerduty, Opsgenie or Firehydrant
- Experience working with ticketing systems such as Jira and Zendesk
- Experience following runbooks and troubleshooting guides to remediate infrastructure or application issues
- Experience with Infrastructure operations (Cloud Infrastructure AWS/Azure preferred)
- Technical aptitude with the ability & willingness to quickly learn and understand complex products or services
- Highly self-motivated, strong work ethic and ability to multitask in a fast-paced environment
- Demonstrates experience in adept problem-solving abilities, and organizational skills, ensuring successful outcomes and efficient execution of incident response and initiatives
- Strong written and verbal communication skills in English
- Ability to work flexible shifts and participate in a 24x7 on-call rotation
- Experience building log-based alert rules (e.g., ElastAlert or equivalent) and investigating issues using centralized logging platforms (e.g., ELK/Kibana or equivalent)
- Comfort with Kubernetes and Docker container-based environments, including pod-level health triage
- Comfort working in a command-line environment (Linux/bash, Windows CMD/PowerShell, or equivalent) the team regularly uses CLI tools for infrastructure triage, pod inspection, and operational scripts
Bonus:
- Experience with AWS, Apigee, or ELK (Elasticsearch would be a major plus!)
- Experience with automation scripting with shell, python, and / or bash
- Familiarity with CI/CD pipelines and how deployments relate to monitoring change windows
- Experience creating pull requests (PRs) in GitHub
What we offer:
- The chance to work with a passionate team on solving one of the largest challenges globally
- Competitive salary and benefits and equity participation
- A dynamic, flexible and fun start-up work environment with a highly talented team
If you're curious to learn more about RapidSOS, you can check out https://rapidsos.com/blog/
Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $78,000.00 - $85,000.00 . This role will also be eligible to receive equity options..
If you are based in California, we encourage you to read this important information for California residents linked here: https://rapidsos.com/privacy/california/
#LI-Remote / #LI-Onsite / #LI-Hybrid
RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status.
Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.