Categories

Roles

Loading Roles...

Work type

Work Mode

Showing 2 results on this page (2 total).

J&M Group Inc
J&M Group Inc

Site Reliability Engineer (Sre)

  • Job Type: Contract
  • Work Mode: Hybrid
  • Location: Toronto

Site Reliability Engineer (Sre)

J&M Group Inc

Location Toronto
Job Type Contract
Work Mode Hybrid
Deadline Apply by Jun 04, 2026
Job Description

Key Responsibilities

Observability, SRE, DevOps roles with proven expertise across infrastructure and application-level reliability. Dynatrace, ELK, Splunk, and PagerDuty; SLI/SLO frameworks. Azure Kubernetes Service, Terraform,

Azure managed services


What will you do

Design and implement observability-as-code solutions using Terraform to deploy monitoring pipelines, dashboards, and alerting strategies across distributed systems.

Drive observability improvements leveraging industry-leading tools (Dynatrace, ELK, Splunk, PagerDuty) to achieve real-time performance insights and comprehensive system visibility.

Instrument applications for end-to-end observability

implementing distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures.

Troubleshoot complex incidents in production environments, diagnosing root causes across multiple service layers, databases, caches, and APIs under load using SLISLO frameworks.

Investigate and resolve Azure Kubernetes Service (AKS) infrastructure, ensuring reliability and scalability of containerized workloads with deep proficiency in Terraform and Azure managed services (SQL MI, Redis, Functions, Event Grid).Translate business requirements into observable, resilient systems that meet defined SLIs SLOs and drive reliability improvements.

Automate operational tasks to reduce toil and improve system resilience through infrastructure-as-code and CICD best practices.

Lead incident response and remediation for mission-critical systems, conducting blameless postmortems and building resilience through chaos engineering and tabletop exercises

.Collaborate cross-functionally with development, platform, and business teams to improve service availability, scalability, and operational excellence.

What do you need to succeed

Must-have8 years hands-on experience in observability, SRE, or DevOps roles with proven expertise across infrastructure and application-level reliability.

Deep expertise in observability tooling Dynatrace, ELK, Splunk, and PagerDuty demonstrated understanding of observability principles (instrumentation, correlation IDs, SLISLO frameworks).Advanced proficiency with Azure Kubernetes Service (AKS), Terraform, and Azure managed services (SQL MI, Redis, Functions, Event Grid) proven ability to design and implement infrastructure-as-code solutions.

Strong hands-on experience instrumenting applications for comprehensive observability distributed tracing, metrics collection, and log aggregation across Node.js and .NET applications in microservices and event-driven architectures.

Proven troubleshooting expertise in distributed systems diagnosing root causes across multiple service layers, databases, caches, and APIs in production environments.

Excellent incident management skills hands-on experience with PagerDuty and ServiceNow ability to resolve high-severity incidents rapidly and conduct effective root cause analysis.

Knowledge of incident, problem, and change management processes, including SRE principles, blameless postmortems, and chaos engineering practices.Exceptional communication and leadership skills to coordinate across business and IT teams ability to lead remo


 

Deadline: Jun 04, 2026

Urgent hiring
J&M Group Inc
J&M Group Inc

Infrastructure Engineer

  • Job Type: Contract
  • Work Mode: Hybrid
  • Location: Mississauga

Infrastructure Engineer

J&M Group Inc

Location Mississauga
Job Type Contract
Work Mode Hybrid
Deadline Apply by Apr 21, 2027
Job Description

Key Responsibilities

  • Develop and maintain infrastructure automation, tools, and applications for cloud-based data platforms 
  • Support and optimize AWS services including EMR, Glue, and S3 for ingestion and analytics workloads 
  • Manage and monitor operational health, performance, and upgrade cycles of data platforms 
  • Work with data lake architectures, supporting ingestion (Zone 2 – Curated) and derived datasets (Zone 3 – Derived) 
  • Implement and manage CI/CD pipelines and infrastructure as code 
  • Perform incident management including triage, troubleshooting, and resolution 
  • Support data operations such as reruns, recovery, and schema-level changes in controlled environments 
  • Collaborate with cross-functional teams including cloud operations, delivery teams, and platform partners 
  • Drive cost optimization and performance improvements using metrics and data insights 
  • Contribute to next-generation platform transitions (e.g., EMR to Snowflake/Iceberg pipelines) 
  • Identify, escalate, and remediate cloud risks and vulnerabilities 

 

Required Skills & Qualifications

Core Technical Skills

  • Strong programming experience in Python (primary), Bash/Shell scripting, and SQL 
  • Hands-on experience with AWS services: EMR, EC2, Glue, Lambda, S3, SQS, SNS, CloudFormation 
  • Experience with Data Lake architectures and related technologies 
  • Knowledge of Snowflake and Apache Iceberg 
  • Understanding of Glue Catalog and Hive Metastore 
  • Familiarity with Data Mesh architecture 
  • Strong Linux system administration skills 

DevOps & Automation

  • Experience with CI/CD pipelines 
  • Infrastructure automation and scripting expertise 

Operational & Support Skills

  • Production support experience: incident management, diagnostics, escalation 
  • Experience with monitoring and alerting tools 
  • Understanding of data governance, security, and access controls (IAM) 

 

Additional Competencies

  • Strong problem-solving and debugging skills 
  • Excellent communication and stakeholder management abilities 
  • Ability to work independently and collaboratively 
  • Experience mentoring team members and handling escalations 
  • Proactive mindset with focus on continuous improvement and innovation

Deadline: Apr 21, 2027

Urgent hiring

© 2026 iTRiders. All Rights Reserved.

Report Bug