Technology Operations Leadership
Unify Technology System & Business Operations, SRE, SOC, Infra, Cloud, DB, Network, Security Ops, and Monitoring teams with clear command-and-control structure.
Build operational discipline around change management, release governance, hotfix/rollback strategy, and production stability.
Establish and enforce operational SOPs and accountability frameworks for both internal staff and external partners.
Service Uptime & Reliability
Ensure maximum uptime through proactive risk identification, system health insights, and coordinated vendor engagement.
Implement SRE practices: SLIs, SLOs, capacity planning, error budgets, and reliability automation.
Drive proactive anomaly detection, log intelligence, and automated remediation across all internal and vendor-managed systems.
Infrastructure & Platform Operations
Coordinate infrastructure scaling across bare-metal, VM, cloud, and Kubernetes platforms with strong collaboration with partner and vendors.
Consult for network (L3–L6), WAF rules, PAM, firewalling, DDoS protection, and load balancing to minimize partner-induced latency and security
Ensure observability and service health insights through advanced dashboards (Grafana, Kibana, ELK, Prometheus) and align with vendor telemetry systems.
Database & Application Operations
Consult Oracle/SQL teams for design and to ensure optimal performance, HA/DR replication, backup discipline, and failover readiness.
Implement control over vendors and partners to ensure quick operational support, strict adherence to SLA for effective collaboration
Own core DFS system operational and performance KPI. Define cross functional KPI and performance record-keeping for KPI calculation.
IT security, Governance & Compliance
Enforce application/server/network hardening across internal and vendor environments.
Ensure all partners are compliant with Bangladesh Bank MFS guidelines and industry standard security framework.
Lead joint cybersecurity drills, vulnerability management, and incident response activities with vendors.
Monitoring, Incident Management & Command Center
Operate and own the 24×7 Command Center with integrated vendor escalation protocols ensuring immediate response.
Lead structured incident response with vendor war rooms, RCA exercises, and closure of action items.
Guide DR/BCP strategy and coordinate with partners for simulation testing, site failover drills, and crisis readiness.
Partner & Vendor Management
Own all technology-related vendor relationships including core DFS provider, network providers, cloud partners, database vendors, IT security vendors, and monitoring service providers from operation perspective.
Define, negotiate, and enforce strict SLAs, OLAs, KPIs ensuring measurable commitment to uptime, latency, and resolution timelines.
Conduct regular vendor performance reviews, quarterly business reviews (QBR), and compliance audits.
Establish escalation matrix and ensure 24×7 responsiveness from all partners.
Support capacity planning and future roadmap discussions with partners to support rapid growth.
Strategic Planning & Continuous Improvement
Define long-term modernization plans for automation, cloud maturity, observability, API security, and autonomous operations.
Build operational excellence programs including DevOps/SRE automation, partner-aligned improvement plans, and failure-mode analysis.
Provide MIS/Reports to CIO and management with insights on uptime, operational efficiency, RCA report, vendor performance, and operational risks.
Skills Requirement:Deep expertise in large-scale distributed systems, low-latency architectures, and mission-critical platforms.Proficiency with Oracle/SQL, Linux, shell scripting, Postman, and container orchestration (K8s, Docker).Good to advance knowledge on networking (L3–L6), WAF, PAM, IDS/IPS, routing, load balancing, and system hardening.Advanced knowledge of Grafana, Kibana, ELK, Prometheus, and performance benchmarking tools (JMeter, Locust).Strong cybersecurity acumen: encryption, certificate lifecycle, secure coding, API protection, vulnerability management.High competence in incident handling, RCA, chaos engineering, capacity planning, and reliability automation.Exceptional stakeholder management skills with ability to work with multiple vendors across SLA, roadmap, and crisis management.Excellent leadership, communication, negotiation, and decision-making skills.Ability to build operational models with vendor–internal team synergy at the center.Expertise in preparing and reviewing DR/BCP, SOPs, SLAs, OLAs, risk registers, and audit documentation.Experience Required:Minimum 12+ years in technology operations, SRE, infrastructure, or platform engineering.At least 5 years in a senior leadership role directly overseeing operations in BFSI, MFS, Telecom, or any high-availability digital platform.Proven experience managing distributed systems with 1M+ daily users and high API transaction volume.Demonstrated success in vendor and partner operational management, including SLA enforcement and major incident escalations.Experience leading monitoring, SOC/NOC, DB Ops, and infrastructure teams in a 24×7 environment. Other Requirement: Must be available for 24×7 leadership escalation, especially during high-severity incidents.Strong operational discipline aligned with regulatory frameworks.High integrity, confidentiality, and crisis-handling mindset.Ability to drive vendor–Nagad joint improvement programs.Ability to make data-driven decisions using operational analytics.Strong collaboration skills to work with internal departments, partners, and regulators.· Commitment to a zero downtime, always-on operations culture.
https://bdjobs.com/h/details/1494351
Category:Engineering jobs Alert
Published:02 Jun 2026
Deadline:13 Jun 2026