Operation And Maintenance Engineer

Best Web3

$7-8.5K[Monthly]
On-site - Kuala Lumpur1-3 Yrs ExpBachelorFull-time
Share

This job is open to Malaysian, Passholder - EP/TEP/PV/DP/RP-T/Others, Foreigner

Job Description

Job Responsibilities

I. Infrastructure and Server Operations (Core Responsibilities)

  • Responsible for the architecture design, setup, and optimization of the company's server clusters (OCI / AWS).
  • Manage Linux servers, system environments, user permissions, SSH keys, SFTP, Firewall, and Security Groups.
  • Responsible for Nginx, SSL, reverse proxy, domain name, and certificate management, maintaining high availability and security.
  • Maintain virtual machines, load balancers (LB), object storage, VPC/VCN networks, subnets, and security group policies.
  • Troubleshoot production environment issues: port conflicts, permission errors, service startup failures, full disks, network anomalies, etc.

II. CI/CD and Deployment Management

  • Design, build, and maintain CI/CD pipelines (GitHub Actions / GitLab CI / Jenkins).
  • Write and maintain deployment scripts, automated build scripts, environment variable management, and version release processes.
  • Responsible for deployment strategies, rollback strategies, blue-green deployments, and canary deployments in testing/UAT/production environments.
  • Collaborate with the R&D team for daily releases, emergency fixes, and configuration management.

III. System Stability and Availability (SRE Focus)

  • Establish an application monitoring system (Prometheus, Grafana, ELK, CloudWatch).
  • Responsible for building an alerting system: CPU/Memory/Disk, service anomalies, and interface anomalies.
  • Responsible for the formulation and implementation of SLAs, SLOs, and SLIs to improve system stability.
  • Perform regular capacity planning, performance optimization, and system load testing.

IV. Security and Access Control

  • Manage server accounts, cloud platform accounts, Git repository permissions, and Jira/Wiki system permissions.
  • Build/maintain bastion hosts (Jump Server/Bastion), adhering to the principle of least privilege.
  • Write security baseline policies and regularly perform patch upgrades, vulnerability scanning, and security inspections.
  • Cooperate with the security/risk control team to handle security incidents (brute-force attacks, abnormal traffic, service vulnerabilities, etc.).

V. Database and Middleware Maintenance

  • Maintain the deployment, backup, and master-slave configuration of services such as MySQL, PostgreSQL, Redis, and Kafka.
  • Database performance tuning, slow SQL analysis, and connection pool optimization.
  • Implement backup strategies, automatic backups, off-site disaster recovery, and regular recovery drills.

VI. Documentation and Asset Management

  • Maintain server ledgers, domain certificate ledgers, and permission lists.
  • Write and maintain operation and maintenance documentation: deployment instructions, deployment processes, security policies, and architecture diagrams.
  • Manage operation and maintenance assets: server specifications, monitoring panels, keys, environment configurations, and network topology diagrams.

VII. Team and Process Development

  • Responsible for the daily management and training of the operation and maintenance team.
  • Drive the implementation of production change processes, deployment procedures, permission management procedures, and disaster recovery procedures.
  • Coordinate across teams (R&D, backend, DBA, and security teams) to handle emergency failures.

Job Requirements

  • Proficient in Linux system administration, Shell scripting, and network basics (Layer 3/Layer 4/Layer 7).
  • Familiar with cloud platform operation and maintenance: OCI/AWS.
  • Proficient in Nginx, SSL, reverse proxy, Keepalived, and load balancing.
  • Familiar with Docker/Kubernetes (at least Docker + Compose must be proficient).
  • Familiar with CI/CD pipelines (GitHub Actions / GitLab CI / Jenkins).
  • Proficient in MySQL basics, master-slave replication, backup and recovery, and performance optimization.
  • Familiar with at least one commonly used middleware such as Redis, Kafka, or RabbitMQ.
  • Experience in building monitoring systems: Prometheus / Grafana / ELK / Loki.
  • Bonus points: Strong logical thinking and rapid troubleshooting abilities; able to independently handle online incidents.
  • A complete operational system mindset: monitoring, alerting, security, permissions, and processes.
  • Excellent documentation skills; able to organize asset tables, network topology, and process procedures.
  • Strong communication and cross-team collaboration skills.
  • Experience in operations and maintenance in the financial, exchange, and blockchain industries.
  • Familiar with high-concurrency and high-availability architecture design.
Monitoring Tools (PrometheusGrafana)Database Management (SQLMongoDB)Infrastructure as Code (TerraformAnsible)Cloud Services (AWSAzureGCP)LinuxCI/CDGitShell Scripting
Preview

Claire 12

HR经理Best Web3

Active today

Posted on 21 January 2026

Report this job

Bossjob Safety Reminder

If the position requires you to work overseas, please be vigilant and beware of fraud.

If you encounter an employer who has the following actions during your job search, please report it immediately

  • withholds your ID,
  • requires you to provide a guarantee or collects property,
  • forces you to invest or raise funds,
  • collects illicit benefits,
  • or other illegal situations.
Tips
×

Some of our features may not work properly on your device.

If you are using a mobile device, please use a desktop browser to access our website.

Or use our app: Download App