Infrastructure Solutions Architect with AI

Remote

About the role:

The Infrastructure Architect will have a strong background in designing, implementing, and scaling enterprise infrastructure with a focus on AI workloads and platforms. This role will be instrumental in shaping the technical foundation required to support AI/ML models, data pipelines, and high-performance compute environments across our organization. The role will combine expertise in cloud, networking, and storage architectures with knowledge of AI infrastructure needs such as GPU clusters, model training/deployment environments, and MLOps frameworks.

Responsibilities:

Key Responsibilities | Architecture & Strategy

Design and implement infrastructure architectures to support enterprise AI workloads (training, inference, and data processing)
Define scalable strategies for on-prem, cloud, or hybrid environments optimized for AI/ML performance
Develop roadmaps for AI infrastructure adoption and integration into existing IT landscapes

Key Responsibilities | Infrastructure Engineering

Architect GPU/accelerator-based compute clusters and storage solutions optimized for large-scale AI workloads
Collaborate with data scientists and ML engineers to understand infrastructure requirements for model training and deployment
Ensure high availability, scalability, and cost-efficiency of AI workloads.

Key Responsibilities | Cloud & DevOps

Design cloud-native solutions leveraging services like AWS Sagemaker, Azure ML, or GCP Vertex AI
Establish MLOps pipelines and CI/CD frameworks for AI/ML (gitlab ci/cd,etc.)
Automate provisioning, monitoring, and scaling of AI infrastructure

Key Responsibilities | Governance and Security

Define best practices for data governance, compliance, and security in AI systems
Ensure responsible usage of AI infrastructure with strong observability and governance controls
Optimize resource utilization and manage budgets for high-performance compute environments

Requirements:

7+ years of experience in IT operations, systems engineering, or a similar role
5+ years of experience in infrastructure architecture, cloud solutions
Proven experience with AI/ML infrastructure (GPU clusters, distributed training, containerization, Kubernetes, etc.)
Strong knowledge of cloud platforms (AWS, Azure, GCP) and their AI/ML services
Experience with MLOps tools (Kubeflow, MLflow, Airflow, etc.)
Solid understanding of networking, storage, and security principles
Ability to communicate complex technical concepts to both technical and non-technical stakeholders
Strong development / TL experience, preferably with python

Nice to Have

Experience with HPC (High-Performance Computing) or large-scale distributed systems
Hands-on experience with deep learning frameworks (PyTorch)
Knowledge of data platforms (Hadoop, etc.)
Familiarity with emerging generative AI infrastructure technologies (LLM hosting, vector databases, retrieval-augmented generation

We offer:

20 working days of paid vacation per year;
Official holidays of Ukraine – days off;
Modern equipment for work;
Corporate events;
External and internal training: conferences, professional events, courses, TechTalks;
English speaking club.

Hiring process:

HR interview
Technical interview
Interview with client