Infrastructure Solutions Architect with AI

Remote
About the role:

The Infrastructure Architect will have a strong background in designing, implementing, and scaling enterprise infrastructure with a focus on AI workloads and platforms. This role will be instrumental in shaping the technical foundation required to support AI/ML models, data pipelines, and high-performance compute environments across our organization. The role will combine expertise in cloud, networking, and storage architectures with knowledge of AI infrastructure needs such as GPU clusters, model training/deployment environments, and MLOps frameworks.

Responsibilities:

Key Responsibilities | Architecture & Strategy

  • Design and implement infrastructure architectures to support enterprise AI workloads (training, inference, and data processing)
  • Define scalable strategies for on-prem, cloud, or hybrid environments optimized for AI/ML performance
  • Develop roadmaps for AI infrastructure adoption and integration into existing IT landscapes

Key Responsibilities | Infrastructure Engineering

  • Architect GPU/accelerator-based compute clusters and storage solutions optimized for large-scale AI workloads
  • Collaborate with data scientists and ML engineers to understand infrastructure requirements for model training and deployment
  • Ensure high availability, scalability, and cost-efficiency of AI workloads.

Key Responsibilities | Cloud & DevOps

  • Design cloud-native solutions leveraging services like AWS Sagemaker, Azure ML, or GCP Vertex AI
  • Establish MLOps pipelines and CI/CD frameworks for AI/ML (gitlab ci/cd,etc.)
  • Automate provisioning, monitoring, and scaling of AI infrastructure

Key Responsibilities | Governance and Security

  • Define best practices for data governance, compliance, and security in AI systems
  • Ensure responsible usage of AI infrastructure with strong observability and governance controls
  • Optimize resource utilization and manage budgets for high-performance compute environments
Requirements:
  • 7+ years of experience in IT operations, systems engineering, or a similar role
  • 5+ years of experience in infrastructure architecture, cloud solutions
  • Proven experience with AI/ML infrastructure (GPU clusters, distributed training, containerization, Kubernetes, etc.)
  • Strong knowledge of cloud platforms (AWS, Azure, GCP) and their AI/ML services
  • Experience with MLOps tools (Kubeflow, MLflow, Airflow, etc.)
  • Solid understanding of networking, storage, and security principles
  • Ability to communicate complex technical concepts to both technical and non-technical stakeholders
  • Strong development / TL experience, preferably with python

Nice to Have

  • Experience with HPC (High-Performance Computing) or large-scale distributed systems
  • Hands-on experience with deep learning frameworks (PyTorch)
  • Knowledge of data platforms (Hadoop, etc.)
  • Familiarity with emerging generative AI infrastructure technologies (LLM hosting, vector databases, retrieval-augmented generation
We offer:
  • 20 working days of paid vacation per year;
  • Official holidays of Ukraine – days off;
  • Modern equipment for work;
  • Corporate events;
  • External and internal training: conferences, professional events, courses, TechTalks;
  • English speaking club.
Hiring process:
  • HR interview
  • Technical interview
  • Interview with client
Thank You for Reaching Out!
Your submission has been received and our team will get back to you shortly.