Software engineer

Support Engineer (US Time Zone

test
Holla
5 years
Software engineer
Job Description
How to Apply

Support Engineer (US Time Zone)

Experience: 1–3 years
Location: Remote
Work Hours: US Time Zone

About the Role

We’re looking for a coding-forward Support Engineer to own L2 investigations and fixes across our Intel OPEA-based GenAI services.n production.

What You’ll Do

  • Own L2 incidents end-to-end: triage, root-cause, hotfix (small code changes), and drive long-term fixes for OPEA services (e.g., retriever/embedding/services, agents, inference gateway).
  • Debug microservices & APIs: reproduce issues locally with Docker Compose/Kubernetes; verify health checks; trace requests across components (LLM, vector DB, tool/agent).
  • Code to unblock customers: write focused patches and scripts (Python/TypeScript, FastAPI/Node) for data prep, adapters, and service hardening.
  • Pipeline reliability: monitor and tune RAG/agent pipelines (token/latency budgets, timeouts, batching, retries, circuit-breakers).
  • Observability first: build/run dashboards and alerts (logs, metrics, traces; OpenTelemetry where applicable).
  • CI/CD & IaC: maintain build/deploy for OPEA components; contribute to Terraform/Helm changes with DevOps.
  • Compatibility & model routing: validate OpenAI-compatible endpoints, model switches, and fallbacks (on-prem/cloud).
  • Docs & learning loops: keep high-signal runbooks, RCAs, and “best known methods” for recurring issues.
  • Participate in US-hours on-call rotations; provide crisp stakeholder updates.

Required Skills

  • 1–3 years in Support/Platform/Dev/DevOps roles with significant coding in Python (preferred) or TypeScript/Node.
  • Solid microservices debugging: REST/gRPC, auth, queues, caching, concurrency, rate limits.
  • Containers & orchestration: Docker, Docker Compose; working knowledge of Kubernetes.
  • Linux fluency and shell scripting.
  • Cloud familiarity: AWS/Azure/GCP (networking, IAM, storage, managed K8s).
  • Version control & CI/CD: Git + a common CI (GitHub Actions/Jenkins).
  • Strong troubleshooting, crisp written/verbal comms, and customer empathy.

Nice to Have

  • OPEA ecosystem familiarity (GenAIComps microservices like retriever/embedding/reranker; Agent service built on LangChain/LangGraph).
  • Vector databases (Milvus/pgvector/FAISS), RAG patterns, prompt/tool/agent debugging.
  • OpenAI-compatible API experience; gateway/proxy patterns; token accounting.
  • Observability: Grafana/Prometheus, ELK/Datadog, OpenTelemetry traces.
  • Infra & MLOps: Helm/Terraform; KServe/Ray/Airflow basics.
  • Intel stack awareness helpful (Xeon, Gaudi accelerators, OpenVINO), but not required.
  • Jira/ServiceNow/Zendesk for incident workflows; Agile practices.

What Success Looks Like

  • Can reproduce and fix common OPEA microservice issues locally (compose/k8s), validate via health endpoints, and contribute small PRs.
  • Ship/run dashboards + actionable alerts for latency, error budgets, and throughput across RAG/agent paths.
  • Improve customer-visible SLO (availability, P50/P95 latency) through code/config changes.
  • Author clean runbooks and RCA that prevents a repeat incident.

 

We’re looking for a coding-forward Support Engineer to own L2 investigations and fixes across our Intel OPEA-based GenAI services. You’ll dive into microservices (retriever/embedding/reranker/agent), APIs, and infra, reproducing issues, shipping small patches, and partnering with Platform/Dev teams to keep customer workloads healthy and fast. OPEA uses a composable, microservice architecture for enterprise GenAI (e.g., RAG blueprints, agents, OpenAI-compatible inference endpoints), which you’ll support and extend in production.