Budgetly Logo

Budgetly

AI Platform Engineer (Remote in AU)

Posted 16 Hours Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Melbourne, Victoria
Senior level
In-Office or Remote
Hiring Remotely in Melbourne, Victoria
Senior level
Build and maintain an agent "harness": runtimes, tooling, governance, observability and evals. Split time between implementing orchestration and improving agent outputs, reviews, eval suites, and platform reliability for production-managed agents.
The summary above was generated by AI
Company Description

Managing business expenses shouldn’t be a guessing game. Yet many SMEs still lack clear cash flow visibility and spend control. At Budgetly, we’re changing that.

We’re building an AI-first platform that simplifies expense management and helps businesses make better financial decisions, faster. The goal is simple: help you spend smarter, save time, and grow profitably.

Why this role exists (and why it’s different)

Six months ago, “agentic engineering” felt novel. Today, it’s rapidly becoming the default way teams ship software.

Most companies are using AI to make engineers type faster. We’re building something more ambitious:

We’re building an AI-first delivery system where agents ship product features end-to-end — and engineers build the platform (the harness) that makes it safe, reliable, and scalable.

If this works, feature delivery scales with product ambition, not headcount.

What we mean by “harness engineering”

A raw LLM isn’t an agent. The “agent” is the model plus the harness around it.

In this role, you’ll build that harness:

  • Agent runtimes & execution loops (plan → act → observe → reflect; retries; stop conditions)
  • Agentic loops & feedback loops that convert outcomes into improvements (evals, regressions, learnings)
  • Tooling & skills (MCP/tool integration, internal APIs, secure credentials, sandboxes)
  • Governance (permissions, policy, human-in-the-loop gates, audit trails)
  • Observability (traces, cost attribution, failure taxonomies, runbooks)
  • Evaluation harnesses (scenario suites, trajectory scoring, tool-arg correctness, “non-deterministic unit tests”)

What success looks like in ~12 months

  • A managed agent platform that product can rely on for meaningful, customer-facing delivery
  • Agent workflows ship features with high repeatability, not “poke-and-hope”
  • Clear quality gates: eval harnesses, review agents, regression suites, and rollout controls
  • Engineers spend less time in the loop and on reviews and more on improving the factory (reliability, speed, safety)

Job Description

This isn’t a traditional senior engineering role. You won’t spend most of your time implementing product features directly.

Your time will roughly split:

  • 50% building and evolving agent harnesses: orchestration, toolchains, approvals, secure execution, managed agents
  • 50% reviewing and improving outputs: tracing failures, improving prompts/steering, tightening eval harnesses, reducing loop count

Concretely, you’ll:

  • Design and implement agentic workflows that take a requirement from spec → code → review → deploy
  • Build agentic loops that turn mistakes into system-level improvements (not one-off fixes)
  • Develop evaluation harnesses (offline + CI) to detect regressions in behavior, not just tests in code
  • Define and maintain review gates (human-in-the-loop + automated reviewers) for risky changes
  • Improve tool reliability: schemas, typed tool interfaces, retries, timeouts, safety checks
  • Build platform capabilities for managed agents: long-running sessions, checkpoints, state/memory boundaries, and recovery
  • Evolve the platform architecture (TypeScript, serverless architecture, shared codebase) with an eye for simplicity and maintainability
  • Partner with Product to reduce ambiguity and translate intent into testable, evaluable spec

Qualifications

This role requires strength in two areas, equally:

  1. Systems thinking for agent harnesses and loops. You can design the execution harness around agents: feedback loops, evaluation strategy, safety constraints, and the “glue code” that makes autonomy safe in production.
  2. Engineering taste. You can look at agent-generated code and immediately judge: conventions, simplicity, correctness, maintainability, security. Not just “does it work,” but “would I approve this PR in a regulated product?”

What we need from you

  • Strong TypeScript and React experience in production environments
  • You’ve shipped real software to real users (not just prototypes)
  • You can read a codebase and quickly identify its patterns, conventions, and architecture
  • You are comfortable working in ambiguity and turning fuzzy intent into clear acceptance criteria + evals
  • Familiarity with agent tooling concepts: tool calling, MCP/tool integration, guardrails, evals, tracing/observability, and permissioning
  • Nice to have: AWS serverless experience (CDK, Lambda, DynamoDB). Our backend is a mix of modern serverless microservices and a legacy Express/PostgreSQL monolith.

Who this role is not for

Be honest with yourself:

  • If you want to spend most of your time building features directly, this role will frustrate you
  • If you’re excited about AI but haven’t shipped production software, you won’t have the taste to judge agent output
  • If you prefer stable scope, established best practices, and minimal ambiguity, this environment won’t be a match

The team and company

You’ll join a small team (3–4 engineers) reporting to a hands-on CTO. The company is going all-in on this model, not just engineering — sales, marketing, and support are all building agentic workflows for their functions.

This isn’t a side experiment; it’s our operating model.

Additional Information

We’re guided by trust, respect, and ownership. Our values, Embrace Change, Carte Blanche, Find Wisdom in Data, and We All “Own It”, shape how we work.

  • Fully remote (work from anywhere in Australia)
  • 5 weeks annual leave and flexible working
  • Monthly Wellness Budget (mental & physical health)
  • Employee share options (ESOP) for all team members

How to apply?

Submit your CV via the application form. Note that background checks are required as part of our offer process. 

We welcome applications from all backgrounds, abilities, and identities. We value diversity and believe that it enhances our creativity, innovation, and overall success. Join us in creating a workplace where everyone can thrive.

Budgetly Sydney, New South Wales, AUS Office

Sydney, New South Wales, Australia, 2000

Similar Jobs

14 Hours Ago
Easy Apply
Remote
Australia
Easy Apply
Senior level
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Lead regional protective intelligence by owning end-to-end threat case management, conducting structured behavioral threat assessments, coordinating incident response with GSOC, Executive Protection, Legal, HR/ER and law enforcement, producing decision-ready briefings, and maturing regional processes, tooling, playbooks, and training to mitigate threats to employees, executives, facilities, and operations.
Top Skills: Generative AiOnticOsint PlatformsPublic Records ResearchSocial Media Monitoring
18 Hours Ago
In-Office or Remote
Mid level
Mid level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Provide white-glove technical support and account management for Square's largest Australian sellers. Triage issues, track and drive resolutions, document bugs, collaborate with engineering/product/sales, analyze customer issue trends, and improve processes to ensure successful onboarding, growth, and ongoing satisfaction for up-market accounts.
Top Skills: Google MeetSquare
18 Hours Ago
Remote or Hybrid
Australia
Entry level
Entry level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Analyst will leverage data analytics to produce actionable insights and presentations, support key client relationships, and manage projects within Argus Australia's team.
Top Skills: ExcelMicrosoft PowerpointSASSQLTableau

What you need to know about the Sydney Tech Scene

From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account