Airtree Logo

Airtree

Founding Applied AI Engineer (Eval-Driven)

Posted 6 Days Ago
Be an Early Applicant
In-Office
Sydney, New South Wales
Mid level
In-Office
Sydney, New South Wales
Mid level
Build and ship AI-powered workflows for construction cost management. Responsibilities include evaluation problem definition, maintenance of evaluation harnesses, and workflow implementation in collaboration with stakeholders.
The summary above was generated by AI

This role is for one of our portfolio companies, not internally at Airtree. Your application will be reviewed by the founder, not an Airtree employee.

This early stage company, still operating in stealth, is on a mission to eradicate cost overruns in construction, a $3 trillion problem that slows down cities, destroys margins, and erodes trust between builders and clients.

Over 90% of construction projects go over budget, eroding builder margins and stalling progress across the industry. Our platform takes a new approach to helping builders stay on budget by detecting and preventing costly variations before they spiral. Our AI-powered early warning system gives builders control over project costs — protecting time, margin, and reputation on every build.

We are at the start of something big and we’re looking for an Applied AI Engineer (Eval-driven) to build and ship design-audit workflows that consistently meet measurable quality bars. This role blends ML engineering and data science, with a heavy emphasis on problem definition, evaluation, and reliability in real customer workflows.

Responsibilities
  • Define evaluation problems: success criteria, failure modes, datasets, labelling guidelines, and score functions.
  • Build and maintain an evaluation harness: regression tests, edge-case suites, and quality dashboards to prevent backsliding.
  • Implement workflow systems end-to-end (data → model/LLM components → post-processing → acceptance testing) until they pass eval thresholds.
  • Partner with product and domain stakeholders to translate messy real-world requirements into testable specs.

Requirements
  • Strong Python skills and practical experience shipping ML/AI systems (not just experimentation).
  • Demonstrated experience designing evals for ML/LLM systems (offline metrics, gold sets, error analysis, regression testing, monitoring).
  • Comfort working across data science + engineering tasks: data wrangling, feature/label design, model/LLM iteration, and productionization.
  • High ownership and intensity: persistence in closing the loop from “fails eval” to “passes consistently.”
Nice to have
  • Experience with document understanding (OCR, parsing, classification/extraction) and structured outputs (schemas, validators).
  • Familiarity with AEC/construction workflows (design coordination, QA/compliance, BIM concepts like IFC/Revit).
  • Experience building human-in-the-loop review systems and adjudication processes to improve training/eval data.

Top Skills

AI
Llm
Ml
Ocr
Python
HQ

Airtree The Hills, New South Wales, AUS Office

131 Devonshire St, The Hills, New South Wales, Australia, 2010

Similar Jobs

6 Hours Ago
In-Office
Glen William, New South Wales, AUS
Senior level
Senior level
Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
Seeking an Electrical and Electronic Systems Design Engineer for the RAAF Wedgetail Program. Responsibilities include design, support, and collaboration on mission systems hardware. Requires 5+ years of experience in a project engineering environment.
Top Skills: Avionics SystemsCommunication SystemsElectrical EngineeringElectronic SystemsMission Computing Hardware
6 Hours Ago
In-Office
Glen William, New South Wales, AUS
Mid level
Mid level
Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing
The Project Coordinator supports program leadership, manages project documentation, and coordinates various project meetings while ensuring project execution aligns with best practices.
Top Skills: ConfluenceJIRAMicrosoft SuiteSharepoint
12 Hours Ago
In-Office or Remote
Sydney, New South Wales, AUS
Senior level
Senior level
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Lead the implementation of our platform for clients, providing technical expertise and tailored solutions. Collaborate with engineering teams on deployment workflows and integrations while advising on best practices.
Top Skills: APIsCi/CdGitGithub CliRestful ServicesRuby/RailsSQLTableau

What you need to know about the Sydney Tech Scene

From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account