Principle Data Engineer
Data Science
St. Petersburg, FL, USA · Remote
Job Description: Principal Data Engineer
Employment Type: Full Time
Location: Remote/Hybrid
Overview
onPhase, is a leading provider of accounting and finance process automation software specializing in Accounts Payable and Accounts Receivable automation.
Position Overview
We are looking for a Principal Data Engineer to lead our Document Intelligence initiatives as a core member of our research and engineering organization. This is a high-impact individual contributor and technical leadership role focused on advancing the state of machine learning, Data Science, NLP, and intelligent document processing within our platform. You will work closely with product, data, and engineering teams and the CTO to design systems that turn unstructured document data into actionable intelligence — at scale.
The ideal candidate brings deep hands-on expertise in machine learning (ML) and document AI, a strong data architecture background, and the ability to drive research from concept through production.
Key Responsibilities
- Lead research and engineering efforts in document intelligence, including OCR post-processing, document classification, information extraction, and layout understanding.
- Design and implement scalable machine learning pipelines and data architectures that support document AI workloads in production environments.
- Define the technical vision and roadmap for document intelligence capabilities across the organization.
- Collaborate with cross-functional teams to translate business requirements into ML system designs, model architectures, and data platform decisions.
- Evaluate, adapt, and extend state-of-the-art NLP and vision-language models for document understanding tasks.
- Establish best practices for ML experimentation, model versioning, evaluation, and deployment (MLOps).
- Mentor and provide technical guidance to engineers and researchers across the team.
- Drive data architecture decisions that support both model training pipelines and downstream analytics and reporting needs.
- Publish or present research findings internally and, where appropriate, externally.
Qualifications
Required
- 10+ years of professional experience in R&D, machine learning, applied research, or data engineering.
- Deep expertise in Document Intelligence — including OCR, document parsing, layout analysis, information extraction, and classification.
- Strong data architecture background, including experience designing data lakes, feature stores, and ML data pipelines.
- Proficiency in Python and relevant ML frameworks (PyTorch, TensorFlow, HuggingFace Transformers, etc.).
- Experience taking ML models from research and prototyping through to production deployment at scale.
- Solid understanding of NLP fundamentals and modern large language/vision-language model architectures.
- Experience with cloud-based ML platforms and infrastructure (AWS, GCP, or Azure).
- Strong written and verbal communication skills — ability to convey complex technical concepts to both technical and non-technical stakeholders.
Preferred
- PhD or Master's degree in Computer Science, Machine Learning, Computational Linguistics, or a closely related field.
- Experience with document AI frameworks such as LayoutLM, Donut, PaddleOCR, Amazon Textract, or similar.
- Publications or contributions to peer-reviewed research in NLP, computer vision, or document understanding.
- Familiarity with enterprise document workflows — AP automation, contract processing, medical records, or similar domains.
- Prior experience in a principal, staff, or lead engineer capacity with ownership over a technical domain.