Skip to content

From Predictive Models to LLM Agents: My Journey in AI Engineering

Reflections on 7+ years in AI engineering — from classical ML at enterprise scale to building LLM-powered agent systems.

2025-02-01 8 min read
CareerAI/MLReflections

Seven years ago, I started my career writing Java backend systems at a financial enterprise in Tokyo. Today I'm building LLM-powered agent pipelines that interview witnesses and automate ad operations. The path between those two points wasn't linear, and the lessons from each phase inform how I approach AI engineering now.

Phase 1: Backend Engineering (2017-2019)

My first two years at TIS Inc. taught me fundamentals that still matter daily — database optimization, API design, transaction handling, and the discipline of writing code that other people maintain. The biggest lesson: production systems need to be boring. Reliability beats cleverness every time.

The engineers who build the best ML systems aren't the ones with the deepest theoretical knowledge — they're the ones who understand software engineering fundamentals. Logging, error handling, monitoring, and testing are what separate research prototypes from production systems.

Phase 2: Applied ML (2019-2021)

At ALBERT Inc., I transitioned into applied machine learning — demand forecasting, customer behavior prediction, and Japanese NLP classification. This phase taught me the ML lifecycle end-to-end: data preprocessing, feature engineering, model selection, cross-validation, and deployment. The models were classical — LightGBM, XGBoost, scikit-learn — but the engineering challenges were real.

The most underrated skill in this phase was feature engineering. A well-crafted time-series feature could improve model performance more than switching from Random Forest to gradient boosting. Understanding the data deeply — its distributions, seasonality, and edge cases — mattered more than model architecture.

Phase 3: Production ML Systems (2022-2024)

BrainPad Inc. is where I learned to build ML at enterprise scale. Forecasting models serving retail clients, optimization pipelines for manufacturing, and the infrastructure to keep it all running. I introduced MLflow for experiment tracking, standardized validation workflows, and learned that model monitoring is where most ML projects die — not in training.

Key realizations from this phase:

  • Model retraining pipelines matter more than initial model quality
  • Experiment tracking isn't optional — it's the backbone of reproducibility
  • Stakeholder communication is an engineering skill, not a soft skill
  • The gap between 'works in notebook' and 'works in production' is enormous

Phase 4: LLM Applications (2025-Present)

The shift to LLM-based systems felt like a paradigm change, but the engineering discipline remained the same. RAG pipelines are data pipelines. Agent orchestration is workflow orchestration. Fine-tuning is model training with different tools. The abstractions changed; the principles didn't.

What's genuinely new is the capability ceiling. With classical ML, I could predict demand or classify text. With LLMs, I'm building systems that reason, adapt, and generate — forensic interview agents that dynamically select questioning strategies, ad trafficking pipelines that parse unstructured emails and coordinate across APIs. The scope of what's possible expanded dramatically.

What I've Learned Across All Phases

Principles that compound:

  • Start simple, measure, iterate. The minimal viable pipeline teaches you more than any architecture diagram.
  • Data quality beats model sophistication. Always.
  • Production is the only environment that matters. Demo-ready is not production-ready.
  • Fine-tuning small models often beats prompting large ones for well-defined tasks.
  • Explicit state management prevents agent chaos. LangGraph over bare LLM loops.
  • Graph databases outperform vector stores for structured knowledge.
  • The best ML engineers are software engineers first.

What's Next

I'm focused on two areas: multi-agent systems where specialized models collaborate on complex tasks, and evaluation frameworks that can reliably measure whether AI systems are actually doing what we want. The field is moving fast, but the fundamentals — clean data, solid engineering, rigorous evaluation — remain the foundation everything else builds on.

If you're starting in AI engineering, invest in software engineering fundamentals before diving into model architectures. The ability to build reliable, maintainable systems is what separates hobby projects from production impact.