Your first AI project is not just a project. It is the operating template every project that follows will inherit.
If the first project ships, in production, with a measurable outcome, the next one gets funded with a different question — “what should we do next?” If the first project becomes a demo, a stalled POC, or a workflow nobody trusts, the next conversation is “is this stuff even real?” The cost is not just the wasted budget. It is the eighteen-month chill that follows.
Mid-market executives feel this acutely. The big-tech AI marketing budget paints a picture of frictionless capability. The reality is that the MIT NANDA project found 95% of generative AI deployments produced zero measurable return, and Gartner finds one in five AI projects in IT operations collapses entirely. The difference between the 5% and the 95% is almost never the model. It is the selection.
This guide is for the executive choosing the first one. It is part of the broader question of why your AI experiments are failing — and the answer, most of the time, starts with what you chose to pilot.
Why the first project matters more than the second
There is a temptation to treat the first AI project as exploratory — “let us try something and learn.” That framing is wrong, and it predicts the outcome.
The first project does three things at once, whether you plan for it or not:
- It builds the operating muscle. Your team learns how to instrument workflows, run evals, deploy guardrails, monitor production, and roll back bad changes. That muscle is reusable. The next project costs a fraction because the foundation already exists.
- It sets the executive expectation. The board, the CFO, and the rest of the C-suite watch this one. What they see becomes their mental model of what AI does at your company.
- It funds the next project. A first project that moves a number gets the second project approved without a debate. A first project that does not gets the second project killed before it starts.
So the first project does not need to be the biggest. It needs to be the one most likely to ship and most likely to be defended at the next budget cycle. The companion piece Best AI Teams Go Narrow and Deep Before They Scale makes this case in detail; this article gives you the executive-level selection criteria.
What makes a good first AI pilot: the criteria
A good first AI pilot meets seven criteria. They are not nice-to-haves. They are the difference between a project that compounds and one that quietly disappears.
Criterion 1: There is a real workflow underneath it
The candidate is not “use AI in marketing.” It is a workflow you can describe in plain English: trigger, sources, rules, judgment, output, review, next action. If the description is vague, the pilot is not ready. The first conversation is not with a vendor. It is with the person who does the work today.
A useful diagnostic: ask the operating lead to walk you through one example of the workflow, end to end, with real names and real systems. If they cannot, the workflow is not legible enough to automate yet. That is fine. It is a finding, not a failure.
Criterion 2: There is a named owner with budget authority
The owner is a specific person — usually a director or VP of the function whose workflow is being changed. They sign off on the pilot scope. They run the operating review. They get credit if it works.
Pilots that report to “the AI committee” or “the innovation team” tend to die at the first integration request. The owner has to be operationally accountable for the workflow’s output, not the project’s success. Those are different things and the second one is what gets you a demo, not a production change.
Criterion 3: The data exists in a usable form today
Not “we have the data somewhere.” Usable form means accessible via supported APIs, schema-aligned, recent enough to be useful, and clean enough that the team responsible for the workflow already trusts it for non-AI reporting.
If the data exists but lives in PDFs, email attachments, or a system without an API, the first project is the data foundation, not the AI workflow. That is a legitimate first project — it just has to be scoped, measured, and funded as such. Pretending the data is ready when it is not is the single most common reason AI pilots fail. Gartner predicts organizations will abandon 60% of AI projects unsupported by AI-ready data through 2026. Do not be in that 60%.
Criterion 4: There is a measurable baseline you can defend
Before the pilot starts, you can answer: what is the current value of the number this workflow moves? Cycle time, error rate, cost per case, hours per report — pick the metric and measure the pre-state.
If you cannot, your first deliverable is the baseline, not the AI workflow. A project that “improves things” without a baseline cannot be defended at budget review. It will be killed by the next CFO who looks at the line item, regardless of how good the work was. This is the argument made in The Baseline Is the Strategy: without a baseline, the strategy is not measurable, which means it is not really a strategy.
Criterion 5: The output is concrete and consumed
“Insights” is not an output. A concrete output is something specific, in a known format, delivered to a known consumer, on a known cadence: “a weekly forecast variance summary delivered to the CFO by 8 a.m. Monday with three flagged accounts and a draft narrative.”
The output has to be something a real person already wants or already needs. If no consumer is waiting for it, the workflow will not get reviewed, the eval set will not get maintained, and the project will quietly drift into shelfware. This is the failure mode named in Why Impressive AI Pilots Become Shelfware.
Criterion 6: The blast radius is bounded
The first project should not have to be perfect to be safe. A pilot where a wrong answer costs the company a customer, a regulator a fine, or an employee their job is the wrong first project. Pick a workflow where the human stays in the loop, the cost of being wrong is small, and the path to escalate is short.
This is not risk aversion. It is operational realism. Your team is going to learn what production AI breaks like. They should learn it on a workflow where the consequence of breakage is “we sent a draft back for revision,” not “we miscalculated a payroll run.”
Criterion 7: It ships in 8–16 weeks, in production, to real users
Eight to sixteen weeks is the window where the project still has executive attention, the team still remembers why they started, and the operating muscle gets built without burning out. Beyond that, scope creep wins and the pilot becomes a program — at which point you have a different problem.
“In production” means real users depending on the output, not a sandbox. “Real users” means people in the function whose workflow you are changing. If the first version is only used by the team that built it, it is not a pilot. It is a demo.
The single most predictive criterion
Of the seven, the most predictive of success is Criterion 4 — measurable baseline. Projects that start with a defensible pre-state metric ship three times more often than projects that do not. Measure the baseline first, even if it takes two weeks. It will save you from killing the project later.
Anti-patterns: what not to pick first
Some candidates look like good first projects and are not. Knowing the anti-patterns saves quarters.
The chatbot for everything. “We want an AI assistant employees can use for any internal question.” Too broad, no owner, no output, no measurable baseline. This is the project that produces a 1,000-document Confluence dump, a generic RAG, and a Slack channel of disappointed users. Pick one repeated expert answer pattern instead, the way Five Signals to Help Pick Your First AI Workflow describes.
The strategic moonshot. A multi-quarter cross-functional transformation pitched as the first AI project. Executive attention will move on before it ships, and the project will fail at the integration boundary it was always going to fail at. Save it for project three or four, when the operating muscle is built.
The demo-driven pick. The candidate that wins because a vendor demo made it feel real. Vendor demos are designed to be impressive. They are not designed to be ship-ready in your environment. The fact that a vendor has a slick demo for it is not evidence that you can ship it.
The “save hours” project with no destination for the saved hours. Time savings are real, but if those hours do not convert to a tracked business outcome — capacity redeployed, headcount avoided, response time reduced — they will not survive the next budget review. Either pick a workflow where the savings translate to an operating number or stop calling it ROI.
The pilot without an owner who wants it. A pilot the IT or innovation team is excited about, but the function whose workflow is being changed is indifferent to. Indifferent owners do not maintain eval sets, do not review production behavior, and do not defend the project. The owner has to want this.
A practical first-project shortlist for mid-market executives
In our experience working with mid-market companies, four patterns repeat as good first projects. None of them is glamorous. All of them ship.
| Pattern | Why it works | What to measure |
|---|---|---|
| Finance: month-end variance summary | Repeating cadence, structured data, owner cares, output already required | Hours per close, variance items flagged, narrative quality |
| Customer support: escalation triage and draft response | Real workflow, measurable cost-per-case, bounded blast radius, human still sends | Time-to-first-response, escalation rate, CSAT on assisted tickets |
| Customer onboarding: status digest and missing-info detection | Concrete output, named consumer, data exists in CRM | Time-to-activation, churn during onboarding, owner workload |
| Operations: weekly operating report with narrative draft | Recurring, measurable, output already consumed by exec team | Time to assemble, leadership questions answered in the draft |
Each of these is covered in depth in the function-specific playbooks linked from our AI workflow examples by function hub. The examples are not generic; they are the workflows we see ship first across mid-market teams, with the metrics that defend them.
How to run the selection process
The selection process itself matters. Three steps:
Step 1: Surface the candidates from the work, not from the strategy deck. Ask the directors and VPs of each function: where is your team already doing the same painful work over and over, using scattered context, manual coordination, and human judgment to produce a usable output or next action? You will get five to ten candidates per function. None of them will look like an AI use case at first. That is the point.
Step 2: Score them honestly. Use the four-axis framework from the AI use case prioritization guide: impact, effort, data readiness, measurability. Be especially honest about effort. The cost is not the prototype. It is the production version — evals, guardrails, observability, oversight, rollback paths.
Step 3: Sequence, do not just rank. Pick the first project and the second project at the same time. The second project should reuse most of the operating muscle the first built. Sequencing this way turns the first project into the first installment of an operating model, not a one-off demo.
If you want help running this process, it is the entry point of our AEMI assessment and the Operational AI Opportunity Mapping engagement — an honest baseline of where your engineering and data foundations are, what workflows are operationally ready, and what should ship first.
What success looks like after 90 days
If you pick the first project well, here is what you should see at the 90-day mark:
- The workflow runs in production with real users depending on it.
- The baseline metric has moved. The CFO accepts the measurement.
- The team has built the operating muscle — evals, observability, guardrails, oversight — that the next project will inherit.
- The function owner is asking for the next workflow, because they have seen what is possible in theirs.
That is the standard. Not “AI was used.” Not “users engaged with the assistant.” Not “the demo went well at the board.” The standard is: a real workflow changed, a real number moved, and the next project costs less because the first one built the foundation.
Pick a First Project That Ships
We will run the selection process with your team — surfacing candidates from the work, scoring them honestly, and sequencing the first two projects so the second one inherits the operating muscle the first one builds.
This is the system underneath the chat box at its most decisive layer: the choice of which workflow gets the first version. The model matters less than the choice. The choice matters less than the discipline of running it through criteria the project can be held to. Pick well, ship in 90 days, and the second project becomes a different conversation. Pick poorly and you spend a year explaining the first one.
Frequently Asked Questions
How do I choose my first AI project?
Start from the workflow, not the AI idea. Ask each function lead where their team is doing repeated painful work using scattered context and manual coordination. Surface 5–10 candidates per function, score them on impact, effort, data readiness, and measurability, and pick the one that meets seven criteria: real workflow, named owner, usable data, measurable baseline, concrete output, bounded blast radius, and an 8–16 week production path.
What makes a good first AI pilot?
A good first AI pilot has a defined workflow with a named owner, data that already exists in a usable form, a measurable baseline, a concrete output consumed by a known person, a bounded blast radius (a wrong answer is not catastrophic), and a path to ship in production in 8–16 weeks. Most failed pilots violate at least three of these. The single most predictive criterion is a defensible baseline measurement before the pilot starts.
How long should the first AI project take?
Eight to sixteen weeks to a production version that real users depend on. Shorter than that usually means the build skipped production essentials — evals, guardrails, observability, oversight. Longer than that usually means scope creep set in and the pilot became a program. The window is where executive attention, team focus, and operating muscle alignment is highest.
Should the first AI project be the highest-impact use case?
No. For the first project, prefer ship-ability and measurability over raw impact. The first project's job is to build the operating muscle (evals, observability, guardrails, oversight) and prove that work can change. Once that foundation exists, the second and third projects can chase bigger impact because the marginal cost is lower. Compounding ships before the moonshot.
What should I avoid for a first AI pilot?
Avoid the all-purpose internal chatbot, the multi-quarter strategic moonshot, the vendor-demo-driven pick, and the project where the function whose workflow is being changed is indifferent to the outcome. Also avoid any candidate scoring 1 on data readiness — the first project becomes the data foundation in that case, not the AI workflow.
How do I know if a workflow is data-ready for AI?
The data exists in supported APIs, is schema-aligned, is recent enough to be useful, and is already trusted by the team for non-AI reporting. If the data lives in PDFs, emails, or systems without APIs, or if the function does not yet trust it for reporting, the first project is the data foundation. Gartner predicts 60% of AI projects without AI-ready data will be abandoned by 2026 — data readiness is the criterion most teams underweight.