• From Tool Access to Workflow Advantage: A Q&A with Professor Timothy DeStefano on Making GenAI Work in Companies

    How GenAI can achieve more than efficiency gains.

    Generative AI (GenAI) is quickly becoming a general-purpose business capability, one that can help organizations work faster, improve quality, and pursue new applications that were previously too costly, time intensive, or difficult to execute at scale. Yet despite high expectations around both productivity gains and new use cases, many organizations still struggle to move from experimentation to reliable, measurable impact.

    Timothy DeStefano - Headshot

    Timothy DeStefano: Associate Research Professor, Georgetown University McDonough School of Business

    So what differentiates organizations that realize real gains from those that don’t?

    To help answer that question, Analysis Group is partnering with Professor Timothy DeStefano – associate research professor at the Georgetown University McDonough School of Business and an applied economist specializing in digital technology, AI, and firm productivity – on a series of scientifically designed field experiments. In December 2025, Analysis Group launched an in-house randomized experiment to test not only whether GenAI could improve performance on demanding tasks – for example, systematic extraction of specific information from published literature – but also the impact of the structures surrounding the tool: task redesign, structured guidance, training, and quality-control mechanisms that help unlock productivity without compromising quality.

    In this Q&A, Managing Principal Lisa Pinheiro invites Professor DeStefano to share what he’s seeing so far in the firm’s experiment, why many firms remain AI hesitant, and what it takes to turn GenAI into an established firm resource rather than a set of ad-hoc prompts.

    Ms. Pinheiro: Many leaders believe GenAI will be transformative, but adoption is still uneven. What’s holding companies back?

    Professor DeStefano: Let’s start with what economists call the barriers to diffusion – the reasons a promising technology isn’t being used within organizations, even when its potential is obvious.

    In the case of GenAI, there are a few big themes. First, there are risk concerns: intellectual property, confidentiality, and the fear of employees pasting sensitive material into public tools. Second, there’s quality uncertainty: Even strong models can hallucinate or deliver output that looks plausible but isn’t correct. That’s not a reason to avoid the tools entirely, but it is a reason to be cautious about how they’re used in real workflows. The same was true in the early days of search engines: Not everything you found online could be taken at face value, but that didn’t make these tools any less valuable.

    Third, and this is more subtle but also critical, many organizations haven’t yet built the scaffolding that turns GenAI into sustained performance gains. You can buy access to a tool in a day. But turning that access into measurable value requires decisions about where the tool should be used, how quality will be protected, and how performance will be measured over time.

    Lisa Pinheiro - Headshot

    Lisa Pinheiro: Managing Principal, Analysis Group

    You’re part of Analysis Group’s effort to launch an in-house experiment on GenAI and structured guidance. What’s the value of taking an experimental approach?

    It’s the most direct way to move beyond anecdotes and to design for real-world implementation at the same time. A lot of the public conversation about GenAI is based on demonstrations, individual success stories, or broad projections about macro-level productivity. Those are meaningful motivators for potential use, but they don’t tell a firm what it needs most: what works in our workflows, for our people, under our quality standards.

    For Analysis Group, we didn’t want to test only whether the model could perform the task (a time-consuming extraction activity that demands both accuracy and speed). We wanted to test whether a redesigned version of the task itself could work in practice. In other words, we didn’t want to find out just about AI capability; we also wanted to know about the structure, guidance, and guardrails that make that capability usable inside a real organization.

    So our experiment was designed to measure two things at once: the impact of the tool, and the impact of embedding that tool inside a carefully designed workflow with training, quality checks, and clear human-AI handoffs. That’s what allows the results to translate into implementation.

    Why focus the experiment on data extraction tasks? 

    Because systematic extraction and synthesis are foundational to high-quality analysis – and they’re also among the harder real-world tests of GenAI in an enterprise setting. These tasks require much more than summarization. They demand accurate extraction, reconciliation across potentially conflicting sources, and defensible judgments about what is credible and relevant. That makes them a natural stress test for the core enterprise question: Can GenAI increase efficiency without lowering standards?

    The specific task we used for the experiment focused on extracting highly specific information from a set of scientific articles reporting on clinical studies. Some data elements were readily available, while others had to be calculated from figures and tables. Some tasks were simple – for example, identifying the name and date of a particular clinical trial. Others required interpretation of potentially subtle criteria for inclusion or exclusion. And some involved inference and technical terminology specific to that literature, which could require additional domain knowledge.

    We also chose this task because its core requirements (accuracy, transparency, traceability, and consistency) show up in many enterprise workflows, including compliance, risk, strategy, and due diligence.

    How Information Extraction Tasks Work

    What are the early results you’re seeing from the experiment?

    So far, the results reinforce themes that are both intuitive and a bit surprising once you see them.

    First, GenAI can drive meaningful efficiency gains in the right tasks. But the more important insight is that training and structured guidance can change how people use the tool, and that can change outcomes in a meaningful way. It’s not only about saving time; it’s about reshaping the work so that people spend less effort on mechanical steps and can direct their efforts to higher-value judgments. One of the signals that’s emerged clearly is that GenAI combined with structured guidance helped people shift their cognitive focus – away from manual extraction and toward critical review, reconciliation, and decision making. In a quality-sensitive environment, that shift is exactly what you want.

    Second, the guidance isn’t just “how to prompt.” The real leverage often comes from redefining the task: being explicit about what the model is allowed to do, what the human must do, how outputs should be checked, and what “good” looks like.

     


    “[T]he guidance isn’t just ‘how to prompt.’ The real leverage often comes from redefining the task: being explicit about what the model is allowed to do, what the human must do, how outputs should be checked, and what ‘good’ looks like.”

    – Timothy DeStefano

    You’ve used the phrase “structured, task-specific guidance.” What does that mean in practice?

    Think of it as workflow design rather than a set of tool tips.

    For example, structured guidance can encompass steps like task decomposition: clear steps that separate what GenAI drafts or extracts from what a human verifies, reconciles, and finalizes. It can also include prompt and output templates, which can standardize inputs and make outputs easier to evaluate.

    Another ingredient is explicit verification steps, so that all participants are on the same page when it comes to confirming results. For data extraction tasks, this means cross-checking key facts, confirming citations, comparing across sources, and flagging uncertainty. Decision rules and escalation paths also matter: They can offer guidance on when the model output is “good enough,” when it must be re-run, and when it requires subject-matter escalation. And finally, feedback loops can help participants capture recurring failure modes and update their playbook so the organization improves over time.

    The key point is that the value doesn’t come from telling people to “use AI” but from designing workflows that specify where AI adds value, how outputs are validated, and who remains accountable.

    Structured Guidance Workflow Design

    You’ve worked with companies on AI strategy and value measurement. What’s the most common mistake you see?

    The most common mistake is assuming that providing the technology guarantees the outcome.

    Even before the GenAI wave, a central lesson from AI adoption was that performance gains come not from the technology itself but from how effectively it’s adopted – whether it’s useful to employees, embedded in real workflows, and supported by clear norms of accountability. My engagement with companies has often focused on building strategies for implementation and measurement, because without measurement you don’t know what’s working, and without workflow integration you can’t scale what works.

    This is also why the Analysis Group experiment includes structured guidance. The point is to test the idea that performance gains are not automatic; they’re shaped by how the tool is deployed and how the human work is designed around it.

     


    “The key point is that the value doesn’t come from telling people to ‘use AI’ but from designing workflows that specify where AI adds value, how outputs are validated, and who remains accountable.”

    – Timothy DeStefano

    If a company is unsure how to start offering thoughtful guidance on GenAI use, what advice would you give them?

    Start with experimentation – but do it deliberately.

    A practical starting point is to identify use cases where value and risk are both clear. Look for tasks that are text- or knowledge-intensive and repeatable, and currently consume significant time. Then pilot GenAI in a way that protects quality: Define what “good” means, build verification into the process, and measure outcomes.

    And don’t stop at the tool. Experiment with training and structured guidance, because that’s where many organizations will find the difference between “interesting examples” and “repeatable gains.” During this process, businesses often uncover new, unexpected ways to embed the technology into their workflows, which can result in additional performance gains.

    Finally, build a habit of transparency and measurement. The firms that win won’t just be the ones that adopt GenAI early – they’ll be the ones that can prove that it works, understand why, and scale it responsibly. ■