dave v

Researching contextual AI frameworks to evaluate user-AI interactions and support better creative outcomes

My Role

UX Researcher

Team

1 PhD Lead
4 UX Researchers

Skills

Interaction Design
Prototyping
UX Research

Timeline

10 months
Dec. 2023 - Sep. 2024

Officially published!

Click to read our paper

Link to check out our paper.
01 - Background

AI tools are major disrupters in creative work

Cursor AI
Adobe Generative Fill

It's true. They're not just accelerating productivity, but redefining how ideas are produced and refined altogether. Embedding these features directly into the workflow can reduce friction and spark new directions for creative problem-solving.

The Current Space

Although these features are changing the game…

They also raise more questions about their usefulness long-term.

Bigger picture questions about AI tools in a creative context.

Especially for creative work, today’s first pick might not be the same a year from now. Let's look at image generation for example: first it was DALL-E, then Midjourney took over, and now Adobe Firefly is built right into the Adobe Suite.

Tools come and go fast, and creatives have learned to be more selective with their toolkit. Adopting a tool means changing habits, workflows, and sometimes even their creative voice.

Identifying a Gap

Speed and seamlessness aren’t enough to secure users' trust


AI has to help people nurture their creativity instead of providing a shortcut to it. For this reason, it's risky to only be concerned with AI performance or user satisfaction.

While so many evaluations focus on outputs, fewer study how the structure of user-AI interaction shapes the experience in terms of cognitive engagement and creative depth.

Research Question

How does the positioning of AI within a creative workflow influence creative outcomes, cognitive effort, and how users perceive their own agency and the value of the AI?

Five Whys template
5 Whys
Competitive Analysis template
Competitive Analysis

To explore the role of structure (the timing and placement of AI support), we looked at integrating an LLM into design templates, a well-established strategy for guiding creative problem solving. This provides us a tangible way of observing how people think and develop creative ideas.

02 - Solution

How our plugin works

POV: Do you find brainstorming a potential solution to a problem challenging? Imagine where generative AI could support you in your workflow.

Scoping the Problem

Engage with different perspectives by generating reflective questions.


Consolidate your line of reasoning by generating a root cause to the problem.


Researching the Space

Jumpstart your idea validation by exploring competitors through focused comparisons that build toward meaningful synthesis.


Uncover overlooked players and factors by expanding your table with AI-suggested competitors and dimensions.

03 - Methods

How we constructed our research methodology

User Study Setup

Defining our conditions

We ran a between-subjects experiment that randomly assigned N=47 users to one of the following three conditions:

No-AI

Users approach a problem/solution without LLM assistance, manually filling out the templates based only on their current knowledge.


Co-Led

Users gain access to LLM generation features in specific parts of the templates, assisting with reflection or proposing alternative ideas.


AI-Led

Templates are already filled out by AI. Users don't have to initiate any writing, only reading and processing what was generated.

User Study Setup

Outlining the exercise protocol

An overview of our user study protocol

The research team and I led user studies that had participants walk through the problem solving process using the template plugin, followed by surveys and structured interviews.

Metrics

How we measured user outcomes across conditions


For our interviews and surveys, we were curious how the plugin shaped their thinking and creative outcomes, using literature-backed metrics and frameworks that assessed:

  • Reflective thinking
  • Creative quality
  • Cognitive load
  • Usability

Analyzing the effects of the tool, not the tool itself


This was something I learned in the process of defining the methodology. Similar to how you establish benchmarks for usability testing, we made sure our metrics provided insights for our research question.

This even reflected in how we reworked the plugin prototype for this study.

04 - Design

Building on the original plugin prototype

In reviewing the PhD’s earlier version of the plugin, we identified two major limitations with integrating AI into the problem-solving workflow:

Lacks user guidance

Users got stuck on how to reflect on AI-generated responses and extract key insights.

Not immediately usable

Users found AI-generated responses to be long-winded, inaccurate, repetitive, and hard to leverage.

In the interest of our research question,


We needed the interactions to keep users thinking so we could focus on how they’re affected by the tool. The following design principles guided our redesign process:

Design Decisions

Interactions with AI should be easy to comprehend and actionable.


The initial plugin had these AI-generated Q&As to prompt reflection. However, previous participants felt like they were repetitive summaries without offering specific angles or concrete ways to explore a competitor further.

Before picture of old plugin, showing AI generation features for the Competitive Analysis exercise. Before and after picture

Sometimes, the best interface is no interface. We realized these previous users struggled to act on AI insights because they were disconnected from where the work happened. Instead, we decided the easiest approach was to revise the backend to generate shorter, more targeted insights.

From a research perspective, this also gave us more chances to see how users actually synthesized and applied AI input, rather than just skimming it.

Design Decisions

The plugin should guide users to keep thinking and exploring across sections without dictating exactly what to do next.


As mentioned previously, there was no smooth handoff between reading AI insights and applying them. Users were often left unsure of what to do next, breaking their momentum instead of advancing their analysis.

Before picture showing the gap between interacting with plugin and template. Before and after. We added more guidance screens.

So we decided to expand the interface to include guidance screens for every interactive element in the table, not just competitor headers or add-buttons.

These screens provide light context on what each element is, how users can interact with it, and what role it plays in the overall exercise, keeping exploration fluid without defining a fixed path.

Design Decisions

Keep users focused on their main goals by ensuring every interaction directly supports their progress.


To observe how users perceived AI-generated content, the previous plugin allowed them to mark any generated material for validity. But most skipped this feature since it felt like an extra task that interrupted their workflow, taking time away from research and synthesis.

Before picture After picture

We decided to not move forward with this feature and instead embedded simple, contextual language like “Review and edit if needed” or “This may or may not reflect your own analysis.”

This subtly encourages reflection without pulling users out of the task.

Rejected Explorations

Possible actions to keep the flow going

We explored adding an idle state for the plugin for possible actions.

Something we considered was creating an idle state that suggested next steps for exploration. Due to how non-linear the exercise is, we wanted to prevent decision paralysis by offering possible actions to keep them actively engaged.

We realized later this risked narrowing or biasing their decision-making—especially in an open-ended task where we want to observe how AI organically influences their thinking.

Rejected Explorations

Populate competitors as well

Explored allowing competitor population as well.

In addition to populating dimensions, we explored letting users auto-generate entire rows for new competitors. The idea was to see whether giving users a full view of a single competitor might shift how they approach comparisons or structure their analysis.

We realized this risked over-structuring the competitor at once, turning it into a one-by-one review of each (just like the old plugin) instead of encouraging comparison and synthesis.

05 - Results

Results & Takeaways

As expected, AI integration influenced participants to incorporate more topics into their ideas. Co-Led and AI-Led users generated more categories than those in the No-AI condition.

But quantity wasn’t the whole story.

Differences emerged in how participants engaged with the template.

No-AI

They spent the most time revisiting and editing earlier responses, refining half-formed ideas as their understanding evolved.

No-AI participants' engagement behavior

Co-Led

In contrast, these participants treated the AI as a dialogue partner. Their focus was on shaping responses in the moment, responding rather than revising.

Co-Led participants' engagement behavior

AI-Led

On the other hand, they spent their time digesting generated content. Their process leaned less on imagination and more on remixing already provided information rather than constructing a new line of reasoning.

AI-Led participants' engagement behavior

As a result,

While No-AI participants expressed more confidence and ownership,

No-AI participants' feeling about the process

They were burdened with keeping track of everything as their understanding evolved or while gathering more context, which left less time for actually synthesizing ideas together.

While AI-led participants had exposure to more perspectives and topics early on,

AI-Led participants' feeling about the process

Users found it hard to explore beyond the AI’s suggestions because they seemed complete and convincing. And so, they spent more time deciding on their idea’s direction, leaving less room for depth and creativity.

And due to the AI’s perceived comprehensiveness, some users even accepted surface-level ideas without fully questioning them.

Co-Led participants struck more of a happy medium, but...

Co-Led participants' feeling about the process

Users expanded their thinking with AI while staying active in shaping/challenging ideas without being overwhelmed by information.

However, having a more complex understanding made them more self-critical toward their idea, as they grappled with unanswered “what-ifs” they felt weren't easy to resolve.

Product Strategy

What might creative AI support look like moving forward?


The patterns we saw in our study echoed a broader trend in today’s AI tools.

Much of the market still leans into one of two extremes:

1️⃣

Generate insights fast, connect later

AI-Led users synthesized the most ideas, but this may leave users preemptively satisfied at the first few convincing ideas rather than staying curious to develop them further.

2️⃣

Exclusively serve a supporting role

Co-Led users balanced exposure to new ideas with agency. But AI that only reacts to user input can be limiting at times — especially if the user get stuck or leaves their own assumptions unspoken.

If someone read our paper, how might they design a better AI product?


Even after we submitted the paper, I couldn’t stop wondering what an answer could look like. I decided to put myself in the shoes of someone working on an AI tool, perhaps my goal now is to take our current plugin and improve it even further.

Proof-of-Concepts

And so I decided to sketch out a bunch of ideas


  • Responsive workspaces - get help anywhere
  • Contextual pop-ups when you need them
  • Proactive guardrails - don't settle too soon
  • Branching off - go beyond the template

After filling out a few spreads, these were some of the themes that emerged from my ideation. They tie back to the gaps we saw in the study: users feeling boxed in, feeling stuck, needing a bit more help for instance. All while keeping it lightweight and leveraging AI's flexibility.

But two of them stood out...

Sketches

During this process, I bumped into some of the risks of making AI proactive. Too invasive and users will tune it out, too heavy-handed and you’ll flatten their process. Striking the right balance matters.

I went with these two since they’re lightweight enough not to overwhelm, can gently nudge the user at the right moments, and can draw on the template’s context to offer relevant, well-timed insights or tips.

Feed-Forward Prompting

Proactively help users drill deeper and expand their perspectives

Jumps in after a user finishes a thought, highlighting gaps or alternative ideas to encourage users to keep going even if they have a convincing idea.


Gentle Troubleshooting

Leveraging Figma chat to extend AI to anywhere in the workspace

Help users test assumptions to flesh out the best version of their ideas, transforming self-critical reflection into tangible validation.


06 - Our Journey

From design student to design researcher

Data Analysis

The trials of storytelling in research

Showing a messy collage of data visuals.

Our research process began with coding timestamps for key task activities and running thematic analyses on user interviews responses to pinpoint patterns.

However, we initially struggled to build a cohesive, data-driven narrative—partly because we were treating qualitative and quantitative data as separate silos.

Data isn’t just words and numbers


The turning point came when we reframed them not as “words” and “numbers,” but as reflections of “what participants thought” and “what they did.”

And so, we reversed our approach


Rather than beginning with thought processes and searching for matching behaviors, we started with statistically significant behaviors and traced them back to supporting quotes.

This clarified the why behind users’ actions and gave us a clearer lens to explain the various effects on their thinking.

A bar chart and icons for supporting quotes

An Unexpected Turn

Reconciling with unexpected results


In a pre-survey, we asked users if they had any design thinking background. Then, experts blindly rated user responses across Likert-scale benchmarks, which we graphed against experience levels.

We initially anticipated a clear correlation—but the data suggested something more complex.

Solution quality is not black and white – what qualities define a strong solution?

Shows the example of a participant's solution and its blind expert rating.

Most expert ratings aligned with the quality of the participants’ solutions—but a few outliers reminded us that strong ideas don’t always look polished.

One participant, for instance, proposed a loosely structured solution around adaptive cultural change in education.

Experts blindly rated their solution low in several areas due to the lack of “how” this solution would be implemented and “how” this affects its audience.

Shows their interview response explaining their process

Yet in their interview, they demonstrated clear signs of deep reflection: they questioned the AI’s logic, merged multiple ideas, and grounded their decisions in a human-centered way.

The Lightbulb Moment

Our most meaningful findings were about how users learned and felt during the process—not just what they produced.


Cases like those highlighted a key nuance in evaluating creativity: strong thinking isn’t always neatly packaged.

This further drove our final thesis, saying that AI tools should go beyond generating outputs. Instead, they should anchor their value propositions in providing structures for nuanced thinking and fluid ways to elevate their creative workflows.

That’s the real value.

07 - Reflection

Contributing to HCI in the era of AI

Thank you UCSD Design Lab for bringing me on! I valued the opportunity to contribute to the growth of creativity support tools (CSTs) and take initiative in various aspects of the research process.

Learning

Humanizing our key findings with storytelling

I thought research was only about pushing the envelope and finding novelty. Over these 10 months, I realized that results only matter if they reflect something about us, the people who use and get influenced by this type of technology.

Challenge

Doing academic research for the first time

It’s no secret that academic research demands rigor and many hats to be worn. While it took time to get up to speed with the literature and methods, the biggest lesson was staying present, being a self-starter, and staying hungry for more.

What I would have done differently

Seeking out more mentorship

Academic research meant the ceiling is high and felt hard to reach on my own. For me, the most invalubale part was getting advice from all the researchers I met, so I wish I had taken more chances to accelerate my growth by looping in more perspectives whenever I hit different stages or hurdles.

Have more questions about our paper?


I’m happy to walk through our research process in more depth or talk about bigger picture items, feel free to reach out to me at [email protected] or LinkedIn. Thanks for reading!

Message me!