CHIIR ’26 proceedings in the ACM Digital Library — full citations and abstracts.
Data reuse expedites scientific progress and conserves resources, yet connecting researchers with datasets available for reuse remains a challenge. The scientific community has proposed several recommendation systems to help identify relevant data and maximize reuse. However, test collections or benchmarks for evaluating the performance of dataset recommendation systems are rare, particularly for social science dataset retrieval. To address this gap, we created a novel test collection for evaluating social science dataset recommendation systems. Our collection includes 249,102 query–dataset pairs with relevance judgments, featuring 262 unique search queries and 10,749 datasets. We describe how we created this collection using datasets archived at the Inter-university Consortium for Political and Social Research (ICPSR) and the ICPSR Bibliography of Data-related Literature. Additionally, we demonstrate a potential use case by evaluating the performance of embedding-based recommendation models on our test collection. The test collection is available through ICPSR at https://doi.org/10.3886/E238682V1.
Understanding and classifying query intents can improve retrieval effectiveness by helping align search results with the motivations behind user queries. However, existing intent taxonomies are typically derived from system log data and capture mostly isolated information needs, while the broader task context often remains unaddressed. This limitation becomes increasingly relevant as interactions with Large Language Models (LLMs) expand user expectations from simple query answering toward comprehensive task support, for example, with purchasing decisions or in travel planning. At the same time, current LLMs still struggle to fully interpret complex and multifaceted tasks. To address this gap, we argue for a stronger task-based perspective on query intent. Drawing on a grounded-theory-based interview study with airport information clerks, we present a taxonomy of task-based information request intents that bridges the gap between traditional query-focused approaches and the emerging demands of AI-driven task-oriented search.
When undertaking complex search tasks that cannot be completed in a single session, resuming the search task can be challenging. When the search is conducted over multiple platforms, this challenge increases. While aggregated search interfaces support the synthesis of search results from multiple sources, cross-session support in such settings remains underexplored. In this research, we study a set of novel workspace approaches to this problem, within a digital humanities search context. We introduce an interactive cluster refinement (ICR) workspace that automatically organizes saved resources using thematic clustering, preserves their provenance, and provides an interactive method for refining the cluster to match the searcher’s mental model. Building on this foundation, we investigate two augmentations to facilitate reacquaintance with previous search activities: interactive passage highlighting (ICR+IPH), which lets searchers mark salient passages as a reminder of why resources were saved; and AI-generated summarization (ICR+AIS), which produces concise overviews of saved resources with inline citations. We combine these together into a fourth interface (ICR+IPH+AIS), where the AI-generated summarization is instructed to make use of the passage highlighting. We conducted a cross-session laboratory study with n = 36 participants to compare these interface alternatives with respect to measures of utility, perceived value, metacognition, exploratory search behaviour, and cross-session support. Both ICR+IPH and ICR+AIS improved outcomes relative to ICR. The combined ICR+IPH+AIS interface was most effective in enabling cross-session searching, yielding faster task resumption, higher resource precision, greater perceived value and knowledge gain, and stronger support for metacognitive planning and evaluation as well as exploratory search outcomes.
Conversational commerce uses digital assistants to support the search process and decision-making in e-commerce. Effective communication in these interactions can be facilitated by assistants adapting their communication style to users and supporting shared understanding. An open challenge in this context is adapting the presentation of complex product information to users with varying levels of domain knowledge. To investigate strategies for such knowledge-level adaptation, we set up a chatbot-assisted laptop search scenario. In a between-subjects experiment (n = 251), we examined novice and expert perceptions of product attribute recommendations presented as technical information only (T), or augmented with performance categories (TC), attribute explanations (TE), or both (TCE). For novices, approaches with explanations (TE, TCE) were perceived as more helpful and led to higher perceived learning than those without. Novices also rated the combined approach (TCE) more appropriate than the baseline (T) and TC in terms of information quantity, indicating that explanations are crucial to understand and benefit from performance categories. Critically, experts showed no significant differences across conditions, suggesting that providing supplementary information beneficial to novices did not detract from their experience. We distill these findings into four concrete design guidelines for inclusive text-based product advisors in technical domains: use TCE by default; keep a single inclusive interface; avoid standalone categories; and support user agency and personalize to the stated use case.
Compared to search engine result pages (SERPs), AI-generated podcasts represent a relatively new and relatively more passive modality of information consumption, delivering narratives in a naturally engaging format. As these two media increasingly converge in everyday information-seeking behavior, it is essential to explore how their interaction influences user attitudes, particularly in contexts involving controversial, value-laden, and often debated topics. Addressing this need, we aim to understand how information mediums of present-day SERPs and AI-generated podcasts interact to shape the opinions of users. To this end, through a controlled user study (N = 483), we investigated user attitudinal effects of consuming information via SERPs and AI-generated podcasts, focusing on how the sequence and modality of exposure shape user opinions. A majority of users in our study corresponded to attitude change outcomes, and we found an effect of sequence on attitude change. Our results further revealed a role of viewpoint bias and the degree of topic controversiality in shaping attitude change, although we found no effect of individual moderators.
Deepfake technologies are powerful tools that can be misused for malicious purposes such as spreading disinformation on social media. The effectiveness of such malicious applications depends on the ability of deepfakes to deceive their audience. Therefore, researchers have investigated human abilities to detect deepfakes in various studies. However, most of these studies were conducted with participants who focused exclusively on the detection task; hence the studies may not provide a complete picture of human abilities to detect deepfakes under realistic conditions: Social media users are exposed to cognitive load on the platform, which can impair their detection abilities. In this paper, we investigate the influence of cognitive load on human detection abilities of voice-based deepfakes in an empirical study with 30 participants. Our results suggest that low cognitive load does not generally impair detection abilities, and that the simultaneous exposure to a secondary stimulus can actually benefit people in the detection task.
LLMs have great potential for shaping how people find and understand information. However, current tools can struggle to provide authoritative sources, fabricate plausible references, and present obstacles to assessing truthfulness of their outputs. Understanding how users verify LLM outputs is particularly important in scholarly disciplines where information produced becomes the foundation of future knowledge. We investigated the factors that influence academic researchers’ decisions to verify LLM responses, their verification strategies, and the effectiveness of those strategies. We conducted a naturalistic think-aloud study, followed by a semi-structured interview, where we observed 16 researchers across disciplines using LLMs of their choice to conduct a research information-seeking task. Our findings highlight that prevailing LLM design can hamper users’ ability to satisfy their information needs for several reasons, such as lack of transparency about sources used in LLM outputs and lack of faithfulness of LLM outputs to the source. Based on these findings, we discuss how future LLMs can better support users in effective verification.
Tag genome is widely used in recommender systems research to, for example, measure item similarity, make recommendations and generate recommendation explanations. Applying tag genome to problems in cross-domain recommendation, however, is complicated by the limited item overlap between cross-domain recommendation data sets and the available tag genomes. Furthermore, existing tag prediction models rely on content-based features that are not readily available in a majority of recommendation data sets. To address these issues, we generated tag genomes for both movies and books based on the Amazon data set, which is widely used in cross-domain recommendation research. These new tag genomes are over 200 × larger than the previous versions and can support comparative evaluation of tag-based and collaborative methods, facilitate the development of new cross-domain recommendation algorithms and provide a foundation for studying phenomena, such as serendipity and diversity, across multiple domains. Both data sets and the data generation pipeline are freely available at https://github.com/Bionic1251/Expanded-Tag-Genomes.
Understanding how working memory (WM) capacity influences cognitive load (CL) during interactive information retrieval (IIR) tasks is critical for designing effective user interfaces. This knowledge supports minimizing cognitive overload, optimizing performance, and tailoring systems to individual cognitive profiles. This study examines CL dynamics and emotional responses during two types of search tasks: fact-checking (FC) and decision-making (DM), among individuals with varying WM capacity, and explores the relationship between CL and emotional states. CL was estimated in near-real-time using pupil diameter signals and analyzed across task phases (begin, mid, end). Participants were divided into high and low WM groups based on N-back task performance. Results show that CL was higher during DM tasks, especially for low WM individuals, while no group differences emerged during FC tasks. CL was highest at the beginning of the tasks and decreased over time, suggesting cognitive adaptation. Analysis of emotions revealed that high WM individuals exhibited positive correlations between CL and valence, joy, and engagement, whereas low WM individuals showed stronger associations with confusion and surprise, particularly during DM tasks. These findings highlight the importance of WM capacity in shaping cognitive and emotional experiences, offering insights for personalized and adaptive system design.
Providing educational services to students with disabilities requires a unique combination of professional knowledge, and adherence to complex compliance and auditing goals. This study investigates the information access gaps within special education health services. Through think-aloud activities with therapy supervisors, practitioners, and administrative staff, we elicited explicit and implicit information needs, and identified significant challenges in current information seeking and documentation practices in this highly regulated environment. Thematic analysis of 329 coded segments (including 120 task descriptions, 123 pain points, and 86 desired features) reveals systematic mismatches between information capture and retrieval practices, which practitioners navigate daily, highlighting a necessity for information seeking beyond traditional search and demonstrating use cases for AI-augmented information retrieval. We present design implications, developed in collaboration with these stakeholders, focused on enhancing task documentation and designing interaction to support these needs, offering a roadmap for future information interaction systems in this complex, multi-stakeholder domain.
The diversification of information access systems, from RAG to autonomous agents, creates a critical need for comparative user studies. However, the technical overhead to deploy and manage these distinct systems is a major barrier. We present UXLab 1,2, an open-source system for web-based user studies that addresses this challenge. Its core is a web-based dashboard enabling the complete, no-code configuration of complex experimental designs. Researchers can visually manage the full study, from recruitment to comparing backends like traditional search, vector databases, and LLMs. We demonstrate UXLab’s value via a micro case study comparing user behavior with RAG versus an autonomous agent. UXLab allows researchers to focus on experimental design and analysis, supporting future multi-modal interaction research.
Generative AI (gen AI) tools are rapidly reshaping higher education, influencing research, teaching, learning, and administrative work. This study investigates when, how, and why gen AI is used and not used in a research-intensive university. We conducted in-depth surveys with faculty and staff in mid-2024 (n=102) and mid-2025 (n=101) to capture evolving adoption and perception. Across both years, over 90% of respondents reported intentional use of gen AI, though some described unintentional use or intentional non-use. The most frequently mentioned tools included ChatGPT, Copilot, Gemini, Claude, and Adobe Firefly.
Through content analysis of gen AI uses described in 2024 (n=528) and 2025 (n=616) , we identified a diverse range of gen AI-supported tasks: writing and editing text or code, summarizing text, creating objects, generating ideas, finding information, problem-solving, and processing or transforming objects often during data-intensive tasks. Respondents expressed slightly negative perceptions of gen AI output quality and credibility, especially when compared to human work, but acknowledged modest benefits for productivity and performance. Privacy, confidentiality, misinformation and bias, intellectual property, and environmental impact emerged as key concerns. By examining patterns of use, non-use, and perceptions across two years, this study provides an empirical understanding of how faculty and staff engage with gen AI tools for seeking, processing, and producing information, highlighting implications for policy, training, and ethical guidance in higher education.
Users often turn to online forums when searching for known books, movies, or games that they cannot identify through conventional search engines. These “tip-of-the tongue” requests present a unique challenge, appearing highly variable in formulation, context, and specificity. So far, these could mostly only be solved by other humans answering in forums. Generative AI is believed to help solve these specific questions. In this work, we manually annotated 150 requests each for books, games, and movies in the casual leisure domain to study the differences between solved and unsolved requests and identify factors that influence their difficulty. We compare human responses in forum threads with the performance of a Large Language Model (LLM) under similar conditions. Specifically, we investigate how the formulation of requests affects human and LLM success; how item properties impact LLM retrieval; how interaction and feedback within a thread shape human and LLM performance; and whether increasing the information provided to an LLM improves its chances of solving the request. Our findings offer new insights into what makes these known-item search problems easier or harder to solve. This study contributes to a better understanding of complex search behavior and the role of LLMs in helping with difficult casual-leisure information needs.
Generative AI (GenAI) tools are transforming information seeking, but their fluent, authoritative responses risk overreliance and discourage independent verification and reasoning. Rather than replacing the cognitive work of users, GenAI systems should be designed to support and scaffold it. Therefore, this paper introduces an LLM-based conversational copilot designed to scaffold information evaluation rather than provide answers and foster digital literacy skills. In a pre-registered, randomised controlled trial (N=261) examining three interface conditions including a chat-based copilot, our mixed-methods analysis reveals that users engaged deeply with the copilot, demonstrating metacognitive reflection. However, the copilot did not significantly improve answer correctness or search engagement, largely due to a "time-on-chat vs. exploration" trade-off and users’ bias toward positive information. Qualitative findings reveal tension between the copilot’s Socratic approach and users’ desire for efficiency. These results highlight both the promise and pitfalls of pedagogical copilots, and we outline design pathways to reconcile literacy goals with efficiency demands.
Community workers play a central role in connecting marginalized populations to housing, health, and social services, translating complex policies and organizational systems into actionable support. Digital tools are increasingly embedded in these workflows, yet little is known about how they are adopted, adapted, and sustained in everyday practice. This study examines the socio-technical practices of community workers in two urban, low-income organizations, focusing on how organizational constraints, such as policies, staffing, and funding, and operational constraints, such as workload pressures, fragmented data, and time-sensitive tasks, shape technology use. Drawing on semi-structured interviews with 16 community workers, we show that effective engagement with digital systems depends on human expertise, workflow alignment, and institutional context. Our findings reframe community workers as socio-technical actors and provide actionable insights for designing human-centered digital infrastructures and smart city technologies that align with operational realities and enable effective, context-sensitive service delivery.
Personality traits influence how individuals engage, behave, and make decisions during the information-seeking process. However, few studies have linked personality to observable search behaviors. This study aims to characterize personality traits through a multimodal time-series model that integrates eye-tracking data and gaze missingness–periods when the user’s gaze is not captured. This approach is based on the idea that people often look away when they think, signaling disengagement or reflection. We conducted a user study with 25 participants, who used an interactive application on an iPad, allowing them to engage with digital artifacts from a museum. We rely on raw gaze data from an eye tracker, minimizing preprocessing so that behavioral patterns can be preserved without substantial data cleaning. From this perspective, we trained models to predict personality traits using gaze signals. Our results from a five-fold cross-validation study demonstrate strong predictive performance across all five dimensions: Neuroticism (Macro F1 = 77.69%), Conscientiousness (74.52%), Openness (77.52%), Agreeableness (73.09%), and Extraversion (76.69%). The ablation study examines whether the absence of gaze information affects the model performance, demonstrating that incorporating missingness improves multimodal time-series modeling. The full model, which integrates both time-series signals and missingness information, achieves 10-15% higher accuracy and macro F1 scores across all Big Five traits compared to the model without time-series signals and missingness. These findings provide evidence that personality can be inferred from search-related gaze behavior and demonstrate the value of incorporating missing gaze data into time-series multimodal modeling.
Despite increasing interest in data storytelling, it remains unclear how the choice of communication medium shapes its effectiveness, particularly for audiences with varying levels of data literacy. This paper reports on a controlled, longitudinal experiment comparing verbal to written storytelling alongside a baseline data visualization condition. Each condition employed simple graphs and an author-driven narrative to examine their effects on recall and attitude change. Results showed mixed results of data storytelling: while storytelling did not improve recall, verbal storytelling and no storytelling facilitated long-term attitude change, whereas written storytelling did not. Higher data literacy supported long-term recall but was associated with smaller immediate attitude shifts, an effect that diminished over time. These findings challenge assumptions about the universal advantages of narrative-based communication, demonstrating that medium, topic familiarity, and audience characteristics jointly determine outcomes. The study contributes empirical evidence to the field and calls for further research into how narrative structures and visualization complexity affect the effectiveness of data storytelling.
The creation of a financial pitch book is a complex and information-intensive task, which requires analysts to gather data from disparate sources and synthesize it into a compelling evidence-based narrative. Current tools support document creation but not the underlying cognitive processes of information synthesis and narrative structuring. We introduce "BayesDeck," an interactive IR/NLP system designed to assist financial analysts in this narrative construction process.
BayesDeck’s core innovation is a Bayesian narrative model that probabilistically reasons about the relevance and impact of various data points (e.g., market size, competitive landscape, financial metrics) on a proposed investment thesis. The system proactively retrieves and suggests data from a multi-source corpus, not merely based on keyword matching, but on the inferred narrative contribution to the evolving pitch. The interactive interface allows analysts to: (1) specify a high-level investment thesis, (2) view the system’s Bayesian "reasoning trail" for suggested data points, visualizing the strength of evidence for and against the thesis, and (3) accept, reject, or critique suggestions, thereby continuously refining the model and the emerging narrative structure.
We evaluated BayesDeck through a controlled laboratory study with 18 financial analysts, comparing it to a baseline system without the Bayesian reasoning component and a traditional web-search-driven workflow. Our findings indicate that the Bayesian model significantly enhances the human-AI collaboration. Analysts using BayesDeck produced narratives rated 24% higher in argumentative coherence and evidential support by expert blind reviewers. Furthermore, qualitative feedback revealed that the transparent, probabilistic reasoning of the Bayesian model fostered greater user trust and a sense of agency, transforming the process from one of document assembly to one of collaborative reasoning and storytelling. This work demonstrates the profound potential of embedding scrutable, probabilistic models into interactive IR systems to support complex human information synthesis and narrative tasks.
The rise of Large Language Models (LLMs) has ushered in a wave of conversational search engines that allow people to engage in dialogues with LLM-infused chatbots to seek information. As people tend to infer personalities from digital social interactions, and given that personality cues have been shown to affect credibility, these perceptions of chatbot design may shape how users assess the credibility of information in conversational search. In this study, we conducted a controlled online study with 190 participants who assessed conversational search results with chatbots designed to exhibit different levels of personality traits. We found that in conversational search, personality can affect perceptions of credibility. Specifically, perceived conscientiousness and agreeableness of a chatbot can increase credibility, while perceived extraversion and neuroticism can decrease the credibility of the information. This research contributes to our understanding of how conversational interfaces and their personality and persona designs can impact credibility. We also provide design implications for conversational search interfaces based on our findings.
Online digital library searchers who find themselves engaged in multiple concurrent complex search tasks may encounter potentially serendipitous information for one task while pursuing another. This can propel them in unexpected and exciting directions that enhance their knowledge. However, existing search interfaces provide limited support for searchers to handle such serendipitous information encounters, forcing them into an often disruptive choice between continuing their current search or pursuing the new discovery. To address this issue, we introduce Revelio: an academic digital library search interface based on a ‘save now, organize later’ workflow. Revelio provides a low-effort mechanism for deferring potentially serendipitous information encounters, and a semantic similarity approach for subsequently reviewing and saving such search results within a multi-workspace structure. We evaluated Revelio in a between-subjects controlled study with 28 participants using a novel experimental design that creates the opportunity for serendipitous information encounters within a concurrent complex academic search task context. Compared to those who used a baseline search interface, participants who used Revelio were able to effectively defer and subsequently review encountered information, resulting in positive opinions about the interface, increased performance in both prescribed search tasks, and greater perceived knowledge gain. These findings demonstrate the value of search interfaces that explicitly support the deferral and subsequent review of potentially serendipitous information.
People frequently use search engines in the lead-up to elections, and the results they encounter can influence their voting decisions. In this study, we analyzed the stances of search results from Google and Bing in Germany related to the 2024 European Parliament elections, as well as the types of sources presented to users. We collected 760 search results for 38 political queries and had jurors assess their stances, with each result being evaluated by five jurors. The findings reveal that public authority and journalistic pages dominate the search results, while political party pages are the least represented source category. Furthermore, we found that Google and Bing predominantly display neutral search results, with neutral stances being more common in Google than in Bing. Additionally, in both search engines, search results that indicate a political leaning tend to align more closely with left-leaning parties than with right-leaning ones. While acknowledging that the findings may be influenced by how the queries were formulated, the results highlight the importance of using multiple search engines to access diverse political viewpoints. They also raise questions about whether both agreeing and disagreeing results should be displayed for controversial topics.
With the rapid proliferation of large language models (LLMs), users are increasingly turning to these systems to fulfill their everyday information needs. Unlike traditional search engines, which rely on structured query-response mechanisms, LLMs offer direct answer synthesis—fundamentally altering how users access and interact with information. However, the effectiveness of such direct responses and how users interact with such answers were not studied. In this work, we present CollabSearch, a user-LLM collaborative search system that enables users to interact with LLMs through adaptive role-playing prompts designed to guide and refine search outcomes. Users leverage the internal knowledge of LLMs while also benefiting from retrieval-augmented generation (RAG) to ensure relevance and currency. We evaluate our system through a task-based study involving 24 participants. The results demonstrate the effectiveness of our approach and offer nuanced insights into user–LLM interaction dynamics in search contexts.
Politeness is a core dimension of human communication, yet its role in human–AI information seeking remains underexplored. We investigate how user politeness behaviour shapes conversational outcomes in a cooking-assistance setting. First, we annotated 30 dialogues, identifying four distinct user clusters ranging from Hyperpolite to Hyperefficient. We then scaled up to 18,000 simulated conversations across five politeness profiles (including impolite) and three open-weight models. Results show that politeness is not only cosmetic: it systematically affects response length, informational gain, and efficiency. Engagement-seeking prompts produced up to 90% longer replies and 38% more information nuggets than hyper-efficient prompts, but at markedly lower density. Impolite inputs yielded verbose but less efficient answers, with up to 48% fewer nuggets per watt-hour compared to polite input. These findings highlight politeness as both a fairness and sustainability issue: conversational styles can advantage or disadvantage users, and “polite” requests may carry hidden energy costs. We discuss implications for inclusive and resource-aware design of information agents.
Our research in this paper lies at the intersection of Generative AI (GenAI) and search-as-learning (SAL). GenAI technologies (e.g., ChatGPT) have revolutionized how people search for and interact with information. However, we do not yet fully understand how people use GenAI systems to learn about complex topics. SAL research has studied how different tools can support learning with traditional document retrieval systems. Our research closely relates to SAL work that has investigated the effects of goal-setting on learning during search. We explore the influence of goal-setting on learning during information-seeking sessions with a GenAI system. We report on a between-subjects crowdsourced study (N = 120) in which participants were asked to learn about a complex topic using a GenAI system. The study had four conditions that varied along two factors (a 2 × 2 design). The first factor involved displaying related web results in addition to the GenAI output. The second factor involved giving participants access to the Subgoal Manager (SM), a tool designed to help people develop subgoals and take notes. We investigated the effects of both factors on: (RQ1) perceptions; (RQ2) behaviors; (RQ3) learning and retention; (RQ4) the types of requests issued to the system; and (RQ5) participants’ motivations for engaging (or not engaging) with the related web results. Results found that participants with access to the SM had higher post-task learning outcomes, did less copy/pasting into their notes, perceived the task as more difficult, and requested more examples and support for differentiating concepts from the GenAI system.
Online discussions have become integral to how people exchange ideas, form opinions, and participate in collective deliberation. While sighted users can comfortably engage with online discussions, blind users who are dependent on screen readers are forced to listen to long threads narrated in a single, monotonic voice that lacks prosodic variation, rhythm, or emotion. This robotic auditory experience not only deteriorates the user engagement with the content but also increases cognitive strain, by making it difficult to remain attentive and discern meaning beyond literal words. In an interview study, most blind participants reported that monotonous narration hindered their ability to detect salient information, perceive emotional cues, and comprehend content authors’ intents in discussions. Many described experiencing mental fatigue when listening to ‘flat’, ‘uninspiring’ voices, noting that their attention tended to diminish quickly over time. The participants also indicated that they often tried to ‘add’ prosodic variation or emotional inflection themselves in their minds, but characterized this compensatory effort as mentally taxing and cognitively demanding. To address this issue, we introduce VoxVista, a multi-voice design framework driven by a large language model that leverages a custom voice-preference dataset to assign personalized voice profiles to user posts in discussions, thereby replacing the traditional monotone narration in screen readers with a more expressive, dynamic, and contextually-aware narration. In a study with 20 blind participants, we observed that VoxVista significantly improved user engagement, comprehension, and willingness to continue listening to longer discussions.
While online shopping platforms provide convenience and autonomy to blind users, their non-visual interactions remain underexplored at a micro-behavioral level. Existing studies have primarily emphasized accessibility and usability challenges but have overlooked how fine-grained, screen reader-driven keystroke-level behaviors reflect users’ cognitive strategies. In this paper, we present the findings of a longitudinal study with 25 blind participants to examine their micro-behavioral patterns, using keyboard activity and screen reader logs on both familiar and unfamiliar e-commerce websites. We complemented this study with semi-structured interviews to contextualize the uncovered micro-behavioral patterns. Our results revealed patterns in how blind users draw upon cognitive maps and well-established shortcut routines developed on familiar websites to streamline navigation on unfamiliar platforms. However, unfamiliar websites, even when structurally accessible, often introduced elevated navigation entropy, increased shortcut failures, and induced more exploratory behavior, as users worked to reconstruct new mental models. Additionally, we also identified a strong preferential structure in keyboard shortcut use, where users maintain a personalized and often chronologically-ranked sequence of keystrokes. Furthermore, most users approached shopping with pre-planned objectives, relying on targeted search queries rather than broad ad-hoc product exploration for securing the ‘best deals’. Based on the study insights, we discuss design considerations for assistive technology developers and e-commerce websites to further improve the online shopping experience for blind users.
Information access systems such as search engines and generative AI are central to how people seek, evaluate, and interpret information. Yet most systems are designed to optimise retrieval rather than to help users develop better search strategies or critical awareness. This paper introduces a pedagogical perspective on information access, conceptualising search and conversational systems as instructive interfaces that can teach, guide, and scaffold users’ learning. We draw on seven didactic frameworks from education and behavioural science to analyse how existing and emerging system features, including query suggestions, source labels, and conversational or agentic AI, support or limit user learning. Using two illustrative search tasks, we demonstrate how different design choices promote skills such as critical evaluation, metacognitive reflection, and strategy transfer. The paper contributes a conceptual lens for evaluating the instructional value of information access systems and outlines design implications for technologies that foster more effective, reflective, and resilient information seekers.
Attention is a crucial construct in Interactive Information Retrieval (IIR) activities and has become an area of growing interest among CHIIR researchers. Despite this importance, explicit definitions are rare. Many studies refer to “attention” without specifying which process or aspect is under examination, and often operationalize it using practical indicators (e.g., eye fixations, dwell time, cursor traces, interaction logs) without a clear conceptual framework guiding measurement choices, raising concerns about the comparability of findings. This paper reviews how attention has been defined, operationalized, and measured in CHIIR publications from the conference’s inception in 2016 through 2025. We searched the ACM Digital Library for "attention" within CHIIR proceedings, finding 296 results. After filtering and using semantic similarity, we narrowed it to 45 relevant papers, from which 19 were selected for in-depth review based on their clear relevance to attention. Drawing on theoretical frameworks and empirical findings from psychology and cognitive science, we analyze how CHIIR researchers define and apply the concept of attention in various contexts. Our review uncovers a variety of interpretations, aspects, and measurement approaches to attention, which reflect broader challenges in bridging cognitive theory and information interaction research. We discuss key issues underlying these differing uses of the term “attention,” and outline possible directions for advancing conceptual clarity and methodological robustness of attention research within CHIIR. We hope this perspective paper raises researchers’ awareness of the need to clearly define attention, thereby promoting greater rigor and reproducibility in future IIR studies.
The classic paradigms of Berry Picking and Information Foraging Theory have framed users as gatherers, opportunistically searching across distributed sources to satisfy evolving information needs. However, the rise of Generative AI (GenAI) is driving a fundamental transformation in how people produce, structure, and reuse information—one that these paradigms no longer fully capture. This transformation is analogous to the Neolithic Revolution, when societies shifted from hunting and gathering to cultivation. Generative technologies empower users to “farm” information by planting seeds in the form of prompts, cultivating workflows over time, and harvesting richly structured, relevant yields within their own plots, rather than foraging across others people’s patches. In this perspectives paper, we introduce the notion of Information Farming as a conceptual framework and argue that it represents a natural evolution in how people engage with information. Drawing on historical analogy and empirical evidence, we examine the benefits and opportunities of information farming, its implications for design and evaluation, and the accompanying risks posed by this transition. We hypothesize that as GenAI technologies proliferate, cultivating information will increasingly supplant transient, patch-based foraging as a dominant mode of engagement, marking a broader shift in human-information interaction and its study.
The integration of artificial intelligence agents into information retrieval systems has prompted two dominant narratives: AI as replacement for human information seeking, and AI as collaborative partner. This perspective paper challenges both framings by proposing a third alternative – that human information seeking behavior and AI agent functionality represent fundamentally incompatible epistemological paradigms. Drawing on information behavior theory and critical AI scholarship, we argue that this incompatibility stems from three dimensions: epistemic orientation (learning vs. pattern reproduction), temporal structure (processual vs. instantaneous), and agentic purpose (uncertainty resolution vs. task execution). Rather than forcing integration through collaboration metaphors or accepting wholesale replacement, we advocate for system designs that acknowledge and preserve these fundamental differences. This reframing has profound implications for the design of information interaction systems, evaluation methodologies, and research directions in human information interaction and retrieval.
Question asking is a crucial human skill, influencing social cognition, creative problem solving, and information seeking. Yet, its cognitive mechanisms remain poorly understood due to challenges in studying it naturally. We developed The Martian Game, an open-ended online question-asking game that simulates creative problem solving in realistic contexts. Players design a solar energy system for a Martian city through two stages: (1) a problem finding phase where they ask an AI chatbot (“Mark”) questions to gather information, and (2) a solution-planning phase producing written and visual designs. Questions are coded for complexity, originality, and relevance; solutions are rated for originality and appropriateness. This game offers an ecologically valid, interdisciplinary tool to study question asking and supports the hypothesis that complex questions promote effective problem solving. A pilot study validates its potential for examining open-ended cognition beyond the lab.
During a think-aloud study, participants verbalize their thoughts as they complete a specified task. Think-aloud comments provide insights into what a participant is doing and experiencing in the moment. Interactive information retrieval studies have used think-aloud protocols to explore different research questions. However, think-aloud protocols can also influence participants. We report on a qualitative analysis of data collected during a think-aloud study. Participants were asked to learn about a complex topic by searching online and taking notes. After the task, participants were asked whether and how thinking aloud influenced their approach to the task. Open-ended responses revealed 21 (positive and negative) ways in which thinking aloud influenced participants. Additionally, during the study, we measured participants’ working memory (WM) capacity. We did not find that thinking aloud had different influences for low- vs. high-WM participants.
Online shoppers increasingly care about social and environmental aspects, yet labels and logos often fail to enter the decision-making process because they struggle to win attention. With conversational shopping assistants shifting from product lists to purchase conversations, how ethical considerations are explained becomes a communication problem as much as a retrieval problem. In an eye-tracking study (N=20, 27 presentation formats), long and medium explainers, i.e. sentences adding ethical information about the product (≈ 40 and 20 words) were awarded a higher willingness-to-pay than short ones (10 words). Gaze showed a conservation of attention: fixation counts stayed roughly stable as length increased, while fixation duration rose in longer explainers, yielding more diluted attention per word yet deeper processing. These results provide attention-linked evidence that longer messages about ethical aspects can be effectively integrated in explanation-forward product conversations. We argue that conversation-first systems are particularly suited to contextualise responsibility-related product information within a purchase decision, supporting informed choices without overloading users.
The shift from traditional search engines to LLM-based conversational AI systems has transformed how people interact with information. However, the role of AI systems in supporting interest-driven rather than goal-directed searches remains underexplored. Through a one-week naturalistic study with 19 participants and 95 audio diaries, we examine interest-driven search, in which search is initiated and sustained by user interest, and the direction and depth of how exploration shifts as that interest develops. We present the AI-Mediated Interest-Driven Search (AIM-IDS) model, which reveals how phases of interest development correspond to search behaviors in AI-mediated environments. Our findings show that while AI-mediated search lowers barriers to initiating exploration, the same features that support early-phase interest development can hinder sustained engagement in later phases. We conclude with design implications for AI systems that support interest development across different interest types.
We present the SRL Perceptions Questionnaire (SPQ), developed to measure perceptions of self-regulated learning (SRL) after information seeking and learning sessions. In a crowd-sourced study (N = 127), participants completed the SPQ after searching to learn about a complex topic. The SPQ asked participants to report their perceptions of particular SRL constructs (e.g., planning, monitoring, strategy use, adapting). A principal component analysis supported a five-factor structure with high reliability (α > =.87). Perceived SRL did not correlate with normalized learning gains, yet pre-task and post-task perceptions showed correlations with several SPQ dimensions. We offer both the SPQ as an instrument for measuring SRL (processes critical to supporting human learning) after information seeking and insights into how perceptions of SRL constructs align with objective learning outcomes, pre-task perceptions, and post-task perceptions while learning during search.
Understanding the motivations behind people’s web searches is a central challenge at the intersection of human-computer interaction and information retrieval. While some studies analyze behavior and others examine motivations, capturing both simultaneously remains difficult. We present a survey design that collects participants’ actual search histories and reflections on their underlying motivations, providing a unified view of both. Applying Functional Attitude Theory (FAT), we find that while most searches are knowledge- or utilitarian-driven, even routine searches can serve deeper psychological functions, such as affirming identity, protecting self-image, or maintaining social connections. This approach reveals how everyday searches serve both practical and deeper psychological purposes.
Video games produced in the Middle East remain largely absent from global catalogs and repositories, despite their significance as cultural artifacts shaped by censorship, propaganda, and resistance. This paper analyzes metadata, discourse, and narratives in 38 Iranian video games. Drawing on metadata schemas such as Dublin Core and the Video Game Metadata Schema (VGMS), along with culturally specific fields like ESRA ratings and policy context, I show how infrastructural omissions reproduce geopolitical inequality in information retrieval systems. Integrating metadata analysis, discourse study, and close reading, I propose design principles for equitable cataloging and preservation of non-Western games, framing game metadata as a site where information infrastructures, cultural power, and intellectual freedom intersect [5, 6, 25].
Conversational agents and related systems enable more flexible and elaborate information seeking and retrieval than earlier search strategies, such as conventional web search engines. However, these systems sometimes produce responses that are excessively long or complex, not always aligning with the needs of their human interaction partners and potentially leading to information overload (IO). IO is associated with negative affect in humans and is known to reduce information effectiveness, thereby limiting the usability of information-seeking systems. In this paper, I briefly discuss psychological effects of IO and present the results of a survey examining self-reported experiences of IO when using conversational agents for information seeking in a German sample (n = 50). The paper concludes with suggestions for how conversational agents could be adapted and modified to become more sensitive to humans’ affective experiences, with the goal of mitigating the effects of IO.
Large Language Models (LLMs) exhibit remarkable generative and reasoning capabilities, yet their outputs often reflect systematic cognitive biases analogous to those observed in human judgment. This paper investigates three interrelated forms of bias: confirmation bias, position bias, and framing bias. Through a series of controlled prompting experiments, we demonstrate that LLMs tend to reinforce the premises embedded in user queries (confirmation bias), favor initial or prominent elements within a prompt (position bias), and vary their conclusions depending on the positive or negative framing of the input (framing bias). We analyze these effects across different open LLMs: Qwen, Mistral, Gemma, Olmo, and LLama. These insights can inform better prompt engineering practices, strengthen evaluation benchmarks, and support the responsible use of LLMs in education, research, and decision-making.
We study how consumers express sustainability-related needs in online product search and how query autocompletion (QAC) systems mediate this intent. Using a 1% random sample (3.95 M queries) from the AmazonQAC dataset, we identify sustainable- and consumption-oriented vocabulary through a hybrid lexicon-based approach and analyse how QAC preserves, removes, or introduces such terms. Only about 1% of queries contain explicit sustainability intent, concentrated in categories like Food & Grocery and Health & Beauty. QAC preserves users’ sustainable intent in 60% of cases and adds sustainability-related tokens in a further 40%, indicating that it can reinforce rather than suppress ethical consumption cues. Regression analyses show that these additions occur more often in longer and more frequent queries. Our findings challenge the assumption that digital search infrastructures inherently bias users toward unsustainable consumption and highlight opportunities for QAC design to support responsible shopping behaviour.
Reddit is a major venue for mental-health information interaction and peer support, where privacy concerns increasingly surface in user discourse. Thus, we analyze privacy-related discussions across 14 mental-health and regulatory subreddits, comprising 10, 119 posts and 65, 385 comments collected with a custom web scraper. Using lexicon-based sentiment analysis, we quantify emotional alignment between communities via cosine similarity of sentiment distributions, observing high similarity for Bipolar and ADHD (0.877), Anxiety and Depression (0.849), and MentalHealthSupport and MentalIllness (0.989) subreddits. We also construct keyword dictionaries to tag privacy-related themes (e.g., HIPAA, GDPR) and perform temporal analysis from 2020 to 2025, finding a \(50\%\) increase in privacy discourse with intermittent regulatory spikes. A chi-square test of independence across subreddit domains indicates significant distributional differences (χ2 = 5596.67, p = 0.03, df = 50). The results characterize how privacy-oriented discussion co-varies with user sentiment in online mental-health communities.
Recent adoption of conversational information systems has expanded the scope of user queries to include complex tasks such as personal advice-seeking. However, we identify a specific type of sought advice—a request for a moral judgment (i.e. “who was wrong?”) in a social conflict—as an implicitly humanizing query which carries potentially harmful anthropomorphic projections. In this study, we examine the reinforcement of these assumptions in the responses of four major general-purpose LLMs through the use of linguistic, behavioral, and cognitive anthropomorphic cues. We also contribute a novel dataset of simulated user queries for moral judgments. We find current LLM system responses reinforce implicit humanization in queries, potentially exacerbating risks like overreliance or misplaced trust. We call for future work to expand the understanding of anthropomorphism to include implicit user-side humanization and to design solutions that address user needs while correcting misaligned expectations of model capabilities.
Information credibility is a central concept in information science research and is closely linked to how people evaluate and use information. Credibility perceptions also play an important role in the adoption and continued use of technologies such as generative AI (gen AI). As gen AI becomes increasingly integrated into everyday life, it is essential to understand how people perceive the credibility of information produced by these systems. As part of a larger survey study on the use (and non-use) and perceptions of gen AI in higher education conducted in mid-2024 and mid-2025, we evaluated a set of questionnaire items to assess the perceived credibility of information created by gen AI. In this paper, we present the questionnaire items, descriptive statistics, and correlation matrices from the two surveys, and the results of our exploratory factor analysis examining the underlying structure of the credibility measure. Across both datasets, we identified two distinct factors—output credibility, which refers to the perceived credibility of AI-produced output itself, and relative credibility, which refers to perceptions of AI-produced output relative to human-produced information. We share the instrument and findings to support future refinement and adaptation in studying credibility in the context of gen AI.
Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable — instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.
Large Language Models (LLMs) and generative search systems are increasingly used for information seeking by diverse populations with varying preferences for knowledge sourcing and presentation. While users can customize LLM behavior through custom instructions and behavioral prompts, no mechanism exists to evaluate whether these instructions are being followed effectively. We present Offscript, an agent-based automated auditing tool that efficiently identifies potential instruction-following failures in LLMs. In a pilot study analyzing custom instructions sourced from Reddit, Offscript detected potential deviations from instructed behavior in \(84.6\%\) of conversations, \(22.2\%\) of which were confirmed as material violations through human review. Our findings suggest that agentic auditing serves as a viable approach for evaluating compliance to behavioral instructions related to information seeking.
Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. In this paper, we analyze the ability of LLMs to reason about collaborative information in recommendation tasks, comparing their performance to traditional matrix factorization (MF) models. We propose a simple and effective method to improve LLMs’ reasoning capabilities using retrieval-augmented generation (RAG) over the user-item interaction matrix with four different prompting strategies. Our results show that the LLM outperforms the MF model whenever we provide relevant information in a clear and easy-to-follow format, and prompt the LLM to reason based on it. We observe that with this strategy, in almost all cases, the more information we provide, the better the LLM performs.
In this study, we conducted semi-structured interviews with 21 IIR researchers to investigate their data reuse practices. This study aims to expand upon current findings by exploring IIR researchers’ information obtaining behaviors regarding data reuse. We identified the information about shared data characteristics that IIR researchers needed when evaluating data reusability, as well as the sources they typically consulted to obtain this information. We consider this work to be an initial step towards revealing IIR researchers’ data reuse practices and find out what the community need to do to promote data reuse. We hope that this study, as well as future research, will inspire more individuals to contribute to the ongoing efforts aimed at designing the standards, infrastructures, and policies, as well as fostering a sustainable culture for data sharing and reuse in this field.
Conversational search systems increasingly provide source citations, yet how citation or source presentation formats influence user engagement remains unclear. We conducted a crowdsourcing user experiment with 394 participants comparing four source presentation designs that varied citation visibility and accessibility: collapsible lists, hover cards, footer lists, and aligned sidebars. High-visibility interfaces generated more hovering on sources, though clicking remained infrequent across all conditions. While interface design showed limited effects on user experience and perception measures, it significantly influenced knowledge, interest, and agreement changes. High-visibility interfaces initially reduced knowledge gain and interest, but these positive effects emerged with increasing source usage. The sidebar condition uniquely increased agreement change. Our findings demonstrate that source presentation alone may not enhance engagement and can even reduce it when insufficient sources are provided.
Taxonomies organize knowledge into hierarchical structures that support effective information seeking behaviors. However, developing taxonomies in fast-evolving domains like e-commerce remains a labor-intensive process. In this paper, we present an interactive system that assists users in expanding taxonomies through automated knowledge discovery from large text corpora. On the back end, our hybrid methods combine topic modeling and large language models (LLMs) to uncover emerging concepts, generate concise summaries, and suggest mappings to taxonomy nodes. On the front end, we develop an interactive web-based interface that supports iterative, human-in-the-loop taxonomy expansion. We demonstrate the system’s versatility through two scenarios using publicly available datasets: amplifying a preliminary taxonomy in the e-commerce domain and refining a mature taxonomy in the medical domain.
Students often face complex academic search tasks, which cannot be resolved with a single query and selecting among the top few search results, or a single answer provided by a generative AI interface. Exploratory search process have been proposed as a useful framework for enabling searchers to undertake complex search tasks. In recent years, a variety of interfaces have been developed to support such searching. While such approaches generally focus on enabling searchers to make sense of what has been found and supporting them in managing their search activities, an underlying assumption is that searchers have good reading skills, working memory, and attention. However, these assumptions do not hold for all users; there can be a wide range of neurodiversity among students with respect to these important cognitive functions. Some searchers, such as those with dyslexia, need more support in some or all of these aspects than academic search interfaces offer. Taking inspiration from prior research on multi-workspace search environments, visually representing histories of past search activities, tag-based approaches to saving search results, interactive mechanisms for filtering what has been saved, and accessibility, we have developed a novel approach we call SearchPath. Herein we explain the key features of SearchPath, highlighting how they have been chosen and fine-tuned to serve as interactive memory aids that can help students with dyslexia to maintain orientation during their search journey.
In high-stakes domains like legal or regulatory compliance, user satisfaction is secondary to trust. A single hallucination can have severe consequences, making trust the paramount user-experience (UX) metric. This presents a methodological challenge: how do we evaluate for this core need, and how do technical metrics relate to a user’s perception of trust?
This Demonstration paper presents an evaluation of "Eloh," a novel GraphRAG conversational agent for ESG compliance, evaluated using a dual-framework: (1) a quantitative LLM-as-a-judge benchmark (Ragas, LangSmith) measuring key metrics like Groundedness, Faithfulness, and Context Recall, and (2) a qualitative study with 15 industry experts measuring perceived trust.
We show a direct link between technical performance and user perception. The system’s perfect 1.0 Groundedness score (zero hallucination) was mirrored in the user study, where 93.4% of experts reported the agent "Always" or "Most of the time" correctly referenced specific regulations. We argue that for high-stakes domains, "Groundedness" should be reframed as a primary, user-centered metric, as it is the essential and measurable foundation for user trust.
Multimodal approaches have shown great promise for searching and navigating digital collections held by libraries, archives, and museums. In this paper, we introduce mapRAS: a retrieval-augmented search system for historic maps. In addition to introducing our framework, we detail our publicly-hosted demo for searching 101,233 map images held by the Library of Congress. With our system, users can multimodally query the map collection via ColPali, summarize search results using Llama 3.2, and upload their own collections to perform inter-collection search. We articulate potential use cases for archivists, curators, and end-users, as well as future work with our system in user-centered research, machine learning, and the digital humanities. Our demo can be viewed at: http://www.mapras.com.
We demonstrate Cleo, a transparent and controllable conversational product advisor that addresses the challenges of opacity, unpredictability of LLMs, and the complexity of comparisons in conversational commerce. With our chatbot system, we make four contributions: First, we introduce transparency by prompting the LLM to reflect on interpreted user needs, while an auditable ranking mechanism reveals loss values per attribute, explaining ranking decisions. Second, we propose controllability through a hybrid architecture separating deterministic ranking from language generation. A ranker applies categorical filters and numeric loss functions over 3,638 product specifications. Meanwhile, a constrained LLM generates grounded descriptions constrained to catalog evidence, thus mitigating the risk of hallucinated or persuasive content. Third, we provide decision support in the form of natural-language comparisons and a highlights feature. These aim to reduce mental workload by contextualizing specifications relative to user needs. Fourth, we contribute an extensible experimental system for IR and HCI researchers, as well as practitioners of conversational search and recommendation. Unlike traditional faceted search or opaque LLM-only recommenders, our approach allows for fluid conversation while maintaining algorithmic transparency. In a live demonstration, attendees will experience information needs elicitation and reflection, conversational refinement with real-time re-ranking, inspection of per-attribute loss explanations, and AI-generated multi-item comparisons. The system aims to advance the design of transparent and controllable conversational systems that provide support for decision-making during online product search.
The Result Assessment Tool (RAT) is an open-source software toolkit for conducting research with results from commercial search engines and other web-based information retrieval systems. Conducting such research is challenging due to the “black-box” nature of these systems and limited data access. RAT addresses this by providing an integrated software environment that unifies modules for study design, automated data collection, manual assessment in a dedicated interface, and automated analysis via an extensible classifier framework. The software is designed to assist with various research tasks, including comparative assessments of result quality, investigations of source variety, and content analysis. It emphasizes transparency in methodologies, reproducibility of outcomes, and responsible data collection.
We introduce PulseSearch, a GenAI-based approach for generating query suggestions in a music search platform. Designed to anticipate users’ intent during a search session, PulseSearch combines long- and short-term user signals by conditioning generation on both recent user queries and pre-generated listener profiles. To further enhance contextual relevance, it generates suggestions tailored to different times of the day. We conduct both online and offline evaluations, showing that PulseSearch consistently improves suggestion quality over dense retrieval baselines across dimensions such as personalization, diversity, and freshness. A demo of our results is available at the provided URL.1
Social engineering through email and phishing is an ever-present threat to users’ daily digital security. To combat the lack of accessible, adaptive training tools, we developed SEA-u-lator, a privacy-preserving Chrome extension that simulates phishing attacks directly within Gmail. The extension includes features such as real-time feedback, adjustable phishing frequency, and an optional post-detection survey. The system inserts realistic phishing emails and provides real-time, on-device feedback after user interaction without modifying actual mailbox data. All data are locally accessible in a JSON file for reflection or research use. This demo demonstrates how integrated simulations can support practical training while maintaining privacy.
A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension 1 that provides a viable in-browser alternative. We introduce a hybrid architecture that functions entirely on the client side, combining two components: (1) an adaptive probabilistic model that learns a user’s behavioral policy from direct feedback, and (2) a Small Language Model (SLM), running in the browser, which is grounded by the probabilistic model to generate context-aware suggestions. To evaluate this approach, we conducted a three-week longitudinal user study with 18 participants. Our results show that this privacy-preserving approach is highly effective at adapting to individual user behavior, leading to measurably improved search efficiency. This work demonstrates that sophisticated AI assistance is achievable without compromising user privacy or data control.
While large language models (LLMs) are increasingly used to summarize long documents, this trend poses significant challenges in the legal domain, where the factual accuracy of deposition summaries is crucial. Nugget‑based methods have been shown to be extremely helpful for the automated evaluation of summarization approaches. In this work, we translate these methods to the user side and explore how nuggets could directly assist end users. Although prior systems have demonstrated the promise of nugget‑based evaluation, its potential to support end users remains underexplored. Focusing on the legal domain, we present a prototype that leverages a factual nugget‑based approach to support legal professionals in two concrete scenarios: (1) determining which of two summaries is better, and (2) manually improving an automatically generated summary.1
How people seek, request, and exchange information in social interactions is shaped by personality and situational context, connecting the fields of interactive information science and attribution theory in social psychology. In everyday life, people seek information to achieve goals, collaborate, and manage social conflicts. Understanding how individual traits and contextual factors influence information-seeking behavior remains a challenge. Recent advances with large language models (LLMs) enable the simulation of socially grounded information-seeking behaviors in realistic and controllable ways. We introduce CHARISMA, a simulation framework that uses LLMs to examine how personality traits and situational factors influence information seeking as a form of social behavior. CHARISMA leverages movie characters and public figures as personality anchors, drawing on LLMs’ knowledge to simulate human-like interaction. CHARISMA’s utility is demonstrated in two studies: (1) agreeable pairs resolve conflicts more successfully , and (2) low-agreeable agents compete for information, while high-agreeable agents cooperate through prosocial exchange.
Many users struggle with effective online search and critical evaluation, especially in high-stakes domains like health, while often overestimating their digital literacy. Thus, in this demo, we present an interactive search companion that seamlessly integrates expert search strategies into existing search engine result pages. Providing context-aware tips on clarifying information needs, improving query formulation, encouraging result exploration, and mitigating biases, our companion aims to foster reflective search behaviour while minimising cognitive burden. A user study demonstrates the companion’s successful encouragement of more active and exploratory search, leading users to submit 75% more queries and view roughly twice as many results, as well as performance gains in difficult tasks. This demo illustrates how lightweight, contextual guidance can enhance search literacy and empower users through micro-learning opportunities. While the vision involves real-time LLM adaptivity, this study utilises a controlled implementation to test the underlying intervention strategies.
Digital cultural heritage (CH) platforms largely replicate the search paradigms of web engines, privileging precise retrieval over exploratory engagement. This presents a challenge for casual visitors, who often approach art collections with curiosity rather than specific queries. We present Sensing a Vibe, a demo system that replaces the traditional search box with browsing through Internet Aesthetics—user-generated, affective categories (e.g., Cottagecore, Dark Academia) reflecting contemporary cultural literacies. The system operationalizes aesthetic-based exploration, as is demonstrated through interactive user scenarios and preliminary usability findings, illustrating how alternative organizational logics can support serendipitous discovery and engagement in CH collections.
AskAda is a conversational agent designed to lower barriers to accessing AI-driven tools and support university students in accessing authoritative scholarly resources for their academic activities. It enables topic identification and definition retrieval of appropriate terms to help students locate trustworthy and current scholarly information that is inaccessible as a commodity through public search engines. Operating within the WhatsApp platform, AskAda reduces the learning burden of adopting a new system and GUI and allows students to use the AI tool directly from a smartphone. Unlike many AI tools, AskAda emphasizes accountability and transparency that instructors can rely on and trust by validating the audit trails generated by the students’ search journey while completing their assignments. Two authors conducted an autobiographical study and evaluated the performance of AskAda based on two dimensions: efficiency and usability. The system facilitates topic identification and retrieval of scholarly resources from library databases while maintaining an auditable session history and fostering transparency in AI interaction in the educational context.
This demo paper presents ImmiGo, an adaptive multilingual chatbot designed to effectively address user needs by providing accurate and timely responses to visa and immigration inquiries in users’ preferred languages. A system supporting such important inquiries relies on being up-to-date and accurate. To this end, ImmiGo is built on a sophisticated agentic Retrieval-Augmented Generation (RAG) pipeline, seamlessly integrating the Groq-accelerated OpenAI GPT-OSS-120B model for faster inference. The system also incorporates an advanced embedding model and a dynamic storage mechanism, leveraging a vector database to ensure efficient real-time retrieval, contextual accuracy, and robust handling of out-of-knowledge-base queries. Additionally, an ensemble document retrieval method and composite grading mechanism refine response quality by filtering less relevant context, ensuring that the language model generates precise, reliable, and factually accurate responses. The system underwent rigorous evaluation and testing for its multi-turn, multilingual, and conversational capabilities, demonstrating its ability to perform consistently across multiple scenarios.
As the global population ages, artificial intelligence (AI)-powered information agents have emerged as potential tools to support older adults’ health information seeking. However, ensuring that agents align with older adults’ autonomy preferences remains a critical challenge. Drawing on interdisciplinary conceptualizations of autonomy, this proposal aims to 1) understand older adults’ conception of autonomy, especially in the health information-seeking context; 2) identify opportunities for information agent design and evaluation to better support older adults.
Large language models (LLMs) offered opportunities for experimenting with the ideas involve building digital twin of humans as they are able to generate human like responses. This doctoral research leverages personas to develop digital persona twins as LLM-enabled conversational agents that simulate human actors in probation interview encounters for criminal justice training. By framing officers as information seekers and personas as dynamic information sources, my work enables scalable, ethical training tools for studying information interactions. A key innovation extends digital twin theory from continuous tracking to temporal snapshot instantiation, capturing a human actor’s information state at a specific moment to enable privacy-preserving simulation. This approach builds on personas to create diverse, individualized digital twins, opening new applications in social computing and information interaction. This dissertation aims to contribute empirical insights into professional information seeking behavior and validates new approaches for designing AI-mediated information sources that support skill development in high-stakes professional contexts.
Artificial intelligence (AI) summaries appear in many Web searches, but there have been errors and questionable quality in the AI responses and source attributions. This dissertation project consists of two studies that explore the impact of generative AI summary features on user perceptions and decision-making. The first experimental study examines the effects of two designs of generative AI search responses (narrative point of view, source authority) on user perceptions and information adoption about consumer health topics. AI summaries were collected, modified, and used in a randomised, within-subjects experiment with crowdsourced workers (N=234). The Information Adoption Model (IAM) informed the conceptual model of the experiment, and the Comprehensive Model of Information Seeking (CMIS) guided the health task design. User perceptions are assessed through self-reported measures of persuasive communication (argument quality, source credibility), information usefulness, domain knowledge, and involvement. The second qualitative study aims to explore how AI summaries and source attributions are utilised alongside traditional Web search results for tasks of personal interest. This is an in-person study involving search sessions, retrospective think-alouds, and semi-structured interviews with graduate students (N=16).
The natural cessation of women’s reproductive life is known as menopause. Hormonal replacement therapy (HRT) is the current standard method to manage menopause symptoms. However, HRT is associated with significant risks including breast cancer and cardiovascular disease. Consequently, many women turn to complementary and alternative medicines (CAM) as a safer alternative. Although previous research studies have explored women’s information-seeking behaviors related to CAM in the general population, there is limited attention given to migrant women’s CAM information seeking behavior. Migrant women may demonstrate atypical information seeking patterns. In this proposed research study, we specifically explore what sources are used and why these sources are uesd to seek CAM information. Questionnaire surveys and interviews will be used to collect data. Quantitative and qualitative methods will be used to analyze data. It is hoped results of this research study will enable relevant authorities such as healthcare providers to proactively fulfill CAM information needs amongst migrant women.
The dialogic capabilities of LLM-based AI create new possibilities for supporting human thinking during information seeking. This three-step dissertation research examines how reversing traditional information-seeking dynamics—where AI initiates questions and humans respond—can foster epistemic curiosity and promote active interaction with information. Specifically, it conceptualizes question-asking as a behavioral expression of epistemic curiosity in human–AI information interaction. Through naturalistic observation and empirical evaluation of the system design, the research develops and evaluates a Question-Asking AI system that generates questions aimed at stimulating epistemic curiosity while strengthening users’ question-asking abilities. By positioning epistemic curiosity and question-asking as central design goals and measurable outcomes, this research advances understanding of how AI systems can encourage curiosity-driven learning and enable deeper thinking with information.
Generative AI (GenAI) systems are reshaping how people search for and use information. Unlike traditional search engines that return ranked lists, these systems generate synthesized answers and conversational explanations. While this shift enhances efficiency, it also impacts users’ cognition, potentially reducing the reflective, evaluative, and exploratory processes that support learning, understanding, and creativity. My research aims to conceptualize, design, and evaluate GenAI search systems that augment rather than replace human cognitive capabilities, with a focus on higher-order processes such as sensemaking, critical thinking, and creativity. The paper presents related work, synthesizes prior collaborative studies, introduces the proposed research framework, and outlines ongoing and planned empirical work toward this goal. The long-term vision is to develop a generalizable design and evaluation framework for GenAI search systems that systematically support cognitive augmentation.
Despite many finding great utility in LLM chatbots such as ChatGPT for their information needs, current systems are not well suited for all tasks. Particularly in knowledge work, hallucinations undermine trustworthiness and effortless answers may undermine long-term learning. We argue that browsing is a promising paradigm to complement traditional technologies for tasks where trust and engagement are highly important. However, unlike search or question answering, which benefit from the general technologies of search engines and RAG, there isn’t yet a general technology to support making any collection browsable. Previous and ongoing work has focused on articulating Hypergraph of Text (HoT) as a framework to support semantic organization and browsing for arbitrary text collections. We have also proposed the Text Information Navigation Kit (TINK) which acts as a toolkit for turning the HoT data structure into a browsing system. Given this, planned and future research focuses on how to evaluate such systems, and how to overcome information overload issues we have seen in preliminary works. Towards this end, we propose a user focused study on both TINK and a conversationally enhanced TINK system. We believe that building such a system and evaluating it with user studies is a key step towards better supporting a wide variety of information needs, especially when trust and learning are of high importance.
In an era of exponential scholarly output, researchers face mounting challenges navigating vast academic literature. Artificial intelligence (AI) is reshaping academic search systems through features such as semantic search, generative summaries, and citation-context tools. While these innovations enhance research efficiency, empirical understanding on how researchers engage with and perceive AI-integrated academic search systems (AISS) remains limited. This study investigates how graduate students and early-career researchers interact with AISS, examining how disciplinary background, domain and search expertise, and attitudes toward AI influence trust and information-seeking strategies. Using a mixed methods design that combines survey with contextual inquiry, the study provides both quantitative trends and in-depth behavioral insights. Preliminary findings reveal substantial variation in transparency across AISS, highlighting the need for responsible and human-centered system design. This work advances understanding of AI’s impact on scholarly information behavior and inform the design of trustworthy academic search tools.
Agentic AI systems are changing how people seek and use information. Yet, the research community has not fully adapted – methods for studying, building, and assessing such systems often remain static, missing the interactive, temporal, and evidence-driven dynamics that characterize real information seeking. This hands-on tutorial equips the CHIIR community with a concise, practice-oriented methodology for leveraging and evaluating information-seeking agents. We define a shared vocabulary for agents and connect it to user-centered IR constructs; we show how to design agentic workflows that elicit effective evidence seeking under temporal change (planning, tool choice, grounding); and we introduce log-based rubrics that score correctness, evidence support, adequacy, and cost. Short case studies and optional demonstrations using open frameworks (for example, Perplexica, local LLMs via Ollama, and metasearch engines such as SearXNG) illustrate how these ideas map to real systems. Attendees receive reusable materials, including slides and selected supplemental resources (e.g., example traces and optional demo notebooks), suitable for research and teaching. The tutorial assumes familiarity with core IR concepts but does not require prior experience with agent frameworks.
This half-day tutorial provides participants with a framework and hands-on experience for conducting controlled experiments on model search behaviour using an open-source toolkit. Participants will learn how to design, run, and analyse experiments to investigate behavioural differences across large language models. By integrating techniques from human user studies with LLM experimentation, the tutorial strengthens CHIIR’s methodological foundations and broadens its scope to include behavioural analysis of generative and agentic systems.
Interactive information retrieval (IIR) systems, including search engines and conversational systems, are increasingly central to user experiences. However, rigorously evaluating their performance, particularly as interactions become highly personalized, remains a scientific challenge. While user simulation offers a powerful methodology for reproducible evaluation, its adoption is hindered by a steep learning curve and a fragmented landscape of complex tools. This half-day tutorial provides a practical, hands-on introduction to user simulation at varying levels of complexity, from foundational statistical models to advanced, LLM-driven frameworks. Through a series of guided problems, participants will acquire practical skills in using popular libraries, learning user models from data, and applying large language models (LLMs) to simulate user behavior. The tutorial concludes with evaluating the simulators themselves, providing participants with guidance on appropriate use cases and fidelity assessment.
The rapid development of artificial intelligence (AI) is reshaping how people seek, access, and use information, with significant implications for researchers, educators, and students. Increasingly, academic search engines, bibliographic databases, and digital libraries are integrating AI features, including generated and synthesized content, conversational interfaces, and intelligent recommendation. These tools promise to support discovery, synthesis, and learning, yet they also raise critical questions about search integrity, fairness, accountability, transparency, and ethics (FATE). In academic contexts, where reliability and credibility are paramount, the design and use of AI-mediated search systems require novel ideas and approaches. Building on previous work in interactive information retrieval (IIR), search as learning, and search user interface design, this workshop invites the CHIIR community to examine opportunities and challenges in developing and using AI-powered academic search systems for research and higher education.
As AI agents become more capable of anticipating intent and taking initiative, the ways humans seek, interpret, and act on information are being quietly reshaped. Yet at the heart of every interaction lies a human—curious, uncertain, and contextually situated, whose goals and boundaries cannot be fully captured by data alone. This workshop centers on the human experience of proactivity and personalization in interactive information access, asking how agents can assist without overriding agency, adapt without imposing assumptions, and anticipate without eroding trust. Building on CHIIR’s tradition of bridging information retrieval and human–computer interaction, the workshop will explore when and how proactivity supports human information behavior – enhancing exploration, sense-making, and learning – and when it risks diminishing transparency or control. Through co-design sessions and participatory discussions, we will interrogate concrete design and evaluation dimensions of proactive systems, including timing of initiative, transparency of intent, user control, and their effects on exploration, sense-making, and trust. Ultimately, this workshop seeks to reimagine proactivity not as automation of the search process, but as a collaborative partnership where agents act as companions in the human pursuit of understanding. All resources related to this workshop are available at https://proactive-chiir.github.io/.