Conversational Agents in Enterprise Software Workflows: Usability, Trust Calibration, and Productivity Impact of Chatbot Integration in Knowledge Work Environments
Enterprise chatbot deployments have proliferated across knowledge work environments, yet rigorous evaluation of their usability, trust calibration accuracy, and measurable productivity impact remains sparse relative to the volume of deployment activity. This paper presents a mixed-methods study of enterprise conversational agent integration across five organizations in legal, financial, and healthcare knowledge work domains, combining a 12-week longitudinal experiment (n=214 participants) with qualitative interviews and log analysis of 340,000 conversational interactions. We evaluate chatbot usability using the Conversational Agent Usability Scale (CAUS), which we develop and validate as part of this work across 11 usability dimensions including intent recognition accuracy, response coherence, context retention, and error recovery behavior. The longitudinal experiment finds that well-designed chatbot integration reduces time spent on information retrieval tasks by 31% and on routine document generation by 44%, but increases task completion time by 18% for complex multi-step reasoning tasks where chatbot error rates are highest. A central finding is trust miscalibration: 67% of users exhibit overtrust in chatbot outputs for factual queries within their domain of expertise, leading to unchecked propagation of erroneous information. We propose a Trust Calibration Interface Design framework comprising four evidence-presentation patterns that reduce overtrust incidence by 48% in a controlled follow-up study.