Application Card: Microsoft Copilot Studio

What is an Application or Platform card?

Microsoft’s Application and Platform Cards are intended to help you understand how our AI technology works, the choices application owners can make that influence application performance and behavior, and the importance of considering the whole application, including the technology, the people, and the environment. Application Cards are created for AI applications and Platform Cards are created for AI platform services. These resources can support the development or deployment of your own applications and can be shared with users or stakeholders impacted by them.

As part of its commitment to responsible AI, Microsoft adheres to six core principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. These principles are embedded in the Responsible AI Standard, which guides teams in designing, building, and testing AI applications. Application and Platform Cards play a key role in operationalizing these principles by offering transparency around capabilities, intended uses, and limitations. For further insight, readers are encouraged to explore Microsoft’s Responsible AI Transparency Report. Customers are required to use services in compliance with the Microsoft Enterprise AI Services Code of Conduct for organizations, which outlines how to engage with AI responsibly.

Overview

Copilot Studio is a platform that empowers organizations to build, customize, and deploy AI-driven agents to automate tasks, answer questions, and streamline business processes. Its purpose is to make advanced artificial intelligence accessible to a wide range of users, from business analysts and IT professionals to developers. By providing intuitive tools for creating conversational agents and integrating them into websites, Microsoft Teams, and other channels, it helps solve common challenges such as repetitive manual work, customer support automation, and data-driven decision-making. It offers benefits like increased efficiency, improved user experiences, and scalable automation.

The platform is designed for enterprise customers, solution developers, and IT teams who want to leverage generative AI without deep coding expertise. Intended users include organizations seeking to automate workflows, enhance customer engagement, or provide self-service support. Copilot Studio supports both low-code and pro-code approaches, enabling users to create agents using natural language instructions or by extending functionality with custom tools and connectors. This flexibility makes it suitable for industries ranging from finance and healthcare to education and retail.

Key terms

The following table defines key terms related to Microsoft Copilot Studio.

Term	Description
Agent	An AI‑powered conversational or autonomous component created in Copilot Studio that can answer questions, take actions, or automate business processes using configured instructions, tools, and data sources.
Agent orchestration	The process by which Copilot Studio plans, routes, and coordinates agent actions, tools, and sub‑agents to fulfill a user request or task.
User prompt	Input that a user provides to an agent, typically in natural language, that initiates a conversation or task.
Grounding	The process of providing relevant enterprise or external data to a model so agent responses are contextually accurate and aligned with organizational information.
Knowledge source	A data source connected to an agent—such as Microsoft Graph data, Dataverse, uploaded documents, or external systems—that you can use to ground responses.
Tool	A callable capability that an agent uses to take actions beyond text generation, such as invoking APIs, running Power Automate flows, or querying systems.
Connector	A prebuilt or custom integration that enables Copilot Studio agents to access external services or enterprise systems.
Generative AI	AI techniques that enable agents to generate text, summaries, plans, or decisions based on large language models and provided context.
Large language model (LLM)	A machine‑learning model trained on large datasets that enables natural language understanding and generation within Copilot Studio agents.
1P model	A base model that Microsoft trains (pre-training and initial fine tuning completed by any Microsoft team).
3P model	A base model that Microsoft doesn't train (pre-training and initial fine tuning completed by a party other than Microsoft).
Copilot Credits	The usage‑based currency that measures consumption of Copilot Studio capabilities, including agent interactions and tool execution.
Channel	A surface where you deploy and use an agent, such as Microsoft Teams, a website, or Microsoft 365 Copilot.
Evaluation	The process of assessing agent quality, performance, safety, and reliability by using predefined or custom testing methods.
Responsible AI	Microsoft’s policy, research, and engineering practices that are grounded in our AI principles and operationalized through our Responsible AI standard.

Key features or capabilities

The key features and capabilities below describe what Copilot Studio is designed to do and how it performs across supported tasks.

Feature	Description
Agent creation and customization	Copilot Studio enables users to build AI-powered agents—often called “copilots”—that automate tasks or answer questions. These agents can be tailored to reflect your organization’s tone, workflows, and business rules.
Low-code and pro-code flexibility	Users can start with intuitive, low-code tools and later extend functionality using APIs, custom connectors, and scripts. This flexibility makes the platform accessible to business users while offering depth for developers. For more information, see FAQ for the agent creation from natural language experience and FAQ for Copilot.
Integration with Microsoft ecosystem	Copilot Studio works seamlessly with Microsoft services like Teams, Power Platform, and Microsoft Graph. Agents can securely access organizational data and deliver context-aware responses.
Connect to external sources	Beyond Microsoft services, Copilot Studio supports connectors and APIs for third-party systems. This capability ensures agents can pull data from diverse sources, making them useful across multiple business processes.
Triggers and actions (autonomous agents)	Agents can respond to events—such as a new record or a customer inquiry—and execute tasks automatically. This event-driven design reduces manual effort and enables end-to-end automation. For more information, see FAQ for using generative orchestration.
Planning and adaptability	Agents can plan multistep workflows and adjust actions based on changing inputs or conditions. This adaptability helps them handle dynamic business scenarios rather than rigid, predefined tasks. For more information, see FAQ for using generative orchestration, FAQ for prompts, and FAQ for generative answers.
Memory for contextual responses	Copilot Studio agents can retain context from previous interactions, allowing them to provide coherent, personalized answers and maintain continuity across conversations.
Extensibility	Developers can enhance agents with custom plugins, connectors, and advanced logic. This extensibility ensures the platform can scale to meet specialized or complex requirements.
Multichannel deployment	Deploy agents across websites, Teams, and other apps, so users can interact with them wherever they work.
Responsible AI and safety features	Built-in privacy, security, and compliance checks align with Microsoft's Responsible AI principles, helping organizations deploy agents safely and ethically. To learn more about using AI-powered features responsibly, see Responsible AI FAQs and Security FAQs for Copilot Studio.

Intended uses

Copilot Studio can be used in multiple scenarios across a variety of industries. Some examples of use cases include:

Customer support automation: A retail company can create an agent that answers common customer questions, tracks orders, and processes returns through a website or chat. This approach reduces wait times and frees human agents to handle complex issues, improving customer satisfaction and operational efficiency.
Employee self-service in government: A government agency can deploy an internal agent to help employees find HR policies, submit leave requests, and check compliance guidelines. By automating routine inquiries, the agency saves time and ensures employees have quick access to accurate information.
Financial services workflow automation: A bank can use Copilot Studio to build an agent that monitors and provides updates on incoming queries and applications. This approach speeds up processing, reduces manual errors, and enhances transparency for customers.
Education and student assistance: Universities can create agents that guide students through course registration, answer questions about deadlines, and provide links to learning resources. This approach helps students navigate administrative tasks easily and reduces the workload on academic staff.
Media content management: A media company can deploy an agent to assist editors by pulling metadata from multiple sources, organizing assets, and suggesting relevant tags for articles. This approach accelerates publishing workflows and ensures consistency across platforms.
Consumer goods inventory alerts: A manufacturer can set up an agent that monitors inventory levels and automatically triggers restocking actions when thresholds are reached. This proactive approach prevents stockouts and keeps supply chains running smoothly.

Agent scope: These agents are generally purpose-built for defined tasks within their domain (for example, HR inquiries, loan processing, or student support). While they can handle multiple related actions, they operate within boundaries set by the organization to maintain control and compliance.

Models and training data

Copilot Studio leverages a variety of AI models to power the experience that users see. Some examples include OpenAI’s GPT‑5 series, provided by Azure OpenAI Service, Anthropic’s Claude Sonnet 4, and xAI's Grok 4.1 Fast.

Microsoft does not train foundation models using customer data. For models developed and trained by third parties, you can find more information in Select a primary model for agent.

Performance

Copilot Studio is designed to work reliably in enterprise environments where agents handle both conversational interactions and automated actions. Agents can answer questions, provide guidance, and also execute workflows triggered by events - such as updating records, sending notifications, or orchestrating multistep business processes. This combination of real-time assistance and background automation makes Copilot Studio suitable for diverse business scenarios.

Supported modalities

Input: Text prompts from users and event-based triggers from connected systems (for example, a new entry in a database or a form submission).
Output: Text responses for conversational scenarios and automated actions such as creating tasks, updating data, or initiating workflows.

Multilingual capabilities

Optimized for English, with support for additional languages based on configuration. Conversational accuracy is highest in English, while automation features remain language-independent because they rely on structured triggers and actions.

Conditions for reliable operation

Agents perform best when prompts are clear and connectors to data sources and systems are properly configured.
Stable connectivity and adherence to organizational data policies are essential for safe and consistent performance.
Performance might degrade if external systems are unavailable or if unsupported languages are used for conversational tasks.

Safety and reliability

Built-in responsible AI checks ensure privacy, security, and compliance for both conversational and automated workflows. These safeguards help organizations deploy agents confidently across their scenarios.

Limitations

Understanding Copilot Studio's limitations is crucial to determine it is used within safe and effective boundaries. While we encourage customers to leverage Copilot Studio in their innovative solutions or applications, it's important to note that Copilot Studio was not designed for every possible scenario. We encourage users to refer to either the Microsoft Enterprise AI Services Code of Conduct (for organizations) or the Code Conduct section in the Microsoft Services Agreement (for individuals) as well as the following considerations when choosing a use case:

Integration and compatibility: While Copilot Studio integrates with Microsoft services and supports connectors for external systems, there might be limitations with third-party apps or highly customized environments. Advanced automation scenarios might require additional development or might not be fully supported.
Customization and flexibility: Copilot Studio offers low-code and pro-code options, but customization has boundaries. Some workflows or responses might be rigid or not fully aligned with unique organizational requirements, especially in regulated domains.
Dependence on connectivity and data access: Agents rely on stable internet connectivity and access to configured data sources. Disruptions in connectivity or changes to external APIs and connectors can impact performance or cause workflows to fail.
User training and adoption: Effective use requires users to understand both capabilities and limitations. There might be a learning curve, and users should review automated actions for accuracy.
Resource intensity: Running advanced AI models and automations can require significant computational resources. Performance might be affected in resource-constrained environments or during peak usage.
Bias, stereotyping, and ungrounded content: Despite responsible AI controls, AI-generated content might still reflect biases, stereotypes, or ungrounded information. Users should always review responses and actions, especially in sensitive or high-stakes scenarios.
Multilingual support: Copilot Studio features support multiple languages. Copilot Studio was developed and tested primarily in English. Using unsupported languages might impact performance, and users should exercise caution when operating outside the intended language scope.

Evaluations

Performance and safety evaluations ensure AI systems operate reliably and securely by assessing factors like groundedness, relevance, and coherence while identifying the risks of generating harmful content. The following evaluations were conducted with safety components already in place, which are described in the "Safety Components & Mitigations" section below.

Performance and quality evaluations

Performance evaluations for AI applications are essential to improving their reliability in real-world applications. Metrics like groundedness, relevance, and coherence help assess the accuracy and consistency of AI-generated outputs, so that they are factually supported in grounded content scenarios, contextually appropriate, and logically structured. For Copilot Studio, we conducted performance evaluations for the following metrics, which are available through Microsoft Foundry:

Groundedness
Coherence
Fluency
Similarity

Risk and safety evaluations

Evaluating potential risks associated with AI-generated content is essential for safeguarding against content risks with varying degrees of severity. This includes evaluating an AI application's predisposition towards generating harmful content or testing vulnerabilities to jailbreak attacks. For Copilot Studio, we conducted risk and safety evaluations for the following metrics available through Microsoft Foundry:

Hate and unfairness
Sexual
Violence
Self-harm
Protected material
Indirect jailbreak
Direct jailbreak
Code vulnerability
Ungrounded attributes

Risk and safety evaluation methods

Copilot Studio evaluates only text‑based interactions, and all internal testing focuses on how the system handles text inputs and generates text outputs, using automated pipelines and LLM‑based judges to measure quality, safety, and grounding. These evaluations use curated or synthetic text datasets that are replayed in offline pipelines and validated in online shadow and A/B tests, with every response scored for relevance, groundedness, completeness, and appropriate abstention based on the judge criteria described in Copilot Studio's evaluation documentation.

The system also applies automated safety evaluators that check for hate and unfairness, sexual content, violence, self‑harm, protected material, jailbreak attempts, code-related harms, and ungrounded attributes. These categories are explicitly listed in Copilot Studio’s internal safety assessment framework. An ideal evaluation result is one in which responses are accurate, fully grounded in provided knowledge, complete, safe across all nine harm categories, and show no regressions in task success or quality when compared with previous model versions. The evaluation dashboards and analytics used in product‑team workflows measure these results. A suboptimal result is one where responses are irrelevant, incomplete, or hallucinated. It contains unsafe content, misuses tools, fails groundedness checks, or regresses on quality or latency compared to baselines. All these signals are identified as failure modes in Copilot Studio’s evaluation reports and quality scoring features.

Evaluation data for quality and safety

Our evaluation data is custom-built to assess AI system performance across key areas of safety and quality, simulating real-world scenarios and risks. We begin by identifying relevant evaluation aspects of concern based on multidisciplinary research and expert input. We translate these concerns into targeted evaluation objectives that guide the formulation of evaluation metrics.

For safety, we create adversarial prompts to elicit undesirable or edge-case responses. AI-assisted annotators, trained to assess alignment with Microsoft’s safety standards, score these responses. For quality, we craft rubric-based prompts relevant to scenarios such as evaluating retrieval-augmented generation (RAG) applications and agents. We curate datasets from diverse sources, including synthetic and public datasets, to simulate real-world user scenarios.

Using the curated datasets, both evaluations undergo iterative refinement and human alignment to improve metric efficacy and reliability. This methodology forms the foundation of repeatable, rigorous assessments that reflect how customers use evaluations to build better and safer AI.

Safety components and mitigations

As we identify potential risks and misuse through evaluations and testing, Microsoft implements mitigations to reduce harm and improve reliability. Copilot Studio is built with safety, fairness, and security at its core, and we continually monitor and update these safeguards as technology and user needs evolve. Below are key components and measures designed to help organizations deploy agents responsibly:

Responsible AI Checks: Every interaction undergoes privacy, security, and compliance checks aligned with Microsoft’s Responsible AI principles. These include AI-based classifiers to detect harmful content and metaprompting to guide model behavior toward safe, ethical outputs.

Grounding in Trusted Data: Responses and actions are anchored in organizational data sources that the user has permission to access. This reduces the risk of ungrounded or fabricated content and ensures outputs are relevant and verifiable.
Content Safety Filters: Built-in classifiers flag potentially harmful content such as hate speech, violence, sexual content, or copyrighted material. When flagged, the system can block the response or redirect the user to safer alternatives.
Prompt Enrichment and Guardrails: Ambiguous prompts are refined to reduce misinterpretation. Guardrails prevent agents from executing actions outside their defined scope, minimizing unintended consequences.
Human Oversight Guidance: Users are advised to review AI-generated outputs and automated actions before applying them in high-stakes scenarios. This practice mitigates risks of overreliance and ensures accountability.
Cybersecurity Measures: Data is encrypted in transit and at rest. Copilot Studio adheres to Microsoft’s enterprise-grade security standards. Protections include role-based access controls, secure API integrations, and continuous vulnerability scanning to safeguard against threats such as injection attacks or unauthorized access.
Continuous Monitoring and Feedback: Organizations can monitor agent performance and submit feedback through admin dashboards. Feedback helps Microsoft improve safety features and address emerging risks. Learn how to provide feedback.

Certain models have higher risks of producing harmful content. For example, experimental and preview models are not recommended for production and customers should conduct their own evaluations before deploying. See Select a primary model for your agent.

Best practices for integrating and deploying Copilot Studio

Responsible AI is a shared commitment between Microsoft and its customers. While Microsoft builds AI systems with safety, fairness, and transparency at the core, customers play a critical role in deploying and using these technologies responsibly within their own contexts. To support this partnership, we offer the following best practices for deployers and end users to help customers implement responsible AI effectively.

Deployers and end-users should:

Exercise caution when designing agentic AI in sensitive domains: Users should exercise caution when designing and/or deploying agentic AI applications in sensitive domains where agent actions are irreversible or highly consequential. Deployers and end users should comply with the requirements in the Microsoft Enterprise AI Services Code of Conduct for organizations.
Evaluate legal and regulatory considerations: Customers need to evaluate potential specific legal and regulatory obligations when using any AI services and solutions, which may not be appropriate for use in every industry or scenario. Additionally, AI services or solutions are not designed for and may not be used in ways prohibited in the terms of service and code of conduct. Users must exercise caution and ensure compliance with applicable laws, regulations, and Microsoft’s Enterprise Code of Conduct for organizations. See section 10 on best practices for integrating and deploying Copilot Studio for more guidance.

End-users should:

Define clear objectives:
- Identify the specific business problem the agent solves. Avoid using "general-purpose" agents unless absolutely necessary. Focused agents are safer and easier to maintain. For example, an agent that interprets and records information from invoices solves the specific business problem of organizing data from invoices.
- Document the scope and boundaries—what the agent should and should not do.
Map triggers and actions:
- Design event-driven logic: what triggers the agent (e.g., a new record, a user query) and what actions it performs.
- Use predefined connectors for common systems and validate API permissions for external integrations.
- Include fallback actions for scenarios where data or connectivity is unavailable.
Incorporate guardrails:
- Implement approval steps for high-impact actions, such as financial transactions or HR changes.
- Limit automation to reversible tasks where possible.
- Configure role-based access controls so agents only act within authorized boundaries.
- Define clear boundaries for agent actions, especially in sensitive domains like finance, healthcare, or legal services. Use approval workflows for high-impact tasks.
Ground responses in trusted data:
- Improve agent accuracy by connecting relevant external systems (such as Microsoft Graph, CRM, ERP) through secure APIs and Microsoft Graph connectors..
- Avoid relying solely on model-generated content for critical decisions—always ground outputs in verifiable data.
- Ensure agents only access data sources that comply with organizational policies. Apply role-based access controls and monitor permissions regularly.
Design for adaptability and extensibility:
- Use conditional logic so agents can handle variations in workflows.
- Build modular components that can be updated without redesigning the entire agent.
Test thoroughly:
- Validate agents in a sandbox environment before deployment.
- Include edge cases and stress tests to ensure stability under different conditions.
- Simulate real-world scenarios to check for unintended behaviors.
Monitor and iterate:
- Use dashboards to track agent performance, adoption, and error rates. Regular monitoring helps detect performance drift and maintain reliability.
- Collect feedback from users and stakeholders regularly.
- Update agents as business processes evolve or new connectors become available.
Apply responsible AI principles:
- Review for bias and fairness in responses and actions.
- Follow Microsoft’s AI Code of Conduct and organizational compliance policies.

In addition to the best practices outlined earlier, end-users should also:

Exercise human oversight when appropriate: Human oversight is an important safeguard when interacting with AI systems. While we continuously improve our AI systems, AI might still make mistakes. The outputs generated may be inaccurate, incomplete, biased, misaligned, or irrelevant to your intended goals. This could happen due to various reasons, such as ambiguity in the inputs or limitations of the underlying models. As such, users should review the responses generated by Copilot Studio and verify that they match their expectations and requirements.
Be aware of the risk of overreliance: Overreliance on AI happens when users accept incorrect or incomplete AI outputs, mainly because mistakes in AI outputs may be hard to detect. For the end-user, overreliance could result in decreased productivity, loss of trust, product abandonment, financial loss, psychological harm, physical harm, among others. (e.g. a doctor accepts an incorrect AI output). For Copilot Studio there is a risk of overreliance because [explain the limitations of the product that may result in mistakes or yield less accurate outcomes. These are usually system designs that make it difficult for users to identify when the AI is wrong – see the three UX goals in the Overreliance on AI: Risk Identification and Mitigation Framework | Microsoft Learn].
Exercise caution when designing agentic AI in sensitive domains: Users should exercise caution when designing and/or deploying agentic AI systems in sensitive domains where agent actions are irreversible or highly consequential. Such domains include, but are not limited to, finance and insurance, healthcare, legal service, essential government service, employment, education, or housing. Additional precautions should also be taken when creating autonomous agentic AI as described further in Microsoft’s Code of Conduct.

Deployers should:

Microsoft provides tools such as Copilot Studio analytics, Power Platform admin center reports, and integrated monitoring capabilities to help organizations deploy and manage Copilot Studio agents. These tools are designed to help organizations monitor usage, evaluate agent quality, and assess the operational impact of Copilot Studio agents. For example:

Copilot Studio built‑in analytics enable deployers to view summaries of agent usage, conversation volume, and engagement to help manage deployments and optimize agent configurations. For more information, see Copilot Studio analytics and monitoring documentation.
Copilot Studio analytics and reporting provide makers, administrators, and organizational stakeholders access to reports about agent performance, usage patterns, and operational health to help measure effectiveness and impact. For more information, see Copilot Studio monitoring and governance guidance.
Copilot Studio telemetry and integrated monitoring tools allow deployers to analyze agent behavior over time and compare usage trends against organizational goals to understand business value and detect performance drift. For more information, see Copilot Studio performance monitoring and evaluation documentation.

Learn more about Copilot Studio

For additional guidance on the responsible use of Microsoft 365 Copilot, we recommend reviewing the following documentation: