Home » Office for Research Protections » Education » Generative AI in Research

Education Menu

ORP Menu

Generative AI in Research

There can be a misconception that use of AI in research is bad and must be avoided. On the contrary, AI can be used responsibly in research. The following general principles offer best practices and guidance on use of generative AI in research. These principles are not exhaustive and are ever evolving but offer a starting point. Researchers are responsible for having an awareness of the norms and acceptable practices regarding use of generative AI in their fields.

Core Responsibilities

Researchers must ensure AI use aligns with established standards for rigor, reproducibility, and integrity. Use of AI should enhance human thought, not replace it. Core responsibility include:

Educating yourself and your team on AI’s capabilities and limits.
Documenting research processes including inputs, outputs, and AI prompts.
Taking primary responsibility for verifying all AI-generated content, data, and references.
Accounting for AI bias.
Disclosing use transparently in proposals, manuscripts, and publications.
Respecting Data Privacy and Security.
Being aware of the impact of stakeholders.
Following existing policies.

These responsibilities are further discussed in the guidance below.

Appropriate Use and Examples

Generally, appropriate use of generative AI in research includes augmentation (enhancement or refinement of a novel idea) or automation (basic analysis/computation or the summarizing of data). Appropriate use also ensures that outcomes can be verified. Cornell’s Task Force on Generative AI in Research provided the following examples of appropriate use of generative AI during different phases of the research cycle.

When defining a research project or gathering necessary background information, examples of appropriate uses may be to:

Triage, organize, and summarize information
Draft literature or improve drafts of literature reviews (with appropriate verification)

Education

All researchers are expected to proactively educate themselves on the appropriate and responsible use of generative AI in research. Responsibility ultimately rests with the individual researcher to remain current on evolving guidance, seek clarification when needed, and apply best practices in their own work. Using AI responsibly requires ongoing self-education, not a one-time training.

Establish expectations for use
Know the limitations of using AI
Be aware of the potential risks such as data privacy and interference (breach of confidentiality) biases and deepfakes
Consult with subject matter experts
Understand the differences among AI tools

AI literacy includes knowing how to draft effective prompts. The Creative CRAFT is a research-based framework that helps AI users write clearer, more effective prompts. The goal is to give the AI enough information to do the task well.

Context: Provide the necessary background information, setting, or sources for the task.
Role: Specify the persona you want the AI to adopt (e.g., “You are an expert content marketer,” “You are a seasoned history teacher”).
Action/Task: Clearly define what you want the AI to do (e.g., “Write a 300-word blog post,” “Create a lesson plan”).
Format: Define the output style (e.g., list, table, essay, JSON, email).
Tone/Steps/Constraints: Set the mood, define specific steps to follow, and list limitations (e.g., “use a friendly tone,” “avoid jargon,” “maximum 500 words”).
Creative Direction: Provide guidance on the style of thinking, such as asking for unique metaphors, specific perspectives, or innovative approaches.

Limits of Generative AI

As AI was designed to predict the “next word” based on patters or trends, AI tends to agree with your framing, reinforce your assumptions, and produce responses that feel right but does not reason or understand ethical issues.

May hallucinate references, data, or results.
Quality of output depends heavily on the quality and recency of input data.
Lacks intuition and contextual understanding.
May embed systemic biases present in training data.

Documentation

The NIH Data Glossary explains that data provenance is a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. This trail of documentation ensure that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.

A best practice when using generative AI in research is data provenance- a focus on documenting both the document work-flow and input processes just as much as the meta data and generated results. Documentation should document the following and indicate who was responsible:

Approvals
Consent
Input
Prompts
Workflow
Output

Verification

AI hallucinations are incorrect or misleading results that AI models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model (Google Cloud).

As such, researchers are responsible for verifying the accuracy of AI generated output. There is always need for human review of data. AI should never solely be relied upon. Verification should include checks for:

Accuracy
Completeness
Relevance

Always check the source material, fact-check key points, look for consistency and contradictions and ensure the work is reproducible.

Account for Bias

There is no fool-proof way to determine when AI was used. Researchers must be transparent about their use of AI. See the Libraries’ guidance on citing generative AI.

Just as AI cannot be fully relied upon without human verification of output, AI checkers also cannot be fully relied upon to determine if AI was used to generate, analyze, or produce and output.

Failure to disclosure could potentially lead to research misconduct allegations. However, not all use of AI requires disclosure to all stakeholders.

EDUCAUSE provides a framework for disclosure as depicted in the cart below.

Disclosure

There is no fool-proof way to determine when AI was used. Researchers must be transparent about their use of AI. See the Libraries’ guidance on citing generative AI.

Just as AI cannot be fully relied upon without human verification of output, AI checkers also cannot be fully relied upon to determine if AI was used to generate, analyze, or produce and output.

Failure to disclosure could potentially lead to research misconduct allegations. However, not all use of AI requires disclosure to all stakeholders.

EDUCAUSE provides a framework for disclosure as depicted in the chart below.

Level of Use	Scope of GenAI Contribution	Content Expert Role	Documentation Requirement	Disclosure Requirement
Minimal	Peripheral brainstorming, minor text edits	Optional	Internal note in design records	None required
Moderate	Substantive input on components (e.g., quiz banks, prompts)	Approval required	Archive the original GenAI output and the human revision	External note where learners encounter AI-assisted content
Significant	GenAI forms the backbone of a major component (e.g., full lecture draft)	Direction and guidance	Edit log with rationale and validation steps	Public disclosure in course or faculty-facing materials
Comprehensive	GenAI produces most of the instructional artifacts or adaptive content	Full oversight and QA review	Complete prompt-to-final archive in stable repository	Full transparency with detailed context for all stakeholders

Generative AI and Human Subjects Research

Use of generative AI in research involving human subjects presents unique ethical and regulatory concerns. IRB-approval is needed before any human subjects research takes place. Researchers must carefully evaluate risks related to privacy, consent, and secondary use of data. In addition to the information below, refer to the IRB webpage for more details.

Deidentification and Privacy

Deidentification is not foolproof. Even when direct identifiers are removed, AI tools may process or infer personal information from residual data, patterns, or linked metadata.
Metadata is identifiable. File names, geolocation, timestamps, or other metadata can inadvertently disclose sensitive information. All metadata should be scrubbed or managed prior to AI use.
Do not input identifiable or “restricted” data into generative AI systems unless explicitly approved by IRB and data agreements.

Informed Consent

Researchers must ensure that participants are informed if generative AI may be used to process or analyze their data.
Consent language should cover:
- Whether AI tools will be used during or after the study.
- Risks of reidentification from metadata or advanced computational methods.
- How long data will be stored and whether future AI-based analyses are anticipated.

Use of AI After Study Completion

Using AI for future, undetermined research requires explicit participant consent or additional IRB approval.
Broad consent may be appropriate when repositories or datasets will be analyzed by AI for purposes not defined at the time of initial data collection.
Researchers must balance scientific opportunity with respect for participant autonomy and privacy.

Best Practices

Document all AI uses in data management plans and IRB protocols.
When in doubt, consult the IRB or research compliance office before applying AI tools to human subjects data.

Authorship and Peer Review

One of the first groups to establish standards on the use of generative AI in the research ecosystem is journals. Most journals do not find it acceptable to include AI as an author. Per the COPE position statement “AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work.” A Nature article explains that AI cannot be credited as an author as “attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.”

Additionally, it is not acceptable to use generative AI to conduct peer reviews as it breaches confidentiality of the process.

Most granting agencies have the same expectations as journals. Generative AI should not be used to review grant proposals and if used to contribute to a grant proposal, researchers are still expected to verify the accuracy of information.

Research Misconduct Concerns

Although generative AI has given rise to concerns about plagiarism, fabrication or falsification, use of generative AI alone is not research misconduct. There are many ways to acceptably use generative AI. Use of any new research tool requires an understanding of the strengths and weaknesses. The same principles of research integrity, such as reproducibility, appropriate citation, and following basic scientific practices, apply to the use of generative AI in research.

However, there are some common pitfalls when using generative AI that may lead to research misconduct issues. For example, generative AI often makes up (fabricates) sources and/or source data. It may also manipulate (falsify) data to the extent that it does not accurately reflect the original outcome. Again, following basic scientific standards such as verifying source data, can alleviate these issues.

Applicable Policies and Regulations

As with any research, there is a need to know the applicable policies and regulations regarding the use of generative AI (e.g., AD69. For example, does inputting code into an AI tool violate the license agreement? Do the data elements being inputted or shared sensitive or classified as “High” or “Restricted” information and have privacy or security concerns (e.g., University Policies, AD53 and AD95 and it’s corresponding Standards)? For research with human subjects or their data, does the use of generative AI deviate from the IRB protocol?

Researcher's Checklist for Responsible AI Use

Starting Your Project

1. Tool Selection & Data Security

Verify Tool Approval: Have you checked if the AI tool is university-approved for your specific data type?
Assess Data Sensitivity: Are you handling public or confidential data? Never upload unpublished data, proprietary work, or identifiable participant details into public AI tools.
Check Compliance Requirements: If your research involves human subjects, does the use of AI for data processing align with your approved IRB protocol and consent documents? Does your intended AI use comply with the specific policies of your funding agency?

Conducting Your Research

2. Accuracy & Human Oversight

Human Verification: Have you verified all AI-generated text, statistics, citations, and images for accuracy?
Identify Hallucinations: Have you cross-checked citations and data points to ensure the AI hasn’t “hallucinated” or created false references?
Reproducibility: Can the AI-assisted analysis be reproduced by a human researcher using the raw data?
Bias Awareness: Have you critically evaluated the AI output for inherent biases or discriminatory patterns?

3. Transparency

Transparency: Have you documented and disclosed the use of AI?
Collaboration: Is your use of AI (e.g., editing, data summarization, or code generation) clearly explained to collaborators?
Image Integrity: If AI was used to alter or generate images, are they appropriately labeled? Unlabeled manipulation may be considered falsification.

Publishing Your Work

4. Authorship & Publication Ethics

Authorship Check: AI should not be listed as an author? AI cannot take legal or ethical responsibility for the work.
Journal Alignment: Does your AI use comply with the specific policies of your target journal (e.g., COPE, ICMJE)?
Peer Review Ethics: Have you refrained from uploading any confidential peer-review materials (manuscripts or comments) into AI tools?
Accountability: Do you accept full responsibility for the integrity of the final product? Accountability cannot be delegated to an AI tool.

Appropriate vs. Inappropriate Use Examples

insert table

General Best Practices Checklist

Verify the accuracy and validity of the output
Confirm the appropriate citation of sources and correct reference
Disclosure the use of AI
Document what parts of the research process included use of generative AI
Be knowledgeable of applicable policies, such as those related to data security and privacy

References and Recommended Reading

Authorship and Peer Review

COPE Position Statement on AI and Authorship
Using AI in Peer Review is a Breach of Confidentiality (Mike Lauer, Stephanie Constant, and Amy Wernimont from NIH)
The Use of Generative Artificial Intelligence Technologies is Prohibited for the NIH Peer Review Process (NIH NOT-OD-23-149)
Nonhuman “Authors” and Implications for the Integrity of Scientific Publication and Medical Knowledge (JAMA)
ChatGPT is fun, but not an author (Science)
Best Practices for Using AI When Writing Scientific Manuscripts (ACS Nano)

Disclosure

From Prompt to Practice: A Framework for Transparent GenAI Use in Higher Education (EDUCAUSE)
Libraries’ guidance on citing generative AI
The Artificial Intelligence Disclosure (AID) Framework (ACRL)

Use of AI in Research

Tools such as ChatGPT threaten transparent science; here are our ground rules for their use (Nature)
Recommendations on ChatGPT and Chatbots in Relation to Scholarly Publications (World Association of Medical Editors)
The challenge of AI chatbots for journal editors (COPE)
Using AI to write scholarly publications (Accountability in Research)
UK Research Integrity Office (UKRIO) AI in Research Resources
Generative AI for Research (University of Nevada, Reno)
Using Generative AI for Scientific Research (University of Michigan)
Generative AI in Academic Research (Cornell University)
AI use in research (Utah State University)
Generative AI Usage Guidance for the Research Community (UNC Chapel Hill)
NIH Artificial Intelligence in Research: Policy Considerations and Guidance

Office for Research Protections

Address

200 Innovation Blvd.
Suite 110
University Park, PA 16802

The Office for Research Protections (ORP) ensures that research at the University is conducted in accordance with federal, state, and local regulations and guidelines that protect human participants, animals, students, and personnel involved with research.

Contact

Phone: (814) 865-1775

Email: orp@psu.edu

I want to...

Protect Your Intellectual Property with an Invention Disclosure

I want to enter into...

Featured Item

Learn about...

Featured Item

Ethics & Compliance

Institutes

Offices

Other Units

More

Generative AI in Research

Education Menu

ORP Menu

Generative AI in Research

Core Responsibilities

Limits of Generative AI

Account for Bias

Research Misconduct Concerns

General Best Practices Checklist

Office for Research Protections

Address

Contact

Office of the Senior Vice President

Interdisciplinary Research Institutes

Penn State Colleges

Become an Advocate