Education Menu

ORP Menu

Generative AI in Research

There can be a misconception that use of AI in research is bad and must be avoided. On the contrary, AI can be used responsibly in research. The following general principles offer best practices and guidance on use of generative AI in research. These principles are not exhaustive and are ever evolving but offer a starting point. Researchers are responsible for having an awareness of the norms and acceptable practices regarding use of generative AI in their fields.

Core Responsibilities

Researchers must ensure AI use aligns with established standards for rigor, reproducibility, and integrity. Use of AI should enhance human thought, not replace it. Core responsibility include:

  • Educating yourself and your team on AI’s capabilities and limits.
  • Documenting research processes including inputs, outputs, and AI prompts.
  • Taking primary responsibility for verifying all AI-generated content, data, and references.
  • Accounting for AI bias.
  • Disclosing use transparently in proposals, manuscripts, and publications.
  • Respecting Data Privacy and Security.
  • Being aware of the impact of stakeholders.
  • Following existing policies.

These responsibilities are further discussed in the guidance below.

Generally, appropriate use of generative AI in research includes augmentation (enhancement or refinement of a novel idea) or automation (basic analysis/computation or the summarizing of data). Appropriate use also ensures that outcomes can be verified. Cornell’s Task Force on Generative AI in Research provided the following examples of appropriate use of generative AI during different phases of the research cycle.

When defining a research project or gathering necessary background information, examples of appropriate uses may be to:

  • Triage, organize, and summarize information
  • Draft literature or improve drafts of literature reviews (with appropriate verification)

All researchers are expected to proactively educate themselves on the appropriate and responsible use of generative AI in research. Responsibility ultimately rests with the individual researcher to remain current on evolving guidance, seek clarification when needed, and apply best practices in their own work. Using AI responsibly requires ongoing self-education, not a one-time training.

  1. Establish expectations for use
  2. Know the limitations of using AI
  3. Be aware of the potential risks such as data privacy and interference (breach of confidentiality) biases and deepfakes
  4. Consult with subject matter experts
  5. Understand the differences among AI tools

AI literacy includes knowing how to draft effective prompts. The Creative CRAFT is a research-based framework that helps AI users write clearer, more effective prompts. The goal is to give the AI enough information to do the task well.

  • Context: Provide the necessary background information, setting, or sources for the task.
  • Role: Specify the persona you want the AI to adopt (e.g., “You are an expert content marketer,” “You are a seasoned history teacher”).
  • Action/Task: Clearly define what you want the AI to do (e.g., “Write a 300-word blog post,” “Create a lesson plan”).
  • Format: Define the output style (e.g., list, table, essay, JSON, email).
  • Tone/Steps/Constraints: Set the mood, define specific steps to follow, and list limitations (e.g., “use a friendly tone,” “avoid jargon,” “maximum 500 words”).
  • Creative Direction: Provide guidance on the style of thinking, such as asking for unique metaphors, specific perspectives, or innovative approaches.

Limits of Generative AI

As AI was designed to predict the “next word” based on patters or trends, AI tends to agree with your framing, reinforce your assumptions, and produce responses that feel right but does not reason or understand ethical issues.

  • May hallucinate references, data, or results.
  • Quality of output depends heavily on the quality and recency of input data.
  • Lacks intuition and contextual understanding.
  • May embed systemic biases present in training data.

The NIH Data Glossary explains that data provenance is a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. This trail of documentation ensure that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.

A best practice when using generative AI in research is data provenance- a focus on documenting both the document work-flow and input processes just as much as the meta data and generated results. Documentation should document the following and indicate who was responsible: 

  • Approvals
  • Consent
  • Input
  • Prompts
  • Workflow
  • Output

AI hallucinations are incorrect or misleading results that AI models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model (Google Cloud).

As such, researchers are responsible for verifying the accuracy of AI generated output. There is always need for human review of data. AI should never solely be relied upon. Verification should include checks for:

  • Accuracy
  • Completeness
  • Relevance

Always check the source material, fact-check key points, look for consistency and contradictions and ensure the work is reproducible.

Account for Bias

There is no fool-proof way to determine when AI was used. Researchers must be transparent about their use of AI. See the Libraries’ guidance on citing generative AI.

Just as AI cannot be fully relied upon without human verification of output, AI checkers also cannot be fully relied upon to determine if AI was used to generate, analyze, or produce and output.

Failure to disclosure could potentially lead to research misconduct allegations. However, not all use of AI requires disclosure to all stakeholders.

EDUCAUSE provides a framework for disclosure as depicted in the cart below.

There is no fool-proof way to determine when AI was used. Researchers must be transparent about their use of AI. See the Libraries’ guidance on citing generative AI 

Just as AI cannot be fully relied upon without human verification of output, AI checkers also cannot be fully relied upon to determine if AI was used to generate, analyze, or produce and output.  

Failure to disclosure could potentially lead to research misconduct allegations. However, not all use of AI requires disclosure to all stakeholders.  

EDUCAUSE provides a framework for disclosure as depicted in the chart below. 

Level of UseScope of GenAI ContributionContent Expert RoleDocumentation RequirementDisclosure Requirement
MinimalPeripheral brainstorming, minor text editsOptionalInternal note in design recordsNone required
ModerateSubstantive input on components (e.g., quiz banks, prompts)Approval requiredArchive the original GenAI output and the human revisionExternal note where learners encounter AI-assisted content
SignificantGenAI forms the backbone of a major component (e.g., full lecture draft)Direction and guidanceEdit log with rationale and validation stepsPublic disclosure in course or faculty-facing materials
ComprehensiveGenAI produces most of the instructional artifacts or adaptive contentFull oversight and QA reviewComplete prompt-to-final archive in stable repositoryFull transparency with detailed context for all stakeholders

Use of generative AI in research involving human subjects presents unique ethical and regulatory concerns. IRB-approval is needed before any human subjects research takes place. Researchers must carefully evaluate risks related to privacy, consent, and secondary use of data. In addition to the information below, refer to the IRB webpage for more details.

Deidentification and Privacy

  • Deidentification is not foolproof. Even when direct identifiers are removed, AI tools may process or infer personal information from residual data, patterns, or linked metadata.
  • Metadata is identifiable. File names, geolocation, timestamps, or other metadata can inadvertently disclose sensitive information. All metadata should be scrubbed or managed prior to AI use.
  • Do not input identifiable or “restricted” data into generative AI systems unless explicitly approved by IRB and data agreements.

Informed Consent

  • Researchers must ensure that participants are informed if generative AI may be used to process or analyze their data.
  • Consent language should cover:
    • Whether AI tools will be used during or after the study.
    • Risks of reidentification from metadata or advanced computational methods.
    • How long data will be stored and whether future AI-based analyses are anticipated.

Use of AI After Study Completion

  • Using AI for future, undetermined research requires explicit participant consent or additional IRB approval.
  • Broad consent may be appropriate when repositories or datasets will be analyzed by AI for purposes not defined at the time of initial data collection.
  • Researchers must balance scientific opportunity with respect for participant autonomy and privacy.

Best Practices

  • Document all AI uses in data management plans and IRB protocols.
  • When in doubt, consult the IRB or research compliance office before applying AI tools to human subjects data.

One of the first groups to establish standards on the use of generative AI in the research ecosystem is journals. Most journals do not find it acceptable to include AI as an author. Per the COPE position statement “AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work.” A Nature article explains that AI cannot be credited as an author as “attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.”

Additionally, it is not acceptable to use generative AI to conduct peer reviews as it breaches confidentiality of the process.

Most granting agencies have the same expectations as journals. Generative AI should not be used to review grant proposals and if used to contribute to a grant proposal, researchers are still expected to verify the accuracy of information.

Research Misconduct Concerns

Although generative AI has given rise to concerns about plagiarism, fabrication or falsification, use of generative AI alone is not research misconduct. There are many ways to acceptably use generative AI. Use of any new research tool requires an understanding of the strengths and weaknesses. The same principles of research integrity, such as reproducibility, appropriate citation, and following basic scientific practices, apply to the use of generative AI in research.

However, there are some common pitfalls when using generative AI that may lead to research misconduct issues. For example, generative AI often makes up (fabricates) sources and/or source data. It may also manipulate (falsify) data to the extent that it does not accurately reflect the original outcome. Again, following basic scientific standards such as verifying source data, can alleviate these issues.

As with any research, there is a need to know the applicable policies and regulations regarding the use of generative AI (e.g., AD69. For example, does inputting code into an AI tool violate the license agreement? Do the data elements being inputted or shared sensitive or classified as “High” or “Restricted” information and have privacy or security  concerns (e.g., University Policies, AD53 and AD95 and it’s corresponding Standards)? For research with human subjects or their data, does the use of generative AI deviate from the IRB protocol?

Starting Your Project

1. Tool Selection & Data Security

  • Verify Tool Approval: Have you checked if the AI tool is university-approved for your specific data type?
  • Assess Data Sensitivity: Are you handling public or confidential data? Never upload unpublished data, proprietary work, or identifiable participant details into public AI tools.
  • Check Compliance Requirements: If your research involves human subjects, does the use of AI for data processing align with your approved IRB protocol and consent documents? Does your intended AI use comply with the specific policies of your funding agency?

Conducting Your Research

2. Accuracy & Human Oversight

  • Human Verification: Have you verified all AI-generated text, statistics, citations, and images for accuracy?
  • Identify Hallucinations: Have you cross-checked citations and data points to ensure the AI hasn’t “hallucinated” or created false references?
  • Reproducibility: Can the AI-assisted analysis be reproduced by a human researcher using the raw data?
  • Bias Awareness: Have you critically evaluated the AI output for inherent biases or discriminatory patterns?

3. Transparency

  • Transparency: Have you documented and disclosed the use of AI?
  • Collaboration: Is your use of AI (e.g., editing, data summarization, or code generation) clearly explained to collaborators?
  • Image Integrity: If AI was used to alter or generate images, are they appropriately labeled? Unlabeled manipulation may be considered falsification.

Publishing Your Work

4.  Authorship & Publication Ethics

  • Authorship Check: AI should not be listed as an author? AI cannot take legal or ethical responsibility for the work.
  • Journal Alignment: Does your AI use comply with the specific policies of your target journal (e.g., COPE, ICMJE)?
  • Peer Review Ethics: Have you refrained from uploading any confidential peer-review materials (manuscripts or comments) into AI tools?
  • Accountability: Do you accept full responsibility for the integrity of the final product? Accountability cannot be delegated to an AI tool.

insert table

General Best Practices Checklist

  • Verify the accuracy and validity of the output
  • Confirm the appropriate citation of sources and correct reference
  • Disclosure the use of AI
  • Document what parts of the research process included use of generative AI
  • Be knowledgeable of applicable policies, such as those related to data security and privacy

Authorship and Peer Review

Disclosure

Use of AI in Research

Office for Research Protections

Address

200 Innovation Blvd.
Suite 110
University Park, PA 16802

The Office for Research Protections (ORP) ensures that research at the University is conducted in accordance with federal, state, and local regulations and guidelines that protect human participants, animals, students, and personnel involved with research.

Contact

Phone: (814) 865-1775

Email: orp@psu.edu