Build Your Own AI Tool: Scripting with Google's PaLM and Python for Library
Presented by Eric Silverberg, Librarian at Queens College, City University of New York
Introduction
In this presentation, Eric Silverberg shares his journey in developing an automated tool to assist faculty at Queens College in depositing their scholarly articles into the institutional repository. Recognizing the low participation of faculty in the School of Education, he sought to simplify the process by leveraging Google's PaLM API and Python scripting.
Background and Motivation
The Importance of Open Access
- Personal Commitment: Eric emphasizes the significance of making educational research openly accessible, aligning with his values and background as a classroom teacher.
- University Mission Alignment: As a public institution, the City University of New York aims to make its research available to the public.
- Impact on Education: Open access to research empowers policymakers, administrators, and teachers by providing them with valuable insights and data.
Challenges with Faculty Participation
- Faculty were generally unaware of the institutional repository or found the process too cumbersome.
- Understanding open access policies for each journal can be complex and time-consuming.
- Manually checking policies via Sherpa Romeo for numerous publications is inefficient.
Problem Statement
The core issue was automating the extraction of journal names from faculty citations to retrieve open access policies from Sherpa Romeo's API without manual intervention.
Initial Approach
- Coding APA Rules: Attempted to parse citations by coding the rules of APA formatting.
- Encountering Exceptions: Faculty citations varied significantly, with inconsistencies and creative deviations from standard formats.
- Limitations: The approach became impractical due to the numerous exceptions, leading to excessive coding for edge cases.
Leveraging Google's PaLM API
Discovering PaLM
- He learned about Google's PaLM API, which powers the language model behind Bard (now Gemini).
- Recognized its potential for natural language understanding and processing.
Implementing PaLM for Journal Extraction
- Simple Prompting: Used straightforward prompts like "What is the name of the journal in this citation?"
- High Accuracy: PaLM effectively extracted journal names even from inconsistently formatted citations.
- Automation: Enabled batch processing of citations without manually coding for formatting exceptions.
Technical Implementation
Setting Up the Environment
- API Key Connection: Established a connection to PaLM's API using a free API key.
- Selecting the Model: Chose the text generation model suitable for processing text inputs.
- Python Scripting: Used Python to write functions for automating the process.
Key Components of the Script
Part A: Connecting to PaLM
# Connect to PaLM API
import google.generativeai as palm
palm.configure(api_key='YOUR_API_KEY')
# Select the text generation model
models = [model for model in palm.list_models() if 'generateText' in model.supported_generation_methods]
model = models[0].name
Part B: Extracting Journal Names
# Function to get journal name
def get_journal_name(citation):
prompt = f"What is the name of the journal in this citation?\n{citation}"
completion = palm.generate_text(model=model, prompt=prompt, temperature=0, max_output_tokens=800)
return completion.result
- Temperature Parameter: Set to 0 to minimize randomness and ensure consistent outputs.
- Max Output Tokens: Defined to control the length of the response.
Automating the Entire Process
- Input Data: Collected faculty citations in a spreadsheet.
- Journal Extraction: Used the `get_journal_name` function to populate journal names next to citations.
- OA Policy Retrieval: Sent journal names to Sherpa Romeo's API to get open access policies.
- Output Report: Generated a comprehensive report detailing OA policies for each publication.
Example Output
An example of the output report includes:
- Citation: Full citation provided by the faculty.
- Journal Name: Extracted using PaLM.
- OA Policies: Detailed information on preprint, accepted manuscript, and final version policies.
Citation 4:
[Full Citation Here]
Journal: African Journal of Teacher Education
OA Policies:
- Submitted Manuscript: [Policy Details]
- Accepted Manuscript: [Policy Details]
- Final Version of Record: [Policy Details]
Challenges and Considerations
Dealing with Sherpa Romeo's API
- Data Structure: The API returns data nested in complex ways, requiring careful parsing.
- Error Handling: Implemented to manage cases where OA data was missing or incomplete.
Faculty Engagement
- Planned to share the generated reports with faculty to encourage repository deposits.
- Recognized the need for feedback to refine the tool and process.
Next Steps and Potential Enhancements
- User Feedback: Gather input from faculty like Professor N'Dri T. AssiƩ-Lumumba, who agreed to pilot the tool.
- Automation of Deposits: Consider scripting the submission of articles into the repository, pending faculty permission.
- Exploring Other APIs: Investigate alternatives like OpenAlex for OA policy data, potentially simplifying the process.
- Improving PDF Handling: Explore methods to reverse engineer formatted PDFs back into Word documents for easier repository submissions.
Audience Questions and Responses
Is there a template available?
Answer: Yes, the code shared is largely based on Google's documentation. You can access Eric's script on GitHub and modify it for your needs.
How are citations received from faculty?
Answer: Currently, citations are obtained directly from faculty CVs. The process may evolve based on faculty feedback and scalability considerations.
Does the tool handle abbreviated journal names?
Answer: Yes, PaLM effectively recognizes and extracts abbreviated journal names, which is particularly useful in fields where abbreviations are common.
Why use Sherpa Romeo instead of OpenAlex?
Answer: Familiarity with Sherpa Romeo's API led to its initial use. OpenAlex may offer a more streamlined API, and exploring it could be beneficial for future iterations.
Can ChatGPT be used for journal name extraction?
Answer: While ChatGPT could perform similar tasks, using PaLM's API allows for automation within the script, eliminating the need for manual input and handling larger batches efficiently.
Could the process be further automated to deposit articles?
Answer: Automating the entire submission process is an intriguing idea. It would require careful consideration of repository submission protocols and faculty permissions.
Conclusion
Eric Silverberg's innovative approach demonstrates how AI tools like Google's PaLM can address practical challenges in academic libraries. By automating the extraction of journal names and retrieval of OA policies, the process becomes more efficient, encouraging greater faculty participation in open access initiatives.
The project underscores the potential of AI in streamlining workflows and enhancing access to scholarly research. Ongoing feedback and collaboration with faculty will be essential in refining the tool and maximizing its impact.
Resources and Contact Information
- GitHub Repository: Access the script on GitHub
- Email: eric.silverberg@example.com
Eric welcomes questions, collaborations, and feedback on the project.
Acknowledgments
Special thanks to Natalie Swanberg for participating in the pilot and to all attendees for their insightful questions and engagement.