Practical Application of AI: Evaluating Music to Build a Music Library
Presented by Jane Gilvin, NPR's Research Archives and Data Team
Introduction
Jane Gilvin delivered a presentation on how her team at NPR utilized artificial intelligence (AI) to automate the identification of instrumental and vocal music to build a digital music library more efficiently. The session focused on the practical application of AI in music cataloging, the challenges faced, and the solutions implemented.
About Jane Gilvin and the RAD Team
- Jane Gilvin:
- Member of NPR's Research Archives and Data (RAD) Team for nearly 13 years.
- Educational background in music and library science.
- Alumna of San Jose State University's Information Science program.
- Experience in radio since she was a teenager.
- The RAD Team:
- Formerly known as the NPR Library, established in the 1970s.
- Responsible for collecting NPR programming archives.
- Provides resources for production, including a comprehensive music collection.
NPR's Music Collection Evolution
The NPR music collection has evolved alongside technological advancements:
- Vinyl Records: The initial collection comprised vinyl records across various genres.
- Transition to CDs: Shifted to compact discs (CDs) as CD players became standard in production.
- Digital Music Files: Moved towards digital files to meet the expectations of quick and remote access to music.
Challenges in Digitizing the Collection
The transition to digital presented several challenges:
- Converting thousands of physical CDs into digital files for immediate access.
- Ensuring metadata accuracy and consistency, especially for instrumental and vocal classification.
- Lack of resources for continuous large-scale ingestion and cataloging of new music.
Solution: Automation with AI
The Robot and ORRIS
- The Robot: A batch processing system capable of ripping CDs, identifying metadata from online databases, and delivering MP3 and WAV files with embedded ID3 tags.
- ORRIS (Open Resource and Research Information System): A new database developed to allow users to search, stream, and download songs for production.
Implementing Essentia
- Essentia: An open-source library and collection of tools used to analyze audio and music to produce descriptions and synthesis.
- Capabilities: Predicts genre, beats per minute, mood, and most importantly, classifies tracks as instrumental or vocal.
- Training the Algorithm: Used NPR's extensive archive of over 300,000 tracks with existing instrumental and vocal tags to train the algorithm.
Accuracy and Testing
- Human Cataloging Accuracy: Ranged from 90% to 98%, averaging around 90% due to human error and limitations.
- Algorithm Accuracy Goal: Set at 80% to balance the usefulness and the efficiency of the process.
- Results: The algorithm achieved an accuracy of 86%, meeting the team's criteria.
Integration and Quality Control
Building into the Ingest Process
- Automated the instrumental/vocal tagging during the ingest process of new tracks.
- Applied the algorithm to existing tracks that lacked instrumental/vocal classification.
User Feedback Mechanism
- Added a feature allowing users to report incorrectly tagged songs directly from the ORRIS interface.
- Provided a quick way for the RAD team to receive notifications and correct metadata errors.
Quality Control Measures
- Automated spreadsheets generated during the algorithm's run allowed for immediate review of results.
- Periodic checks to ensure the algorithm continues to perform within the acceptable accuracy range.
- Addressed any shifts in algorithm performance due to changes in the type of music being ingested or other factors.
Demonstration
Jane provided a live demonstration of how the process works:
- Showed the ORRIS search interface and how users can search for and listen to tracks (e.g., Thelonious Monk, David Bowie).
- Demonstrated the ingestion of new albums and how the algorithm processes them to classify tracks as instrumental or vocal.
- Illustrated the use of the user feedback feature to report incorrect classifications.
Benefits and Outcomes
- Significantly reduced the time and resources required for music cataloging.
- Enabled continuous addition of new music to the library despite limited staff time.
- Improved user satisfaction by providing a reliable point of data for instrumental and vocal tracks.
Challenges and Considerations
- Training Data Limitations: Ensuring the training data was representative and free from bias or errors.
- Algorithm Bias: Addressing the overrepresentation of certain genres (e.g., jazz and classical) in the training data to avoid skewed results.
- Metadata Accuracy: Dealing with inconsistent or incorrect metadata from external sources.
Future Plans
Jane discussed potential future projects:
- Revisiting other algorithms from Essentia, such as those predicting timbre and mood.
- Implementing user testing and UX projects to improve data research and user experience.
- Continuing to refine the algorithm and processes to maintain or improve accuracy.
Questions and Answers
During the Q&A session, several topics were addressed:
Copyright and Licensing Considerations
- NPR has licenses with major performing rights organizations for the use of music in production.
- Other libraries considering building a music collection should review legal permissions and terms of use.
Data Labeling and Ongoing QA/QC
- The team performs periodic quality checks but does not engage extensively in data labeling projects.
- Emphasis on monitoring algorithm performance and making adjustments as needed.
User Testing and UX Improvements
- Future plans include conducting user testing to evaluate the effectiveness of additional algorithms (e.g., mood taxonomy).
- Goal is to enhance the search and discovery experience for users.
Conclusion
Jane concluded by emphasizing how the application of AI allowed the RAD team to develop a less time-consuming ingest and cataloging process. This enabled the continuous growth of the music library, providing valuable resources to production staff while efficiently managing limited staff time.
Contact Information
For further information or inquiries, you can reach out to Jane Gilvin through NPR's Research Archives and Data Team.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.