KeywordExtractor Class
The KeywordExtractor
class is the main entry point for YAKE (Yet Another Keyword Extractor), providing a simple API to extract meaningful keywords from textual content.
Info: This documentation provides interactive code views for each method. Click on a function name to view its implementation.
Module Overview
The KeywordExtractor
class handles the configuration, preprocessing, and extraction of keywords from text documents using statistical features without relying on dictionaries or external corpora.
Constructor
Parameters:
lan
(str, optional): Language for stopwords (default: "en")n
(int, optional): Maximum n-gram size (default: 3)dedup_lim
(float, optional): Similarity threshold for deduplication (default: 0.9)dedup_func
(str, optional): Deduplication function to use (default: "seqm")window_size
(int, optional): Size of word window for co-occurrence (default: 1)top
(int, optional): Maximum number of keywords to return (default: 20)features
(list, optional): List of features to use for scoring (default: None = all features)stopwords
(set, optional): Custom stopwords set (default: None, loads from language file)
Core Methods
Parameters:
text
(str): The text to extract keywords from
Returns:
- list: A list of tuples containing (keyword, score) pairs, sorted by relevance (lower scores are better)
Helper Methods
Similarity Functions
Usage Examples
Basic Usage
Customized Usage
Deduplication Functions
The KeywordExtractor
supports multiple string similarity algorithms for deduplication:
-
Jaro-Winkler ("jaro", "jaro_winkler"): Based on character matches with higher weights for prefix matches
-
Levenshtein Ratio ("levs"): Based on Levenshtein edit distance normalized by string length
-
SequenceMatcher ("seqm", "sequencematcher"): Based on Python's difflib sequence matching algorithm
Dependencies
The module relies on:
os
: For file operations and path handlingjellyfish
: For Jaro-Winkler string similarityyake.data.DataCore
: For core data representation.Levenshtein
: For Levenshtein distance and ratio calculations