personalized_embedding

`PERSONALIZED_EMBEDDINGS`

A unified embedding manager combining Word2Vec and Doc2Vec capabilities.

This class provides a comprehensive interface for training and managing both word and document embeddings, making it suitable for personalized recommendation systems that need to understand both word-level and document-level semantics.

Attributes:

Name	Type	Description
`word2vec`	`WORD2VEC`	Instance of the Word2Vec model for word embeddings
`doc2vec`	`DOC2VEC`	Instance of the Doc2Vec model for document embeddings

Methods:

Name	Description
`train_word2vec`	Trains the Word2Vec model on a corpus of sentences
`train_doc2vec`	Trains the Doc2Vec model on a corpus of documents
`get_word_embedding`	Retrieves word vectors
`get_doc_embedding`	Retrieves document vectors
`save_models`	Persists both models to disk
`load_models`	Loads both models from disk

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

class PERSONALIZED_EMBEDDINGS:
    """
    A unified embedding manager combining Word2Vec and Doc2Vec capabilities.

    This class provides a comprehensive interface for training and managing both word
    and document embeddings, making it suitable for personalized recommendation systems
    that need to understand both word-level and document-level semantics.

    Attributes:
        word2vec (WORD2VEC): Instance of the Word2Vec model for word embeddings
        doc2vec (DOC2VEC): Instance of the Doc2Vec model for document embeddings

    Methods:
        train_word2vec: Trains the Word2Vec model on a corpus of sentences
        train_doc2vec: Trains the Doc2Vec model on a corpus of documents
        get_word_embedding: Retrieves word vectors
        get_doc_embedding: Retrieves document vectors
        save_models: Persists both models to disk
        load_models: Loads both models from disk
    """

    def __init__(self, word2vec_params: Dict[str, Any] = None, doc2vec_params: Dict[str, Any] = None):
        """
        Initialize both Word2Vec and Doc2Vec models with customizable parameters.

        Args:
            word2vec_params (Dict[str, Any], optional): Configuration parameters for Word2Vec model.
                                                       Includes vector_size, window, min_count, workers.
            doc2vec_params (Dict[str, Any], optional): Configuration parameters for Doc2Vec model.
                                                      Includes vector_size, window, min_count, workers, epochs.

        Note:
            If no parameters are provided, models will be initialized with default values.
            See individual model documentation for default parameter details.
        """
        self.word2vec = WORD2VEC(**(word2vec_params if word2vec_params else {}))
        self.doc2vec = DOC2VEC(**(doc2vec_params if doc2vec_params else {}))

    def train_word2vec(self, sentences: List[List[str]], epochs: int = 10):
        """
        Train the Word2Vec model.

        Parameters:
        - sentences (List[List[str]]): A list of tokenized sentences.
        - epochs (int): Number of training iterations.
        """
        self.word2vec.train(sentences, epochs=epochs)

    def train_doc2vec(self, documents: List[List[str]]):
        """
        Train the Doc2Vec model.

        Parameters:
        - documents (List[List[str]]): A list of tokenized documents.
        """
        self.doc2vec.train(documents)

    def get_word_embedding(self, word: str) -> List[float]:
        """
        Get the embedding vector for a given word.

        Parameters:
        - word (str): The word to retrieve the embedding for.

        Returns:
        - List[float]: The embedding vector.
        """
        return self.word2vec.get_embedding(word)

    def get_doc_embedding(self, doc_id: int) -> List[float]:
        """
        Get the embedding vector for a given document ID.

        Parameters:
        - doc_id (int): The document ID.

        Returns:
        - List[float]: The embedding vector.
        """
        return self.doc2vec.get_embedding(doc_id)

    def save_models(self, word2vec_path: str, doc2vec_path: str):
        """
        Save both Word2Vec and Doc2Vec models.

        Parameters:
        - word2vec_path (str): File path to save the Word2Vec model.
        - doc2vec_path (str): File path to save the Doc2Vec model.
        """
        self.word2vec.save_model(word2vec_path)
        self.doc2vec.save_model(doc2vec_path)

    def load_models(self, word2vec_path: str, doc2vec_path: str):
        """
        Load pre-trained Word2Vec and Doc2Vec models.

        Parameters:
        - word2vec_path (str): File path of the saved Word2Vec model.
        - doc2vec_path (str): File path of the saved Doc2Vec model.
        """
        self.word2vec.load_model(word2vec_path)
        self.doc2vec.load_model(doc2vec_path)

`init(word2vec_params=None, doc2vec_params=None)`

Initialize both Word2Vec and Doc2Vec models with customizable parameters.

Parameters:

Name	Type	Description	Default
`word2vec_params`	`Dict[str, Any]`	Configuration parameters for Word2Vec model. Includes vector_size, window, min_count, workers.	`None`
`doc2vec_params`	`Dict[str, Any]`	Configuration parameters for Doc2Vec model. Includes vector_size, window, min_count, workers, epochs.	`None`

Note

If no parameters are provided, models will be initialized with default values. See individual model documentation for default parameter details.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def __init__(self, word2vec_params: Dict[str, Any] = None, doc2vec_params: Dict[str, Any] = None):
    """
    Initialize both Word2Vec and Doc2Vec models with customizable parameters.

    Args:
        word2vec_params (Dict[str, Any], optional): Configuration parameters for Word2Vec model.
                                                   Includes vector_size, window, min_count, workers.
        doc2vec_params (Dict[str, Any], optional): Configuration parameters for Doc2Vec model.
                                                  Includes vector_size, window, min_count, workers, epochs.

    Note:
        If no parameters are provided, models will be initialized with default values.
        See individual model documentation for default parameter details.
    """
    self.word2vec = WORD2VEC(**(word2vec_params if word2vec_params else {}))
    self.doc2vec = DOC2VEC(**(doc2vec_params if doc2vec_params else {}))

`get_doc_embedding(doc_id)`

Get the embedding vector for a given document ID.

Parameters: - doc_id (int): The document ID.

Returns: - List[float]: The embedding vector.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def get_doc_embedding(self, doc_id: int) -> List[float]:
    """
    Get the embedding vector for a given document ID.

    Parameters:
    - doc_id (int): The document ID.

    Returns:
    - List[float]: The embedding vector.
    """
    return self.doc2vec.get_embedding(doc_id)

`get_word_embedding(word)`

Get the embedding vector for a given word.

Parameters: - word (str): The word to retrieve the embedding for.

Returns: - List[float]: The embedding vector.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def get_word_embedding(self, word: str) -> List[float]:
    """
    Get the embedding vector for a given word.

    Parameters:
    - word (str): The word to retrieve the embedding for.

    Returns:
    - List[float]: The embedding vector.
    """
    return self.word2vec.get_embedding(word)

`load_models(word2vec_path, doc2vec_path)`

Load pre-trained Word2Vec and Doc2Vec models.

Parameters: - word2vec_path (str): File path of the saved Word2Vec model. - doc2vec_path (str): File path of the saved Doc2Vec model.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def load_models(self, word2vec_path: str, doc2vec_path: str):
    """
    Load pre-trained Word2Vec and Doc2Vec models.

    Parameters:
    - word2vec_path (str): File path of the saved Word2Vec model.
    - doc2vec_path (str): File path of the saved Doc2Vec model.
    """
    self.word2vec.load_model(word2vec_path)
    self.doc2vec.load_model(doc2vec_path)

`save_models(word2vec_path, doc2vec_path)`

Save both Word2Vec and Doc2Vec models.

Parameters: - word2vec_path (str): File path to save the Word2Vec model. - doc2vec_path (str): File path to save the Doc2Vec model.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def save_models(self, word2vec_path: str, doc2vec_path: str):
    """
    Save both Word2Vec and Doc2Vec models.

    Parameters:
    - word2vec_path (str): File path to save the Word2Vec model.
    - doc2vec_path (str): File path to save the Doc2Vec model.
    """
    self.word2vec.save_model(word2vec_path)
    self.doc2vec.save_model(doc2vec_path)

`train_doc2vec(documents)`

Train the Doc2Vec model.

Parameters: - documents (List[List[str]]): A list of tokenized documents.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def train_doc2vec(self, documents: List[List[str]]):
    """
    Train the Doc2Vec model.

    Parameters:
    - documents (List[List[str]]): A list of tokenized documents.
    """
    self.doc2vec.train(documents)

`train_word2vec(sentences, epochs=10)`

Train the Word2Vec model.

Parameters: - sentences (List[List[str]]): A list of tokenized sentences. - epochs (int): Number of training iterations.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py

def train_word2vec(self, sentences: List[List[str]], epochs: int = 10):
    """
    Train the Word2Vec model.

    Parameters:
    - sentences (List[List[str]]): A list of tokenized sentences.
    - epochs (int): Number of training iterations.
    """
    self.word2vec.train(sentences, epochs=epochs)

personalized_embedding