Skip to content

personalized_embedding

PERSONALIZED_EMBEDDINGS

A unified embedding manager combining Word2Vec and Doc2Vec capabilities.

This class provides a comprehensive interface for training and managing both word and document embeddings, making it suitable for personalized recommendation systems that need to understand both word-level and document-level semantics.

Attributes:

Name Type Description
word2vec WORD2VEC

Instance of the Word2Vec model for word embeddings

doc2vec DOC2VEC

Instance of the Doc2Vec model for document embeddings

Methods:

Name Description
train_word2vec

Trains the Word2Vec model on a corpus of sentences

train_doc2vec

Trains the Doc2Vec model on a corpus of documents

get_word_embedding

Retrieves word vectors

get_doc_embedding

Retrieves document vectors

save_models

Persists both models to disk

load_models

Loads both models from disk

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
class PERSONALIZED_EMBEDDINGS:
    """
    A unified embedding manager combining Word2Vec and Doc2Vec capabilities.

    This class provides a comprehensive interface for training and managing both word
    and document embeddings, making it suitable for personalized recommendation systems
    that need to understand both word-level and document-level semantics.

    Attributes:
        word2vec (WORD2VEC): Instance of the Word2Vec model for word embeddings
        doc2vec (DOC2VEC): Instance of the Doc2Vec model for document embeddings

    Methods:
        train_word2vec: Trains the Word2Vec model on a corpus of sentences
        train_doc2vec: Trains the Doc2Vec model on a corpus of documents
        get_word_embedding: Retrieves word vectors
        get_doc_embedding: Retrieves document vectors
        save_models: Persists both models to disk
        load_models: Loads both models from disk
    """

    def __init__(self, word2vec_params: Dict[str, Any] = None, doc2vec_params: Dict[str, Any] = None):
        """
        Initialize both Word2Vec and Doc2Vec models with customizable parameters.

        Args:
            word2vec_params (Dict[str, Any], optional): Configuration parameters for Word2Vec model.
                                                       Includes vector_size, window, min_count, workers.
            doc2vec_params (Dict[str, Any], optional): Configuration parameters for Doc2Vec model.
                                                      Includes vector_size, window, min_count, workers, epochs.

        Note:
            If no parameters are provided, models will be initialized with default values.
            See individual model documentation for default parameter details.
        """
        self.word2vec = WORD2VEC(**(word2vec_params if word2vec_params else {}))
        self.doc2vec = DOC2VEC(**(doc2vec_params if doc2vec_params else {}))

    def train_word2vec(self, sentences: List[List[str]], epochs: int = 10):
        """
        Train the Word2Vec model.

        Parameters:
        - sentences (List[List[str]]): A list of tokenized sentences.
        - epochs (int): Number of training iterations.
        """
        self.word2vec.train(sentences, epochs=epochs)

    def train_doc2vec(self, documents: List[List[str]]):
        """
        Train the Doc2Vec model.

        Parameters:
        - documents (List[List[str]]): A list of tokenized documents.
        """
        self.doc2vec.train(documents)

    def get_word_embedding(self, word: str) -> List[float]:
        """
        Get the embedding vector for a given word.

        Parameters:
        - word (str): The word to retrieve the embedding for.

        Returns:
        - List[float]: The embedding vector.
        """
        return self.word2vec.get_embedding(word)

    def get_doc_embedding(self, doc_id: int) -> List[float]:
        """
        Get the embedding vector for a given document ID.

        Parameters:
        - doc_id (int): The document ID.

        Returns:
        - List[float]: The embedding vector.
        """
        return self.doc2vec.get_embedding(doc_id)

    def save_models(self, word2vec_path: str, doc2vec_path: str):
        """
        Save both Word2Vec and Doc2Vec models.

        Parameters:
        - word2vec_path (str): File path to save the Word2Vec model.
        - doc2vec_path (str): File path to save the Doc2Vec model.
        """
        self.word2vec.save_model(word2vec_path)
        self.doc2vec.save_model(doc2vec_path)

    def load_models(self, word2vec_path: str, doc2vec_path: str):
        """
        Load pre-trained Word2Vec and Doc2Vec models.

        Parameters:
        - word2vec_path (str): File path of the saved Word2Vec model.
        - doc2vec_path (str): File path of the saved Doc2Vec model.
        """
        self.word2vec.load_model(word2vec_path)
        self.doc2vec.load_model(doc2vec_path)

__init__(word2vec_params=None, doc2vec_params=None)

Initialize both Word2Vec and Doc2Vec models with customizable parameters.

Parameters:

Name Type Description Default
word2vec_params Dict[str, Any]

Configuration parameters for Word2Vec model. Includes vector_size, window, min_count, workers.

None
doc2vec_params Dict[str, Any]

Configuration parameters for Doc2Vec model. Includes vector_size, window, min_count, workers, epochs.

None
Note

If no parameters are provided, models will be initialized with default values. See individual model documentation for default parameter details.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(self, word2vec_params: Dict[str, Any] = None, doc2vec_params: Dict[str, Any] = None):
    """
    Initialize both Word2Vec and Doc2Vec models with customizable parameters.

    Args:
        word2vec_params (Dict[str, Any], optional): Configuration parameters for Word2Vec model.
                                                   Includes vector_size, window, min_count, workers.
        doc2vec_params (Dict[str, Any], optional): Configuration parameters for Doc2Vec model.
                                                  Includes vector_size, window, min_count, workers, epochs.

    Note:
        If no parameters are provided, models will be initialized with default values.
        See individual model documentation for default parameter details.
    """
    self.word2vec = WORD2VEC(**(word2vec_params if word2vec_params else {}))
    self.doc2vec = DOC2VEC(**(doc2vec_params if doc2vec_params else {}))

get_doc_embedding(doc_id)

Get the embedding vector for a given document ID.

Parameters: - doc_id (int): The document ID.

Returns: - List[float]: The embedding vector.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
104
105
106
107
108
109
110
111
112
113
114
def get_doc_embedding(self, doc_id: int) -> List[float]:
    """
    Get the embedding vector for a given document ID.

    Parameters:
    - doc_id (int): The document ID.

    Returns:
    - List[float]: The embedding vector.
    """
    return self.doc2vec.get_embedding(doc_id)

get_word_embedding(word)

Get the embedding vector for a given word.

Parameters: - word (str): The word to retrieve the embedding for.

Returns: - List[float]: The embedding vector.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def get_word_embedding(self, word: str) -> List[float]:
    """
    Get the embedding vector for a given word.

    Parameters:
    - word (str): The word to retrieve the embedding for.

    Returns:
    - List[float]: The embedding vector.
    """
    return self.word2vec.get_embedding(word)

load_models(word2vec_path, doc2vec_path)

Load pre-trained Word2Vec and Doc2Vec models.

Parameters: - word2vec_path (str): File path of the saved Word2Vec model. - doc2vec_path (str): File path of the saved Doc2Vec model.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
127
128
129
130
131
132
133
134
135
136
def load_models(self, word2vec_path: str, doc2vec_path: str):
    """
    Load pre-trained Word2Vec and Doc2Vec models.

    Parameters:
    - word2vec_path (str): File path of the saved Word2Vec model.
    - doc2vec_path (str): File path of the saved Doc2Vec model.
    """
    self.word2vec.load_model(word2vec_path)
    self.doc2vec.load_model(doc2vec_path)

save_models(word2vec_path, doc2vec_path)

Save both Word2Vec and Doc2Vec models.

Parameters: - word2vec_path (str): File path to save the Word2Vec model. - doc2vec_path (str): File path to save the Doc2Vec model.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
116
117
118
119
120
121
122
123
124
125
def save_models(self, word2vec_path: str, doc2vec_path: str):
    """
    Save both Word2Vec and Doc2Vec models.

    Parameters:
    - word2vec_path (str): File path to save the Word2Vec model.
    - doc2vec_path (str): File path to save the Doc2Vec model.
    """
    self.word2vec.save_model(word2vec_path)
    self.doc2vec.save_model(doc2vec_path)

train_doc2vec(documents)

Train the Doc2Vec model.

Parameters: - documents (List[List[str]]): A list of tokenized documents.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
83
84
85
86
87
88
89
90
def train_doc2vec(self, documents: List[List[str]]):
    """
    Train the Doc2Vec model.

    Parameters:
    - documents (List[List[str]]): A list of tokenized documents.
    """
    self.doc2vec.train(documents)

train_word2vec(sentences, epochs=10)

Train the Word2Vec model.

Parameters: - sentences (List[List[str]]): A list of tokenized sentences. - epochs (int): Number of training iterations.

Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
73
74
75
76
77
78
79
80
81
def train_word2vec(self, sentences: List[List[str]], epochs: int = 10):
    """
    Train the Word2Vec model.

    Parameters:
    - sentences (List[List[str]]): A list of tokenized sentences.
    - epochs (int): Number of training iterations.
    """
    self.word2vec.train(sentences, epochs=epochs)