personalized_embedding
PERSONALIZED_EMBEDDINGS
A unified embedding manager combining Word2Vec and Doc2Vec capabilities.
This class provides a comprehensive interface for training and managing both word and document embeddings, making it suitable for personalized recommendation systems that need to understand both word-level and document-level semantics.
Attributes:
Name | Type | Description |
---|---|---|
word2vec |
WORD2VEC
|
Instance of the Word2Vec model for word embeddings |
doc2vec |
DOC2VEC
|
Instance of the Doc2Vec model for document embeddings |
Methods:
Name | Description |
---|---|
train_word2vec |
Trains the Word2Vec model on a corpus of sentences |
train_doc2vec |
Trains the Doc2Vec model on a corpus of documents |
get_word_embedding |
Retrieves word vectors |
get_doc_embedding |
Retrieves document vectors |
save_models |
Persists both models to disk |
load_models |
Loads both models from disk |
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
__init__(word2vec_params=None, doc2vec_params=None)
Initialize both Word2Vec and Doc2Vec models with customizable parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word2vec_params
|
Dict[str, Any]
|
Configuration parameters for Word2Vec model. Includes vector_size, window, min_count, workers. |
None
|
doc2vec_params
|
Dict[str, Any]
|
Configuration parameters for Doc2Vec model. Includes vector_size, window, min_count, workers, epochs. |
None
|
Note
If no parameters are provided, models will be initialized with default values. See individual model documentation for default parameter details.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
get_doc_embedding(doc_id)
Get the embedding vector for a given document ID.
Parameters: - doc_id (int): The document ID.
Returns: - List[float]: The embedding vector.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
104 105 106 107 108 109 110 111 112 113 114 |
|
get_word_embedding(word)
Get the embedding vector for a given word.
Parameters: - word (str): The word to retrieve the embedding for.
Returns: - List[float]: The embedding vector.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
92 93 94 95 96 97 98 99 100 101 102 |
|
load_models(word2vec_path, doc2vec_path)
Load pre-trained Word2Vec and Doc2Vec models.
Parameters: - word2vec_path (str): File path of the saved Word2Vec model. - doc2vec_path (str): File path of the saved Doc2Vec model.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
127 128 129 130 131 132 133 134 135 136 |
|
save_models(word2vec_path, doc2vec_path)
Save both Word2Vec and Doc2Vec models.
Parameters: - word2vec_path (str): File path to save the Word2Vec model. - doc2vec_path (str): File path to save the Doc2Vec model.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
116 117 118 119 120 121 122 123 124 125 |
|
train_doc2vec(documents)
Train the Doc2Vec model.
Parameters: - documents (List[List[str]]): A list of tokenized documents.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
83 84 85 86 87 88 89 90 |
|
train_word2vec(sentences, epochs=10)
Train the Word2Vec model.
Parameters: - sentences (List[List[str]]): A list of tokenized sentences. - epochs (int): Number of training iterations.
Source code in engines/contentFilterEngine/embedding_representation_learning/personalized_embeddings.py
73 74 75 76 77 78 79 80 81 |
|