Konferensartikel

Decentralized Word2Vec Using Gossip Learning

Abdul Aziz Alkathiri

Lodovico Giaretta

Sarunas Girdzijauskas

Magnus Sahlgren

Ladda ner artikel

Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2, 2021.

Linköping Electronic Conference Proceedings 178:40, s. 373-377

NEALT Proceedings Series 45:40, p. 373-377

Visa mer +

Publicerad: 2021-05-21

ISBN: 978-91-7929-614-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a loss of quality as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.

Nyckelord

gossip learning, decentralized machine learning, distributed machine learning, NLP, Word2Vec, data privacy

Referenser

Inga referenser tillgängliga

Citeringar i Crossref