What is LexVec?

LexVec is an advanced word embedding method that stands out in a variety of natural language processing tasks by factorizing the Positive Pointwise Mutual Information (PPMI) matrix using stochastic gradient descent. This approach places a stronger emphasis on penalizing errors that involve frequent co-occurrences while also taking into account negative co-occurrences. Pre-trained vectors are readily available, which include an extensive common crawl dataset comprising 58 billion tokens and 2 million words represented across 300 dimensions, along with a dataset from English Wikipedia 2015 and NewsCrawl that features 7 billion tokens and 368,999 words in the same dimensionality. Evaluations have shown that LexVec performs on par with or even exceeds the capabilities of other models like word2vec, especially in tasks related to word similarity and analogy testing. The implementation of this project is open-source and is distributed under the MIT License, making it accessible on GitHub and promoting greater collaboration and usage within the research community. The substantial availability of these resources plays a crucial role in propelling advancements in the field of natural language processing, thereby encouraging innovation and exploration among researchers. Moreover, the community-driven approach fosters dialogue and collaboration that can lead to even more breakthroughs in language technology.

Pricing

Price Starts At:
Free
Free Version:
Free Version available.

Integrations

No integrations listed.

Screenshots and Video

LexVec Screenshot 1

Company Facts

Company Name:
Alexandre Salle
Company Location:
Brazil
Company Website:
github.com/alexandres/lexvec

Product Details

Deployment
SaaS
Windows
Mac
Linux
On-Prem
Training Options
Documentation Hub
Support
Web-Based Support

Product Details

Target Company Sizes
Individual
1-10
11-50
51-200
201-500
501-1000
1001-5000
5001-10000
10001+
Target Organization Types
Mid Size Business
Small Business
Enterprise
Freelance
Nonprofit
Government
Startup
Supported Languages
English

LexVec Categories and Features