Date: prev next · Thread: first prev next last
2017 Archives by date, by thread · List index


Hello,

in a talk at the PyData Berlin meetup I saw this project: https://github.com/lusy/hora-de-decir-bye-bye , where spanish articles are scraped and searched for english words. In order to identify english words she used the dictionaries from Open Office and compared scraped words to the dictionaries. She mentioned the problem that not all words were in the dictionaries.

So I thought this could be used to find (or at least help finding) most missing words in dictionaries for all languages. One could scrape e.g. all Wikipedia articles of a certain language and create a candidate list of missing words. Or it could also be used to find domain specific words by scraping e.g. scientific articles, articles from certain types of websites and so on.

My question is if this would be something helpful at all or if missing words in dictionaries is not a problem anymore. Also, I unfortunately don't have much spare time at the moment to work on this so if anyone wants to pick this up feel free to do so. I will let you know when I implemented something myself.

I'm looking forward to your feedback.

Cheers,

Andrej


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.