Re: [libreoffice-users] testing out 2 new large word list English dictionaries.

"Winston C. Yang" <winston -AT- cs.wisc.edu>
Mon, 07 Nov 2011 20:27:34 -0500

On 11/06/2011 07:55 AM, webmaster for Kracked Press Productions wrote:

I would like to see a "simple" working code in Python though. If Iever decided to try to retrain the programming part of my brain, I wastold to try Python.
On 11/05/2011 11:05 PM, Winston C. Yang wrote:
webmaster:
Possibly a convenient language for comparing the word lists would bePython.
--- Python has a data structure "dict" (dictionary, hashtable,associative array).
--- Python has a data structure "set".

If you wish, I can email you short, working, example code.

Winston

On 11/05/2011 10:08 PM, webmaster for Kracked Press Productions wrote:
If I still could remember my basic C programming, I would write aprogram comparing the different word lists to see which words arenot common, but after 3 strokes I have not programmed such a packagein many years. Actually a few months after the last stroke.



webmaster:

Below is some example, elementary Python code that reads two files, withone word per line, and writes an output file with the words that are inexactly one of the files.


If you wish, you can use the code in LibreOffice.

If you have any comments or questions, email me.

Winston

Possibly it is good that you see the results first. Then, if you areinterested, then you can read about how to generate the results.


On a command line, you will type the following:

    python3.2 find_nonshared_words.py

(You can also type "python2.7" instead of "python3.2". But realize thatPython 3.2 is not always backwards-compatible with Python 2.7.)


This command will generate the following output file:

output_file.txt:
a1
a2
a3
a4
a5
a6
b1
b2
b3
b4

Below, I show you how to create the results:

Create the following two input files. (Words starting with "a" appear inonly file 1. Words starting with "b" appear in only file 2. Wordsstarting with "c" appear in both files, and should be ignored by the code.)


input_file1.txt:
a1
a2
c1
a3
a4
c2
a5
c3
a6

input_file2.txt:
c1
b1
b2
b3
c2
c3
b4

Then create a file called find_nonshared_words.py:

def create_set_from_file(input_file_name):

    input_file = open(input_file_name)

    s = set()
    for line in input_file:
        # Delete any leading or trailing whitespace.
        s.add(line.strip())

    input_file.close()

    return s

set1 = create_set_from_file("input_file1.txt")
set2 = create_set_from_file("input_file2.txt")
set_of_words_in_exactly_one_file = set1.symmetric_difference(set2)

output_file_name = "output_file.txt"
output_file = open(output_file_name, "w+")
for word in sorted(set_of_words_in_exactly_one_file):
    output_file.write(word + "\n")

output_file.close()

--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-users] testing out 2 new large word list English dictionaries. · webmaster for Kracked Press Productions
- Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Brian Barker
  - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · webmaster for Kracked Press Productions
    - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Winston C. Yang
      - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · webmaster for Kracked Press Productions
        
        Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Winston C. Yang
        
        Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Tom Davies
    - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Mark Stanton
      - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · webmaster for Kracked Press Productions
        
        Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Tom Davies
        
        Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · webmaster for Kracked Press Productions
- Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Libre User
  - Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Tom Davies
- Re: [libreoffice-users] testing out 2 new large word list English dictionaries. · Mark Stanton

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.