Date: prev next · Thread: first prev next last
2011 Archives by date, by thread · List index


On 11/06/2011 07:55 AM, webmaster for Kracked Press Productions wrote:

I would like to see a "simple" working code in Python though. If I ever decided to try to retrain the programming part of my brain, I was told to try Python.


On 11/05/2011 11:05 PM, Winston C. Yang wrote:
webmaster:

Possibly a convenient language for comparing the word lists would be Python.

--- Python has a data structure "dict" (dictionary, hashtable, associative array).

--- Python has a data structure "set".

If you wish, I can email you short, working, example code.

Winston

On 11/05/2011 10:08 PM, webmaster for Kracked Press Productions wrote:

If I still could remember my basic C programming, I would write a program comparing the different word lists to see which words are not common, but after 3 strokes I have not programmed such a package in many years. Actually a few months after the last stroke.






webmaster:

Below is some example, elementary Python code that reads two files, with one word per line, and writes an output file with the words that are in exactly one of the files.

If you wish, you can use the code in LibreOffice.

If you have any comments or questions, email me.

Winston



Possibly it is good that you see the results first. Then, if you are interested, then you can read about how to generate the results.

On a command line, you will type the following:

    python3.2 find_nonshared_words.py

(You can also type "python2.7" instead of "python3.2". But realize that Python 3.2 is not always backwards-compatible with Python 2.7.)

This command will generate the following output file:

output_file.txt:
a1
a2
a3
a4
a5
a6
b1
b2
b3
b4

Below, I show you how to create the results:

Create the following two input files. (Words starting with "a" appear in only file 1. Words starting with "b" appear in only file 2. Words starting with "c" appear in both files, and should be ignored by the code.)

input_file1.txt:
a1
a2
c1
a3
a4
c2
a5
c3
a6

input_file2.txt:
c1
b1
b2
b3
c2
c3
b4

Then create a file called find_nonshared_words.py:

def create_set_from_file(input_file_name):

    input_file = open(input_file_name)

    s = set()
    for line in input_file:
        # Delete any leading or trailing whitespace.
        s.add(line.strip())

    input_file.close()

    return s

set1 = create_set_from_file("input_file1.txt")
set2 = create_set_from_file("input_file2.txt")
set_of_words_in_exactly_one_file = set1.symmetric_difference(set2)

output_file_name = "output_file.txt"
output_file = open(output_file_name, "w+")
for word in sorted(set_of_words_in_exactly_one_file):
    output_file.write(word + "\n")

output_file.close()

--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.