- 
                Notifications
    You must be signed in to change notification settings 
- Fork 53
Homework1
DUE: January 25, 2013 by 1pm
This homework asks you to write several Scala programs. The problems are of more or less increasing complexity, so you will probably be best off solving them in order.
IMPORTANT: You must use only immutable data structures and you must only use val variables (not var variables). The one exception is that Arrays (which are mutable) may be used when they are what Scala provides in some contexts (e.g. as command line arguments and as the return value for certain common functions).
The tutorials that were assigned should be very helpful for you. You can also check out the other resources on the course links page. There may be some concepts in this homework that have not been covered in the tutorials or in class. I have provided hints and guidelines for you to be able to use them, but you should not feel shy to ask for help. A big part of programming is trying to figure out some new things based on what you already know, so give it a try, but at the end of the day, I’m here to help smooth things out for you.
Tip: use the Scala REPL to try out code and see what happens immediately. This will make things go much faster as you come up with solutions.
If any of the instructions don’t make sense to you, please get in touch with the instructor right away.
Note: The following instructions assume you know how to do a number of things, including using git, etc. Given this terseness, please don't hesitate to ask for help on the class mailing list if you need any help with setup! (It isn't a test of what you know or don't know.)
Your starting point is a set of stub programs that are available in the course github repository. Since this is the first homework, you'll need to start by obtaining a GitHub account (if you don't already have one) and cloning the repository.
$ git clone
This will give you a directory called applied-nlp, and the stub programs for this first homework are in the sub-directory applied-nlp/homework/hw1. These are incomplete implementations that handle a few things for you, and have comments embedded in them to help you solve the problems.
You can now enter that directory and solve the homework problems by editing and running the appropriate files in it.
Note: If you have the ability to create private GitHub repositories, feel free to fork applied-nlp rather than clone it. No solutions should be included in your
Your submission will be a zipped up file with your implementations for each problem in it. Submit it on Blackboard as <lastname>_<firstname>_hw1.tgz.
$ tar czf baldridge_jason.tgz baldridge_jason_hw1
NOTE: If you do not submit things in this way, I'll ask you to resubmit and take one full grade off (e.g. an A becomes a B). (It may seem like a little thing, but having solutions submitted in this way saves me a lot of time because I use automatic scripts that check the correctness of your code, and it requires having things structured this way.)
Tip: Make sure your programs run as requested in each problem and that the output is consistent with the example output I give. This makes it much easier for me to check whether your solution is correct based on the output. Keep in mind that I will be providing new, novel inputs to your programs and these should be handled correctly.
IMPORTANT: REMEMBER THAT YOU CANNOT USE var VARIABLES FOR ANY OF THESE PROBLEMS! This will force you to quickly pick up several key concepts for functional programming in Scala. I will check your solutions for the use of varand knock one letter grade off if any are present in any solution.
For this problem, you will write a program that takes two numbers on the command line and manipulates them in various ways. Look at the Scala stub program variables.scala. It takes two arguments from the command line and initializes two integer (Int) variables called num1 and num2 with their values. You will perform operations and comparisons on these variables for this problem.
(a) Write code that adds the numbers together and prints the result in the format 2 + 5 = 7.
(b) Write code that multiplies the numbers together and prints the result in the format 2 * 5 = 10.
(c) Write an if expression that determines which of the numbers is larger and sets the variable smaller to be the smaller one and the variable larger to be the larger one. (If they are equal, let num1 be the "smaller" value.) Use pairs (Tuples with two elements) when doing this. Print out the smaller value and the larger value, e.g.: Smaller: 2 and Larger: 5.
(d) Write one line of code that adds the numbers together, multiplies that result by the smaller of the two numbers, and then prints the result in the format (2 + 5) * 2 = 14.
Here’s an example of how your program should run:
> scala variables.scala 2 5
2 + 5 = 7
2 * 5 = 10
Smaller: 2
Larger: 5
(2 + 5) * 2 = 14
Look at the Scala stub program variables.scala, using the comments as guidance.
Write a program that takes two numbers and the name of an arithmetical operation on the command line, performs that operation on the two integers and prints the result. You need to handle the operations plus, minus, times, and div. Your program should work as follows:
$ scala calculator.scala 2 plus 5
2 plus 5 = 7.0
$ scala calculator.scala 7 minus 3
7 minus 3 = 4.0
$ scala calculator.scala 2 times 9
2 times 9 = 18.0
$ scala calculator.scala 7 div 3
7 div 3 = 2.3333333333333335
Hint: you will need to convert the numerical arguments, which are Strings, into Ints. Also, you’ll need to convert one of the integers in a division to Double.
Look at the Scala stub program calculator.scala, using the comments as guidance.
Write a program that takes two numbers on the command line, adds all the numbers between them (including the two numbers themselves), and then prints the numbers which are being added and the result. The program should run as follows:
> scala addRange.scala 2 4
2 + 3 + 4 = 9
The first number must be smaller than the second. Your program should print out a warning and exit if the second is smaller, as follows:
> scala addRange.scala 4 2
Please make sure that the first number is smaller than the second.
Hint: use System.exit(0) to stop executing the program.
Look at the Scala stub program addRange.scala, using the comments as guidance.
It just wouldn’t be an introductory programming homework without this one. Write a program that takes a number n on the command line and computes its factorial, "n!". Your solution must use recursion.
The program should run as follows:
$ scala factorial.scala 3
3! = 6
The number given must be a positive integer greater than or equal to zero. If it isn’t, print out a warning and exit, as follows:
$ scala factorial.scala -1
Please supply a number greater than or equal to 0.
Note that while any positive integer is a valid argument to factorial, anything above 20 is likely to return 0 on your machine -- this is a limitation on the size of the numbers that can be represented in 64 bit machines. It’s fine to test your program with numbers less than 12.
Look at the Scala stub program factorial.scala, using the comments as guidance.
For this problem, you will write a program that takes any number of country names as command line arguments and does various things with them, as described below.
Look at the Scala stub program countries.scala. There are two lists already defined in the program, one containing the names of North American countries and the other containing the names of South American countries. Also, there are some print statements like println("Part (a)") – make sure not to delete those.
Be sure to look at the example output given below before starting on the problem – it will help explain the output that is expected from each part.
(a) Write code that creates a sorted list called countries from the arguments on the command line and then prints them.
Hint: use the sorted method on lists.
(b) Write code that checks whether each country in countries is in South America, North America, or is a country we know nothing about (unknown), and prints out their status in reverse alphabetical order.
Hint: use the contains and reverse methods on lists.
(c) Print out the number of unknown countries, making sure to respect English agreement (e.g. was 1 country vs were 2 countries).
Here’s an example of how your program should work on some example input:
$ scala countries.scala Brazil Scotland USA India
Part (a)
Considering: Brazil India Scotland USA
Part (b)
USA: North America
Scotland: ???
India: ???
Brazil: South America
Part (c)
There were 2 unknown countries.
For this problem, you need to process a string from the command line, and do some simple transformations using Lists. Modify the stub file lengthSort.scala.
(a) Split the string from the command line (which you get as args(0)) and sort it. The sorting order is first according to the length of each token and then alphabetically by the token (so "to" is before "and", which is before "the"). Print out the list in this sorted order as a single line, with each token converted to the format word:length, e.g. the:3. (See example output below.)
Tip: The split method on Strings returns an Array. Use toList on the Array to convert it to a List, e.g. "a:b:c:d".split(":").toList.
(b) Working from the sorted list you obtained in (a), now output the list using a similar format, but remix it so that you have:
- the first group of elements are the tokens of length 3 or less, in ascending order
- the next group of elements are the tokens of length greater than 3 and less than 11, in descending order
- the last group of elements are the tokens of length 11 or more, in ascending order
See the example output for guidance on formatting, which is similar to (a).
Example output.
$ scala lengthSort.scala "Introduction to Computational Linguistics introduces the most important data structures and algorithmic techniques underlying computational linguistics"
Sorted: to:2 and:3 the:3 data:4 most:4 important:9 introduces:10 structures:10 techniques:10 underlying:10 Linguistics:11 algorithmic:11 linguistics:11 Introduction:12 Computational:13 computational:13
Remixed: to:2 and:3 the:3 underlying:10 techniques:10 structures:10 introduces:10 important:9 most:4 data:4 Linguistics:11 algorithmic:11 linguistics:11 Introduction:12 Computational:13 computational:13
Hint: You’ll want to use pairs (Tuple2) for a number of reasons. One of the biggest advantages is that they sort first on the first element and then on the second element.
Note: You must make sure to enclose the argument to lengthSort.scala in quotes so that the entire argument is available in the first element of the args array.
Here is a text passage from H.G.Wells, The War of the Worlds (from Project Gutenberg, http://www.gutenberg.org/etext/36):
After the glimpse I had had of the Martians emerging from the cylinder in which they had come to the earth from their planet, a kind of fascination paralysed my actions. I remained standing knee-deep in the heather, staring at the mound that hid them. I was a battleground of fear and curiosity.
I did not dare to go back towards the pit, but I felt a passionate longing to peer into it. I began walking, therefore, in a big curve, seeking some point of vantage and continually looking at the sand heaps that hid these new-comers to our earth. Once a leash of thin black whips, like the arms of an octopus, flashed across the sunset and was immediately withdrawn, and afterwards a thin rod rose up, joint by joint, bearing at its apex a circular disk that spun with a wobbling motion. What could be going on there?
For this problem, you will perform series of manipulations and operations on this passage. Modify the stub program wotw.scala, looking at the comments in it for guidance. See the output at the end of this question for guidance on what your program’s output should be.
Note: even though this problem is a bit lengthy in its description, the Scala solutions themselves can be quite short.
(a) In the passage, how many word tokens are there that end with the letters "ing"? (Note that we are looking for words ending in "ing", not just occurrences of the letter sequence "ing" anywhere in a word. You do not need to deal with punctuation. Just count only the words that actually end in "ing", ignoring those that end in "ing," or "ing.".) Print it out as specified in the output below.
Hint: Use the String method split to split a string of text into individual words (and split on sequences of one or more whitespace characters), the String method endsWith to test whether a string ends with a particular substring, and the List method count to find how many words pass that test.
Note: The result of split is an Array, not a List. To get a List, call the method toList, e.g. "a:b:c:d".split(":").toList.
(b) Make a list called chopped that contains all the words from the above passage, but with the last two letters cut off. Print out a string that concatenates the chopped tokens using a space as the separator, as below. (This is a very crude first stab at making a stemmer, a program that cuts off suffixes and leaves only word stems. We will see improved implementations later.)
Hint: Do this using map on the result of splitting the string from part (a). Also, recall the slice method, which excises a sublist from a List, e.g.:
scala> val foo = List(3,4,0,32,3,9,13)
foo: List[Int] = List(3, 4, 0, 32, 3, 9, 13)
scala> foo.slice(2,5)
res21: List[Int] = List(0, 32, 3)
Strings are also a List-like data structure (they are sequences of characters) and the slice method works for them too.
(c ) A common item of interest when we get to language models is the bigram word probability of one word given that another precedes it, such as the probability of seeing the word "the" next given that you’ve just seen the word "at" in a sentence like "I’ll see you at … " This value is expressed as P(the|at). At its most basic level, this is a simple thing to compute. Given a corpus of texts, the probability is calculated as:
P(the|at) = count(at,the) / count(at)
In words: out of all the times I’ve seen the word "at", how many times was it followed by the word "the"? Write the code to compute this from the WotW passage, and print out the result, as below.
Hint: There is a very useful function on Lists called sliding, which gives you a sliding window of the values in the list. This is easy to see in an example:
scala> "a b c d".split(" ").sliding(2).toList
res0: List[Array[java.lang.String]] = List(Array(a, b), Array(b, c), Array(c, d))
The argument to sliding is an Int that specifies how many items will be in each window. With such a list of two-element Arrays in hand, you can now use the count function with an appropriately defined predication to compute the numerator of the probability.
Goal output: (with ### substituted for actual numerical values)
$ scala wotw.scala
Part (a)
Number of words ending in 'ing': ###
Part (b)
Aft t glimp  h h  t Martia emergi fr t cylind  whi th h co  t ear fr the plane  ki  fascinati paralys  action  remain standi knee-de  t heathe stari  t mou th h the  w  battlegrou  fe a curiosit  d n da   ba towar t pi b  fe  passiona longi  pe in i  beg walkin therefor   b curv seeki so poi  vanta a continual looki  t sa hea th h the new-come  o eart On  lea  th bla whip li t ar   octopu flash acro t suns a w immediate withdraw a afterwar  th r ro u joi  join beari  i ap  circul di th sp wi  wobbli motio Wh cou  goi  ther
Part (c)
P(the|at) = ###
For this problem, you will write a program that takes a list of names and room numbers on the command line, adds them to entries from a Map relating people to their rooms, and then creates a reverse Map that relates rooms to the people who are in them.
Look at the stub program rooms.scala. A Map named defaultRoomNumbers has already been defined with a few entries. As in previous problems, there are some print statements like print "Part (a)" – make sure not to delete those.
(a) The arguments on the command line are to be expected as a list of (person number) pairs, like Bill 315 Angela 412 Susan 325. Get this information from the command line (making sure to check that an even number of arguments has been provided), and use it to add create a new Map that contains both these person to room pairings and those in the defaultRoomNumbers Map. Then print out who is in which room, alphabetically by name.
Hint: use the mod function % to find out whether there are an even number of arguments. (Try 5 % 2 and 4 % 2 in the Scala REPL and you’ll see what it does.)
(b) Some people might share the same room. Create a new map that maps rooms to the people in them, so it will have room numbers as keys, and lists of people as values.
Here’s an example of how your program should work on some example input:
$ scala rooms.scala Bill 315 Angela 412 Susan 325
Part (a)
Angela: Room 412
Bill: Room 315
Jane: Room 312
Sam: Room 312
Susan: Room 325
Ted: Room 325
Part (b)
Room 312: Sam,Jane
Room 315: Bill
Room 325: Susan,Ted
Room 412: Angela
For this problem you’ll create a very simple machine translator that looks at each word in a source language sentence and maps it to a unique word in a target language. This will involve working with the file system and with Maps. Use the stub file translate.scala for this.
You are given a (very) small translation dictionary in the file por2eng.txt. You must read this in from the command line (as the first argument) to create a Map that maps Portuguese words to the English words, as specified in por2eng.txt. Then, using that Map, you must translate any remaining arguments on the command line (which you can expect to be Portuguese sentences) and translate them one by one to English.
If you are given a Portuguese word that is not in the dictionary, it should "translate" it by converting it to uppercase, e.g. 'senhora' becomes "SENHORA".
Here is an example call to translate.scala and the output it should produce for the given input.
$ scala translate.scala por2eng.txt "a menina ouviu o cachorro ontem" "o cachorro viu o gato hoje"
Portuguese: a menina ouviu o cachorro ontem
English:    the girl heard the dog yesterday
Portuguese: o cachorro viu o gato hoje
English:    the dog saw the cat today