Статьи

Первые шаги в Scala для начинающих программистов, часть 3

Это третья часть руководств для начинающих программистов, попадающих в Scala. Другие посты находятся в этом блоге, и вы можете получить ссылки на эти и другие ресурсы на странице ссылок курса по компьютерной лингвистике, для которого я их создаю.

Conditionals

Переменные приходят и переменные уходят, и они принимают разные значения в зависимости от ввода. Как правило, нам необходимо применять различные способы поведения, обусловленные этими ценностями. Например, давайте смоделируем тендер в баре в Остине, который должен убедиться, что он не дает алкоголь лицам младше 21 года.

scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) println("beer") else println("water")
serveBeer: (customerAge: Int)Unit

scala> serveBeer(23)
beer

scala> serveBeer(19)
water

То, что мы сделали здесь, — это стандартное использование условных выражений для создания того или иного действия — в данном случае просто печатается одно или другое сообщение. Выражение в if (…) является логическим значением, истинным или ложным . Вы можете увидеть это, просто выполнив неравенство напрямую:

scala> 19 >= 21
res7: Boolean = false

И эти выражения могут быть объединены в соответствии со стандартными правилами для соединения и дизъюнкции логических. Конъюнкция обозначается символом &&, а дизъюнкция — || ,

scala> 19 >= 21 || 5 > 2
res8: Boolean = true

scala> 19 >= 21 && 5 > 2
res9: Boolean = false

Чтобы проверить равенство, используйте == .

scala> 42 == 42
res10: Boolean = true

scala> "the" == "the"
res11: Boolean = true

scala> 3.14 == 6.28
res12: Boolean = false

scala> 2*3.14 == 6.28
res13: Boolean = true

scala> "there" == "the" + "re"
res14: Boolean = true

Равенство оператор == отличается от присваивания оператора = , и вы получите сообщение об ошибке , если вы попытаетесь использовать = для испытаний равенства.

scala> 5 = 5
<console>:1: error: ';' expected but '=' found.
5 = 5
^

scala> x = 5
<console>:10: error: not found: value x
val synthvar$0 = x
^
<console>:7: error: not found: value x
x = 5
^

The first example is completely bad because we cannot hope to assign a value to a constant like 5. With the latter example, the error complains about not finding a value x. That’s because it is a valid construct, assuming that a var variable x has been previously defined.

scala> var x = 0
x: Int = 0

scala> x = 5
x: Int = 5

Recall that with var variables, it is possible to assign them a new value. However, it is actually not necessary to use vars much of the time, and there are many advantages with sticking with vals. I’ll be helping you think in these terms as we go along. For now, try to ignore the fact that vars exist in the language!

Back to conditionals. First, here are more comparison operators:

x == y   (x is equal to y)
x != y    (x does not equal y)
x > y     (x is larger than y)
x < y     (x is less than y)
x >= y   (x is equal to y, or larger than y)
x <= y   (x is equal to y, or less than y)

These operators work on any type that has a natural ordering, including Strings.

scala> "armadillo" < "bear"
res25: Boolean = true

scala> "armadillo" < "Bear"
res26: Boolean = false

scala> "Armadillo" < "Bear"
res27: Boolean = true

Clearly, this isn’t the usual alphabetic ordering you are used to. Instead it is based on ASCII character encodings.

A very beautiful and useful thing about conditionals in Scala is that they return a value. So, the following is a valid way to set the values of the variables x and y.

scala> val x = if (true) 1 else 0
x: Int = 1

scala> val y = if (false) 1 else 0
y: Int = 0

Not so impressive here, but let’s return to the bartender, and rather than the serveBeer function printing a String, we can have it return a String representing a beverage, “beer” in the case of a 21+ year old and “water” otherwise.

scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) "beer" else "water"
serveBeer: (customerAge: Int)java.lang.String

scala> serveBeer(42)
res21: java.lang.String = beer

scala> serveBeer(20)
res22: java.lang.String = water

Notice how the first serveBeer function returned Unit but this one returns a String. Unit means that no value is returned — in general this is to be discouraged for reasons we won’t get into here. Regardless of that, the general pattern of conditional assignment shown above is something you’ll be using a lot.

Conditionals can also have more than just the single if and else.  For example, let’s say that the bartender simply serves age appropriate drinks to each customer, and that 21+ get beer, teenagers get soda and little kids should get juice.

scala> def serveDrink (customerAge: Int) = {
|     if (customerAge >= 21) "beer"
|     else if (customerAge >= 13) "soda"
|     else "juice"
| }
serveDrink: (customerAge: Int)java.lang.String

scala> serveDrink(42)
res35: java.lang.String = beer

scala> serveDrink(16)
res36: java.lang.String = soda

scala> serveDrink(6)
res37: java.lang.String = juice

And of course, the Boolean expressions in any of the ifs or else ifs can be complex conjunctions and disjunctions of smaller expressions. Let’s consider a computational linguistics oriented example now that can take advantage of that, and which we will continue to build on in later tutorials.

Everybody (hopefully) knows what a part-of-speech is. (If not, go check out Grammar Rock on YouTube.) In computational linguistics, we tend to use very detailed tagsets that go far beyond “noun”, “verb”, “adjective” and so on. For example, the tagset from the Penn Treebank uses NN for singular nouns (table), NNS for plural nouns (tables), NNP for singular proper noun (John), and NNPS for plural proper noun (Vikings).

Here’s an annotated sentence with postags from the first sentence of the Wall Street Journal portion of the Penn Treebank, in the format word/postag.

The/DT index/NN of/IN the/DT 100/CD largest/JJS Nasdaq/NNP financial/JJ stocks/NNS rose/VBD modestly/RB as/IN well/RB ./.

We’ll see how to process these en masse shortly, but for now, let’s build a function that turns single tags like “NNP” into “NN” and “JJS” into “JJ”, using conditionals. We’ll let all the other postags stay as they are.

We’ll start with a suboptimal solution, and then refine it. The first thing you might try is to create a case for every full form tag and output its corresponding shortened tag.

scala> def shortenPos (tag: String) = {
|     if (tag == "NN") "NN"
|     else if (tag == "NNS") "NN"
|     else if (tag == "NNP") "NN"
|     else if (tag == "NNPS") "NN"
|     else if (tag == "JJ") "JJ"
|     else if (tag == "JJR") "JJ"
|     else if (tag == "JJS") "JJ"
|     else tag
| }
shortenPos: (tag: String)java.lang.String

scala> shortenPos("NNP")
res47: java.lang.String = NN

scala> shortenPos("JJS")
res48: java.lang.String = JJ

So, it’s doing the job, but there is a lot of redundancy — in particular, the return value is the same for many cases. We can use disjunctions to deal with this.

def shortenPos2 (tag: String) = {
  if (tag == "NN" || tag == "NNS" || tag == "NNP" || tag == "NNP") "NN"
  else if (tag == "JJ" || tag == "JJR" || tag == "JJS") "JJ"
  else tag
}

These are logically equivalent.

There is an easier way of doing this, using properties of Strings. Here, the startsWith method is very useful.

scala> "NNP".startsWith("NN")
res51: Boolean = true

scala> "NNP".startsWith("VB")
res52: Boolean = false

We can use this to simplify the postag shortening function.

def shortenPos3 (tag: String) = {
  if (tag.startsWith("NN")) "NN"
  else if (tag.startsWith("JJ")) "JJ"
  else tag
}

This makes it very easy to add an additional condition that collapses all of the verb tags to “VB”. (Left as an exercise.)

A final note of conditional assignments: they can return anything you like, so, for example, the following are all valid. For example, here is a (very) simple (and very imperfect) English stemmer that returns the stem and and suffix.

scala> def splitWord (word: String) = {
|     if (word.endsWith("ing")) (word.slice(0,word.length-3), "ing")
|     else if (word.endsWith("ed")) (word.slice(0,word.length-2), "ed")
|     else if (word.endsWith("er")) (word.slice(0,word.length-2), "er")
|     else if (word.endsWith("s")) (word.slice(0,word.length-1), "s")
|     else (word,"")
| }
splitWord: (word: String)(String, java.lang.String)

scala> splitWord("walked")
res10: (String, java.lang.String) = (walk,ed)

scala> splitWord("walking")
res11: (String, java.lang.String) = (walk,ing)

scala> splitWord("booking")
res12: (String, java.lang.String) = (book,ing)

scala> splitWord("baking")
res13: (String, java.lang.String) = (bak,ing)

If we wanted to work with the stem and suffix directly with variables, we can assign them straight away.

scala> val (stem, suffix) = splitWord("walked")
stem: String = walk
suffix: java.lang.String = ed

Matching

Scala provides another very powerful way to encode conditional execution called matching. They have much in common with if-else blocks, but come with some nice extra features. We’ll go back to the postag shortener, starting with a full list out of the tags and what to do in each case, like our first attempt with if-else.

def shortenPosMatch (tag: String) = tag match {
  case "NN" => "NN"
  case "NNS" => "NN"
  case "NNP" => "NN"
  case "NNPS" => "NN"
  case "JJ" => "JJ"
  case "JJR" => "JJ"
  case "JJS" => "JJ"
  case _ => tag
}

scala> shortenPosMatch("JJR")
res14: java.lang.String = JJ

Note that the last case, with the underscore “_” is the default action to take, similar to the “else” at the end of an if-else block.

Compare this to the if-else function shortenPos from before, which had lots of repetition in its definition of the form “else if (tag == “. Match statements allow you to do the same thing, but much more concisely and arguably, much more clearly. Of course, we can shorten this up.

def shortenPosMatch2 (tag: String) = tag match {
  case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
  case "JJ" | "JJR" | "JJS" => "JJ"
  case _ => tag
}

Which is quite a bit more readable than the if-else shortenPosMatch2 defined earlier.

In addition to readability, match statements provide some logical protection. For example, if you accidentally have two cases that overlap, you’ll get an error.

scala> def shortenPosMatchOops (tag: String) = tag match {
|   case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
|   case "JJ" | "JJR" | "JJS" => "JJ"
|   case "NN" => "oops"
|   case _ => tag
| }
<console>:10: error: unreachable code
case "NN" => "oops"

This is an obvious example, but with more complex match options, it can save you from bugs!

We cannot use the startsWith method the same way we did with the if-else shortenPosMatch3. However, we can use regular expressions very nicely with match statements, which we’ll get to in a later tutorial.

Where match statements really shine is that they can match on much more than just the value of simple variables like Strings and Ints.  One use of matches is to check the types of the input to a function that can take a supertype of many types. Recall that Any is the supertype of all types; if we have the following function that takes an argument with any type, we can use matching to inspect what the type of the argument is and do different behaviors accordingly.

scala> def multitypeMatch (x: Any) = x match {
|    case i: Int => "an Int: " + i*i
|    case d: Double => "a Double: " + d/2
|    case b: Boolean => "a Boolean: " + !b
|    case s: String => "a String: " + s.length
|    case (p1: String, p2: Int) => "a Tuple[String, Int]: " + p2*p2 + p1.length
|    case (p1: Any, p2: Any) => "a Tuple[Any, Any]: (" + p1 + "," + p2 + ")"
|    case _ => "some other type " + x
| }
multitypeMatch: (x: Any)java.lang.String

scala> multitypeMatch(true)
res4: java.lang.String = a Boolean: false

scala> multitypeMatch(3)
res5: java.lang.String = an Int: 9

scala> multitypeMatch((1,3))
res6: java.lang.String = a Tuple[Any, Any]: (1,3)

scala> multitypeMatch(("hi",3))
res7: java.lang.String = a Tuple[String, Int]: 92

So, for example, if it is an Int, we can do things like multiplication, if it is a Boolean we can negate it (with !), and so on. In the case statement, we provide a new variable that will have the type that is matched, and then after the arrow =>, we can use that variable in a type safe manner. Later we’ll see how to create classes (and in particular case classes), where this sort of matching based function is used regularly.

In the meantime, here’s an example of a simple addition function that allows one to enter a String or Int to specify its arguments. For example, the behavior we desire is this:

scala> add(1,3)
res4: Int = 4

scala> add("one",3)
res5: Int = 4

scala> add(1,"three")
res6: Int = 4

scala> add("one","three")
res7: Int = 4

Let’s assume that we only handle the spelled out versions of 1 through 5, and that any string we cannot handle (e.g. “six” and aardvark”) is considered to be 0. Then the following two functions using matches handle it.

def convertToInt (x: String) = x match {
  case "one" => 1
  case "two" => 2
  case "three" => 3
  case "four" => 4
  case "five" => 5
  case _ => 0
}

def add (x: Any, y: Any) = (x,y) match {
  case (x: Int, y: Int) => x + y
  case (x: String, y: Int) => convertToInt(x) + y
  case (x: Int, y: String) => x + convertToInt(y)
  case (x: String, y: String) => convertToInt(x) + convertToInt(y)
  case _ => 0
}

Like if-else blocks, matches can return whatever type you like, including Tuples, Lists and more.

Match blocks are used in many other useful contexts that we’ll come to later. In the meantime, it is also worth pointing out that matching is actually used in variable assignment. We’ve seen it already with Tuples, but it can be done with Lists and other types.

scala> val (x,y) = (1,2)
x: Int = 1
y: Int = 2

scala> val colors = List("blue","red","yellow")
colors: List[java.lang.String] = List(blue, red, yellow)

scala> val List(color1, color2, color3) = colors
color1: java.lang.String = blue
color2: java.lang.String = red
color3: java.lang.String = yellow

This is especially useful in the case of the args Array that comes from the command line when creating a script with Scala. For example, consider a program that is run as following.

$ scala nextYear.scala John 35
Next year John will be 36 years old.

Here’s how we can do it. (Save the next two lines as nextYear.scala and try it out.)

val Array(name, age) = args
println("Next year " + name + " will be " + (age.toInt + 1) + " years old.")

Notice that we had to do age.toInt. That is because age itself is a String, not an Int.

Conditional execution with if-else blocks and match blocks is a powerful part of building complex behaviors into your programs that you’ll see and use frequently!

 

From http://bcomposes.wordpress.com/2011/08/26/first-steps-in-scala-for-beginning-programmers-part-3/