Cloud-native developer. Distributed systems wannabe. DevOps and continuous delivery. 10x troublemaker. DevOps Manager at VHT.
8381 stories
·
1 follower

Real Python: Sets in Python

1 Share

Perhaps you recall learning about sets and set theory at some point in your mathematical education. Maybe you even remember Venn diagrams:

Venn diagram

If this doesn’t ring a bell, don’t worry! This tutorial should still be easily accessible for you.

In mathematics, a rigorous definition of a set can be abstract and difficult to grasp. Practically though, a set can be thought of simply as a well-defined collection of distinct objects, typically called elements or members.

Grouping objects into a set can be useful in programming as well, and Python provides a built-in set type to do so. Sets are distinguished from other object types by the unique operations that can be performed on them.

Here’s what you’ll learn in this tutorial: You’ll see how to define set objects in Python and discover the operations that they support. As with the earlier tutorials on lists and dictionaries, when you are finished with this tutorial, you should have a good feel for when a set is an appropriate choice. You will also learn about frozen sets, which are similar to sets except for one important detail.

Defining a Set

Python’s built-in set type has the following characteristics:

  • Sets are unordered.
  • Set elements are unique. Duplicate elements are not allowed.
  • A set itself may be modified, but the elements contained in the set must be of an immutable type.

Let’s see what all that means, and how you can work with sets in Python.

A set can be created in two ways. First, you can define a set with the built-in set() function:

x = set(<iter>)

In this case, the argument <iter> is an iterable—again, for the moment, think list or tuple—that generates the list of objects to be included in the set. This is analogous to the <iter> argument given to the .extend() list method:

>>> x = set(['foo', 'bar', 'baz', 'foo', 'qux'])
>>> x
{'qux', 'foo', 'bar', 'baz'}

>>> x = set(('foo', 'bar', 'baz', 'foo', 'qux'))
>>> x
{'qux', 'foo', 'bar', 'baz'}

Strings are also iterable, so a string can be passed to set() as well. You have already seen that list(s) generates a list of the characters in the string s. Similarly, set(s) generates a set of the characters in s:

>>> s = 'quux'

>>> list(s)
['q', 'u', 'u', 'x']
>>> set(s)
{'x', 'u', 'q'}

You can see that the resulting sets are unordered: the original order, as specified in the definition, is not necessarily preserved. Additionally, duplicate values are only represented in the set once, as with the string 'foo' in the first two examples and the letter 'u' in the third.

Alternately, a set can be defined with curly braces ({}):

x = {<obj>, <obj>, ..., <obj>}

When a set is defined this way, each <obj> becomes a distinct element of the set, even if it is an iterable. This behavior is similar to that of the .append() list method.

Thus, the sets shown above can also be defined like this:

>>> x = {'foo', 'bar', 'baz', 'foo', 'qux'}
>>> x
{'qux', 'foo', 'bar', 'baz'}

>>> x = {'q', 'u', 'u', 'x'}
>>> x
{'x', 'q', 'u'}

To recap:

  • The argument to set() is an iterable. It generates a list of elements to be placed into the set.
  • The objects in curly braces are placed into the set intact, even if they are iterable.

Observe the difference between these two set definitions:

>>> {'foo'}
{'foo'}

>>> set('foo')
{'o', 'f'}

A set can be empty. However, recall that Python interprets empty curly braces ({}) as an empty dictionary, so the only way to define an empty set is with the set() function:

>>> x = set()
>>> type(x)
<class 'set'>
>>> x
set()

>>> x = {}
>>> type(x)
<class 'dict'>

An empty set is falsy in Boolean context:

>>> x = set()
>>> bool(x)
False
>>> x or 1
1
>>> x and 1
set()

You might think the most intuitive sets would contain similar objects—for example, even numbers or surnames:

>>> s1 = {2, 4, 6, 8, 10}
>>> s2 = {'Smith', 'McArthur', 'Wilson', 'Johansson'}

Python does not require this, though. The elements in a set can be objects of different types:

>>> x = {42, 'foo', 3.14159, None}
>>> x
{None, 'foo', 42, 3.14159}

Don’t forget that set elements must be immutable. For example, a tuple may be included in a set:

>>> x = {42, 'foo', (1, 2, 3), 3.14159}
>>> x
{42, 'foo', 3.14159, (1, 2, 3)}

But lists and dictionaries are mutable, so they can’t be set elements:

>>> a = [1, 2, 3]
>>> {a}
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    {a}
TypeError: unhashable type: 'list'

>>> d = {'a': 1, 'b': 2}
>>> {d}
Traceback (most recent call last):
  File "<pyshell#72>", line 1, in <module>
    {d}
TypeError: unhashable type: 'dict'

Set Size and Membership

The len() function returns the number of elements in a set, and the in and not in operators can be used to test for membership:

>>> x = {'foo', 'bar', 'baz'}

>>> len(x)
3

>>> 'bar' in x
True
>>> 'qux' in x
False

Operating on a Set

Many of the operations that can be used for Python’s other composite data types don’t make sense for sets. For example, sets can’t be indexed or sliced. However, Python provides a whole host of operations on set objects that generally mimic the operations that are defined for mathematical sets.

Operators vs. Methods

Most, though not quite all, set operations in Python can be performed in two different ways: by operator or by method. Let’s take a look at how these operators and methods work, using set union as an example.

Given two sets, x1 and x2, the union of x1 and x2 is a set consisting of all elements in either set.

Consider these two sets:

x1 = {'foo', 'bar', 'baz'}
x2 = {'baz', 'qux', 'quux'}

The union of x1 and x2 is {'foo', 'bar', 'baz', 'qux', 'quux'}.

Note: Notice that the element 'baz', which appears in both x1 and x2, appears only once in the union. Sets never contain duplicate values.

In Python, set union can be performed with the | operator:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}
>>> x1 | x2
{'baz', 'quux', 'qux', 'bar', 'foo'}

Set union can also be obtained with the .union() method. The method is invoked on one of the sets, and the other is passed as an argument:

>>> x1.union(x2)
{'baz', 'quux', 'qux', 'bar', 'foo'}

The way they are used in the examples above, the operator and method behave identically. But there is a subtle difference between them. When you use the | operator, both operands must be sets. The .union() method, on the other hand, will take any iterable as an argument, convert it to a set, and then perform the union.

Observe the difference between these two statements:

>>> x1 | ('baz', 'qux', 'quux')
Traceback (most recent call last):
  File "<pyshell#43>", line 1, in <module>
    x1 | ('baz', 'qux', 'quux')
TypeError: unsupported operand type(s) for |: 'set' and 'tuple'

>>> x1.union(('baz', 'qux', 'quux'))
{'baz', 'quux', 'qux', 'bar', 'foo'}

Both attempt to compute the union of x1 and the tuple ('baz', 'qux', 'quux'). This fails with the | operator but succeeds with the .union() method.

Available Operators and Methods

Below is a list of the set operations available in Python. Some are performed by operator, some by method, and some by both. The principle outlined above generally applies: where a set is expected, methods will typically accept any iterable as an argument, but operators require actual sets as operands.

x1.union(x2[, x3 ...])

x1 | x2 [| x3 ...]

Compute the union of two or more sets.

Set unionSet Union

x1.union(x2) and x1 | x2 both return the set of all elements in either x1 or x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.union(x2)
{'foo', 'qux', 'quux', 'baz', 'bar'}

>>> x1 | x2
{'foo', 'qux', 'quux', 'baz', 'bar'}

More than two sets may be specified with either the operator or the method:

>>> a = {1, 2, 3, 4}
>>> b = {2, 3, 4, 5}
>>> c = {3, 4, 5, 6}
>>> d = {4, 5, 6, 7}

>>> a.union(b, c, d)
{1, 2, 3, 4, 5, 6, 7}

>>> a | b | c | d
{1, 2, 3, 4, 5, 6, 7}

The resulting set contains all elements that are present in any of the specified sets.

x1.intersection(x2[, x3 ...])

x1 & x2 [& x3 ...]

Compute the intersection of two or more sets.

Set intersectionSet Intersection

x1.intersection(x2) and x1 & x2 return the set of elements common to both x1 and x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.intersection(x2)
{'baz'}

>>> x1 & x2
{'baz'}

You can specify multiple sets with the intersection method and operator, just like you can with set union:

>>> a = {1, 2, 3, 4}
>>> b = {2, 3, 4, 5}
>>> c = {3, 4, 5, 6}
>>> d = {4, 5, 6, 7}

>>> a.intersection(b, c, d)
{4}

>>> a & b & c & d
{4}

The resulting set contains only elements that are present in all of the specified sets.

x1.difference(x2[, x3 ...])

x1 - x2 [- x3 ...]

Compute the difference between two or more sets.

Set differenceSet Difference

x1.difference(x2) and x1 - x2 return the set of all elements that are in x1 but not in x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.difference(x2)
{'foo', 'bar'}

>>> x1 - x2
{'foo', 'bar'}

Another way to think of this is that x1.difference(x2) and x1 - x2 return the set that results when any elements in x2 are removed or subtracted from x1.

Once again, you can specify more than two sets:

>>> a = {1, 2, 3, 30, 300}
>>> b = {10, 20, 30, 40}
>>> c = {100, 200, 300, 400}

>>> a.difference(b, c)
{1, 2, 3}

>>> a - b - c
{1, 2, 3}

When multiple sets are specified, the operation is performed from left to right. In the example above, a - b is computed first, resulting in {1, 2, 3, 300}. Then c is subtracted from that set, leaving {1, 2, 3}:

[Insert image here (set-difference-multiple)]

x1.symmetric_difference(x2)

x1 ^ x2 [^ x3 ...]

Compute the symmetric difference between sets.

Set symmetric differenceSet Symmetric Difference

x1.symmetric_difference(x2) and x1 ^ x2 return the set of all elements in either x1 or x2, but not both:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.symmetric_difference(x2)
{'foo', 'qux', 'quux', 'bar'}

>>> x1 ^ x2
{'foo', 'qux', 'quux', 'bar'}

The ^ operator also allows more than two sets:

>>> a = {1, 2, 3, 4, 5}
>>> b = {10, 2, 3, 4, 50}
>>> c = {1, 50, 100}

>>> a ^ b ^ c
{100, 5, 10}

As with the difference operator, when multiple sets are specified, the operation is performed from left to right.

Curiously, although the ^ operator allows multiple sets, the .symmetric_difference() method doesn’t:

>>> a = {1, 2, 3, 4, 5}
>>> b = {10, 2, 3, 4, 50}
>>> c = {1, 50, 100}

>>> a.symmetric_difference(b, c)
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    a.symmetric_difference(b, c)
TypeError: symmetric_difference() takes exactly one argument (2 given)

x1.isdisjoint(x2)

Determines whether or not two sets have any elements in common.

x1.isdisjoint(x2) returns True if x1 and x2 have no elements in common:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.isdisjoint(x2)
False

>>> x2 - {'baz'}
{'quux', 'qux'}
>>> x1.isdisjoint(x2 - {'baz'})
True

If x1.isdisjoint(x2) is True, then x1 & x2 is the empty set:

>>> x1 = {1, 3, 5}
>>> x2 = {2, 4, 6}

>>> x1.isdisjoint(x2)
True
>>> x1 & x2
set()

Note: There is no operator that corresponds to the .isdisjoint() method.

x1.issubset(x2)

x1 <= x2

Determine whether one set is a subset of the other.

In set theory, a set x1 is considered a subset of another set x2 if every element of x1 is in x2.

x1.issubset(x2) and x1 <= x2 return True if x1 is a subset of x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x1.issubset({'foo', 'bar', 'baz', 'qux', 'quux'})
True

>>> x2 = {'baz', 'qux', 'quux'}
>>> x1 <= x2
False

A set is considered to be a subset of itself:

>>> x = {1, 2, 3, 4, 5}
>>> x.issubset(x)
True
>>> x <= x
True

It seems strange, perhaps. But it fits the definition—every element of x is in x.

x1 < x2

Determines whether one set is a proper subset of the other.

A proper subset is the same as a subset, except that the sets can’t be identical. A set x1 is considered a proper subset of another set x2 if every element of x1 is in x2, and x1 and x2 are not equal.

x1 < x2 returns True if x1 is a proper subset of x2:

>>> x1 = {'foo', 'bar'}
>>> x2 = {'foo', 'bar', 'baz'}
>>> x1 < x2
True

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'bar', 'baz'}
>>> x1 < x2
False

While a set is considered a subset of itself, it is not a proper subset of itself:

>>> x = {1, 2, 3, 4, 5}
>>> x <= x
True
>>> x < x
False

Note: The < operator is the only way to test whether a set is a proper subset. There is no corresponding method.

x1.issuperset(x2)

x1 >= x2

Determine whether one set is a superset of the other.

A superset is the reverse of a subset. A set x1 is considered a superset of another set x2 if x1 contains every element of x2.

x1.issuperset(x2) and x1 >= x2 return True if x1 is a superset of x2:

>>> x1 = {'foo', 'bar', 'baz'}

>>> x1.issuperset({'foo', 'bar'})
True

>>> x2 = {'baz', 'qux', 'quux'}
>>> x1 >= x2
False

You have already seen that a set is considered a subset of itself. A set is also considered a superset of itself:

>>> x = {1, 2, 3, 4, 5}
>>> x.issuperset(x)
True
>>> x >= x
True

x1 > x2

Determines whether one set is a proper superset of the other.

A proper superset is the same as a superset, except that the sets can’t be identical. A set x1 is considered a proper superset of another set x2 if x1 contains every element of x2, and x1 and x2 are not equal.

x1 > x2 returns True if x1 is a proper superset of x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'bar'}
>>> x1 > x2
True

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'bar', 'baz'}
>>> x1 > x2
False

A set is not a proper superset of itself:

>>> x = {1, 2, 3, 4, 5}
>>> x > x
False

Note: The > operator is the only way to test whether a set is a proper superset. There is no corresponding method.

Modifying a Set

Although the elements contained in a set must be of immutable type, sets themselves can be modified. Like the operations above, there are a mix of operators and methods that can be used to change the contents of a set.

Augmented Assignment Operators and Methods

Each of the union, intersection, difference, and symmetric difference operators listed above has an augmented assignment form that can be used to modify a set. For each, there is a corresponding method as well.

x1.update(x2[, x3 ...])

x1 |= x2 [| x3 ...]

Modify a set by union.

x1.update(x2) and x1 |= x2 add to x2 any elements in x1 that x2 does not already have:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'baz', 'qux'}

>>> x1 |= x2
>>> x1
{'qux', 'foo', 'bar', 'baz'}

>>> x1.update(['corge', 'garply'])
>>> x1
{'qux', 'corge', 'garply', 'foo', 'bar', 'baz'}

x1.intersection_update(x2[, x3 ...])

x1 &= x2 [& x3 ...]

Modify a set by intersection.

x1.intersection_update(x2) and x1 &= x2 update x1, retaining only elements found in both x1 and x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'baz', 'qux'}

>>> x1 &= x2
>>> x1
{'foo', 'baz'}

>>> x1.intersection_update(['baz', 'qux'])
>>> x1
{'baz'}

x1.difference_update(x2[, x3 ...])

x1 -= x2 [| x3 ...]

Modify a set by difference.

x1.difference_update(x2) and x1 -= x2 update x1, removing elements found in x2:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'baz', 'qux'}

>>> x1 -= x2
>>> x1
{'bar'}

>>> x1.difference_update(['foo', 'bar', 'qux'])
>>> x1
set()

x1.symmetric_difference_update(x2)

x1 ^= x2

Modify a set by symmetric difference.

x1.symmetric_difference_update(x2) and x1 ^= x2 update x1, retaining elements found in either x1 or x2, but not both:

>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'foo', 'baz', 'qux'}
>>> 
>>> x1 ^= x2
>>> x1
{'bar', 'qux'}
>>> 
>>> x1.symmetric_difference_update(['qux', 'corge'])
>>> x1
{'bar', 'corge'}

Other Methods For Modifying Sets

Aside from the augmented operators above, Python supports several additional methods that modify sets.

x.add(<elem>)

Adds an element to a set.

x.add(<elem>) adds <elem>, which must be a single immutable object, to x:

>>> x = {'foo', 'bar', 'baz'}

>>> x.add('qux')
>>> x
{'bar', 'baz', 'foo', 'qux'}

x.remove(<elem>)

Removes an element from a set.

x.remove(<elem>) removes <elem> from x. Python raises an exception if <elem> is not in x:

>>> x = {'foo', 'bar', 'baz'}

>>> x.remove('baz')
>>> x
{'bar', 'foo'}

>>> x.remove('qux')
Traceback (most recent call last):
  File "<pyshell#58>", line 1, in <module>
    x.remove('qux')
KeyError: 'qux'

x.discard(<elem>)

Removes an element from a set.

x.discard(<elem>) also removes <elem> from x. However, if <elem> is not in x, this method quietly does nothing instead of raising an exception:

>>> x = {'foo', 'bar', 'baz'}

>>> x.discard('baz')
>>> x
{'bar', 'foo'}

>>> x.discard('qux')
>>> x
{'bar', 'foo'}

x.pop()

Removes a random element from a set.

x.pop() removes and returns an arbitrarily chosen element from x. If x is empty, x.pop() raises an exception:

>>> x = {'foo', 'bar', 'baz'}

>>> x.pop()
'bar'
>>> x
{'baz', 'foo'}

>>> x.pop()
'baz'
>>> x
{'foo'}

>>> x.pop()
'foo'
>>> x
set()

>>> x.pop()
Traceback (most recent call last):
  File "<pyshell#82>", line 1, in <module>
    x.pop()
KeyError: 'pop from an empty set'

x.clear()

Clears a set.

x.clear() removes all elements from x:

>>> x = {'foo', 'bar', 'baz'}
>>> x
{'foo', 'bar', 'baz'}
>>> 
>>> x.clear()
>>> x
set()

Frozen Sets

Python provides another built-in type called a frozenset, which is in all respects exactly like a set, except that a frozenset is immutable. You can perform non-modifying operations on a frozenset:

>>> x = frozenset(['foo', 'bar', 'baz'])
>>> x
frozenset({'foo', 'baz', 'bar'})

>>> len(x)
3

>>> x & {'baz', 'qux', 'quux'}
frozenset({'baz'})

But methods that attempt to modify a frozenset fail:

>>> x = frozenset(['foo', 'bar', 'baz'])

>>> x.add('qux')
Traceback (most recent call last):
  File "<pyshell#127>", line 1, in <module>
    x.add('qux')
AttributeError: 'frozenset' object has no attribute 'add'

>>> x.pop()
Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    x.pop()
AttributeError: 'frozenset' object has no attribute 'pop'

>>> x.clear()
Traceback (most recent call last):
  File "<pyshell#131>", line 1, in <module>
    x.clear()
AttributeError: 'frozenset' object has no attribute 'clear'

>>> x
frozenset({'foo', 'bar', 'baz'})

Deep Dive: Frozensets and Augmented Assignment

Since a frozenset is immutable, you might think it can’t be the target of an augmented assignment operator. But observe:

>>> f = frozenset(['foo', 'bar', 'baz'])
>>> s = {'baz', 'qux', 'quux'}

>>> f &= s
>>> f
frozenset({'baz'})

What gives?

Python does not perform augmented assignments on frozensets in place. The statement x &= s is effectively equivalent to x = x & s. It isn’t modifying the original x. It is reassigning x to a new object, and the object x originally referenced is gone.

You can verify this with the id() function:

>>> f = frozenset(['foo', 'bar', 'baz'])
>>> id(f)
56992872
>>> s = {'baz', 'qux', 'quux'}

>>> f &= s
>>> f
frozenset({'baz'})
>>> id(f)
56992152

f has a different integer identifier following the augmented assignment. It has been reassigned, not modified in place.

Some objects in Python are modified in place when they are the target of an augmented assignment operator. But frozensets aren’t.

Frozensets are useful in situations where you want to use a set, but you need an immutable object. For example, you can’t define a set whose elements are also sets, because set elements must be immutable:

>>> x1 = set(['foo'])
>>> x2 = set(['bar'])
>>> x3 = set(['baz'])
>>> x = {x1, x2, x3}
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    x = {x1, x2, x3}
TypeError: unhashable type: 'set'

If you really feel compelled to define a set of sets (hey, it could happen), you can do it if the elements are frozensets, because they are immutable:

>>> x1 = frozenset(['foo'])
>>> x2 = frozenset(['bar'])
>>> x3 = frozenset(['baz'])
>>> x = {x1, x2, x3}
>>> x
{frozenset({'bar'}), frozenset({'baz'}), frozenset({'foo'})}

Likewise, recall from the previous tutorial on dictionaries that a dictionary key must be immutable. You can’t use the built-in set type as a dictionary key:

>>> x = {1, 2, 3}
>>> y = {'a', 'b', 'c'}
>>> 
>>> d = {x: 'foo', y: 'bar'}
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    d = {x: 'foo', y: 'bar'}
TypeError: unhashable type: 'set'

If you find yourself needing to use sets as dictionary keys, you can use frozensets:

>>> x = frozenset({1, 2, 3})
>>> y = frozenset({'a', 'b', 'c'})
>>> 
>>> d = {x: 'foo', y: 'bar'}
>>> d
{frozenset({1, 2, 3}): 'foo', frozenset({'c', 'a', 'b'}): 'bar'}

Conclusion

In this tutorial, you learned how to define set objects in Python, and you became familiar with the functions, operators, and methods that can be used to work with sets.

You should now be comfortable with the basic built-in data types that Python provides.

Next, you will begin to explore how the code that operates on those objects is organized and structured in a Python program.


[ Improve Your Python With đŸ Python Tricks đŸ’Œ – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Read the whole story
sbanwart
2 hours ago
reply
Akron, OH
Share this story
Delete

Bing.com runs on .NET Core 2.1!

2 Shares

Bing.com is a cloud service that runs on thousands of servers spanning many datacenters across the globe. Bing servers handle thousands of users’ queries every second from consumers around the world doing searches through their browsers, from our partners using the Microsoft Cognitive Services APIs, and from the personal digital assistant, Cortana. Our users demand both relevancy and speed in those results, thus performance and reliability are key components in running a successful cloud service such as Bing.

Bing’s front-end stack is written predominantly in managed code layered in an MVC pattern. Most of the business logic code is written as data models in C#, and the view logic is written in Razor. This layer is responsible for transforming the search result data (encoded as Microsoft Bond) to HTML that is then compressed and sent to the browser. As gatekeepers of that front-end platform at Bing, we consider developer productivity and feature agility as additional key components in our definition of success. Hundreds of developers rely on this platform to get their features to production, and they expect it to run like clockwork.

Since its beginning, Bing.com has run on the .NET Framework, but it recently transitioned to running on .NET Core. The main reasons driving Bing.com’s adoption of .NET Core are performance (a.k.a serving latency), support for side-by-side and app-local installation independent of the machine-wide installation (or lack thereof) and ReadyToRun images. In anticipation of those improvements, we started an effort to make the code portable across .NET implementations, rather than relying on libraries only available on Windows and only with the .NET Framework. The team started the effort with .NET Standard 1.x, but the reduced API surface caused non-trivial complications for our code migrations. With the 20,000+ APIs that returned with .NET Standard 2.0, all that changed, and we were able to quickly shift gears from code modifications to testing. After squashing a few bugs, we were ready to deploy .NET Core to production.

ReadyToRun Images

Managed applications often can have poor startup performance as methods first have to be JIT compiled to machine code. .NET Framework has a precompilation technology, NGEN. However, NGEN requires the precompilation step to occur on the machine on which the code will execute. For Bing, that would mean NGENing on thousands of machines. This coupled with an aggressive deployment cycle would result in significant serving capacity reduction as the application gets precompiled on the web-serving machines. Furthermore, running NGEN requires administrative privileges, which are often unavailable or heavily scrutinized in a datacenter setting. On .NET Core, the crossgen tool allows the code to be precompiled as a pre-deployment step, such as in the build lab, and the images deployed to production are Ready To Run!

Performance

.NET Core 2.1 has made major performance improvements in virtually all areas of the runtime and libraries; a great treatise is available on a previous post in the blog.

Our production data resonates with the significant performance improvements in .NET Core 2.1 (as compared to both .NET Core 2.0 and .NET Framework 4.7.2). The graph below tracks our internal server latency over the last few months. The Y axis is the latency (actual values omitted), and the final precipitous drop (on June 2) is the deployment of .NET Core 2.1! That is a 34% improvement, all thanks to the hard work of the .NET community!

The following changes in .NET Core 2.1 are the highlights of this phenomenal improvement for our workload. They’re presented in decreasing order of impact.

  1. Vectorization of string.Equals (@jkotas) & string.IndexOf/LastIndexOf (@eerhardt)

Whichever way you slice it, HTML rendering and manipulation are string-heavy workloads. String comparisons and indexing operations are major components of that. Vectorization of these operations is the single biggest contributor to the performance improvement we’ve measured.

  1. Devirtualization Support for EqualityComparer<T>.Default (@AndyAyersMS)

One of our major software components is a heavy user of Dictionary<int/long, V>, which indirectly benefits from the intrinsic recognition work that was done in the JIT to make Dictionary<K, V> amenable to that optimization (@benaadams)

  1. Software Write Watch for Concurrent GC (@Maoni0 and @kouvel)

This led to reduction in CPU usage in our application. Prior to .NET Core 2.1, the write-watch on Windows x64 (and on the .NET Framework) was implemented using Windows APIs that had a different performance trade-off. This new implementation relies on a JIT Write Barrier, which intuitively increases the cost of a reference store, but that cost is amortized and not noticed in our workload. This improvement is now also available on the .NET Framework via May 2018 Security and Quality Rollup

  1. Methods with calli are now inline-able (@AndyAyersMS and @mjsabby)

We use ldftn + calli in lieu of delegates (which incur an object allocation) in performance-critical pieces of our code where there is a need to call a managed method indirectly. This change allowed method bodies with a calli instruction to be eligible for inlining. Our dependency injection framework generates such methods.

  1. Improve performance of string.IndexOfAny for 2 & 3 char searches (@bbowyersmyth)

A common operation in a front-end stack is search for ‘:’, ‘/’, ‘/’ in a string to delimit portions of a URL. This special-casing improvement was beneficial throughout the codebase.

In addition to the runtime changes, .NET Core 2.1 also brought Brotli support to the .NET Library ecosystem. Bing.com uses this capability to dynamically compress the content and deliver it to supporting browsers.

Runtime Agility

Finally, the ability to have an xcopy version of the runtime inside our application means we’re able to adopt newer versions of the runtime at a much faster pace. In fact, if you peek at the graph above we took the .NET Core 2.1 update worldwide in a regular application deployment on June 2, which is two days after it was released!

This was possible because we were running our continuous integration (CI) pipeline with .NET Core’s daily CI builds testing functionality and performance all the way through the release.

We’re excited about the future and are collaborating closely with the .NET team to help them qualify their future updates! The .NET Core team is excited because of our large catalog of functional tests and an additional large codebase to measure real-world performance improvements on, as well as our commitment to providing both Bing.com users fast results and our own developers working with the latest software and tools.

This blog post was authored by Mukul Sabharwal (@mjsabby) from the Bing.com Engineering team.

Read the whole story
sbanwart
2 hours ago
reply
Akron, OH
alvinashcraft
3 hours ago
reply
West Grove, PA
Share this story
Delete

Docker Hub’s scheduled downtime on 25 August: potential impacts to App Service customers

1 Share

Docker has scheduled a maintenance window for Docker Hub on Saturday August 25th, which has potential impacts to App Service customers.

For Web App for Containers (using custom Docker image), customers will not be able to create new web apps using a Docker container image from Docker Hub during the maintenance window.  Customers can still create new apps using Docker images hosted on Azure Container Registry or a private Docker registry.  For App Service on Linux (using non-preview built-in stacks),  customers will not be impacted as we have Docker container images cached on our Linux workers. 

To avoid unnecessary service interruptions, we recommend Web App for Containers customers not make any changes or restart your apps, or use an alternative Docker registry during the Docker Hub maintenance window.

Read the whole story
sbanwart
2 hours ago
reply
Akron, OH
Share this story
Delete

[Infographic] The Docker vs. Kubernetes vs. Apache Mesos Myth: Why What You Think You Know is Wrong

1 Share

These days it seems like container orchestration is a big topic in every technical conversation. There are countless articles, presentations, and lots of social chatter comparing Docker, Kubernetes, and Mesos. However, many of these articles and talk-tracks are misinformed and perpetuate the myth that these three open source projects are in a fight-to-the-death for container supremacy.

While all of these platforms have functionality around containers, they are all very different technologies that do very different things.

In this infographic, we aim to dispel these myths and provide clarity around what these technologies do and how they are different. Read on to learn more about Kubernetes, Docker, Apache Mesos, and how they all actually work together.

Interested in learning more about how Mesosphere can help optimize your container strategy? Learn how to run Kubernetes-as-a-service.

The post [Infographic] The Docker vs. Kubernetes vs. Apache Mesos Myth: Why What You Think You Know is Wrong appeared first on Mesosphere.

Read the whole story
sbanwart
5 hours ago
reply
Akron, OH
Share this story
Delete

Linux kernal TCP vulnerability

1 Share

We are aware of a denial of service vulnerability that is affecting the kernel of many Linux distributions, including Ubuntu 16.04 LTS machines, named SegmentSmack CVE-2018-5390. The TCP implementation in the Linux kernel makes the system vulnerable to a denial of service if several 1-byte sized segments are sent to the system over an open TCP connection, the system will become unresponsive with 100% CPU utilization. You can read more about this vulnerability on the CERT/CC website and in the Ubuntu security notes blog .

As a result, we recommend that you immediately patch the nodes in your Service Fabric Linux clusters. You have the following options:

  1. Automatically patch your cluster’s nodes using the Patch Orchestration Application (POA) – The POA is a Service Fabric application available for automating OS patching on Ubuntu clusters without downtime (through a monitored rolling upgrade). Refer to this article on how to go about downloading and installing this app on your Service Fabric cluster.
  2. Upgrade your VMSS OS Image through an ARM Upgrade
    1. Update your OS image using the latest OS version – if you are using image version "latest" you can use VMSS OSRollingUpgrade command to reapply the latest image. See
      https://docs.microsoft.com/en-us/rest/api/compute/virtualmachinescalesetrollingupgrades/startosupgrade for further details.
    2. Update your OS Image for a VMSS bound to a specific version - You can modify your arm template or update your VMSS definition.
"storageProfile": {
       "imageReference": {
              "publisher": "Canonical",
              "offer": "UbuntuServer",
              "sku": "16.04-LTS",
              "version": "latest"
         },

For the “version” that you choose, you can either set it to latest to receive the most up-to-date patches or you can use the explicit version 16.04.201808060 (which includes the kernel version 4.15.0-1019 that contains the fix).

  1. Manually patch your cluster’s nodes – you can manually follow the steps that the POA takes to upgrade your nodes on a per node or per upgrade domain basis. Here are the steps you should take to patch your nodes manually:
    1. Check for health of the cluster before patch
    2. Disable a node OR multiple nodes belonging to same upgrade domain – this is to gracefully transfer workloads from nodes going down for the patch to other nodes in the cluster.
      You can do this via SFX, PowerShell, or through sfctl (“sfctl node disable --node-name <node_name> --deactivation_intent restart”)
    3. Wait for node(s) to be in “Disabled” state – this step may take a while based on what is running on the node(s)
    4. Run these steps to install patches on the node(s):
      1. pull the latest patches: sudo apt-get update
      2. install the latest patches:  sudo apt-get dist-upgrade
      3. restart the node (needed for kernel upgrades to go through)
    5. Once the node is back up, Enable the nodes in Service Fabric. You can do this through SFX, PowerShell, or sfctl (“sfctl node enable --node-name <node_name>”)
    6. Check for changes in health status changes once the node is back online by comparing with the health observed in step 1
    7. If there are no changes, continue rolling out the upgrade to the next node / set of nodes

Please feel free to reach out to Azure Support if you are running into any issues with these options.

Read the whole story
sbanwart
6 hours ago
reply
Akron, OH
Share this story
Delete

Dew Drop – August 20, 2018 (#2785)

1 Share

XL edition today after a week off.

Top Links

Web & Cloud Development

XAML, UWP & Xamarin

Visual Studio & .NET

Design, Methodology & Testing

Mobile, IoT & Game Development

Podcasts, Screencasts & Videos

Community & Events

Database

PowerShell

Miscellaneous

More Link Collections

The Geek Shelf

The post Dew Drop – August 20, 2018 (#2785) appeared first on Morning Dew.

Read the whole story
sbanwart
6 hours ago
reply
Akron, OH
Share this story
Delete
Next Page of Stories