• Save
  • Download
  • Clear Output
  • Runtime
  • Run All Cells
  • Difficulty Rating

Loading Runtime

What is a String?

In this lesson we're going to learn about a new data type. In other words, we're going to learn about a new category of thing that we can store to a variable. Let's learn about strings.

Up to this point in the course we've learned about two numeric datatypes: integers, and floats. Now we're going to introduce strings as our first non-numeric data type.

Strings are how Python stores text. If you wanted to store your name to a variable, or a sentence, or a paragraph, or a blog post, or any other text-based data, you would use a string to do that.

Making a String

Let's store our name to a variable called my_name.

Just like before we write the variable first, then a single equals sign, but this time we have to wrap the text that we want to store to inside of the variable with quotation marks.

These quotation marks tell Python where the string starts and ends so that it knows exactly what characters make up the string that we want to store.

Ryan

If we use the type() function on our new variable we'll see that python abbreviates the datatype here as 'str'.

<class 'str'>

Double Quotes or Single Quotes?

In Python, you can wrap your string in either double quotes or single quotes. It doesn't matter which as long as you use the same one at both the start and end of the string.

Ryan AllredTemzee

Some people prefer single quotes because then they don't have to hold down the shift key, and I mean, why push extra buttons if you don't have to, right?

Other programming languages only allow double quotes to be used for storing strings. So a lot of people use double quotes just out of habit from doing programming in other languages.

Feel free to use whichever you prefer. There isn't really a right or wrong here, but try and be consistent if you can.

Strings that contain quotation marks.

There are some rare situations where it might be better to use single quotes or double quotes.

For example, imagine you want to store the following sentence:

I can't go to the store today.

Watch what happens if I try and wrap this string with single quotes:

File "<exec>", line 1 sentence = 'I can't go to the store today.' ^ SyntaxError: unterminated string literal (detected at line 1)

Woah, notice how our code is going funny colors? That's the syntax highlighting trying to tell us that something's off here. More specifically, we have here what's called a "Syntax Error". A Syntax Error is thrown whenever we break one of the rules of the Python programming language so that it can't understand our code –to process it successfully. There's all kind of ways to trigger syntax errors, this won't be the last time we run into one.

Remember how I said that the quotation marks tell Python where the string starts and ends? Well, if the string you're trying to store contains the same kind of quotation mark that you're to wrap the string in (at the beginning and at the end), then the code interpreter might get confused.

Python reads from left to right, just like we do, it sees a single quote at the beginning and goes, alright, we're starting a string here, and then it runs into the second single quote (which in this case is technically the apostrophe in the word can't) and it thinks, ok, now the string's ending, my work here is done –hey wait a minute, what's all this extra stuff? This doesn't make any sense, I'd better throw a syntax error.

I also want to point out that even the print statement on line 3 is showing up as orange. If nothing else, this should set off alarm bells in your head. This is because Python can't figure our where the string is supposed to end. Is the single quote at the end of the string actually trying to start a new string? Python's not sure, so it just colors everything else in the code cell as orange until we find a way to clarify.

So how do we fix it? How do we help Python differentiate between the quotation marks that make up the start and end of the string versus quotation marks that might just occur naturally in the text that we're trying to store?

Escape Characters

As Python is reading our string from left to right, we can tell it to skip any quotation marks that we don't want to be considered as part of the string creation syntax (the starting or ending of a string) by putting a backwards slash \ just before any quotation marks that we want to be ignored.

This backwards slash is called an "escape character". Escape Characters aren't considered an extra letter within the string, they're totally ignored. In fact, if we print out the a string that contains an escape character it won't be shown in the output.

I can't go to the store today.

We can even use escape characters on more complex pieces of text that contain both kinds of quotation marks.

And then he said, "That's preposturous!"

Using different quotation marks than the string.

In situations where the text itself only contains one kind of quotation mark or the other, we can avoid the use of escape characters by just changing the kind of quote that we're wrapping the string in from single quotes to double quotes (or vice-versa). That way there's no conflict. It's clear to Python where the string starts and ends.

I can't go to the store today.

A "sequence" of characters

Alright, cool, we know how to make strings and store them to variables. But, just to back up for a minute, why are they called "strings" in the first place? Why not just call this data type "text" or something?

Well, strings get their name from the fact that the text they hold is a "string of characters". To think about what that means, consider something like a "string of beads", for example.

colorful beads being strung on an orange string

When you put beads on a string you slide them on one by one. This means that the beads have a specific order -or sequence. And that sequence can't be changed without dismantling the string and rebuilding it.

The order of the beads is also very important. For example in a bracelet or necklace or some other kind of beaded jewelry that's part of what makes it distinct, beautiful and gives it value.

Strings in Python follow many of the same principles. With written language, the order of the characters is extremely important. Each word in language is defined by the order of its constituent letters. Same with a sentence. The order that words come in is what gives language meaning –when you stop and think about it.

So when you hear the world "string" I don't the term to be an abstract name that you have to memorize because that's how some programmer chose to name it. I want the term "string" to have special meaning to you. Let's explore the concepts of sequence and altering strings a little bit more in-depth.

Index Position

Each character in a string is assigned a number that represents that character's position within the string. The funny thing about these numbers is that even though they're the regular counting numbers (1, 2, 3, 4... and so on). We always start counting with the number 0 rather than 1. This is a common thing throughout most programming languages that you'll get used to.

P y t h o n

0 1 2 3 4 5

The quotation marks that start and end a string are not included in this enumeration.

These numbers are called the character's "index position" or "index" for short. Whenever we order items in this way starting with zero we say that the sequence itself is "zero-indexed".

Substrings

We can select any character of a string by using its index position. The way that we do this is we write the name of the variable that the string is stored to, and then we put a set of square brackets. And then inside of those square brackets we put the index position of the character we're interested in.

This smaller part of a large string is called a "substring".

o

Negative String Indexing

A cool feature of Python that is somewhat unique to this programming language is that not only are strings indexed from front to back (starting at 0), but they're also indexed from back to front.

However, the indices (also called indexes) coming from back to front are negative and do not start at zero.

P y t h o n

-6 -5 -4 -3 -2 -1

This is useful in a whole bunch of ways, but in particular can be very convenient if the parts of the string that we're interested in are closer to the back of the string rather than the front.

y

IndexError: string index out of range

If you use an index position that goes beyond the end of the string, Python will throw an error: IndexError: string index out of range.

File "<exec>", line 1, in <module> IndexError: string index out of range

String Slicing

If we want to capture a substring that is longer than one character we will use "string slicing" to accomplish this.

Let's create a new string with a longer portion of text so that we can extract some interesting substrings from it.

Ryan loves using Python for Data Science!

Once we've got a variable with a string stored to it. We can access a portion of the string by writing writing the variable name like usual with square brackets on the end like before.

You'll notice that we tend to use square brackets written after a variable name whenever we want to get a portion of something larger. This pattern is common all throughout and applies to all kinds of datatypes.

Alright, so we're saying with the square brackets that we want to get a part of the string, but how do we indicate what part we want?

Let's pretend that we wanted to extract the substring Ryan from the larger string.

The first thing we put in the square brackets is the index position of the first character of the substring. So in this case that would be 0. Simple enough.

Next we put a colon :. We're going to put a second number after this colon that tells Python where to end our substring. The colon symbol is saying "get me everything inbetween these two indices."

Here's the tricky part though, for the second index position, we don't put the index of the last number of the substring that we want, We put the index position of the character that comes just after what we want to be the last character in the substring. Compare the results of the following two expressions:

Rya
Ryan

The "R" is at index position 0. But the "n" is at index position 3. So why do we put 4 instead?

Well in Python when we give a range like this using a colon, the number we put on the left side is inclusive, but the number we put on the right is exclusive

What we're really saying here is, hey Python, get me the characters of the string phrase starting at index position 0 and up to but not including index position 4.

This may seem like a weird thing, why just not make both inclusive wouldn't that be easier? Well, doing it in this way will help simplify some other concepts down the road. Also this is a pretty consistent pattern across all of Python, when we're giving a range of values, we provide the first one as inclusive but the second one as exclusive -typically.

I'll make it a point to demonstrate how this seemingly inconvenient feature of the language will simplify both our code and our thinking –when we get to some more advanced concepts.

String Slicing with Negative Indices

Say I wanted to extract the substring "Science". Well, I could count from the beginning of the array all the way to the end to find out what index positions I should use, or I could just use negative indices and count from the back. Remember, when we count from the back of the string the final index is -1 not 0.

And even though we're getting the negative index positions from the back of the string, we still provide the left-most index first to the square brackets. and the right-most index (exclusive) on the right side of the square brackets.

Science

If I want don't put a number on the left side of the colon then the slicing will start from the beginning of the string. If I don't put a number on the right side of the colon then the slicing will go all the way to the end of the string.

Ryan loves using Python for Data Science!Ryan loves using Python for Data Science!

String Concatenation

When we were working with numeric values (integers and floats) and we put a plus sign in-between two of them, that told Python to do addition. What would happen if we put a plus sign in-between two strings? Let's try it and see what happens.

RyanAllred

This is called string concatenation. "concatenate" is just a fancy word for "squish two things together into one thing".

You'll notice that when I combined these two words together there was no whitespace between them. Well, let's just concatenate a space between them too.

Ryan Allred

You usually won't see raw strings like this concatenated together. Their values will usually be stored to variables and then we'll use the variables to do the same operation. That might look something like this:

Ryan Allred

Immutability

If something is "immutable" that means that it can't be changed once it has been created. Strings in Python are immutable. This means that we can select substrings from them, but we can't change the characters in the original string itself. We can however use the original string to make a new one that has the properties that we want.

Imagine that we have the following string:

This string is one of the common misspellings of the word "definitely". Can we fix it so that it's spelled correctly? What if we just tell the character in index position 5 (where the letter a is) to be an e instead? Let's try it.

File "<exec>", line 1, in <module> TypeError: 'str' object does not support item assignment

Well, we can't. At least not using this method. That's because in Python, strings are "immutable" and can't be changed after they've been created. So there's no way to modify the string stored to the variable misspelled.

What we would have to do is use the original string to make build an entirely new string but to not modify the original as we do so. Watch this.

definitely

That's a lot of work just to switch out a letter within a string, but this is all driven by the fact that strings are immutable in Python. In reality, it's rare that you'd need to make a small change to a specific character like this.

Having strings be immutable means that it's less likely that you'll accidentally irreversibly change the contents of a string by mistake when you were really meaning to do something else entirely. So in the long run, you'll come to appreciate that Python has made this design decision.