• Save
  • Download
  • Clear Output
  • Runtime
  • Run All Cells

Loading Runtime

In the previous lesson we learned about how lists are valuable because they enable us to store a lot of data in one place, but lists are also useful in another way. Lists are useful because not only are they a container for data, but they are an ordered container of data. In Python we have a special name for any ordered container of data. We call these collections a "sequence".

We've already talked about one kind of sequence in the past when we talked about strings. A string (if you'll recall with me) is a sequence of characters. The specific order of the letters in words is what gives text meaning, but it's not just written language that it is useful to represent as an ordered "sequence". Take a look at the following string as an example:

ATCGGCTAATCG

In a DNA sequence is made up of nucleotides represented by one of the four letters (A, C, G, or T). The order of the nucleotide bases is crucial as it represents the genetic information encoded within the DNA molecule. Altering the order of the bases would result in a different genetic sequence, potentially encoding different proteins or causing genetic mutations. Therefore, the order of the characters in the string is essential for interpreting the genetic information accurately.

However, there's no reason why we can't hold this same information within a list as a list would also preserve the order of the

['A', 'T', 'C', 'G', 'G', 'C', 'T', 'A', 'A', 'T', 'C', 'G']

By the way, you can use the type() function on a list just like we would any other value that is stored to a variable to check what datatype is contained within that variable

<class 'list'>

In fact, the similarity between these two types of sequences is so strong, that Python has built-in methods that can be used for turning strings into lists.

When we transform one datatype to another this is called "casting" or "type conversion". In the code cell below we are "casting" a string to a list.

['A', 'T', 'C', 'G', 'G', 'C', 'T', 'A', 'A', 'T', 'C', 'G']

As you can see, the list() function takes our string and breaks it up into a list of individual string characters just like what we manually wrote out above. Later on I'll show you how we can reverse this process to turn a list back into a string, but for now I just want to help draw some important parallels between these two data types.

Lists and Strings have a lot in common

Strings and lists have a lot of common. I want to show you that if you mastered working with strings in our previous lessons, then you already know quite a lot about how to work with lists!

Zero-Indexing

When we worked with strings we talked about how the positions of characters within strings are each assigned a number called an "index position" or "index" for short.

However, in Python (and in many other programming languages), these index positions don't start with 1 like we do when we're using the regular counting numbers, they start with 0.

Here's a reminder about when we looked at the string "Python" and below the string are the index of each value is listed.

P y t h o n

0 1 2 3 4 5

Instead of rehashing this old example, let's use our new DNA sequence data to demonstrate this similarity between strings and lists.

The two code cells below are equivalent. I can use the "square bracket syntax" at the end of the string literal (typed out string) or on the variable that holds the string and I'll get the same result.

A
A

In this case since we're comparing strings and lists that hold basically the same contents, and the end results are the same, I find that demonstrating the syntax for indexing into a specific element within the string or list on both the variable and the literal version helps remind us that the underlying containers for this data are different. Even if in these instances they're doing the same thing.

A
A

Whenever we put square brackets on the end of a sequence (be it a string, list or other sequences that we'll talk about in the future) I sometimes think of those square brackets being claws in one of those vending machine reaching in and grabbing a specific part of the sequence. Whether that's a helpful visual to you or not, I want you to remember that when we put square brackets at the end of a sequence like this, we're saying "Hey Python, get me this part of what's held in this variable."

To be thorough, let's demonstrate the same thing but grabbing the 3rd letter in each sequence: C. (So because we want to grab the 3rd letter we'll need to use index position 2.

C
C
C
C

Slicing

The syntax for slicing a list is also identical to that of string slicing. We put the square brackets like usual but this time instead of putting in a single index position we put a colon inside of the square brackets and on the left-hand side of the colon we put the starting index of the section of the sequence that we want to access. On the right-hand side of the colon we put the index that is after the last item that we want to access.

When we talked about strings we mentioned how it is sometimes useful to think about this as if it's interval notation, where the first index indicates the start of the inclusive start to an interval while the second index position is exclusive, meaning that the left-hand index is included in the slice, but the right-hand index is not included in the slice but anything before it is.

GG
['G', 'G']

Using Negative Indices

And of course if we can do indexing and slicing, we can also use negative indices as well.

T
T

Out of Range Error

We have to be careful when indexing into lists and strings however, becasue if we use and index position that is not actually contained within the sequence then Python will throw an "index out of range" error.

File "<exec>", line 1, in <module> IndexError: string index out of range
File "<exec>", line 1, in <module> IndexError: list index out of range

Length len() function

The len() function works on both strings and lists. For Strings it tells us the number of characters in the string, and for lists it tells us the number of elements (items) in the list.

12
12

Concatenation

If we're working with numeric datatypes (ints and floats) then the plus sign just does addition, but with sequences it does a very different operation called "concatenation". As we talked about with strings, all that the fancy word "concatenation" means is that it takes two things and sticks them together end to end.

With strings this means that they get combined into a single string

Hello World!

Lists can also be concatenated, and the two lists will be combined into a single list.

[1, 2, 3, 4, 5, 6]

Repetition

We can also cause a string or list to repeat itself using the repition operation (done using the multiplication symbol). Needing to do this isn't super common so we didn't cover it when we were going over lists, but if you multiply a string by a certain number you'll get a new string where the original is repeated that number of times.

ABCABCABC

The same thing happens with lists where the contents of the list will be repeated however many times we indicate when we multiply the list by the number of times that we want its contents to be repeated.

I don't use this repetition operation very often, but it's convenient on rare occasion where you want the contents of a string or list to follow a specific pattern.

['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']