Skip to content

Commit 1ab860e

Browse files
authored
Merge pull request mikeckennedy#21 from diazgilberto/gd-app-8
Transcript for app 8
2 parents 2b2b39c + 81a0e2c commit 1ab860e

File tree

11 files changed

+51
-63
lines changed

11 files changed

+51
-63
lines changed

transcripts/txt/08_app/1.txt

+2-3
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
0:44 And then, in the program.py on line 24, 
1616
0:46 we found where we were using the name space from the other module 
1717
0:49 to actually allocate or initialize a dragon and this is what a dragon with level 50- 
18-
0:54 let me just say that's it, total number matches was two, we are done. 
18+
0:54 Now we just say that's it, total number matches was two, we are done. 
1919
0:59 You might think that this app is largely about files. 
2020
1:02 But it's not, we've spent a lot of time on working with directories and files and so on, 
2121
1:06 we will have to do that, yes, but that's not the primary focus, 
@@ -32,5 +32,4 @@
3232
1:48 We are going to talk about path operations 
3333
1:50 and very basic string searching 
3434
1:52 but we are going to focus on this concept of a processing pipeline 
35-
1:55 and we just happen to use files directories and text as our input. 
36-
35+
1:55 and we just happen to use files directories and text as our input.

transcripts/txt/08_app/10.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,4 @@
2828
1:34 and passing along until finally some sort of pipeline spits out 
2929
1:38 a smaller transformed set of items, 
3030
1:40 the composition of generator methods is amazing there. 
31-
1:43 And that's what we are going to build in our file searcher app, when we get back to it. 
32-
31+
1:43 And that's what we are going to build in our file searcher app, when we get back to it.

transcripts/txt/08_app/11.txt

+5-7
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
0:55 but we can actually do better still 
1919
1:00 so this is a generator method, this is a regular one,
2020
1:03 but we can also apply the exactly same idea here and the same idea here, 
21-
1:07 now this gets a little tricky because I have to say for m in matches yield m, 
21+
1:07 now this gets a little tricky because I have to say for m in matches: yield m, 
2222
1:14 now, that's not the most fun thing to write, 
2323
1:16 it would work but I'll show you something better, 
2424
1:17 same thing down here for all the matches there we want to do that, 
@@ -28,8 +28,8 @@
2828
1:30 it's going to go until it hits one of these, 
2929
1:33 which the generator and it's going to hand one back 
3030
1:36 so if we only wanted the first 4 matches 
31-
1:39 we could compute that extremely quickly
32-
1:40 however, this line 65, 66 this is not the coolest thing, 
31+
1:39 we could compute that extremely quickly.
32+
1:40 However, this line 65, 66 this is not the coolest thing, 
3333
1:44 it turns out that python 3.3 added basically a keyword that will do the same thing
3434
1:51 like take a whole set and sort of hand them back one at a time, 
3535
1:54 and so we can simplify this and just say yield from matches, 
@@ -70,8 +70,7 @@
7070
4:10 is a single line in memory, ok, 
7171
4:13 great we do have the file stream open to some huge file at some point, 
7272
4:16 but we are seeking over, we are streaming across it. 
73-
4:20 Let's just let it run and see where it goes. 
74-
4:22 
73+
4:20 Let's just let it run and see where it goes.
7574
4:28 It's done, look at that, 
7675
4:29 look at the memory usage, look at the CPU, 
7776
4:34 look at the performance, it is so much better than it was before, 
@@ -98,5 +97,4 @@
9897
5:49 to create these pipelines basically effortlessly, 
9998
5:52 we'll see that there is even a simpler way to create
10099
5:55 this type of structure something called a generator expression, 
101-
5:58 right, but we'll save that for the next app. 
102-
100+
5:58 right, but we'll save that for the next app.

transcripts/txt/08_app/2.txt

+14-15
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
0:01 So let's just take a moment and sketch out the flow of this application 
22
0:04 just to build a skeleton as we always do. 
3-
0:06 So, we have a main method, and we'll do something here 
4-
0:09 and of course we'll use the live template from main to call it down here, like so, 
3+
0:06 So, we have a main() method, and we'll do something here 
4+
0:09 and of course we'll use the live template from main() to call it down here, like so, 
55
0:13 from PyCharm, and then, the next thing we want to do 
66
0:16 of course we are going to say define print header, 
77
0:18 what we need to do is ask the user 
8-
0:20 hey what directories subtreat you want to search 
8+
0:20 hey what directories subtree you want to search 
99
0:23 and what do you want to search for, 
1010
0:25 so those are the next two things we are going to need, 
11-
0:26 we'll say get folder from user
12-
0:29 and similarly we'll say get search text from user
13-
0:34 then let's just define a search file method here, 
11+
0:26 we'll say get_folder_from_user()
12+
0:29 and similarly we'll say get_search_text_from_user()
13+
0:34 then let's just define a search_file() method here, 
1414
0:37 and we'll figure out the parameters in a minute, ok, 
1515
0:40 we'll reformat so we are all good, via pep 8 and let's start writing, 
16-
0:44 so print header, you guys know this, this is old hat by now, 
16+
0:44 so print_header(), you guys know this, this is old hat by now, 
1717
0:48 so we'll just fly through it, 
1818
0:57 next we are going to get a folder from the user, 
19-
0:59 we'll say get folder form user and let's just do
20-
1:01 a quick little test like if they enter nothing we would rather not have our app crush 
19+
0:59 we'll say get_folder_form_user() and let's just do
20+
1:01 a quick little test like if they enter nothing we would rather not have our app crash 
2121
1:06 we'll just say hey search that, moreover, 
2222
1:10 if they enter folder that doesn't exist, 
2323
1:14 we'd also want to build the deal with this so just do a test, 
@@ -40,17 +40,17 @@
4040
2:08 something like that so we can hit control t and rename this 
4141
2:11 and go I want to search folders, like that, and do the refactoring, 
4242
2:16 and down here and if there was 100 files all potentially leveraging this 
43-
2:20 and doc strings leveraging and so on, 
43+
2:20 and docstrings leveraging and so on, 
4444
2:22 all of that would have been fixed, 
4545
2:25 now we need our folders and text of course, excellent, 
4646
2:28 so these should be pretty easy to write,
4747
2:29 let's just come here and pull this out, 
4848
2:32 now we are going to do a little more than just get the text that is the folder, 
4949
2:36 we are actually going to verify it. 
5050
2:38 So ask the user what folder do you want to search, 
51-
2:40 then we'll say if not folder so for some reason it came back empty 
51+
2:40 then we'll say if not folder: so for some reason it came back empty 
5252
2:43 I don't think it can ever come back as none 
53-
2:45 but let's just verify it, and we'll say or if possibly the folder is just white space, right, 
53+
2:45 but let's just verify it, let's little safe and we'll say or if possibly the folder is just white space, right, 
5454
2:53 so in either case we'll just return none so we have nothing, 
5555
2:56 there is no folder and that will trigger remember up here 
5656
3:01 that will trigger this and say no, it doesn't work. 
@@ -62,7 +62,7 @@
6262
3:23 if you give it a file versus a folder, but we are not doing that right now, 
6363
3:25 so we are just going to say if it's not a directory 
6464
3:30 we'll just return none and then finally, 
65-
3:32 let's clean this up a little bit and say we'll return OS.path.absolutepath 
65+
3:32 let's clean this up a little bit and say we'll return OS.path.absolutepath() 
6666
3:37 so we have a nice absolute path for a folder instead of something relative, 
6767
3:41 as you will see having an absolute path is helpful for later on. 
6868
3:45 Next, let's just get some search text here, 
@@ -85,5 +85,4 @@
8585
4:40 so we would search/users/screencaster/desktop_08_file seracher for cats. 
8686
4:46 Perfect, so we have kind of all this parts of like user input 
8787
4:49 and everything completely done, and now
88-
4:51 it's just a matter of implementing the search method. 
89-
88+
4:51 it's just a matter of implementing the search method.

transcripts/txt/08_app/3.txt

+7-9
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
0:01 Ok, let's implement the actual search. 
22
0:03 The first thing that we need to do is actually go to this folder and find all the files. 
33
0:08 So again, our friend OS will help. 
4-
0:10 We can say OS.list.dir and give it a folder, 
4+
0:10 We can say OS.listdir() and give it a folder, 
55
0:14 and here this is going to return all of the items in here so let's say items, 
66
0:18 notice I am not calling them files because 
77
0:21 sometimes they are folders, sometimes they are files. 
88
0:24 So, we are going to do our loop, 
9-
0:27 we are going to say for item in items and we want to check, 
10-
0:29 well if this is a folder we don't want to do this so we'll say if os.path.isdir item 
9+
0:27 we are going to say for item in items: and we want to check, 
10+
0:29 well if this is a folder we don't want to do this so we'll say if os.path.isdir(item ):
1111
0:36 we are going to not process this item but we want to keep going to the loop 
1212
0:40 and the perfect keyword for that is continue, 
1313
0:43 so basically go back to the top of the loop, 
@@ -21,7 +21,7 @@
2121
1:06 is when we say listdir this only gives us the file name,
2222
1:09 not the full path name so either way whether it's a directory or a file, 
2323
1:13 we need to do something like this, 
24-
1:15 we'll go full item = os.path.join and remember, 
24+
1:15 we'll go full item = os.path.join() and remember, 
2525
1:21 we made sure that this is an absolute path 
2626
1:24 and then we want to join up the subitem, perfect, 
2727
1:26 so here we got the check full item 
@@ -48,7 +48,7 @@
4848
2:31 then we want to just go through each line in the file 
4949
2:33 and check it to see if the search text appears in it, 
5050
2:38 so it turns out that these file streams are iterable 
51-
2:41 so I could say for line in fin and in a really nice way 
51+
2:41 so I could say for line in fin: and in a really nice way 
5252
2:45 just sort of smoothly stream over them without loading the whole file of them 
5353
2:50 in the memory as like an array of strings or something like this, 
5454
2:52 this is also going to be really key way to use these generator methods later on 
@@ -66,7 +66,7 @@
6666
3:31 so they actually found find found the substring. 
6767
3:35 The other thing we got to do is make sure that this is lower case, 
6868
3:38 we could do that once above here instead of every time we had a file, 
69-
3:44 so let's do something like return text.lower. 
69+
3:44 so let's do something like return text.lower()
7070
3:47 Cool, so if this is the case let's for now just somehow collect up this line of text, 
7171
3:53 we are going to see that this is not the ideal way to do this
7272
3:55 and we'll fix it just in a moment, 
@@ -100,7 +100,6 @@
100100
5:36 and the phrase we want to search for is let's say "friends", 
101101
5:42 we'll that's pretty fantastic, 
102102
5:43 except for we forgot to print out the results, let's do that really quick. 
103-
5:47 
104103
5:53 Ok, so we'll capture the matches and for each one int here we'll just print this out. 
105104
5:56 Try again. 
106105
5:57 Search for friends and you can see we found some results, look at that. 
@@ -116,5 +115,4 @@
116115
6:26 but it does look like it's working, 
117116
6:28 now what line in what book did that appear in? 
118117
6:31 I have no idea, so our next job is to fix that so we actually return 
119-
6:35 more information about our search. 
120-
118+
6:35 more information about our search.

transcripts/txt/08_app/4.txt

+11-12
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
0:00 So it's time to improve the results that come out of here, 
22
0:03 remember, we are just passing the actual lines of a bunch of files,
3-
0:06  we don't really even know which ones those are, 
3+
0:06 we don't really even know which ones those are, 
44
0:09 we just start with the folder that have some kind of match, 
55
0:11 so we want to bundle some information together about these matches, 
66
0:14 now we could use classes and those are very powerful, 
77
0:17 but really we just need things like what's the line number, 
88
0:20 what's the text, what's the file name and so on. 
99
0:23 So it turns out named tuples are perfect, so again, 
10-
0:27 we'll just use our collections of a search result equals 
11-
0:31 we want to use our collections.namedtuple 
10+
0:27 we'll just use our collections of a SearchResult =
11+
0:31 we want to use our collections.namedtuple()
1212
0:34 and then the first thing is the type name 
1313
0:38 the second argument are basically the fields 
14-
0:42so it will be file, file line and text, 
14+
0:42 so it will be file, file line and text, 
1515
0:45 let's say those are the three things that we are going to return there. 
1616
0:50 Now let's change this, so not just append a line but actually make results, 
17-
0:54 so we'll say m for match it's going to be a search results 
17+
0:54 so we'll say m for match it's going to be a SearchResults() 
1818
0:59 and in here what are we going to add, 
19-
1:01 we want to say line = line I think I called it and file = file name 
20-
1:07 and the text = actually, text = line, that's the line of text 
19+
1:01 we want to say line = line I think I called it and file = filename 
20+
1:07 and the text = ... actually, text = line, that's the line of text 
2121
1:14 and I need the line number here, 
2222
1:18 so we have to compute this ourselves so way we are doing it but that's ok, 
2323
1:24 ok, so now we are going to append that and let's just run it again, 
@@ -30,7 +30,7 @@
3030
1:46 they all talk about friends. 
3131
1:49 Ok, that works, but let's do a little bit nicer output so we can read it here, 
3232
1:52 let's go at the top where we are doing this print and for a little bit let's go ok, stop this,
33-
1:58 what we are going to do is we are going to say print 
33+
1:58 what we are going to do is we are going to say print() 
3434
2:00 and we'll do something like this, there is a match 
3535
2:04 ok so we'll say match then we'll print out the file, the line and the actual match text, 
3636
2:09 let's run it again. 
@@ -44,8 +44,8 @@
4444
2:37 remember, when we read an individual line from the files 
4545
2:40 they actually have the new line on there so that new line is appearing here 
4646
2:44 and I'd rather take more control over the text rather than assume that 
47-
2:47 that's always going to be there so let's just say strip, 
48-
2:50 now it could do an r strip just to get the stuff off the end or maybe
47+
2:47 that's always going to be there so let's just say strip()
48+
2:50 now it could do an rstrip() just to get the stuff off the end or maybe
4949
2:54  if there is white space you want to pull it to the font, anyway so let's do it this way. 
5050
3:00 Great, it looks like our searching is working perfectly. 
5151
3:07 Now remember, we are giving it this folder 
@@ -60,5 +60,4 @@
6060
3:38 but it turns out the most natural way to solve these hierarchical problems are 
6161
3:43 to use something called recursion,
6262
3:45 so let's take a moment and go look at this concept 
63-
3:47 and then we'll come back and apply ti to our application. 
64-
63+
3:47 and then we'll come back and apply ti to our application.

transcripts/txt/08_app/5.txt

+3-4
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
0:06 where we can just play around with some ideas. 
44
0:08 We'll do this for recursion but we also do this for our generator methods later. 
55
0:12 Let's take a really simple case something like creating the factorial of a number. 
6-
0:16 Now remember, factorials are sort of iterative processes 
7-
0:21 the factorial written with like a number and an exclamation point right 
6+
0:16 Now remember, factorials are sort of iterative processes.
7+
0:21 The factorial written with like a number and an exclamation point right 
88
0:25 so like let's say 5! = 120
99
0:29 and the way that you get that is you take 5*4*3*2*1 
1010
0:37 and each step if I start here the thing I need to multiply is 
@@ -48,5 +48,4 @@
4848
2:56 will make a difference on the small ones but on the larger one 
4949
2:59 you can see we now have separated those with digit grouping. 
5050
3:03 So this is a recursive algorithm, let's take a moment 
51-
3:06 and look at this as it's one of our core concepts.
52-
51+
3:06 and look at this as it's one of our core concepts.

transcripts/txt/08_app/6.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -25,5 +25,4 @@
2525
1:26 with the types of problems you are solving, 
2626
1:28 if you are dealing with hierarchical data structures 
2727
1:30  or you are dealing with these iteratively defined algorithms, 
28-
1:33 recursion may be what you need. 
29-
28+
1:33 recursion may be what you need. 

transcripts/txt/08_app/7.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -50,5 +50,4 @@
5050
2:36 but if the number of files and the quantity of files gets to be gigabytes of text 
5151
2:41 you'll see there is a severe performance problem especially around memory 
5252
2:45 and we can do much better using some very cool features 
53-
2:49 called generator methods that's what we are going to do next. 
54-
53+
2:49 called generator methods that's what we are going to do next.

transcripts/txt/08_app/8.txt

+2-3
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
0:48 well the app actually does surprisingly well, 
1818
0:52 it really does go through and it finally has results and so on, 
1919
0:54 but if we leverage this concept of generator methods and related things 
20-
0:58 that will build on other applications, we can actually do amazingly better, ok, 
20+
0:58 that will build on other applications further down the line, we can actually do amazingly better, ok, 
2121
1:04 so just to make sure everything is working on Windows, 
2222
1:08 let me just search the same stuff here, 
2323
1:12 ok so we want to search c/users/mkennedy/desktop/books 
@@ -67,5 +67,4 @@
6767
4:08 this is not the most amazing outcome that we could have had. 
6868
4:11 It turns out it took almost 400 MB the way we implemented our algorithm, 
6969
4:17 and depending on the how we hold the data or the size of the data,
70-
4:20 it could be even worse. 
71-
70+
4:20 it could be even worse.

transcripts/txt/08_app/9.txt

+4-4
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
1:49 so let's actually print this out in a slightly different way 
3131
1:51 where we have a little more control to go through, 
3232
1:53 so let's say for n in Fibonacci like so, 
33-
1:57 and we'll just print in and we'll do that end = just a little comma thing 
33+
1:57 and we'll just print(n) and we'll do that end = just a little comma thing 
3434
2:01 so we can have on the same line, let's run that, perfect, 
3535
2:04 it looks basically the same but here is the key thing, 
3636
2:07 let's put a break point here and actually step through, 
@@ -102,13 +102,13 @@
102102
6:06 let's suppose we want it all the Fibonacci numbers, 
103103
6:09 the infinite sequence of numbers. 
104104
6:12 We could remove this limit and we could just say I want to do this forever, 
105-
6:16 now obviously doing this forever is going to sort of be a problem, it won't crush,
105+
6:16 now obviously doing this forever is going to sort of be a problem, it won't crash,
106106
6:22 it will just keep going and getting slower and slower, and so on, 
107107
6:24 but it does let us down here as a consumer decide when we've had enough 
108-
6:29 so we could say if n > 100 then break, 
108+
6:29 so we could say if n > 1000 then break, 
109109
6:34 but we have access to the entire infinite series. 
110110
6:38 Now, if I did this with the list, this would just run for a long time, 
111-
6:41 run out of memory and then crush, but that's not what happens here, 
111+
6:41 run out of memory and then crash, but that's not what happens here, 
112112
6:45 we just get one compute the next, compute the next 
113113
6:47 and any step along that path is super cheap, 
114114
6:50 basically a couple of additions and a return value, so let's run this again. 

0 commit comments

Comments
 (0)