1
|
1
|
new file mode 100644
|
...
|
...
|
@@ -0,0 +1,180 @@
|
|
1
|
+(
|
|
2
|
+
|
|
3
|
+# Summary
|
|
4
|
+
|
|
5
|
+Reads a file in chunks - perfect for when you have a small buffer or when you
|
|
6
|
+don't know the file size. Copes with files up to 4,294,967,295 bytes long.
|
|
7
|
+
|
|
8
|
+# Code
|
|
9
|
+
|
|
10
|
+)
|
|
11
|
+@file-read-chunks ( func* udata* buf* size* filename* -- func* udata'* buf* size* filename* )
|
|
12
|
+
|
|
13
|
+ #0000 DUP2 ( F* U* B* SZ* FN* OL* OH* / )
|
|
14
|
+ &resume
|
|
15
|
+ ROT2 STH2 ( F* U* B* SZ* OL* OH* / FN* )
|
|
16
|
+ ROT2 ( F* U* B* OL* OH* SZ* / FN* )
|
|
17
|
+
|
|
18
|
+ &loop
|
|
19
|
+ STH2kr .File/name DEO2 ( F* U* B* OL* OH* SZ* / FN* )
|
|
20
|
+ STH2k .File/length DEO2 ( F* U* B* OL* OH* / FN* SZ* )
|
|
21
|
+ STH2k .File/offset-hs DEO2 ( F* U* B* OL* / FN* SZ* OH* )
|
|
22
|
+ STH2k .File/offset-ls DEO2 ( F* U* B* / FN* SZ* OH* OL* )
|
|
23
|
+ SWP2 ( F* B* U* / FN* SZ* OH* OL* )
|
|
24
|
+ ROT2k NIP2 ( F* B* U* B* F* / FN* SZ* OH* OL* )
|
|
25
|
+ OVR2 .File/load DEO2 ( F* B* U* B* F* / FN* SZ* OH* OL* )
|
|
26
|
+ .File/success DEI2 SWP2 ( F* B* U* B* length* F* / FN* SZ* OH* OL* )
|
|
27
|
+ JSR2 ( F* B* U'* done-up-to* / FN* SZ* OH* OL* )
|
|
28
|
+ ROT2 SWP2 ( F* U'* B* done-up-to* / FN* SZ* OH* OL* )
|
|
29
|
+ SUB2k NIP2 ( F* U'* B* -done-length* / FN* SZ* OH* OL* )
|
|
30
|
+ ORAk ,¬-end JCN ( F* U'* B* -done-length* / FN* SZ* OH* OL* )
|
|
31
|
+
|
|
32
|
+ POP2 POP2r POP2r ( F* U'* B* / FN* SZ* )
|
|
33
|
+ STH2r STH2r ( F* U'* B* SZ* FN* / )
|
|
34
|
+ JMP2r
|
|
35
|
+
|
|
36
|
+ ¬-end
|
|
37
|
+ STH2r SWP2 ( F* U'* B* OL* -done-length* / FN* SZ* OH* )
|
|
38
|
+ LTH2k JMP INC2r ( F* U'* B* OL* -done-length* / FN* SZ* OH'* )
|
|
39
|
+ SUB2 ( F* U'* B* OL'* / FN* SZ* OH'* )
|
|
40
|
+ STH2r STH2r ( F* U'* B* OL'* OH'* SZ* / FN* )
|
|
41
|
+ ,&loop JMP
|
|
42
|
+
|
|
43
|
+(
|
|
44
|
+
|
|
45
|
+# Arguments
|
|
46
|
+
|
|
47
|
+* func* - address of callback routine
|
|
48
|
+* udata* - userdata to pass to callback routine
|
|
49
|
+* buf* - address of first byte of buffer of file's contents
|
|
50
|
+* size* - size in bytes of buffer
|
|
51
|
+* filename* - address of filename string (zero-terminated)
|
|
52
|
+
|
|
53
|
+All of the arguments are shorts (suffixed by asterisks *).
|
|
54
|
+
|
|
55
|
+# Callback routine
|
|
56
|
+
|
|
57
|
+If you make use of userdata, the signature of the callback routine is:
|
|
58
|
+)
|
|
59
|
+ ( udata* buf* length* -- udata'* done-up-to* )
|
|
60
|
+(
|
|
61
|
+
|
|
62
|
+* udata* and buf* are as above.
|
|
63
|
+* length* is the length of the chunk being worked on, which could be less than
|
|
64
|
+ size* when near the end of the file, and func* is called with zero length* to
|
|
65
|
+ signify end of file.
|
|
66
|
+* udata'* is the (potentially) modified userdata, to be passed on to the next
|
|
67
|
+ callback routine call and returned by file-read-chunks after the last chunk.
|
|
68
|
+* done-up-to* is the pointer to the first unprocessed byte in the buffer, or
|
|
69
|
+ buf* + length* if the whole chunk was processed.
|
|
70
|
+
|
|
71
|
+If you don't make use of any userdata, feel free to pretend the signature is:
|
|
72
|
+)
|
|
73
|
+ ( buf* length* -- done-up-to* )
|
|
74
|
+(
|
|
75
|
+
|
|
76
|
+# Userdata
|
|
77
|
+
|
|
78
|
+The udata* parameter is not processed by file-read-chunks, except to keep the
|
|
79
|
+one returned from one callback to the next. The meaning of its contents is up
|
|
80
|
+to you - it could simply be a short integer or a pointer to a region of memory.
|
|
81
|
+
|
|
82
|
+# Operation
|
|
83
|
+
|
|
84
|
+file-read-chunks reads a file into the buffer you provide and calls func* with
|
|
85
|
+JSR2 with each chunk of data, finishing with an empty chunk at end of file.
|
|
86
|
+
|
|
87
|
+file-read-chunks loops until done-up-to* equals buf*, equivalent to when no
|
|
88
|
+data is processed by func*. This could be because processing cannot continue
|
|
89
|
+without a larger buffer, an error is detected in the data and further
|
|
90
|
+processing is pointless, or because the end-of-file empty chunk leaves the
|
|
91
|
+callback routine with no other choice.
|
|
92
|
+
|
|
93
|
+# Return values
|
|
94
|
+
|
|
95
|
+Since file-read-chunks's input parameters remain available throughout its
|
|
96
|
+operation, they are not automatically discarded in case they are useful to the
|
|
97
|
+caller.
|
|
98
|
+
|
|
99
|
+# Discussion about done-up-to*
|
|
100
|
+
|
|
101
|
+file-read-chunks is extra flexible because it doesn't just give you one chance
|
|
102
|
+to process each part of the file. Consider a func* routine that splits the
|
|
103
|
+chunk's contents into words separated by whitespace. If the buffer ends with a
|
|
104
|
+letter, you can't assume that letter is the end of that word - it's more likely
|
|
105
|
+to be the in the middle of a word that continues on. If func* returns the
|
|
106
|
+address of the first letter of the word so far, it will be called again with
|
|
107
|
+that first letter as the first character of the next chunk's buffer. There's no
|
|
108
|
+need to remember the earlier part of the word because you get presented with
|
|
109
|
+the whole lot again to give parsing another try.
|
|
110
|
+
|
|
111
|
+That said, func* must make at least _some_ progress through the chunk: if it
|
|
112
|
+returns the address at the beginning of the buffer, buf*, file-read-chunks will
|
|
113
|
+terminate and return to its caller. With our word example, a buffer of ten
|
|
114
|
+bytes will be unable to make progress with words that are ten or more letters
|
|
115
|
+long. Depending on your application, either make the buffer big enough so that
|
|
116
|
+progress should always be possible, or find a way to discern this error
|
|
117
|
+condition from everything working fine.
|
|
118
|
+
|
|
119
|
+# Discussion about recursion
|
|
120
|
+
|
|
121
|
+Since all of file-read-chunks's data is on the working and return stacks, it
|
|
122
|
+can be called recursively by code running in the callback routine. For example,
|
|
123
|
+a code assembler can process the phrase "include library.tal" by calling
|
|
124
|
+file-read-chunks again with library.tal as the filename. There are a couple of
|
|
125
|
+caveats:
|
|
126
|
+
|
|
127
|
+* the filename string must not reside inside file-read-chunk's working buffer,
|
|
128
|
+ otherwise it gets overwritten by the file's contents and subsequent chunks
|
|
129
|
+ will fail to be read properly; and
|
|
130
|
+
|
|
131
|
+* if the buffer is shared with the parent file-read-chunk, the callback routine
|
|
132
|
+ should stop further processing and return with done-up-to* straight away,
|
|
133
|
+ since the buffer contents have already been replaced by the child
|
|
134
|
+ file-read-chunk.
|
|
135
|
+
|
|
136
|
+# Resuming / starting operation from an arbitrary offset
|
|
137
|
+
|
|
138
|
+You can call file-read-chunks/resume instead of the main routine if you'd like
|
|
139
|
+to provide your own offset shorts rather than beginning at the start of the
|
|
140
|
+file. The effective signature for file-read-chunks/resume is:
|
|
141
|
+)
|
|
142
|
+ ( func* udata* buf* size* filename* offset-ls* offset-hs* -- func* udata'* buf* size* filename* )
|
|
143
|
+(
|
|
144
|
+
|
|
145
|
+# Example callback routines
|
|
146
|
+
|
|
147
|
+This minimal routine is a no-op that "processes" the entire buffer each time
|
|
148
|
+and returns a valid done-up-to*:
|
|
149
|
+
|
|
150
|
+ @quick-but-useless
|
|
151
|
+ ADD2 JMP2r
|
|
152
|
+
|
|
153
|
+This extremely inefficient callback routine simply prints a single character
|
|
154
|
+from the buffer and asks for the next one. It operates with a buffer that is
|
|
155
|
+just one byte long, but for extra inefficiency you can assign a much larger
|
|
156
|
+buffer and it will ignore everything after the first byte each time. If the
|
|
157
|
+buffer is zero length it returns done-up-to* == buf* so that file-read-chunks
|
|
158
|
+returns properly.
|
|
159
|
+
|
|
160
|
+ @one-at-a-time
|
|
161
|
+ #0000 NEQ2 JMP JMP2r
|
|
162
|
+ LDAk .Console/write DEO
|
|
163
|
+ INC2 JMP2r
|
|
164
|
+
|
|
165
|
+This more efficient example writes the entire chunk to the console before
|
|
166
|
+requesting the next one by returning. How short can you make a routine that
|
|
167
|
+does the same?
|
|
168
|
+
|
|
169
|
+ @chunk-at-a-time
|
|
170
|
+ &loop
|
|
171
|
+ ORAk ,¬-eof JCN
|
|
172
|
+ POP2 JMP2r
|
|
173
|
+
|
|
174
|
+ ¬-eof
|
|
175
|
+ STH2
|
|
176
|
+ LDAk .Console/write DEO
|
|
177
|
+ INC2 STH2r #0001 SUB2
|
|
178
|
+ ,&loop JMP
|
|
179
|
+
|
|
180
|
+)
|