new-downloader performs badly when downloading a lot of data from a file #1170

Closed
opened 2010-08-12 06:15:42 +00:00 by zooko · 209 comments
zooko commented 2010-08-12 06:15:42 +00:00
Owner

Some measurements:

run..... version downloaded download KBps flags
101 1.8.0c2 100 MB 91
--- --- --- --- ---
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
--- --- --- --- ---
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
--- --- --- --- ---
111 1.8.0c2 413 MB 19 spanstrace
--- --- --- --- ---
112 1.8.0c2 456 MB 113 spanstrace, spans.py.diff
--- --- --- --- ---
113 1.7.1 543 MB 241 on office network
114 1.8.0c2 456 MB 147 spans.py.diff + comment:121237
--- --- --- --- ---
115 1.7.1 456 MB 224
116 1.8.0c2 314 MB 154 spans.py.diff + comment:121237 , used 3*nszi
--- --- --- --- ---
117 1.8.0c2 36 MB 160 cProfile spans.py.diff + comment:121237
--- --- --- --- ---
118 1.8.0c2 1490 MB 158 cProfile spans.py.diff + comment:121237
--- --- --- --- ---
119 1.7.1 543 MB 99 on office network, noflog
120 1.8.0c2 543 MB 319 on office network, noflog, spans.py.diff + comment:121237
--- --- --- --- ---
121 1.7.1 543 MB 252 on office network, noflog
122 1.8.0c2 1490 MB 157 noflog, spans.py.diff + comment:121237, nologging in immutable/download/share.py
--- --- --- --- ---
123 1.8.0c2 100 MB 242 noflog, spans.py.diff + comment:121237, nologging in immutable/download/*.py
--- --- --- --- ---
124 1.7.1 100 MB 219 noflog
125 1.8.0c2 100 MB 161 noflog, spans.py.diff + comment:121237, nologging in immutable/download/*.py
--- --- --- --- ---
126 1.7.1 100 MB 223 noflog
127 1.8.0c2 100 MB 155 noflog, spans.py.diff + comment:121237, nologging in immutable/download/*.py
--- --- --- --- ---
129 1.8.0c2 100 MB 291 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
130 1.8.0c2 100 MB 179 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
131 1.8.0c2 100 MB 276 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
132 1.8.0c2 100 MB 179 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
133 1.8.0c2 100 MB 279 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
134 1.8.0c2 100 MB 262 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
135 1.8.0c2 100 MB 180 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
136 1.8.0c2 100 MB 284 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
137 1.8.0c2 100 MB 286 on office network, noflog, 1170-combo.diff
--- --- --- --- ---
138 1.7.1 867 MB 265 on office network, noflog
139 1.8.0c2 866 MB 169 on office network, noflog, 1170-combo.diff used 1sp26,1nszi,1*fp3x
--- --- --- --- ---
140 1.7.1 100 MB 169 terrell, home cable modem
141 1.8.0c2 100 MB 223 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*4rk5
--- --- --- --- ---
142 1.8.0c2 100 MB 215 terrell, home cable modem, 1170-combo.diff
--- --- --- --- ---
143 1.8.0c2 100 MB 235 terrell, home cable modem, 1170-combo.diff
--- --- --- --- ---
144 1.7.1 100 MB 159 terrell, home cable modem
145 1.8.0c2 100 MB 131 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*fp3x
--- --- --- --- ---
146 1.7.1 100 MB 184 terrell, home cable modem
147 1.8.0c2 100 MB 233 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*4rk5
--- --- --- --- ---
148 1.7.1 100 MB 172 terrell, home cable modem
149 1.8.0c2 100 MB 259 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*4rk5
--- --- --- --- ---
150 1.7.1 100 MB 141 terrell, home cable modem
151 1.8.0c2 100 MB 139 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*sroo
--- --- --- --- ---
152 1.7.1 100 MB 159 terrell, home cable modem
153 1.8.0c2 100 MB 241 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*4rk5
--- --- --- --- ---
154 1.7.1 100 MB 172 terrell, home cable modem
155 1.8.0c2 100 MB 229 terrell, home cable modem, 1170-combo.diff used 1sp26,1nszi,1*4rk5
--- --- --- --- ---
156 1.7.1 100 MB 159 terrell, home cable modem
157 1.8.0c2 100 MB 174 terrell, home cable modem, 1170-combo.diff used 1nszi,14rk5,1*fp3x
--- --- --- --- ---
158 1.7.1 100 MB 173 terrell, home cable modem
159 1.8.0c2 100 MB 262 terrell, home cable modem, 1170-combo.diff
--- --- --- --- ---
160 1.7.1 100 MB 152 terrell, home cable modem
161 1.8.0c2 100 MB 248 terrell, home cable modem, 1170-combo.diff
--- --- --- --- ---
162 1.7.1 100 MB 135 terrell, home cable modem
1001 1.7.1 100 MB 201 used 1sp26,1sroo,1*4rk5
1002 1.8.0c2 100 MB 186 1170-combo.diff used 3nsziz,1sp26,1*fp3x
--- --- --- --- ---
2000 1.7.1 100 MB 187 used 1sp26,1sroo,1*4rk5
2001 1.8.0c2 100 MB 172 1170-combo.diff used 2sp26,2fp3x,1*4rk5
--- --- --- --- ---
2002 1.7.1 100 MB 220 used 1sp26,1sroo,1*4rk5
2003 1.8.0c2 100 MB 214 1170-combo.diff used 2sp26,2nszi,1*4rk5
--- --- --- --- ---
2004 1.7.1 100 MB 188 used 1sp26,1sroo,1*4rk5
2005 1.8.0c2 100 MB 164 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2006 1.7.1 100 MB 193 used 1sp26,1sroo,1*4rk5
2007 1.8.0c2 100 MB 163 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2008 1.7.1 100 MB 188 used 1sp26,1sroo,1*4rk5
2009 1.8.0c2 100 MB 167 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2010 1.7.1 100 MB 222 used 1sp26,1sroo,1*4rk5
2011 1.8.0c2 100 MB 171 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2012 1.7.1 100 MB 208 used 1sp26,1sroo,1*4rk5
2013 1.8.0c2 100 MB 171 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2014 1.7.1 100 MB 216 used 1sp26,1sroo,1*4rk5
2015 1.8.0c2 100 MB 174 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2016 1.7.1 100 MB 212 used 1sp26,1sroo,1*4rk5
2017 1.8.0c2 100 MB 172 1170-combo.diff used 2sp26,2nszi,1*fp3x
--- --- --- --- ---
2018 1.7.1 100 MB 204 used 1sp26,1sroo,1*4rk5
2019 1.8.0c2 100 MB 222 1170-combo.diff used 2sp26,2nszi,1*4rk5
--- --- --- --- ---
3000 1.7.1 333 MB 164 negativland.ogg used 2fp3x,1tavr
3001 1.8.0c2 333 MB 91 negativland.ogg used 2*fp3x,2(1)*tavr
--- --- --- --- ---
3002 1.7.1 224 MB 28 negativland.ogg used 2fp3x,1tavr
3100 1.7.1 333 MB 90 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
3101 1.8.0c2 0 MB n/a terrell, home cable modem, negativland.ogg
--- --- --- --- ---
3102 1.7.1 0 MB n/a terrell, home cable modem, negativland.ogg
3103 1.8.0c2 0 MB n/a terrell, home cable modem, negativland.ogg
--- --- --- --- ---
3104 1.7.1 333 MB 92 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
3105 1.8.0c2 333 MB 95 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
3106 1.7.1 333 MB 93 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
3107 1.8.0c2 333 MB 93 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
3108 1.7.1 333 MB 92 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
3109 1.8.0c2 333 MB 91 terrell, home cable modem, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
4000 1.7.1 333 MB 93 on office network, negativland.ogg used 2fp3x,1tavr
4001 1.8.0c2 333 MB 94 on office network, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
4002 1.7.1 333 MB 193 on office network, negativland.ogg used 1fp3x,2tavr
4003 1.8.0c2 333 MB 94 on office network, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
4004 1.7.1 333 MB 189 on office network, negativland.ogg used 1fp3x,2tavr
4005 1.8.0c2 333 MB 93 on office network, negativland.ogg used 2fp3x,1tavr
--- --- --- --- ---
4006 1.7.1 333 MB 189 on office network, negativland.ogg used 1fp3x,2tavr
Some measurements: |run.....| version | downloaded | download KBps | flags | |---|---|---|---|---| |[101](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121197) |1.8.0c2 | 100 MB | 91 | | |---|---|---|---|---| | [102](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121198) | 1.7.1 | 100 MB | 182 | | | [103](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121201) | 1.7.1 | 100 MB | 207 | | |[104](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121202) |1.8.0c2 | 100 MB | 82 | | |---|---|---|---|---| | [105](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121207) | 1.7.1 | 100 MB | 211 | | | [109](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121217) | 1.7.1 | 100 MB | 228 | cProfile | |[110](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121218) |1.8.0c2 | 100 MB | 113 |cProfile | |---|---|---|---|---| |[111](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121230) |1.8.0c2 | 413 MB | 19 |spanstrace | |---|---|---|---|---| |[112](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121232) |1.8.0c2 | 456 MB | 113 |spanstrace, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) | |---|---|---|---|---| | [113](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121235) | 1.7.1 | 543 MB | 241 | on office network | |[114](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121240) |1.8.0c2 | 456 MB | 147 |[spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | |---|---|---|---|---| | [115](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121241) | 1.7.1 | 456 MB | 224 | | [flog](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-77f68a69fe1c) | |[116](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121242) |1.8.0c2 | 314 MB | 154 |[spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) , used 3*nszi | [flog](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-40a6472a8f45) | |---|---|---|---|---|---| |[117](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121243) |1.8.0c2 | 36 MB | 160 |cProfile [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | [prof](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8af1f325894f) | |---|---|---|---|---|---| |[118](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121247) |1.8.0c2 | 1490 MB | 158 |cProfile [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | [prof](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4768a7c938ce) | |---|---|---|---|---|---| | [119](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4c0519c5ea0e) | 1.7.1 | 543 MB | 99 |on office network, noflog | |[120](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4c0519c5ea0e) |1.8.0c2 | 543 MB | 319 |on office network, noflog, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | |---|---|---|---|---| | [121](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4c0519c5ea0e) | 1.7.1 | 543 MB | 252 |on office network, noflog | |[122](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) |1.8.0c2 | 1490 MB | 157 |noflog, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237), nologging in immutable/download/share.py | |---|---|---|---|---| | [123](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) |1.8.0c2 | 100 MB | 242 |noflog, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237), nologging in immutable/download/*.py |= [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-6ae14db5efcd) | |---|---|---|---|---|| | [124](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) | 1.7.1 | 100 MB | 219 | noflog | |[125](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) |1.8.0c2 | 100 MB | 161 |noflog, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237), nologging in immutable/download/*.py | |---|---|---|---|---| | [126](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) | 1.7.1 | 100 MB | 223 | noflog | |[127](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1aaa2e831408) |1.8.0c2 | 100 MB | 155 |noflog, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237), nologging in immutable/download/*.py |= [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5f127f8ac5c5) | |---|---|---|---|---|| |[129](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 291 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[130](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 179 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[131](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 276 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[132](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 179 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[133](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 279 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[134](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 262 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[135](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 180 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[136](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 284 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| |[137](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 100 MB | 286 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | |---|---|---|---|---| | [138](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) | 1.7.1 | 867 MB | 265 |on office network, noflog | |[139](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d22015ae7b30) |1.8.0c2 | 866 MB | 169 |on office network, noflog, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-c0ab08d6783c) | |---|---|---|---|---|---| | [140](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1193f2ee1bd7) | 1.7.1 | 100 MB | 169 |terrell, home cable modem | |[141](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1193f2ee1bd7) |1.8.0c2 | 100 MB | 223 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-b534dc5d156e) | |---|---|---|---|---|---| |[142](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121254) |1.8.0c2 | 100 MB | 215 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-9a59fed22a26) | |---|---|---|---|---|---| |[143](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 235 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-36d398c6db20) | |---|---|---|---|---|---| | [144](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 159 |terrell, home cable modem | |[145](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 131 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-a28447c405c6) | |---|---|---|---|---|---| | [146](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 184 |terrell, home cable modem | |[147](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 233 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-876468ff3d56) | |---|---|---|---|---|---| | [148](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 172 |terrell, home cable modem | |[149](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 259 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4f6dfe57f809) | |---|---|---|---|---|---| | [150](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 141 |terrell, home cable modem | |[151](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 139 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*sroo | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-fa7af15a79ba) | |---|---|---|---|---|---| | [152](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 159 |terrell, home cable modem | |[153](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 241 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ed3aba842c0d) | |---|---|---|---|---|---| | [154](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 172 |terrell, home cable modem | |[155](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 229 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*sp26,1*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d931eba25beb) | |---|---|---|---|---|---| | [156](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 159 |terrell, home cable modem | |[157](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 174 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 1*nszi,1*4rk5,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-df5275f0f7ee) | |---|---|---|---|---|---| | [158](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 173 |terrell, home cable modem | |[159](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 262 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-34260532d53b) | |---|---|---|---|---|---| | [160](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 152 |terrell, home cable modem | |[161](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) |1.8.0c2 | 100 MB | 248 |terrell, home cable modem, [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-b322052f9303) | |---|---|---|---|---|---| | [162](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5d51b69fd4b7) | 1.7.1 | 100 MB | 135 |terrell, home cable modem | | [1001](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-c923630e9d62) | 1.7.1 | 100 MB | 201 | used 1*sp26,1*sroo,1*4rk5 | [flog](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-a0a928c3ba87) [pcap](http://tahoe-lafs.org/~zooko/run-zooko1001-duringdownload.pcap) [shot 2010-08-23 at 01.07.41-0600.png" wireshark]attachment:"Screen | |[1002](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-70e3fadf6296) |1.8.0c2| 100 MB | 186 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 3*nsziz,1*sp26,1*fp3x | [flog](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-e4a0c6ea3b35) [pcap](http://tahoe-lafs.org/~zooko/run-zooko1002-duringdownload.pcap) [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ceb1de4f66db) [shot 2010-08-23 at 01.07.41-0600.png" wireshark]attachment:"Screen | |---|---|---|---|---|---| | [2000](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 187 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2001](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 172 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*fp3x,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-63d53128790c) | |---|---|---|---|---|---| | [2002](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 220 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2003](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 214 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-9635ca8d968b) | |---|---|---|---|---|---| | [2004](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 188 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2005](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 164 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-7b8bb43edae2) | |---|---|---|---|---|---| | [2006](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 193 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2007](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 163 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-eac4dfba7e09) | |---|---|---|---|---|---| | [2008](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 188 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2009](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 167 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-884df27885a2) | |---|---|---|---|---|---| | [2010](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 222 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2011](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 171 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d84aad4d4edf) | |---|---|---|---|---|---| | [2012](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 208 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2013](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 171 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1ca18fe8ea0a) | |---|---|---|---|---|---| | [2014](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 216 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2015](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 174 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-7f871a52aec9) | |---|---|---|---|---|---| | [2016](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 212 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2017](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 172 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*fp3x | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bed49c13abc9) | |---|---|---|---|---|---| | [2018](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) | 1.7.1 | 100 MB | 204 | used 1*sp26,1*sroo,1*4rk5 | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-5a8e004a117a) | |[2019](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4ab79d8b204c) |1.8.0c2| 100 MB | 222 |[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) used 2*sp26,2*nszi,1*4rk5 | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-acc5cf52c589) | |---|---|---|---|---|---| | [3000](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-e0bf67ba5a66) | 1.7.1 | 333 MB | 164 | [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-076bbd6aa8bd) | |[3001](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-d72baaa43fe8) |1.8.0c2| 333 MB | 91 | [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,2(1)*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-0356287ec87e) | |---|---|---|---|---|---| | [3002](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-1d7282894b0f) | 1.7.1 | 224 MB | 28 | [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-076bbd6aa8bd) | | [3100](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.7.1 | 333 MB | 90 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-08b563ac9344) | | [3101](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.8.0c2 | 0 MB | n/a | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) | download attempt failed immediately | |---|---|---|---|---|---| | [3102](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.7.1 | 0 MB | n/a | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) | download attempt failed immediately | | [3103](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.8.0c2 | 0 MB | n/a | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) | download attempt failed immediately | |---|---|---|---|---|---| | [3104](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.7.1 | 333 MB | 92 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-be99ea3e2ea1) | | [3105](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.8.0c2 | 333 MB | 95 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-fcb6d7c9971f) | |---|---|---|---|---|---| | [3106](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.7.1 | 333 MB | 93 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-27679c37c603) | | [3107](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.8.0c2 | 333 MB | 93 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-3ca06a67ca50) | |---|---|---|---|---|---| | [3108](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.7.1 | 333 MB | 92 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-0dadcae50692) | | [3109](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-97560f75bcf3) | 1.8.0c2 | 333 MB | 91 | terrell, home cable modem, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bf57d0858d4b) | |---|---|---|---|---|---| | [4000](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) | 1.7.1 | 333 MB | 93 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [serverselection-twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-3462f211d427) [twistd.log.tar.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdc9dff31263) | |[4001](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) |1.8.0c2| 333 MB | 94 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-47901e131b27) | |---|---|---|---|---|---| | [4002](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) | 1.7.1 | 333 MB | 193 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 1*fp3x,2*tavr | [serverselection-twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-3462f211d427) [twistd.log.tar.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdc9dff31263) | |[4003](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) |1.8.0c2| 333 MB | 94 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-95aac318a845) | |---|---|---|---|---|---| | [4004](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) | 1.7.1 | 333 MB | 189 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 1*fp3x,2*tavr | [serverselection-twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-3462f211d427) [twistd.log.tar.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdc9dff31263) | |[4005](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) |1.8.0c2| 333 MB | 93 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 2*fp3x,1*tavr | [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-30d76ab49ee0) | |---|---|---|---|---|---| | [4006](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dfa81b7f0c4) | 1.7.1 | 333 MB | 189 |on office network, [negativland.ogg](@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) used 1*fp3x,2*tavr | [serverselection-twistd.log](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-3462f211d427) [twistd.log.tar.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdc9dff31263) |
tahoe-lafs added the
code-network
major
defect
1.8β
labels 2010-08-12 06:15:42 +00:00
tahoe-lafs added this to the 1.8.0 milestone 2010-08-12 06:15:42 +00:00
zooko commented 2010-08-12 16:28:41 +00:00
Author
Owner

Oh, all those 32-byte reads must have been all the hashes in the Merkle Trees. I assume that those are indeed coalesced using the clever spans structure source:src/allmydata/util/spans.py@4666. Nevertheless we should investigate the very poor performance shown in this download status file.

Oh, all those 32-byte reads must have been all the hashes in the Merkle Trees. I assume that those are indeed coalesced using the clever spans structure source:src/allmydata/util/spans.py@4666. Nevertheless we should investigate the very poor performance shown in this download status file.
warner commented 2010-08-12 18:15:54 +00:00
Author
Owner

yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests). So I'm not particularly worried about merging these requests further.

My longer-term goal is to extend the Spans data structure with some sort of "close enough" merging feature: given a Spans bitmap, return a new bitmap with all the small holes filled in, so e.g. a 32-byte gap between two hashtree nodes (which might not be strictly needed until a later segment is read) would be retrieved early. The max-hole-size would need to be tuned to match the overhead of each remote-read message (probably on the order of 30-40 bytes): there's a breakeven point somewhere in there.

Another longer-term goal is to add a readv()-type API to the remote share-read protocol, so we could fetch multiple ranges in a single call. This doesn't shave much overhead off of just doing multiple pipelined read() requests, so again it's low-priority.

And yes, a cleverer which-share-should-I-use-now algorithm might reduce stalls like that. I'm working on visualization tools to show the raw download-status events in a Gantt-chart -like form, which should make it easier to develop such an algorithm. For now, you want to look at the Request table for correlations between reads that occur at the same time. For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second.

yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests). So I'm not particularly worried about merging these requests further. My longer-term goal is to extend the Spans data structure with some sort of "close enough" merging feature: given a Spans bitmap, return a new bitmap with all the small holes filled in, so e.g. a 32-byte gap between two hashtree nodes (which might not be strictly needed until a later segment is read) would be retrieved early. The max-hole-size would need to be tuned to match the overhead of each remote-read message (probably on the order of 30-40 bytes): there's a breakeven point somewhere in there. Another longer-term goal is to add a `readv()`-type API to the remote share-read protocol, so we could fetch multiple ranges in a single call. This doesn't shave much overhead off of just doing multiple pipelined `read()` requests, so again it's low-priority. And yes, a cleverer which-share-should-I-use-now algorithm might reduce stalls like that. I'm working on visualization tools to show the raw download-status events in a Gantt-chart -like form, which should make it easier to develop such an algorithm. For now, you want to look at the Request table for correlations between reads that occur at the same time. For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second.
zooko commented 2010-08-12 18:38:22 +00:00
Author
Owner

Replying to warner:

yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests).

I don't understand what those columns mean (see #1169 (documentation for the new download status page)).

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121176): > yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests). I don't understand what those columns mean (see #1169 (documentation for the new download status page)).
zooko commented 2010-08-12 19:12:36 +00:00
Author
Owner

Replying to warner:

For now, you want to look at the Request table for correlations between reads that occur at the same time.

I'm having trouble interpreting it (re: #1169).

For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second.

I tried to watch the same movie from my office network and got similarly unwatchable results, download status page attached. Could it be a problem with the way my client, VLC.app, is reading?

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121176): > For now, you want to look at the Request table for correlations between reads that occur at the same time. I'm having trouble interpreting it (re: #1169). > For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second. I tried to watch the same movie from my office network and got similarly unwatchable results, download status page attached. Could it be a problem with the way my client, VLC.app, is reading?
zooko commented 2010-08-12 19:13:18 +00:00
Author
Owner

Attachment down-1.html (110075 bytes) added

**Attachment** down-1.html (110075 bytes) added
zooko commented 2010-08-12 21:04:16 +00:00
Author
Owner

Attachment down-2.html (3058756 bytes) added

**Attachment** down-2.html (3058756 bytes) added
zooko commented 2010-08-12 21:04:27 +00:00
Author
Owner

Well, it wasn't the VLC.app client. I did another download of the same file using wget. The performance was bad--38 KB/s:

p$ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
--2010-08-12 12:54:47--  http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
Resolving localhost... ::1, fe80::1, 127.0.0.1
Connecting to localhost|::1|:3456... failed: Connection refused.
Connecting to localhost|fe80::1|:3456... failed: Connection refused.
Connecting to localhost|127.0.0.1|:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1490710513 (1.4G) [application/ogg]
Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg'

16% [=========================>                                                                                                                                       ] 247,203,678 38.0K/s  eta 8h 0m   ^C

Here is the download status page for this download (attached). Note that one server had a DYHB RTT of 3 minutes and another had a DYHB RTT of 8 minutes! There were no incident report files or twistd.log entries.

Well, it wasn't the VLC.app client. I did another download of the same file using wget. The performance was bad--38 KB/s: ``` p$ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg --2010-08-12 12:54:47-- http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg Resolving localhost... ::1, fe80::1, 127.0.0.1 Connecting to localhost|::1|:3456... failed: Connection refused. Connecting to localhost|fe80::1|:3456... failed: Connection refused. Connecting to localhost|127.0.0.1|:3456... connected. HTTP request sent, awaiting response... 200 OK Length: 1490710513 (1.4G) [application/ogg] Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg' 16% [=========================> ] 247,203,678 38.0K/s eta 8h 0m ^C ``` Here is the download status page for this download (attached). Note that one server had a DYHB RTT of 3 minutes and another had a DYHB RTT of 8 minutes! There were no incident report files or `twistd.log` entries.
zooko commented 2010-08-12 21:10:31 +00:00
Author
Owner

The two servers with dramatically higher DYHB RTTs introduced themselves as:

Service Name  	
Nickname
PeerID
Connected? 	Since 	First Announced
Version
storage  	
linuxpal
rpiw4n3ffzygwyzlkjb55upikk6mewtv
Yes: to 18.62.1.14:55058 	13:57:08 12-Aug-2010 	21:26:57 11-Aug-2010
allmydata-tahoe/1.7.1

and

storage  	
sunpal7
62nlabgfiuzbfseufd2yoymbjzsenbuz
Yes: to 18.62.6.169:64882 	14:25:33 12-Aug-2010 	21:26:57 11-Aug-2010
allmydata-tahoe/1.7.1

I pinged their IP addresses:

--- 18.62.6.169 ping statistics ---
21 packets transmitted, 21 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 74.049/78.417/91.016/3.839 ms
--- 18.62.1.14 ping statistics ---
21 packets transmitted, 21 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 73.153/78.478/92.159/4.260 ms
The two servers with dramatically higher DYHB RTTs introduced themselves as: ``` Service Name Nickname PeerID Connected? Since First Announced Version ``` ``` storage linuxpal rpiw4n3ffzygwyzlkjb55upikk6mewtv Yes: to 18.62.1.14:55058 13:57:08 12-Aug-2010 21:26:57 11-Aug-2010 allmydata-tahoe/1.7.1 ``` and ``` storage sunpal7 62nlabgfiuzbfseufd2yoymbjzsenbuz Yes: to 18.62.6.169:64882 14:25:33 12-Aug-2010 21:26:57 11-Aug-2010 allmydata-tahoe/1.7.1 ``` I pinged their IP addresses: ``` --- 18.62.6.169 ping statistics --- 21 packets transmitted, 21 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 74.049/78.417/91.016/3.839 ms --- 18.62.1.14 ping statistics --- 21 packets transmitted, 21 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 73.153/78.478/92.159/4.260 ms ```
zooko commented 2010-08-12 21:33:50 +00:00
Author
Owner

Attachment down-0.html (5761 bytes) added

**Attachment** down-0.html (5761 bytes) added
zooko commented 2010-08-12 21:34:16 +00:00
Author
Owner

Okay, I've finally realized that this is a regression of the feature that we [added in v1.6.0]source:trunk/NEWS?rev=4698#L267 to start fetching blocks as soon as you've learned about enough shares and to use the lowest-latency servers. Attached is the download status page from v1.7.1 of trying to download this same file from the same test grid. It performs much better:

$ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
--2010-08-12 15:06:55--  http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
Resolving localhost... ::1, fe80::1, 127.0.0.1
Connecting to localhost|::1|:3456... failed: Connection refused.
Connecting to localhost|fe80::1|:3456... failed: Connection refused.
Connecting to localhost|127.0.0.1|:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1490710513 (1.4G) [application/ogg]
Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg.1'

 1% [=>                                                                                                                                                               ] 25,182,400   260K/s  eta 1h 43m  ^C

We can't release Tahoe-LAFS v1.8.0 with this behavior because it is a significant regression: people who use grids with slow or occasionally slow servers such as the public Test Grid would be ill-advised to upgrade from v1.7.1 to v1.8.0 and we don't like to release new versions that some users are ill-advised to upgrade to.

Okay, I've finally realized that this is a regression of the feature that we [added in v1.6.0]source:trunk/NEWS?rev=4698#L267 to start fetching blocks as soon as you've learned about enough shares and to use the lowest-latency servers. Attached is the download status page from v1.7.1 of trying to download this same file from the same test grid. It performs much better: ``` $ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg --2010-08-12 15:06:55-- http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg Resolving localhost... ::1, fe80::1, 127.0.0.1 Connecting to localhost|::1|:3456... failed: Connection refused. Connecting to localhost|fe80::1|:3456... failed: Connection refused. Connecting to localhost|127.0.0.1|:3456... connected. HTTP request sent, awaiting response... 200 OK Length: 1490710513 (1.4G) [application/ogg] Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg.1' 1% [=> ] 25,182,400 260K/s eta 1h 43m ^C ``` We can't release Tahoe-LAFS v1.8.0 with this behavior because it is a significant regression: people who use grids with slow or occasionally slow servers such as the public Test Grid would be ill-advised to upgrade from v1.7.1 to v1.8.0 and we don't like to release new versions that some users are ill-advised to upgrade to.
tahoe-lafs added
critical
and removed
major
labels 2010-08-12 21:34:16 +00:00
zooko commented 2010-08-12 21:46:52 +00:00
Author
Owner

I've noticed that when tickets get more than one attachment it becomes confusing for the reader to understand what is what, so here's a quick recap:

I've noticed that when tickets get more than one attachment it becomes confusing for the reader to understand what is what, so here's a quick recap: * <http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1169/down-0.html> ; attached to #1169, download using 1.8.0c2 with VLC.app from my home (initial ticket) * [down-1.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bed9faf09a9e) ; download using 1.8.0c2 with VLC.app from my office (comment:4) * [down-2.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-166c6d34ef90) ; download using 1.8.0c2 with wget from my office ([comment:121179](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121179)) * [down-0.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-220258e201bb) ; download using 1.7.1 with wget from my office ([comment:121181](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121181))
zooko commented 2010-08-12 21:53:39 +00:00
Author
Owner

The feature that we released in v1.6.0 was ticket #928, and we did add some sort of unit tests for it, by making some servers not respond to DYHB at all: http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_hung_server.py?rev=37a242e01af6cf76

(In the pre-1.6.0 version, that situation would cause download to stall indefinitely, so that was our primary goal at that time and that is what the tests ensure no longer happens.)

The feature that we released in v1.6.0 was ticket #928, and we did add *some* sort of unit tests for it, by making some servers not respond to DYHB at all: <http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_hung_server.py?rev=37a242e01af6cf76> (In the pre-1.6.0 version, that situation would cause download to stall indefinitely, so that was our primary goal at that time and that is what the tests ensure no longer happens.)
zooko commented 2010-08-12 21:54:35 +00:00
Author
Owner

Note: the wget speed indicator is telling about "current" speed and so it varies a lot during a download. To get reliable speed measurements I guess I should let the wget finish which means, I suppose, I should download a smaller file! I would use the download status pages timings as an indicator of performance instead of the wget speed indicator.

Note: the wget speed indicator is telling about "current" speed and so it varies a lot during a download. To get reliable speed measurements I guess I should let the wget finish which means, I suppose, I should download a smaller file! I would use the download status pages timings as an indicator of performance instead of the wget speed indicator.
davidsarah commented 2010-08-12 22:10:16 +00:00
Author
Owner

Replying to warner:

One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5.

Yes, I was going to point that out. Given that the DYHB responses were:

serverid   sent         received     shnums   RTT
lv3fqmev   +0.001393s   +0.105560s            104ms
tavrk54e   +0.003111s   +0.211572s            208ms
xflgj7cg   +0.004095s   +0.111008s            107ms
sp26qyqc   +0.006173s   +0.117722s   0,5      112ms
sroojqcx   +0.007326s   +0.297761s   1,6      290ms
4rk5oqd7   +0.008324s   +0.212271s   2        204ms
nszizgf5   +0.009295s   +0.204480s   3,7,8    195ms
62nlabgf   +0.010490s   +0.203262s            193ms
fp3xjndg   +0.011982s   +0.242262s   4,9      230ms
rpiw4n3f   +0.013246s   +0.113830s            101ms
  • Should the downloader have chosen to get shares from sp26qyqc, nszizgf5 and 4rk5oqd7, rather than getting two shares from sp26qyqc and one from nszizgf5?
  • The file has a happiness of 5. Shouldn't the uploader have distributed the shares more evenly?
Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121176): > One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. Yes, I was going to point that out. Given that the DYHB responses were: ``` serverid sent received shnums RTT lv3fqmev +0.001393s +0.105560s 104ms tavrk54e +0.003111s +0.211572s 208ms xflgj7cg +0.004095s +0.111008s 107ms sp26qyqc +0.006173s +0.117722s 0,5 112ms sroojqcx +0.007326s +0.297761s 1,6 290ms 4rk5oqd7 +0.008324s +0.212271s 2 204ms nszizgf5 +0.009295s +0.204480s 3,7,8 195ms 62nlabgf +0.010490s +0.203262s 193ms fp3xjndg +0.011982s +0.242262s 4,9 230ms rpiw4n3f +0.013246s +0.113830s 101ms ``` * Should the downloader have chosen to get shares from sp26qyqc, nszizgf5 and 4rk5oqd7, rather than getting two shares from sp26qyqc and one from nszizgf5? * The file has a happiness of 5. Shouldn't the uploader have distributed the shares more evenly?
warner commented 2010-08-12 22:32:06 +00:00
Author
Owner

Yeah, I'd like to see some more quantifiable data. It's a pity that the
old-downloader doesn't provide as much information as the new one (a flog
might help), but obviously I learned from experience with old-downloader
while building the instrumentation on the new-downloader :).

The status data you show from both downloaders show a server in common, and
the other server responded to the DYHB very quickly, so for at least the
beginning of the download, I don't think the downloader has enough
information to do any better.

Many of the new-downloader block-requests (I'm looking at the +179s to +181s
mark) show correlated stalls of both the "fast" server (sp26) and the other
"slow" server (nszi). If the problem were a single slow server, I'd expect to
see big differences between the response times.

Interesting. So, the main known-problem with the new-downloader (at least the
one on the top of my personal list) is its willingness to pull multiple
shares from the same server (a "diversity failure"), which obviously has the
potential to be slower than getting each share from a different server.

This is plausibly acceptable for the first segment, because the moment we
receive the DYHB response that takes us above "k" shares, we're faced with a
choice: start downloading now, or wait a while (how long??) in the hopes that
new responses will increase diversity and result in a faster download.

But after the first segment, specifically after we've received the other DYHB
responses, the downloader really ought to get as much diversity as it can, so
pulling multiple shares from the same server (when there's an alternative)
isn't excusable after that point.

The fix for this is to implement the next stage of the new-downloader
project, which is to rank servers (and which-share-from-which-server
mappings) according so some criteria (possibly speed, possibly cost,
fairness, etc), and reevaluate that list after each segment is fetched. This
is closely tied into the "OVERDUE" work, which is tied into the notion of
cross-file longer-term server quality/reputation tracking, which is loosely
tied into the notion of alternative backend server classes.

And I can't get that stage finished and tested in the next week, nor is a
change that big a very stable thing to land this close to a release. So I'm
hoping that further investigation will reveal something convenient, like
maybe that 1.7.1 is actually just as variable as new-downloader on this grid,
or that the two-shares-from-one-server problem isn't as bad as it first
appears.

I do have a quick-and-dirty patch that might improve matters, which is
worth experimenting with. I'll have to dig it out of a dark corner of my
laptop, but IIRC it added an artificial half-second delay after receiving >=k
shares from fewer than k servers. If new shares were found before that timer
expired, the download would proceed with good diversity. If not, the download
would begin with bad diversity after a small delay.

It fixed the basic problem, but I don't like arbitrary delays, and didn't
address the deeper issue (you could still wind up pulling shares from slow
servers even after you have evidence that there are faster ones available),
so I didn't include it in #798.

Yeah, I'd like to see some more quantifiable data. It's a pity that the old-downloader doesn't provide as much information as the new one (a flog might help), but obviously I learned from experience with old-downloader while building the instrumentation on the new-downloader :). The status data you show from both downloaders show a server in common, and the other server responded to the DYHB very quickly, so for at least the beginning of the download, I don't think the downloader has enough information to do any better. Many of the new-downloader block-requests (I'm looking at the +179s to +181s mark) show correlated stalls of both the "fast" server (sp26) and the other "slow" server (nszi). If the problem were a single slow server, I'd expect to see big differences between the response times. Interesting. So, the main known-problem with the new-downloader (at least the one on the top of my personal list) is its willingness to pull multiple shares from the same server (a "diversity failure"), which obviously has the potential to be slower than getting each share from a different server. This is plausibly acceptable for the first segment, because the moment we receive the DYHB response that takes us above "k" shares, we're faced with a choice: start downloading now, or wait a while (how long??) in the hopes that new responses will increase diversity and result in a faster download. But after the first segment, specifically after we've received the other DYHB responses, the downloader really ought to get as much diversity as it can, so pulling multiple shares from the same server (when there's an alternative) isn't excusable after that point. The fix for this is to implement the next stage of the new-downloader project, which is to rank servers (and which-share-from-which-server mappings) according so some criteria (possibly speed, possibly cost, fairness, etc), and reevaluate that list after each segment is fetched. This is closely tied into the "OVERDUE" work, which is tied into the notion of cross-file longer-term server quality/reputation tracking, which is loosely tied into the notion of alternative backend server classes. And I can't get that stage finished and tested in the next week, nor is a change that big a very stable thing to land this close to a release. So I'm hoping that further investigation will reveal something convenient, like maybe that 1.7.1 is actually just as variable as new-downloader on this grid, or that the two-shares-from-one-server problem isn't as bad as it first appears. I *do* have a quick-and-dirty patch that might improve matters, which is worth experimenting with. I'll have to dig it out of a dark corner of my laptop, but IIRC it added an artificial half-second delay after receiving >=k shares from fewer than k servers. If new shares were found before that timer expired, the download would proceed with good diversity. If not, the download would begin with bad diversity after a small delay. It fixed the basic problem, but I don't like arbitrary delays, and didn't address the deeper issue (you could still wind up pulling shares from slow servers even after you have evidence that there are faster ones available), so I didn't include it in #798.
warner commented 2010-08-12 22:50:46 +00:00
Author
Owner

RE davidsarah's comment:

Yeah, that's the sort of heuristic that I didn't want to guess at. It'll be
easier to see this stuff when I land the visualization code. The arrival
order of positive responses is:

  • sp26 +117ms (sh0+sh5)
  • nszi +204ms (sh3+sh7+sh8)
  • 4rk5 +212ms (sh2)
  • fp3x +242ms (sh4+sh9)
  • sroo +298ms (sh1+sh6)

At +117ms, we don't have enough shares to download. At +204ms, we have enough
shares but we'd like more diversity: we can't know that we could achieve our
ideal diversity by waiting another 8 milliseconds, so we start downloading
the first segment immediately.

By the time the second segment is started (at +977ms), we have a clearer
picture of the available shares. We also have about 40kB of experience with
each server (or 80kB for sp26, since we happened to fetch two shares from
it), which we might use to make some guesses about speeds. When the second
segment is started, at the very least we should prefer an arragement that
gives us one share from each server. We might also like to prefer shares that
we've already been using (since we'll have to fetch fewer hash-tree nodes to
validate them); note that these two goals are already in conflict. We should
prefer servers which gave us faster responses, if we believe that they're
more likely to give fast responses in the future. But if we only hit up the
really fast servers, they'll be using more bandwidth, which might cost them
money, so they might prefer that we spread some of the load onto the slower
servers, whatever we mutually think is fair.

And we need serendipity too: we should occasionally download a share from a
random server, because it might be faster than any of the ones we're
currently using, although maybe it won't be, so a random server may slow us
down. All five of these goals conflict with each other, so there are weights
and heuristics involved, which will change over time.

And we should remember some of this information beyond the end of a single
download, rather than starting with an open mind each time, to improve
overall efficiency.

So yeah, it's a small thread that, when tugged, pulls a giant elephant into
the room. "No no, don't tug on that, you never know what it might be attached
to".

So I'm hoping to find a quicker smaller solution for the short term.

RE davidsarah's comment: Yeah, that's the sort of heuristic that I didn't want to guess at. It'll be easier to see this stuff when I land the visualization code. The arrival order of positive responses is: * sp26 +117ms (sh0+sh5) * nszi +204ms (sh3+sh7+sh8) * 4rk5 +212ms (sh2) * fp3x +242ms (sh4+sh9) * sroo +298ms (sh1+sh6) At +117ms, we don't have enough shares to download. At +204ms, we have enough shares but we'd like more diversity: we can't know that we could achieve our ideal diversity by waiting another 8 milliseconds, so we start downloading the first segment immediately. By the time the second segment is started (at +977ms), we have a clearer picture of the available shares. We also have about 40kB of experience with each server (or 80kB for sp26, since we happened to fetch two shares from it), which we might use to make some guesses about speeds. When the second segment is started, at the very least we should prefer an arragement that gives us one share from each server. We might also like to prefer shares that we've already been using (since we'll have to fetch fewer hash-tree nodes to validate them); note that these two goals are already in conflict. We should prefer servers which gave us faster responses, if we believe that they're more likely to give fast responses in the future. But if we only hit up the really fast servers, they'll be using more bandwidth, which might cost them money, so they might prefer that we spread some of the load onto the slower servers, whatever we mutually think is fair. And we need serendipity too: we should occasionally download a share from a random server, because it might be faster than any of the ones we're currently using, although maybe it won't be, so a random server may slow us down. All five of these goals conflict with each other, so there are weights and heuristics involved, which will change over time. And we should remember some of this information beyond the end of a single download, rather than starting with an open mind each time, to improve overall efficiency. So yeah, it's a small thread that, when tugged, pulls a giant elephant into the room. "No no, don't tug on that, you never know what it might be attached to". So I'm hoping to find a quicker smaller solution for the short term.
zooko commented 2010-08-13 05:45:37 +00:00
Author
Owner

Brian asked for better measurements, and I ran quite a few (appended below). I think these results are of little use as they are very noisy and as far as I can tell I was just wrong when I thought, earlier today, that 1.8.0c2 was downloading this file slower than 1.7.1 did.

On the other hand I think these numbers are trying to tell us that something is wrong. Why does it occasionally take 40s to download 100K?

After I post this comment I will attach some status reports and flogs.

With v1.7.1 and no flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   8398      0  0:00:11  0:00:11 --:--:-- 25621

real 0m11.913s
user 0m0.004s
sys  0m0.006s

With 1.7.1 and flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kb.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  11044      0  0:00:09  0:00:09 --:--:-- 24679

real 0m9.062s
user 0m0.003s
sys  0m0.006s

v1.7.1 without tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kc.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   8134      0  0:00:12  0:00:12 --:--:-- 23310

real 0m12.301s
user 0m0.004s
sys  0m0.006s

v1.7.1 and flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kd.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   9050      0  0:00:11  0:00:11 --:--:-- 24716

real 0m11.057s
user 0m0.004s
sys  0m0.006s

Now switched from office to home.
v1.7.1 and no flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  33899      0  0:00:29  0:00:29 --:--:--  287k

real 0m29.509s
user 0m0.006s
sys  0m0.013s

v1.7.1 and flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  51857      0  0:00:19  0:00:19 --:--:--  228k

real 0m19.294s
user 0m0.005s
sys  0m0.012s

v1.7.1 and no flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   2499      0  0:00:40  0:00:40 --:--:-- 25031

real 0m40.018s
user 0m0.005s
sys  0m0.011s

v1.7.1 and flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   5249      0  0:00:19  0:00:19 --:--:-- 24746

real 0m19.059s
user 0m0.005s
sys  0m0.009s

v1.8.0c2 and no flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  80318      0  0:00:01  0:00:01 --:--:-- 80516

real 0m1.253s
user 0m0.004s
sys  0m0.005s

v1.8.0c2 and no flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0    98k      0  0:00:09  0:00:09 --:--:--  118k

real 0m9.961s
user 0m0.005s
sys  0m0.015s

v1.8.0c2 and flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0   210k      0  0:00:04  0:00:04 --:--:--  268k

real 0m4.645s
user 0m0.004s
sys  0m0.009s
Brian asked for better measurements, and I ran quite a few (appended below). I think these results are of little use as they are very noisy and as far as I can tell I was just wrong when I thought, earlier today, that 1.8.0c2 was downloading this file slower than 1.7.1 did. On the other hand I think these numbers are trying to tell us that something is wrong. Why does it occasionally take 40s to download 100K? After I post this comment I will attach some status reports and flogs. With v1.7.1 and no flogtool tail: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100K.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 8398 0 0:00:11 0:00:11 --:--:-- 25621 real 0m11.913s user 0m0.004s sys 0m0.006s ``` With 1.7.1 and flogtool tail: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100Kb.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 11044 0 0:00:09 0:00:09 --:--:-- 24679 real 0m9.062s user 0m0.003s sys 0m0.006s ``` v1.7.1 without tail: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100Kc.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 8134 0 0:00:12 0:00:12 --:--:-- 23310 real 0m12.301s user 0m0.004s sys 0m0.006s ``` v1.7.1 and flogtool tail: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100Kd.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 9050 0 0:00:11 0:00:11 --:--:-- 24716 real 0m11.057s user 0m0.004s sys 0m0.006s ``` Now switched from office to home. v1.7.1 and no flogtool tail, 1M: ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 33899 0 0:00:29 0:00:29 --:--:-- 287k real 0m29.509s user 0m0.006s sys 0m0.013s ``` v1.7.1 and flogtool tail, 1M: ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 51857 0 0:00:19 0:00:19 --:--:-- 228k real 0m19.294s user 0m0.005s sys 0m0.012s ``` v1.7.1 and no flogtool tail, 100K: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100K.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 2499 0 0:00:40 0:00:40 --:--:-- 25031 real 0m40.018s user 0m0.005s sys 0m0.011s ``` v1.7.1 and flogtool tail, 100K: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100K.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 5249 0 0:00:19 0:00:19 --:--:-- 24746 real 0m19.059s user 0m0.005s sys 0m0.009s ``` v1.8.0c2 and no flogtool tail, 100K: ``` $ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100K.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97k 100 97k 0 0 80318 0 0:00:01 0:00:01 --:--:-- 80516 real 0m1.253s user 0m0.004s sys 0m0.005s ``` v1.8.0c2 and no flogtool tail, 1M: ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 98k 0 0:00:09 0:00:09 --:--:-- 118k real 0m9.961s user 0m0.005s sys 0m0.015s ``` v1.8.0c2 and flogtool tail, 1M: ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 210k 0 0:00:04 0:00:04 --:--:-- 268k real 0m4.645s user 0m0.004s sys 0m0.009s ```
zooko commented 2010-08-13 05:47:09 +00:00
Author
Owner

Attachment flog-1.7.1.bz2 (20172 bytes) added

**Attachment** flog-1.7.1.bz2 (20172 bytes) added
zooko commented 2010-08-13 05:47:25 +00:00
Author
Owner

Attachment flog-1.8.0c2.bz2 (36494 bytes) added

**Attachment** flog-1.8.0c2.bz2 (36494 bytes) added
zooko commented 2010-08-13 05:49:41 +00:00
Author
Owner

Attachment 1.8.0c2-dl100M-didntusethepals-down-2.html (24501 bytes) added

**Attachment** 1.8.0c2-dl100M-didntusethepals-down-2.html (24501 bytes) added
zooko commented 2010-08-13 05:57:27 +00:00
Author
Owner

I just did one more download with 1.7.1 of 1M in order to get both the status page and the flog I named this download "run 99" so that I could keep its status page, flog, and stdout separate from all the others on this ticket.
Here is run 99, at my home, with Tahoe-LAFS v1.7.1, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) :

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  38477      0  0:00:25  0:00:25 --:--:--  245k'

real 0m25.999s
user 0m0.005s
sys  0m0.012s

I will now attach the status output and flog of run 99.

I just did one more download with 1.7.1 of 1M in order to get both the status page and the flog I named this download "run 99" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 99, at my home, with Tahoe-LAFS v1.7.1, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) : ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 38477 0 0:00:25 0:00:25 --:--:-- 245k' real 0m25.999s user 0m0.005s sys 0m0.012s ``` I will now attach the status output and flog of run 99.
zooko commented 2010-08-13 05:57:55 +00:00
Author
Owner

Attachment 1.7.1-run-number-99-down-0.html (2867 bytes) added

**Attachment** 1.7.1-run-number-99-down-0.html (2867 bytes) added
zooko commented 2010-08-13 05:59:08 +00:00
Author
Owner

Attachment flog-1.7.1-from-run-number-99.bz2 (31625 bytes) added

**Attachment** flog-1.7.1-from-run-number-99.bz2 (31625 bytes) added
zooko commented 2010-08-13 06:14:54 +00:00
Author
Owner

I just did one more download with 1.8.0c2 of 1M in order to get both the status page and the flog I named this download "run 100" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 100, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) :

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0   139k      0  0:00:06  0:00:06 --:--:--  169k

real 0m7.377s
user 0m0.004s
sys  0m0.010s
I just did one more download with 1.8.0c2 of 1M in order to get both the status page and the flog I named this download "run 100" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 100, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) : ``` $ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-1M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 976k 100 976k 0 0 139k 0 0:00:06 0:00:06 --:--:-- 169k real 0m7.377s user 0m0.004s sys 0m0.010s ```
zooko commented 2010-08-13 06:15:25 +00:00
Author
Owner

Attachment flog-1.8.0c2-r4698-from-run-100.bz2 (49267 bytes) added

**Attachment** flog-1.8.0c2-r4698-from-run-100.bz2 (49267 bytes) added
zooko commented 2010-08-13 06:17:47 +00:00
Author
Owner

Attachment 1.8.0c2-r4698-run-100-down-0.html (24532 bytes) added

**Attachment** 1.8.0c2-r4698-run-100-down-0.html (24532 bytes) added
zooko commented 2010-08-13 06:21:12 +00:00
Author
Owner

Okay there's no solid evidence that there is a regression from 1.7.1. I think Brian should use this ticket to analyze my flogs and status pages if he wants and then change it to be a ticket about download server selection. :-) Removing "regression".

Okay there's no solid evidence that there is a regression from 1.7.1. I think Brian should use this ticket to analyze my flogs and status pages if he wants and then change it to be a ticket about download server selection. :-) Removing "regression".
tahoe-lafs added
major
and removed
critical
labels 2010-08-13 06:21:12 +00:00
tahoe-lafs modified the milestone from 1.8.0 to eventually 2010-08-13 06:21:12 +00:00
zooko commented 2010-08-13 06:52:56 +00:00
Author
Owner

I just did one more download with 1.8.0c2 of 100M in order to get both the status page and the flog I named this download "run 101" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 101, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) :

$ echo this is run 101 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 101
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0  90900      0  0:18:20  0:18:20 --:--:-- 96978

real 18m20.118s
user 0m0.097s
sys  0m0.536s
I just did one more download with 1.8.0c2 of 100M in order to get both the status page and the flog I named this download "run 101" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 101, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) : ``` $ echo this is run 101 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 101 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 90900 0 0:18:20 0:18:20 --:--:-- 96978 real 18m20.118s user 0m0.097s sys 0m0.536s ```
zooko commented 2010-08-13 06:54:30 +00:00
Author
Owner

Attachment flog-run-101-100M-1.8.0c2-r4698.bz2 (1872831 bytes) added

**Attachment** flog-run-101-100M-1.8.0c2-r4698.bz2 (1872831 bytes) added
zooko commented 2010-08-13 06:59:15 +00:00
Author
Owner

Attachment 1.8.0c2-r4698-run-101-down-1.html (1231326 bytes) added

**Attachment** 1.8.0c2-r4698-run-101-down-1.html (1231326 bytes) added
zooko commented 2010-08-13 07:18:54 +00:00
Author
Owner

I just did one more download with 1.7.1 of 100M in order to get both the status page and the flog I named this download "run 102" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 102, at my home, with Tahoe-LAFS v1.7.1, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) :

$ echo this is run 102 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 102
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   182k      0  0:08:54  0:08:54 --:--:-- 26.3M

real 8m54.910s
user 0m0.066s
sys  0m0.478s
I just did one more download with 1.7.1 of 100M in order to get both the status page and the flog I named this download "run 102" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 102, at my home, with Tahoe-LAFS v1.7.1, the first 1M of (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) : ``` $ echo this is run 102 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 102 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 182k 0 0:08:54 0:08:54 --:--:-- 26.3M real 8m54.910s user 0m0.066s sys 0m0.478s ```
zooko commented 2010-08-13 07:19:10 +00:00
Author
Owner

Interesting that 1.7.1 was twice as fast as 1.8.0c2 this time.

Interesting that 1.7.1 was twice as fast as 1.8.0c2 this time.
zooko commented 2010-08-13 07:19:48 +00:00
Author
Owner

Attachment flog-run-102-100M-1.7.1.bz2 (173265 bytes) added

**Attachment** flog-run-102-100M-1.7.1.bz2 (173265 bytes) added
zooko commented 2010-08-13 07:20:37 +00:00
Author
Owner

Annoyingly, 1.7.1 has a bug where it doesn't show downloads in the status page sometimes, and that happened this time, so I can't show you the status page for run 102.

Annoyingly, 1.7.1 has a bug where it doesn't show downloads in the status page sometimes, and that happened this time, so I can't show you the status page for run 102.
zooko commented 2010-08-13 15:51:30 +00:00
Author
Owner

run 103
1.7.1
the first 100M
(@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@)

$ echo this is run 103 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 103
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   207k      0  0:07:50  0:07:50 --:--:-- 27.1M

real 7m50.696s
user 0m0.063s
sys  0m0.469s
run 103 1.7.1 the first 100M (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) ``` $ echo this is run 103 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 103 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 207k 0 0:07:50 0:07:50 --:--:-- 27.1M real 7m50.696s user 0m0.063s sys 0m0.469s ```
zooko commented 2010-08-13 15:52:22 +00:00
Author
Owner

Attachment flog-run-103-100M-1.7.1.bz2 (2358125 bytes) added

**Attachment** flog-run-103-100M-1.7.1.bz2 (2358125 bytes) added
zooko commented 2010-08-13 17:51:36 +00:00
Author
Owner

run 104

1.8.0rc2-4698

the first 100M

(@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@)

$ echo this is run 104 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 104
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0  81763      0  0:20:23  0:20:23 --:--:-- 57095

real 20m23.119s
user 0m0.102s
sys  0m0.554s
run 104 1.8.0rc2-4698 the first 100M (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) ``` $ echo this is run 104 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 104 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 81763 0 0:20:23 0:20:23 --:--:-- 57095 real 20m23.119s user 0m0.102s sys 0m0.554s ```
zooko commented 2010-08-13 17:54:57 +00:00
Author
Owner

Attachment flog-run-104-100M-1.8.0c2-r4698.bz2 (1877994 bytes) added

**Attachment** flog-run-104-100M-1.8.0c2-r4698.bz2 (1877994 bytes) added
zooko commented 2010-08-13 17:56:14 +00:00
Author
Owner

Hm, okay it really looks like there is a substantial (2X) slowdown for using Tahoe-LAFS v1.8.0c2 instead of v1.7.1 on today's (and yesterday's) Test Grid. I'm re-adding the regression tag which means I think this issue should block 1.8.0 release until we at least understand it better.

Hm, okay it really looks like there *is* a substantial (2X) slowdown for using Tahoe-LAFS v1.8.0c2 instead of v1.7.1 on today's (and yesterday's) Test Grid. I'm re-adding the `regression` tag which means I think this issue should block 1.8.0 release until we at least understand it better.
tahoe-lafs added
critical
and removed
major
labels 2010-08-13 17:56:14 +00:00
tahoe-lafs modified the milestone from eventually to 1.8.0 2010-08-13 17:56:14 +00:00
zooko commented 2010-08-13 17:57:41 +00:00
Author
Owner

Attachment 1.8.0c2-run-104-down-0.html (1231853 bytes) added

**Attachment** 1.8.0c2-run-104-down-0.html (1231853 bytes) added
zooko commented 2010-08-13 18:45:51 +00:00
Author
Owner

run 105

1.7.1

the first 100M

(@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@)

$ echo this is run 105 tahoe-lafs v1.7.1 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 105 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   211k      0  0:07:40  0:07:40 --:--:-- 19.8M

real 7m41.179s
user 0m0.061s
sys  0m0.481s
run 105 1.7.1 the first 100M (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) ``` $ echo this is run 105 tahoe-lafs v1.7.1 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 105 tahoe-lafs v1.7.1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 211k 0 0:07:40 0:07:40 --:--:-- 19.8M real 7m41.179s user 0m0.061s sys 0m0.481s ```
zooko commented 2010-08-13 18:46:45 +00:00
Author
Owner

Attachment flog-run-105-100M-1.7.1.bz2 (336776 bytes) added

**Attachment** flog-run-105-100M-1.7.1.bz2 (336776 bytes) added
zooko commented 2010-08-13 19:10:39 +00:00
Author
Owner

run 106

1.8.0c2-r4698

the first 100M

(@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@)

$ echo this is run 106 tahoe-lafs v1.8.0c2-r4698 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 106 tahoe-lafs v1.8.0c2-r4698
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   112k      0  0:14:29  0:14:29 --:--:-- 94083

real 14m29.309s
user 0m0.082s
sys  0m0.528s
run 106 1.8.0c2-r4698 the first 100M (@@http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg@@) ``` $ echo this is run 106 tahoe-lafs v1.8.0c2-r4698 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 106 tahoe-lafs v1.8.0c2-r4698 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 112k 0 0:14:29 0:14:29 --:--:-- 94083 real 14m29.309s user 0m0.082s sys 0m0.528s ```
zooko commented 2010-08-13 19:11:57 +00:00
Author
Owner

Attachment flog-run-106-100M-1.8.0c2-r4698.bz2 (1765981 bytes) added

**Attachment** flog-run-106-100M-1.8.0c2-r4698.bz2 (1765981 bytes) added
zooko commented 2010-08-13 19:12:57 +00:00
Author
Owner

Attachment 1.8.0c2-r4698-run-106-down-0.html (1230114 bytes) added

**Attachment** 1.8.0c2-r4698-run-106-down-0.html (1230114 bytes) added
warner commented 2010-08-14 01:52:35 +00:00
Author
Owner

I had an idea for a not-too-complex share-selection algorithm this morning:

  • first, have the ShareFinder report all shares as soon as it learns

about them, instead of its current behavior of withholding them until
someone says that they're hungry

  • also record the DYHB RTT in each Share object, so it can be found later.

Keep the list of Shares sorted by this RTT (with share-number as the
secondary sort key).

  • then, each time the SegmentFetcher needs to start using a new share,

use the following algorithm:

sharemap = {} # maps shnum to Share instance
num_shares_per_server = {} # maps Server to a count of shares
for max_shares_per_server in itertools.count(1):
  progress = False
  for sh in shares:
    if sh.shnum in sharemap:
      continue
    if num_shares_per_server[sh.server] >= max_shares_per_server:
      continue
    sharemap[sh.shnum] = sh
    num_shares_per_server[sh.server] += 1
    progress = True
    if len(sharemap) >= k:
      return SUCCESS
  if not progress:
    return FAIL

The general idea is to cycle through all the shares we know about, but first
try to build a sharemap that only uses one share per server (i.e. perfect
diversity). That might fail because the shares are not diverse enough, so we
can walk through the loop a second time and be willing to accept two
shares per server. If that fails, we raise our willingness to three shares
per server, etc. If we ever finish a loop without adding at least one share
to our sharemap, we declare failure: this indicates that there are not enough
distinct shares (that we know about so far) to succeed.

If this returns FAIL, that really means we should declare "hunger" and ask
the ShareFinder to look for more shares. If we return SUCCESS but
max_shares_per_server > 1, then we should ask for more shares too (but
start the segment anyways: new shares may help the next segment do better).

This is still vulnerable to certain pathological situations, like if
everybody has a copy of sh0 but only the first server has a copy of sh1: this
will use sh0 from the first server then circle around and have to use sh1
from that server as well. A smarter algorithm would peek ahead, realize the
scarcity of sh1, and add sh1 from the first server so it could get sh0 from
one of the other servers instead.

But I think this might improve the diversity of downloads without going down
the full itertools.combinations-enumerating route that represents the
"complete" way to approach this problem.

I had an idea for a not-too-complex share-selection algorithm this morning: * first, have the `ShareFinder` report all shares as soon as it learns > about them, instead of its current behavior of withholding them until > someone says that they're hungry * also record the DYHB RTT in each Share object, so it can be found later. > Keep the list of Shares sorted by this RTT (with share-number as the > secondary sort key). * then, each time the `SegmentFetcher` needs to start using a new share, > use the following algorithm: ``` sharemap = {} # maps shnum to Share instance num_shares_per_server = {} # maps Server to a count of shares for max_shares_per_server in itertools.count(1): progress = False for sh in shares: if sh.shnum in sharemap: continue if num_shares_per_server[sh.server] >= max_shares_per_server: continue sharemap[sh.shnum] = sh num_shares_per_server[sh.server] += 1 progress = True if len(sharemap) >= k: return SUCCESS if not progress: return FAIL ``` The general idea is to cycle through all the shares we know about, but first try to build a sharemap that only uses one share per server (i.e. perfect diversity). That might fail because the shares are not diverse enough, so we can walk through the loop a second time and be willing to accept **two** shares per server. If that fails, we raise our willingness to three shares per server, etc. If we ever finish a loop without adding at least one share to our sharemap, we declare failure: this indicates that there are not enough distinct shares (that we know about so far) to succeed. If this returns FAIL, that really means we should declare "hunger" and ask the `ShareFinder` to look for more shares. If we return SUCCESS but `max_shares_per_server > 1`, then we should ask for more shares too (but start the segment anyways: new shares may help the next segment do better). This is still vulnerable to certain pathological situations, like if everybody has a copy of sh0 but only the first server has a copy of sh1: this will use sh0 from the first server then circle around and have to use sh1 from that server as well. A smarter algorithm would peek ahead, realize the scarcity of sh1, and add sh1 from the first server so it could get sh0 from one of the other servers instead. But I think this might improve the diversity of downloads without going down the full `itertools.combinations`-enumerating route that represents the "complete" way to approach this problem.
zooko commented 2010-08-14 03:47:11 +00:00
Author
Owner

This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1. Is that what you think? Have you had a chance to look at my flogs attached to this ticket to confirm that this is what is happening?

This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1. Is that what you think? Have you had a chance to look at my flogs attached to this ticket to confirm that this is what is happening?
zooko commented 2010-08-14 05:18:37 +00:00
Author
Owner

Replying to warner:

But I think this might improve the diversity of downloads without going down
the full itertools.combinations-enumerating route that represents the
"complete" way to approach this problem.

(Parenthetical historical observation which is pleasurable to me: Your heuristic algorithm for server selection (for download) in comment:121209, and your observation that it is susceptible to failure in certain cases, is similar to my proposed heuristic algorithm for server selection for upload in #778 (comment:114304, for the benefit of future cyborg archaeologist historians). David-Sarah then observed that finding the optimal solution was a standard graph theory problem named "maximum matching of a bipartite graph". Kevan then implemented it and thus we were able to finish #778.)

My copy of Cormen, Leiserson, Rivest 1st Ed. says (chapter 27.3) that the Ford-Fulkerson solution requires computation O(V * E) where V is the number of vertices (num servers plus num shares) and E is the number of edges (number of (server, share) tuples).

Now what Kevan actually implemented in [happinessutil.py]source:src/allmydata/util/happinessutil.py@4593#L80 just returns the size of the maximum matching, and what we want here is an actual matching. I'm not 100% sure but I think if you save all the path's that are returned from augmenting_path_for() in servers_of_happiness() and return the resulting set of paths then you'll have your set of server->share mappings.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121209): > But I think this might improve the diversity of downloads without going down > the full `itertools.combinations`-enumerating route that represents the > "complete" way to approach this problem. (Parenthetical historical observation which is pleasurable to me: Your heuristic algorithm for server selection (for download) in [comment:121209](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121209), and your observation that it is susceptible to failure in certain cases, is similar to my proposed heuristic algorithm for server selection for *upload* in #778 ([comment:114304](/tahoe-lafs/trac-2024-07-25/issues/778#issuecomment-114304), for the benefit of future cyborg archaeologist historians). David-Sarah [then observed](/tahoe-lafs/trac-2024-07-25/issues/778#issuecomment-114310) that finding the optimal solution was a standard graph theory problem named "maximum matching of a bipartite graph". Kevan then implemented it and thus we were able to finish #778.) My copy of Cormen, Leiserson, Rivest 1st Ed. says (chapter 27.3) that the Ford-Fulkerson solution requires computation O(V * E) where V is the number of vertices (num servers plus num shares) and E is the number of edges (number of (server, share) tuples). Now what Kevan actually implemented in [happinessutil.py]source:src/allmydata/util/happinessutil.py@4593#L80 just returns the size of the maximum matching, and what we want here is an actual matching. I'm not 100% sure but I think if you save all the `path`'s that are returned from `augmenting_path_for()` in `servers_of_happiness()` and return the resulting set of paths then you'll have your set of server->share mappings.
zooko commented 2010-08-14 06:08:34 +00:00
Author
Owner

Replying to zooko:

This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1.

Okay this does appear to be happening in at least one of the slow v1.8.0c2 downloads attached to this ticket. I looked at 1.8.0c2-r4698-run-106-down-0.html and every request-block in it (for three different shares) went to the same server -- nszizgf5 -- which was the first server to respond to the DYHB (barely) and which happened to be the only server that had three shares. So at least for that run, Brian's idea that fetching blocks of different shares from the same server is a significant slowdown seems to be true.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121210): > This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1. Okay this does appear to be happening in at least one of the slow v1.8.0c2 downloads attached to this ticket. I looked at [1.8.0c2-r4698-run-106-down-0.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-4bcaed6dd245) and every request-block in it (for three different shares) went to the same server -- nszizgf5 -- which was the first server to respond to the DYHB (barely) and which happened to be the only server that had three shares. So at least for that run, Brian's idea that fetching blocks of different shares from the same server is a significant slowdown seems to be true.
tahoe-lafs changed title from does new-downloader perform badly for certain situations (such as today's Test Grid)? to new-downloader performs badly when the first server to reply to DYHB has K shares 2010-08-14 07:02:04 +00:00
zooko commented 2010-08-15 04:08:45 +00:00
Author
Owner

In http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004998.html I wrote:
Hey waitasecond. As far as I understand, Tahoe-LAFS v1.7.1 should also—just like v1.8.0c2—start downloading all three shares from Greg's server as soon as that server is the first responder to the DYHB:

http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/immutable/download.py?rev=9f3995feb9b2769c#L923

Am I misunderstanding? So the question of why 1.7.1 seems to download 2 or 3 times as fast as 1.8.0c2 on this grid remains open.

In <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004998.html> I wrote: Hey waitasecond. As far as I understand, Tahoe-LAFS v1.7.1 should also—just like v1.8.0c2—start downloading all three shares from Greg's server as soon as that server is the first responder to the DYHB: <http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/immutable/download.py?rev=9f3995feb9b2769c#L923> Am I misunderstanding? So the question of why 1.7.1 seems to download 2 or 3 times as fast as 1.8.0c2 on this grid remains open.
zooko commented 2010-08-16 05:25:21 +00:00
Author
Owner

run 107

1.7.1

the first 10MB

with cProfile profiling running but no flog running

$ echo this is run 107 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-10M.ogg
this is run 107 tahoe-lafs v1.7.1 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9765k  100 9765k    0     0   171k      0  0:00:57  0:00:57 --:--:-- 2424k

real    0m57.139s
user    0m0.010s
sys     0m0.059s
run 107 1.7.1 the first 10MB with cProfile profiling running but no flog running ``` $ echo this is run 107 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-10M.ogg this is run 107 tahoe-lafs v1.7.1 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 9765k 100 9765k 0 0 171k 0 0:00:57 0:00:57 --:--:-- 2424k real 0m57.139s user 0m0.010s sys 0m0.059s ```
zooko commented 2010-08-16 05:28:11 +00:00
Author
Owner

Attachment prof-run-107.dump.txt (113473 bytes) added

**Attachment** prof-run-107.dump.txt (113473 bytes) added
zooko commented 2010-08-16 05:36:27 +00:00
Author
Owner

run 108

1.8.0c2

the first 10MB

with cProfile profiling running but no flog running

$ echo this is run 108 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-10M.ogg
this is run 108 tahoe-lafs v1.8.0c2 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9765k  100 9765k    0     0   247k      0  0:00:39  0:00:39 --:--:--  264k

real    0m39.496s
user    0m0.009s
sys     0m0.052s
run 108 1.8.0c2 the first 10MB with cProfile profiling running but no flog running ``` $ echo this is run 108 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-10M.ogg this is run 108 tahoe-lafs v1.8.0c2 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 9765k 100 9765k 0 0 247k 0 0:00:39 0:00:39 --:--:-- 264k real 0m39.496s user 0m0.009s sys 0m0.052s ```
zooko commented 2010-08-16 05:37:29 +00:00
Author
Owner

Attachment prof-run-108-dump.txt (111816 bytes) added

**Attachment** prof-run-108-dump.txt (111816 bytes) added
zooko commented 2010-08-16 05:53:28 +00:00
Author
Owner

run 109

1.7.1

the first 100MB

with cProfile profiling running but no flog running

$ echo this is run 109 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 109 tahoe-lafs v1.7.1 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   228k      0  0:07:06  0:07:06 --:--:-- 21.5M

real    7m6.626s
user    0m0.059s
sys     0m0.460s
run 109 1.7.1 the first 100MB with cProfile profiling running but no flog running ``` $ echo this is run 109 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 109 tahoe-lafs v1.7.1 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 228k 0 0:07:06 0:07:06 --:--:-- 21.5M real 7m6.626s user 0m0.059s sys 0m0.460s ```
zooko commented 2010-08-16 05:53:54 +00:00
Author
Owner

Attachment prof-run-109-dump.txt (94530 bytes) added

**Attachment** prof-run-109-dump.txt (94530 bytes) added
zooko commented 2010-08-16 06:13:37 +00:00
Author
Owner

run 110

1.8.0c2

the first 100MB

with cProfile profiling running but no flog running

$ echo this is run 110 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 110 tahoe-lafs v1.8.0c2 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   113k      0  0:14:19  0:14:19 --:--:--   98k

real    14m19.256s
user    0m0.079s
sys     0m0.504s
run 110 1.8.0c2 the first 100MB with cProfile profiling running but no flog running ``` $ echo this is run 110 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg this is run 110 tahoe-lafs v1.8.0c2 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 113k 0 0:14:19 0:14:19 --:--:-- 98k real 14m19.256s user 0m0.079s sys 0m0.504s ```
zooko commented 2010-08-16 06:14:33 +00:00
Author
Owner

Attachment prof-run-110-dump.txt (114356 bytes) added

**Attachment** prof-run-110-dump.txt (114356 bytes) added
zooko commented 2010-08-16 06:17:45 +00:00
Author
Owner

Attachment run-110-download-status.html (1228847 bytes) added

**Attachment** run-110-download-status.html (1228847 bytes) added
zooko commented 2010-08-16 06:22:55 +00:00
Author
Owner

Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file.
It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in prof-run-110-dump.txt shows major CPU usage in spans:

         661324561 function calls (661269650 primitive calls) in 919.130 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    91950    0.464    0.000    0.646    0.000 spans.py:142(dump)
  1374341    0.310    0.000    0.310    0.000 spans.py:152(__iter__)
    41352    0.062    0.000    0.149    0.000 spans.py:156(__nonzero__)
   142497    0.194    0.000    0.253    0.000 spans.py:159(len)
    18390    0.059    0.000    0.438    0.000 spans.py:164(__add__)
    22962    1.083    0.000    8.079    0.000 spans.py:170(__sub__)
     9195    0.034    0.000    0.346    0.000 spans.py:186(__and__)
153765314  121.874    0.000  197.580    0.000 spans.py:203(overlap)
153189988   54.374    0.000   54.374    0.000 spans.py:215(adjacent)
    18390    1.032    0.000    1.343    0.000 spans.py:238(len)
    18390    5.727    0.000    7.413    0.000 spans.py:248(dump)
    87347    1.430    0.000  460.111    0.005 spans.py:25(__init__)
     9195    0.633    0.000  459.852    0.050 spans.py:256(get_spans)
     6906    0.114    0.000    0.115    0.000 spans.py:260(assert_invariants)
     9193    0.305    0.000    0.372    0.000 spans.py:271(get)
     6906    2.174    0.000    3.233    0.000 spans.py:295(add)
  2678832   32.014    0.000   46.032    0.000 spans.py:34(_check)
     6904    0.742    0.000    1.630    0.000 spans.py:389(remove)
     3436    0.007    0.000    0.042    0.000 spans.py:434(pop)
  1315865  140.193    0.000  458.353    0.000 spans.py:46(add)
  1275620    2.471    0.000    6.287    0.000 spans.py:82(remove)
Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file. It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in [prof-run-110-dump.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdd412cfa66a) shows major CPU usage in spans: ``` 661324561 function calls (661269650 primitive calls) in 919.130 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 91950 0.464 0.000 0.646 0.000 spans.py:142(dump) 1374341 0.310 0.000 0.310 0.000 spans.py:152(__iter__) 41352 0.062 0.000 0.149 0.000 spans.py:156(__nonzero__) 142497 0.194 0.000 0.253 0.000 spans.py:159(len) 18390 0.059 0.000 0.438 0.000 spans.py:164(__add__) 22962 1.083 0.000 8.079 0.000 spans.py:170(__sub__) 9195 0.034 0.000 0.346 0.000 spans.py:186(__and__) 153765314 121.874 0.000 197.580 0.000 spans.py:203(overlap) 153189988 54.374 0.000 54.374 0.000 spans.py:215(adjacent) 18390 1.032 0.000 1.343 0.000 spans.py:238(len) 18390 5.727 0.000 7.413 0.000 spans.py:248(dump) 87347 1.430 0.000 460.111 0.005 spans.py:25(__init__) 9195 0.633 0.000 459.852 0.050 spans.py:256(get_spans) 6906 0.114 0.000 0.115 0.000 spans.py:260(assert_invariants) 9193 0.305 0.000 0.372 0.000 spans.py:271(get) 6906 2.174 0.000 3.233 0.000 spans.py:295(add) 2678832 32.014 0.000 46.032 0.000 spans.py:34(_check) 6904 0.742 0.000 1.630 0.000 spans.py:389(remove) 3436 0.007 0.000 0.042 0.000 spans.py:434(pop) 1315865 140.193 0.000 458.353 0.000 spans.py:46(add) 1275620 2.471 0.000 6.287 0.000 spans.py:82(remove) ```
tahoe-lafs changed title from new-downloader performs badly when the first server to reply to DYHB has K shares to new-downloader performs badly when downloading a lot of data from a file 2010-08-16 06:22:55 +00:00
zooko commented 2010-08-16 07:01:29 +00:00
Author
Owner

Looking at [immutable/downloader/share.py]source:src/allmydata/immutable/downloader/share.py@4688, I have the following review comments:

  • This comment confuses me:
59           self._pending = Spans() # request sent but no response received yet
60	        self._received = DataSpans() # ACK response received, with data
61	        self._unavailable = Spans() # NAK response received, no data
62	
63	        # any given byte of the share can be in one of four states:
64	        #  in: _wanted, _requested, _received
65	        #      FALSE    FALSE       FALSE : don't care about it at all
66	        #      TRUE     FALSE       FALSE : want it, haven't yet asked for it
67	        #      TRUE     TRUE        FALSE : request is in-flight
68	        #                                   or didn't get it
69	        #      FALSE    TRUE        TRUE  : got it, haven't used it yet
70	        #      FALSE    TRUE        FALSE : got it and used it
71	        #      FALSE    FALSE       FALSE : block consumed, ready to ask again

Are _wanted, _requested, and _received old names for _pending, _received, and _unavailable? Or perhaps from a different design entirely? And that's six states, not four.

  • A span is add'ed to _pending in [_send_requests()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L698 and removed from _pending in [_got_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L833 but is not removed if the request errbacks instead of callbacks. That would be a bug for it still to be marked as "pending" after the request errbacked, wouldn't it?

  • We shouldn't give the author of a file the ability to raise AssertionError from [immutable/downloader/share.py line 416 _satisfy_share_hash_tree()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L416 but instead give him the ability to cause [_satisfy_offsets()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L331 to raise a LayoutInvalid exception (see related ticket #1085 (we shouldn't use "assert" to validate incoming data in introducer client))

Looking at [immutable/downloader/share.py]source:src/allmydata/immutable/downloader/share.py@4688, I have the following review comments: * This comment confuses me: ``` 59 self._pending = Spans() # request sent but no response received yet 60 self._received = DataSpans() # ACK response received, with data 61 self._unavailable = Spans() # NAK response received, no data 62 63 # any given byte of the share can be in one of four states: 64 # in: _wanted, _requested, _received 65 # FALSE FALSE FALSE : don't care about it at all 66 # TRUE FALSE FALSE : want it, haven't yet asked for it 67 # TRUE TRUE FALSE : request is in-flight 68 # or didn't get it 69 # FALSE TRUE TRUE : got it, haven't used it yet 70 # FALSE TRUE FALSE : got it and used it 71 # FALSE FALSE FALSE : block consumed, ready to ask again ``` Are `_wanted`, `_requested`, and `_received` old names for `_pending`, `_received`, and `_unavailable`? Or perhaps from a different design entirely? And that's six states, not four. * A span is `add`'ed to `_pending` in [_send_requests()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L698 and removed from `_pending` in [_got_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L833 but is not removed if the request errbacks instead of callbacks. That would be a bug for it still to be marked as "pending" after the request errbacked, wouldn't it? * We shouldn't give the author of a file the ability to raise AssertionError from [immutable/downloader/share.py line 416 _satisfy_share_hash_tree()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L416 but instead give him the ability to cause [_satisfy_offsets()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L331 to raise a LayoutInvalid exception (see related ticket #1085 (we shouldn't use "assert" to validate incoming data in introducer client))
zooko commented 2010-08-16 07:18:19 +00:00
Author
Owner

This looks like a bug: [share.py _got_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L741:

        self._received.add(start, data)

That could explain the slowdown -- the items added to _received here are never removed, because the removal code in [_satisfy_block_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L517 is:

        block = self._received.pop(blockstart, blocklen)

I added the following assertions to source:trunk/src/allmydata/util/spans.py@4666:

hunk ./src/allmydata/util/spans.py 47
             raise
 
     def add(self, start, length):
+        assert isinstance(start, (int, long))
+        assert isinstance(length, (int, long))
         assert start >= 0
         assert length > 0
         #print " ADD [%d+%d -%d) to %s" % (start, length, start+length, self.dump())

And indeed these assertions fail because data is not an integer.
However, then when I add this patch:

hunk ./src/allmydata/immutable/downloader/share.py 741
                 share=repr(self), start=start, length=length, datalen=len(data),
                 level=log.NOISY, parent=lp, umid="5Qn6VQ")
         self._pending.remove(start, length)
-        self._received.add(start, data)
+        self._received.add(start, length)
 
         # if we ask for [a:c], and we get back [a:b] (b<c), that means we're
         # never going to get [b:c]. If we really need that data, this block

This causes a bunch of tests to fail in ways that I don't understand.

This looks like a bug: [share.py _got_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L741: ``` self._received.add(start, data) ``` That could explain the slowdown -- the items added to `_received` here are never removed, because the removal code in [_satisfy_block_data()]source:trunk/src/allmydata/immutable/downloader/share.py@4688#L517 is: ``` block = self._received.pop(blockstart, blocklen) ``` I added the following assertions to source:trunk/src/allmydata/util/spans.py@4666: ``` hunk ./src/allmydata/util/spans.py 47 raise def add(self, start, length): + assert isinstance(start, (int, long)) + assert isinstance(length, (int, long)) assert start >= 0 assert length > 0 #print " ADD [%d+%d -%d) to %s" % (start, length, start+length, self.dump()) ``` And indeed these assertions fail because `data` is not an integer. However, then when I add this patch: ``` hunk ./src/allmydata/immutable/downloader/share.py 741 share=repr(self), start=start, length=length, datalen=len(data), level=log.NOISY, parent=lp, umid="5Qn6VQ") self._pending.remove(start, length) - self._received.add(start, data) + self._received.add(start, length) # if we ask for [a:c], and we get back [a:b] (b<c), that means we're # never going to get [b:c]. If we really need that data, this block ``` This causes a bunch of tests to fail in ways that I don't understand.
zooko commented 2010-08-16 07:35:04 +00:00
Author
Owner

more review notes:

  • In [spans add()]source:trunk/src/allmydata/util/spans.py@4666#L64 it would be more efficient to use bisect.insort()
  • In [spans remove()]source:trunk/src/allmydata/util/spans.py@4666#L132 it would be more efficient to insert the new span, e.g. replace
                    self._spans[i] = (left_start, left_length)
                    self._spans.append( (right_start, right_length) )
                    self._spans.sort()

with

                    self._spans[i] = (left_start, left_length)
                    self._spans.insert(i+1, (right_start, right_length))
  • [DataSpans.add()]source:trunk/src/allmydata/util/spans.py@4666#L295 calls assert_invariants() which iterates over all the spans. Also DataSpans.add() itself searches for where to make modifications by iterating from the beginning, which seems unnecessary. Couldn't it do a binary search to find the place it needs to modify and then modify only a local neighborhood there?
more review notes: * In [spans add()]source:trunk/src/allmydata/util/spans.py@4666#L64 it would be more efficient to use [bisect.insort()](http://docs.python.org/library/bisect.html#bisect.insort) * In [spans remove()]source:trunk/src/allmydata/util/spans.py@4666#L132 it would be more efficient to insert the new span, e.g. replace ``` self._spans[i] = (left_start, left_length) self._spans.append( (right_start, right_length) ) self._spans.sort() ``` with ``` self._spans[i] = (left_start, left_length) self._spans.insert(i+1, (right_start, right_length)) ``` * [DataSpans.add()]source:trunk/src/allmydata/util/spans.py@4666#L295 calls `assert_invariants()` which iterates over all the spans. Also `DataSpans.add()` itself searches for where to make modifications by iterating from the beginning, which seems unnecessary. Couldn't it do a binary search to find the place it needs to modify and then modify only a local neighborhood there?
zooko commented 2010-08-16 07:37:16 +00:00
Author
Owner

In [immutable/downloader/share.py]source:trunk/src/allmydata/immutable/downloader/share.py@4688 _unavailable can have things added to it (in case of intentional over-read or in case of failing/corrupted server) but never has things removed from it. Does that matter? I suspect that it is intentional and doesn't hurt but I'm not sure.

In [immutable/downloader/share.py]source:trunk/src/allmydata/immutable/downloader/share.py@4688 `_unavailable` can have things added to it (in case of intentional over-read or in case of failing/corrupted server) but never has things removed from it. Does that matter? I suspect that it is intentional and doesn't hurt but I'm not sure.
zooko commented 2010-08-16 14:17:54 +00:00
Author
Owner

Also Brian discovered another bug in new-downloader last night. Here are some excerpts from IRC.


<warner> hm, ok, so my new share-selection algorithm is, I think, revealing a
	 bug in the rest of the downloader code			        [23:54]
<warner> there's a funny place where the number of segments in the file
	 (needed to build the block-hash-tree) is not known right away,
	 because we have to get the UEB to know for sure	        [23:55]
<warner> so the CommonShare object speculatively creates an
	 IncompleteHashTree, with a guessed value for numsegs
<warner> and then replaces it later
<warner> (in hindsight, it's probably not a good idea to do that.. I should
	 just leave it as None until we know for sure)		        [23:56]
<warner> the test failure is that the first segment uses sh0,sh1,sh5 , and all
	 have their numsegs updated, but the second segment then switches to
	 some different shares, and those new ones are still using the guessed
	 value for numsegs					        [23:57]
<warner> seg1 used sh0,1,6
<zooko> Hm.							        [23:58]
<zooko> Good find!
<warner> and sh6 threw an exception because of the wrong-sized hashtree, so it
	 fell back to sh5
<warner> each time it tried to use a share that wasn't part of the original
	 set, it got that error and returned back to sh0,1,5	        [23:59]
Also Brian discovered another bug in new-downloader last night. Here are some excerpts from IRC. ``` <warner> hm, ok, so my new share-selection algorithm is, I think, revealing a bug in the rest of the downloader code [23:54] <warner> there's a funny place where the number of segments in the file (needed to build the block-hash-tree) is not known right away, because we have to get the UEB to know for sure [23:55] <warner> so the CommonShare object speculatively creates an IncompleteHashTree, with a guessed value for numsegs <warner> and then replaces it later <warner> (in hindsight, it's probably not a good idea to do that.. I should just leave it as None until we know for sure) [23:56] <warner> the test failure is that the first segment uses sh0,sh1,sh5 , and all have their numsegs updated, but the second segment then switches to some different shares, and those new ones are still using the guessed value for numsegs [23:57] <warner> seg1 used sh0,1,6 <zooko> Hm. [23:58] <zooko> Good find! <warner> and sh6 threw an exception because of the wrong-sized hashtree, so it fell back to sh5 <warner> each time it tried to use a share that wasn't part of the original set, it got that error and returned back to sh0,1,5 [23:59] ```
warner commented 2010-08-16 18:14:04 +00:00
Author
Owner

Responses to earlier comments:

  • spans.py definitely looks O()-suspicious. Does
    prof-run-110-dump.txt suggest that in a 100MB file download, we
    spent half of the total time in Spans.*init*?
  • yes, the comment about _wanted, _requested, _received is stale.
  • the failure to remove data from _pending upon errback is a bug
  • LayoutInvalid is better than assert, yeah
  • the self._received.add(start,data) is correct: _received is a
    DataSpans instance, not Spans, and it holds strings, not
    booleans. _received holds the data that comes back from the server
    until the "satisfy" code consumes it. It has methods like get and
    pop, whereas the simpler Spans class merely has methods for
    is-range-in-span.
  • that said, I don't understand how assertions added to spans.py#L47 would
    ever fire. If those same assertions were added to spans.py#L295, I'd get
    it. What types were start/length in your observed assertion failures? And
    what was the stack trace?
  • The patch to call self._received.add(start, length) is wrong; it
    must be called with (int,str).
  • all the comments about efficiency improvements in Spans are probably
    correct
  • adding data to _unavailable should be benign: the amount of
    unavailable data is small and constant (if the share is intact, we should
    only add to _unavailable during the first few reads if we've guessed
    the segsize wrong).

Now some new ideas. I've found a couple of likely issues.

  • looking at timestamps from flog-run-106, the inter-segment timing
    definitely is growing over the course of the download. It's noisy,
    but it goes from about 0.8s at the start (seg0), to about 1.5s-2.0s at the
    end (seg762). I haven't looked at smaller deltas (i.e. only inside the
    "desire" code) to rule out network variations, but it certainly points to
    a leak or complexity increase of some sort that gets worse as the download
    progresses.
  • comparing downloads of the first 100MB, the middle 100MB, and the last
    100MB would rule out anything that's influenced by the absolute segment
    number.
  • looking at the Spans.dump() strings in the flog, I see that two of
    the three shares (sh7+sh8) have an ever-growing .received
    DataSpans structure. A cursory check suggests they are growing by 64
    bytes and one range per segment. By the end of the download (seg762), sh7
    is holding 37170 bytes in 378 ranges (whereas sh3 only has 1636 bytes in
    22 ranges, and remains mostly constant)
  • This means that we're asking for data which we then don't end up using.
    We keep it around in _received because it might be useful later:
    maybe we ask for the wrong data because our guess of the segsize (and
    thus numsegs, and thus the size/placement of the hashtrees) was wrong.
    But later we might take advantage of whatever we fetched by mistake.
  • I have two theories:
  • the IncompleteHashTree.needed_hashes() call, when asked what
    hashes we need to validate leaf 0, might tell us we need the hash for
    leaf 0 too. However, that hash can be computed from the block of data
    that makes up the leaf, so we don't really need to fetch it. (whereas we
    do need the hash for leaf 1, since it sits on the "uncle chain" for
    leaf0). If the desire-side code is conservatively/incorrectly asking for
    the leaf0 hash, but the satisfy-side code doesn't use it, then we'll add
    a single 32-byte hash node per segment.
  • ciphertext hash tree nodes: when we start working on a segment, the
    desire-side code will ask for ciphertext hash tree nodes from each
    segment we're using. However, the satisfy-side code will only use the
    hashes from the first response: by the time the second response arrives,
    the ciphertext hash tree is satisfied, so that clause isn't reached.
    This means that we'll leave that data in ._received forever. This
    seems most likely: it would explain why the first share (sh3) doesn't
    grow, whereas the later two shares do, and why I saw a 64-byte increment
    (the actual growth would depend upon the segment number, and how many
    new uncle-chain nodes are needed, but 2-nodes is a fairly common value).
  • The .received leftover-data issue shouldn't be such a big deal,
    N=378 is not a huge number, but the measured increase in inter-segment
    time suggests that whatever the O() complexity is, N=378 is enough to
    cause problems.

So I think the next directions to pursue are:

  • build some kind of test framework to exercise a large download without
    using real remote_read calls, ideally 100MB or 1GB in a few seconds. This
    would use a Share subclass that returns data immediately (well,
    after a eventual-send) rather than ever touching a server. It might also
    need to stub out some of the hashtree checks, but does need real
    needed_hashes computations. Then we fix the code until this test
    finishes in a reasonable amount of time. While I wouldn't have the test
    case assert anything about runtime, I would have it assert things like
    ._received doesn't grow over the course of the test.
  • measure the CPU seconds needed to download a 100MB file from both
    old-downloader and new-downloader: if we're looking at a O(n^3^) problem,
    it will manifest as a much heavier CPU load. (if we were merely looking at
    a pipelining failure, the CPU time would be the same, but wallclock time
    would be higher).
  • stare at Spans (and specifically DataSpans) for
    computational-complexity problems. Build some tests of these with N=400ish
    and see how efficient they are. They're supposed to be linear wrt
    number-of-ranges, but either they aren't, or they're being called in a way
    which makes it worse
  • consider commenting out the dump() calls for a test, or some
    assert_invariants calls, to see if we're hitting that old problem
    where the data structure is efficient unless we leave in the self-checks
    or debugging messages
  • (hard) figure out how to drop the unused ciphertext-hash-tree nodes from
    the second and later shares
  • look at IncompleteHashTree.needed_hashes and see if we're actually
    requesting the leaf node that we don't really need.
  • consider an entirely different DataSpans structure. The
    perhaps-too-clever overlap/merge behavior is mostly just exercised during
    the fetch of the first segment, before we're sure about the correct number
    of segments (we fetch some data speculatively to reduce roundtrips; if we
    guess wrong, we'll get the wrong data, but DataSpans lets us easily
    use that data later if it turns out to be what we needed for some other
    purpose). Perhaps a data structure which was less tuned for merging
    adjacent ranges would be better, maybe one which has an explicit
    merge() method that's only called just before the requests are sent
    out. Or maybe the value of holding on to that data isn't enough to justify
    the complexity.
  • a related thought (mostly to remind me of the idea later): for pipelining
    purposes, I'd like to be able to label the bits in a Spans with
    their purpose: if we send parallel requests for both seg2 and seg3, I'd
    like the seg2 data to arrive first, so e.g. the hashes needed to validate
    seg2 should arrive before the bulk block data for seg3. A label on the
    bits like "this is for seg2" would let us order the requests in such a
    way to reduce our memory footprint. A label like this might also be
    useful for handling the unused-ciphertext-hash-tree-nodes problem, if we
    could remove data from a DataSpans that's labelled with an
    already-complete segnum.

Finally, the bug zooko mentioned in comment:121225 is real. I'm still working on
it, but basically it prevents us from using shares that arrive after the
initial batch of requests: they are not initialized properly and don't get a
correct block hash tree. I'm working on a fix. The symptom is that we fall back to the initial shares, but if those have died, the download will fail, which is wrong.

And I'm still working on the new share-selection algorithm. The code works,
and my basic unit tests work, but certain ones require the comment:121225 bug to
be fixed before it is safe to use (the bug will hurt current downloads, but
occurs less frequently).

Responses to earlier comments: * spans.py definitely looks O()-suspicious. Does [prof-run-110-dump.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdd412cfa66a) suggest that in a 100MB file download, we spent half of the total time in `Spans.*init*`? * yes, the comment about `_wanted, _requested, _received` is stale. * the failure to remove data from `_pending` upon errback is a bug * `LayoutInvalid` is better than `assert`, yeah * the `self._received.add(start,data)` is correct: `_received` is a `DataSpans` instance, not `Spans`, and it holds strings, not booleans. `_received` holds the data that comes back from the server until the "satisfy" code consumes it. It has methods like `get` and `pop`, whereas the simpler `Spans` class merely has methods for is-range-in-span. * that said, I don't understand how assertions added to spans.py#L47 would ever fire. If those same assertions were added to spans.py#L295, I'd get it. What types were start/length in your observed assertion failures? And what was the stack trace? * The patch to call `self._received.add(start, length)` is wrong; it must be called with (int,str). * all the comments about efficiency improvements in Spans are probably correct * adding data to `_unavailable` should be benign: the amount of unavailable data is small and constant (if the share is intact, we should only add to `_unavailable` during the first few reads if we've guessed the segsize wrong). Now some new ideas. I've found a couple of likely issues. * looking at timestamps from flog-run-106, the inter-segment timing definitely **is** growing over the course of the download. It's noisy, but it goes from about 0.8s at the start (seg0), to about 1.5s-2.0s at the end (seg762). I haven't looked at smaller deltas (i.e. only inside the "desire" code) to rule out network variations, but it certainly points to a leak or complexity increase of some sort that gets worse as the download progresses. * comparing downloads of the first 100MB, the middle 100MB, and the last 100MB would rule out anything that's influenced by the absolute segment number. * looking at the `Spans.dump()` strings in the flog, I see that two of the three shares (sh7+sh8) have an ever-growing `.received` `DataSpans` structure. A cursory check suggests they are growing by 64 bytes and one range per segment. By the end of the download (seg762), sh7 is holding 37170 bytes in 378 ranges (whereas sh3 only has 1636 bytes in 22 ranges, and remains mostly constant) * This means that we're asking for data which we then don't end up using. We keep it around in `_received` because it might be useful later: maybe we ask for the wrong data because our guess of the segsize (and thus numsegs, and thus the size/placement of the hashtrees) was wrong. But later we might take advantage of whatever we fetched by mistake. * I have two theories: * the `IncompleteHashTree.needed_hashes()` call, when asked what hashes we need to validate leaf 0, might tell us we need the hash for leaf 0 too. However, that hash can be computed from the block of data that makes up the leaf, so we don't really need to fetch it. (whereas we *do* need the hash for leaf 1, since it sits on the "uncle chain" for leaf0). If the desire-side code is conservatively/incorrectly asking for the leaf0 hash, but the satisfy-side code doesn't use it, then we'll add a single 32-byte hash node per segment. * ciphertext hash tree nodes: when we start working on a segment, the desire-side code will ask for ciphertext hash tree nodes from each segment we're using. However, the satisfy-side code will only use the hashes from the first response: by the time the second response arrives, the ciphertext hash tree is satisfied, so that clause isn't reached. This means that we'll leave that data in `._received` forever. This seems most likely: it would explain why the first share (sh3) doesn't grow, whereas the later two shares do, and why I saw a 64-byte increment (the actual growth would depend upon the segment number, and how many new uncle-chain nodes are needed, but 2-nodes is a fairly common value). * The `.received` leftover-data issue shouldn't be such a big deal, N=378 is not a huge number, but the measured increase in inter-segment time suggests that whatever the O() complexity is, N=378 is enough to cause problems. So I think the next directions to pursue are: * build some kind of test framework to exercise a large download without using real remote_read calls, ideally 100MB or 1GB in a few seconds. This would use a `Share` subclass that returns data immediately (well, after a eventual-send) rather than ever touching a server. It might also need to stub out some of the hashtree checks, but does need real `needed_hashes` computations. Then we fix the code until this test finishes in a reasonable amount of time. While I wouldn't have the test case assert anything about runtime, I would have it assert things like `._received` doesn't grow over the course of the test. * measure the CPU seconds needed to download a 100MB file from both old-downloader and new-downloader: if we're looking at a O(n^3^) problem, it will manifest as a much heavier CPU load. (if we were merely looking at a pipelining failure, the CPU time would be the same, but wallclock time would be higher). * stare at `Spans` (and specifically `DataSpans`) for computational-complexity problems. Build some tests of these with N=400ish and see how efficient they are. They're supposed to be linear wrt number-of-ranges, but either they aren't, or they're being called in a way which makes it worse * consider commenting out the dump() calls for a test, or some `assert_invariants` calls, to see if we're hitting that old problem where the data structure is efficient unless we leave in the self-checks or debugging messages * (hard) figure out how to drop the unused ciphertext-hash-tree nodes from the second and later shares * look at `IncompleteHashTree.needed_hashes` and see if we're actually requesting the leaf node that we don't really need. * consider an entirely different `DataSpans` structure. The perhaps-too-clever overlap/merge behavior is mostly just exercised during the fetch of the first segment, before we're sure about the correct number of segments (we fetch some data speculatively to reduce roundtrips; if we guess wrong, we'll get the wrong data, but `DataSpans` lets us easily use that data later if it turns out to be what we needed for some other purpose). Perhaps a data structure which was less tuned for merging adjacent ranges would be better, maybe one which has an explicit `merge()` method that's only called just before the requests are sent out. Or maybe the value of holding on to that data isn't enough to justify the complexity. * a related thought (mostly to remind me of the idea later): for pipelining purposes, I'd like to be able to label the bits in a `Spans` with their purpose: if we send parallel requests for both seg2 and seg3, I'd like the seg2 data to arrive first, so e.g. the hashes needed to validate seg2 should arrive before the bulk block data for seg3. A label on the bits like "this is for seg2" would let us order the requests in such a way to reduce our memory footprint. A label like this might also be useful for handling the unused-ciphertext-hash-tree-nodes problem, if we could remove data from a `DataSpans` that's labelled with an already-complete segnum. Finally, the bug zooko mentioned in [comment:121225](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121225) is real. I'm still working on it, but basically it prevents us from using shares that arrive after the initial batch of requests: they are not initialized properly and don't get a correct block hash tree. I'm working on a fix. The symptom is that we fall back to the initial shares, but if those have died, the download will fail, which is wrong. And I'm still working on the new share-selection algorithm. The code works, and my basic unit tests work, but certain ones require the [comment:121225](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121225) bug to be fixed before it is safe to use (the bug will hurt current downloads, but occurs less frequently).
davidsarah commented 2010-08-16 21:54:35 +00:00
Author
Owner

Attachment spans.py.diff (677 bytes) added

Short-term hack to test for asymptotic inefficiency of DataSpans.get_spans

**Attachment** spans.py.diff (677 bytes) added Short-term hack to test for asymptotic inefficiency of [DataSpans](wiki/DataSpans).get_spans
davidsarah commented 2010-08-16 22:13:42 +00:00
Author
Owner

Replying to zooko:

Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file.
It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in prof-run-110-dump.txt shows major CPU usage in spans:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
    87347    1.430    0.000  460.111    0.005 spans.py:25(*init*)
     9195    0.633    0.000  459.852    0.050 spans.py:256(get_spans)
...
  1315865  140.193    0.000  458.353    0.000 spans.py:46(add)

This is the smoking gun. The code of DataSpans.get_spans is:

    def get_spans(self):
        """Return a Spans object with a bit set for each byte I hold"""
        return Spans([(start, len(data)) for (start,data) in self.spans])

and the Spans constructor has the loop:

    for (start,length) in _span_or_start:
        self.add(start, length)

Spans.add does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of DataSpans.get_spans is Θ(n^2^) where n is the number of spans.

Since Spans uses essentially the same invariant as DataSpans for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement get_spans in Θ(1) time. However I suspect that the important difference here is between Θ(n^2^) and Θ(n).

The diff's implementation of get_spans includes a call to s._check. It may also be worth doing another profile run without that call.

(Some of my comments in ticket:798#comment:18 would reduce the number of calls to overlap and eliminate calls to adjacent, but I don't think that's the critical issue by itself.)

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121219): > Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file. > It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in [prof-run-110-dump.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-bdd412cfa66a) shows major CPU usage in spans: > > ``` > ncalls tottime percall cumtime percall filename:lineno(function) > ... > 87347 1.430 0.000 460.111 0.005 spans.py:25(*init*) > 9195 0.633 0.000 459.852 0.050 spans.py:256(get_spans) > ... > 1315865 140.193 0.000 458.353 0.000 spans.py:46(add) > ``` This is the smoking gun. The code of `DataSpans.get_spans` is: ``` def get_spans(self): """Return a Spans object with a bit set for each byte I hold""" return Spans([(start, len(data)) for (start,data) in self.spans]) ``` and the `Spans` constructor has the loop: ``` for (start,length) in _span_or_start: self.add(start, length) ``` `Spans.add` does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of `DataSpans.get_spans` is Θ(n^2^) where n is the number of spans. Since `Spans` uses essentially the same invariant as `DataSpans` for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement `get_spans` in Θ(1) time. However I suspect that the important difference here is between Θ(n^2^) and Θ(n). The diff's implementation of `get_spans` includes a call to `s._check`. It may also be worth doing another profile run without that call. (Some of my comments in ticket:798#comment:18 would reduce the number of calls to `overlap` and eliminate calls to `adjacent`, but I don't think that's the critical issue by itself.)
davidsarah commented 2010-08-16 23:06:37 +00:00
Author
Owner

Replying to [davidsarah]comment:44:

Spans.add does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of DataSpans.get_spans is Θ(n^2^) where n is the number of spans.

Since Spans uses essentially the same invariant as DataSpans for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement get_spans in Θ(1) time. However I suspect that the important difference here is between Θ(n^2^) and Θ(n).

Note that, given this problem and Brian's observations in comment:121226, the overall time for a download will be Θ(n^3^). So maybe we do need a better data structure (some sort of balanced tree or heap, maybe) if we want to get to Θ(n log n) rather than Θ(n^2^) for the whole download. But maybe that can wait until after releasing 1.8.

(Actually, just logging the output of Spans.dump calls will by itself cause Θ(n^2^) behaviour for the whole download, although with a fairly small constant.)

Replying to [davidsarah]comment:44: > `Spans.add` does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of `DataSpans.get_spans` is Θ(n^2^) where n is the number of spans. > > Since `Spans` uses essentially the same invariant as `DataSpans` for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement `get_spans` in Θ(1) time. However I suspect that the important difference here is between Θ(n^2^) and Θ(n). Note that, given this problem and Brian's observations in [comment:121226](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121226), the overall time for a download will be Θ(n^3^). So maybe we do need a better data structure (some sort of balanced tree or heap, maybe) if we want to get to Θ(n log n) rather than Θ(n^2^) for the whole download. But maybe that can wait until after releasing 1.8. (Actually, just logging the output of `Spans.dump` calls will by itself cause Θ(n^2^) behaviour for the whole download, although with a fairly small constant.)
zooko commented 2010-08-17 05:19:10 +00:00
Author
Owner

Replying to warner:

  • that said, I don't understand how assertions added to spans.py#L47 would
    ever fire. If those same assertions were added to spans.py#L295, I'd get
    it. What types were start/length in your observed assertion failures? And
    what was the stack trace?

This was my mistake. I must have confused it with a different test run. Those assertions never fire.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121226): > > * that said, I don't understand how assertions added to spans.py#L47 would > ever fire. If those same assertions were added to spans.py#L295, I'd get > it. What types were start/length in your observed assertion failures? And > what was the stack trace? This was my mistake. I must have confused it with a different test run. Those assertions never fire.
zooko commented 2010-08-18 12:57:33 +00:00
Author
Owner

run 111

1.8.0c2

requesting all of the file

with flog running

$ echo this is run 111 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 111 tahoe-lafs v1.8.0c2-r4699+
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 27 1421M   27  394M    0     0  18995      0 21:47:59  6:02:29 15:45:30     0^C

real    362m29.965s
user    0m1.123s
sys     0m4.187s
run 111 1.8.0c2 requesting all of the file with flog running ``` $ echo this is run 111 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 111 tahoe-lafs v1.8.0c2-r4699+ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 27 1421M 27 394M 0 0 18995 0 21:47:59 6:02:29 15:45:30 0^C real 362m29.965s user 0m1.123s sys 0m4.187s ```
zooko commented 2010-08-18 12:58:38 +00:00
Author
Owner

Oh, and in run 111 (comment:121230) I had added log messages for all events which touched the Share._received Spans object so the resulting flogfile is a trace of everything that affects that object.

Oh, and in run 111 ([comment:121230](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121230)) I had added log messages for all events which touched the Share._received Spans object so the resulting flogfile is a trace of everything that affects that object.
zooko commented 2010-08-18 14:07:45 +00:00
Author
Owner

The following run has patch attachment:spans.py.diff.

run 112

1.8.0c2

requesting all of the file

with flog running

$ echo this is run 112 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 112 tahoe-lafs v1.8.0c2-r4699+
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 30 1421M   30  435M    0     0   113k      0  3:33:42  1:05:29  2:28:13  139k^C

real    65m29.907s
user    0m0.350s
sys     0m2.302s
The following run has patch attachment:spans.py.diff. run 112 1.8.0c2 requesting all of the file with flog running ``` $ echo this is run 112 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 112 tahoe-lafs v1.8.0c2-r4699+ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 30 1421M 30 435M 0 0 113k 0 3:33:42 1:05:29 2:28:13 139k^C real 65m29.907s user 0m0.350s sys 0m2.302s ```
zooko commented 2010-08-18 15:57:23 +00:00
Author
Owner

The patch helped a lot—compare run 112 to 111—but not enough to make trunk as fast as 1.7.1 on large downoads—compare run 112 to runs 102, 103, 105, and 109.

I intend to write a tool which reads the traces of what was done to the Share._received Spans object and does those operations to a Spans object so that we can run benchmark it and profile it in isolation.

run version downloaded download KBps flags
101 1.8.0c2 100 MB 91
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
111 1.8.0c2 413 MB 19 spanstrace
112 1.8.0c2 456 MB 113 spanstrace, patched
The patch helped a lot—compare run 112 to 111—but not enough to make trunk as fast as 1.7.1 on large downoads—compare run 112 to runs 102, 103, 105, and 109. I intend to write a tool which reads the traces of what was done to the `Share._received` Spans object and does those operations to a Spans object so that we can run benchmark it and profile it in isolation. | run | version | downloaded | download KBps | flags | |---|---|---|---|---| | [101](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121197) | 1.8.0c2 | 100 MB | 91 | | | [102](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121198) | 1.7.1 | 100 MB | 182 | | | [103](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121201) | 1.7.1 | 100 MB | 207 | | | [104](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121202) | 1.8.0c2 | 100 MB | 82 | | | [105](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121207) | 1.7.1 | 100 MB | 211 | | | [109](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121217) | 1.7.1 | 100 MB | 228 | cProfile | | [110](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121218) | 1.8.0c2 | 100 MB | 113 | cProfile | | [111](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121230) | 1.8.0c2 | 413 MB | 19 | spanstrace | | [112](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121232) | 1.8.0c2 | 456 MB | 113 | spanstrace, [patched](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) |
zooko commented 2010-08-18 15:58:51 +00:00
Author
Owner

Attachment run-111-above28-flog.pickle.bz2 (2352820 bytes) added

**Attachment** run-111-above28-flog.pickle.bz2 (2352820 bytes) added
zooko commented 2010-08-18 16:01:33 +00:00
Author
Owner

Attachment run-112-above28-flog.pickle.bz2 (2553365 bytes) added

**Attachment** run-112-above28-flog.pickle.bz2 (2553365 bytes) added
zooko commented 2010-08-18 16:03:04 +00:00
Author
Owner

Attachment debuggery-trace-spans.dpatch.txt (10870 bytes) added

**Attachment** debuggery-trace-spans.dpatch.txt (10870 bytes) added
zooko commented 2010-08-18 16:04:20 +00:00
Author
Owner

debuggery-trace-spans.dpatch.txt adds logging of all events that touched Share._received at loglevel CURIOUS. run-111-above28-flog.pickle.bz2 and run-112-above28-flog.pickle.bz2 are the flogs from run 111 and run 112 with only events logged at level CURIOUS or above.

[debuggery-trace-spans.dpatch.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ed527d17d95b) adds logging of all events that touched `Share._received` at loglevel `CURIOUS`. [run-111-above28-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-41f02d4f7c5e) and [run-112-above28-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-67a0a03f68af) are the flogs from run 111 and run 112 with only events logged at level `CURIOUS` or above.
zooko commented 2010-08-18 17:02:52 +00:00
Author
Owner
run version downloaded download KBps flags
113 1.7.1 543 MB 241 on office network
$ echo this is run 113 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 113 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 36 1421M   36  518M    0     0   241k      0  1:40:26  0:36:37  1:03:49  257k^C

real    36m39.238s
user    0m0.329s
sys     0m2.733s
| run | version | downloaded | download KBps | flags | |---|---|---|---|---| | 113 | 1.7.1 | 543 MB | 241 | on office network | ``` $ echo this is run 113 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 113 tahoe-lafs v1.7.1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 36 1421M 36 518M 0 0 241k 0 1:40:26 0:36:37 1:03:49 257k^C real 36m39.238s user 0m0.329s sys 0m2.733s ```
warner commented 2010-08-18 18:56:04 +00:00
Author
Owner

BTW, be sure to pay attention to the DataSpans too, specifically
Share._received . That's the one that I observed growing linearly with
number-of-segments-read.

I'm close to finishing my rework of the way Shares are handled. If we can
make new-downloader fast enough by fixing complexity issues in spans.py, we
should stick with that for 1.8.0, because those are probably smaller and less
intrusive changes. If not, here are the properties of my Share-handling
changes:

  • use a new diversity-seeking Share selection algorithm, as described in
    comment:27 . This should distribute the download load evenly among all
    known servers when they have equal number of shares, and as evenly as
    possible (while still getting k shares) when not. If more shares are
    discovered later, the algorithm will recalculate the sharemap and take
    advantage of the new shares, and we'll keep looking for new shares as long
    as we don't have the diversity that we want (one share per server).

  • fix the problem in which late shares (not used for the first segment, but
    located and added later) were not given the right sized hashtree and threw
    errors, causing them to be dropped. I think this completely broke the
    "tolerate loss of servers" feature, but the problem might have been caused
    by the diversity-seeking algorithm change, rather than something that was
    in new-downloader originally.

  • deliver all shares to the SegmentFetcher as soon as we learn about
    them, instead of waiting for the fetcher to tell us it's hungry. This
    gives the fetcher more information to work with.

I might be able to attach a patch tomorrow.. there are still some bugs in it,
and I haven't finished implementing the last point (push shares on discovery,
not pull on hunger).

BTW, be sure to pay attention to the `DataSpans` too, specifically `Share._received` . That's the one that I observed growing linearly with number-of-segments-read. I'm close to finishing my rework of the way Shares are handled. If we can make new-downloader fast enough by fixing complexity issues in spans.py, we should stick with that for 1.8.0, because those are probably smaller and less intrusive changes. If not, here are the properties of my Share-handling changes: * use a new diversity-seeking Share selection algorithm, as described in comment:27 . This should distribute the download load evenly among all known servers when they have equal number of shares, and as evenly as possible (while still getting k shares) when not. If more shares are discovered later, the algorithm will recalculate the sharemap and take advantage of the new shares, and we'll keep looking for new shares as long as we don't have the diversity that we want (one share per server). * fix the problem in which late shares (not used for the first segment, but located and added later) were not given the right sized hashtree and threw errors, causing them to be dropped. I think this completely broke the "tolerate loss of servers" feature, but the problem might have been caused by the diversity-seeking algorithm change, rather than something that was in new-downloader originally. * deliver all shares to the `SegmentFetcher` as soon as we learn about them, instead of waiting for the fetcher to tell us it's hungry. This gives the fetcher more information to work with. I might be able to attach a patch tomorrow.. there are still some bugs in it, and I haven't finished implementing the last point (push shares on discovery, not pull on hunger).
warner commented 2010-08-18 23:38:51 +00:00
Author
Owner

Oh, hey, here's a simple patch to try out:

diff --git a/src/allmydata/immutable/downloader/share.py b/src/allmydata/immutable/downloader/share.py
index f7ed4e8..413f907 100644
--- a/src/allmydata/immutable/downloader/share.py
+++ b/src/allmydata/immutable/downloader/share.py
@@ -531,6 +531,9 @@ class Share:
             for o in observers:
                 # goes to SegmentFetcher._block_request_activity
                 o.notify(state=COMPLETE, block=block)
+            # now clear our received data, to dodge the #1170 spans.py
+            # complexity bug
+            self._received = DataSpans()
         except (BadHashError, NotEnoughHashesError), e:
             # rats, we have a corrupt block. Notify our clients that they
             # need to look elsewhere, and advise the server. Unlike

Since self._received is supposed to be empty after each segment is complete (unless we guess the segsize wrong), this patch simply manually empties it at that point. No data is retained from one segment to the next: any mistakes will just cause us to ask for more data next time.

If the problem in this bug is a computational complexity in DataSpans, this should bypass it, by making sure we never add more than 3 or 4 ranges to one, since even O(n^3) is small when n is only 3 or 4. (we should still fix the problem, but maybe the fix can wait for 1.8.1). If the problem is in Spans, or elsewhere, then this won't help.

Oh, hey, here's a simple patch to try out: ``` diff --git a/src/allmydata/immutable/downloader/share.py b/src/allmydata/immutable/downloader/share.py index f7ed4e8..413f907 100644 --- a/src/allmydata/immutable/downloader/share.py +++ b/src/allmydata/immutable/downloader/share.py @@ -531,6 +531,9 @@ class Share: for o in observers: # goes to SegmentFetcher._block_request_activity o.notify(state=COMPLETE, block=block) + # now clear our received data, to dodge the #1170 spans.py + # complexity bug + self._received = DataSpans() except (BadHashError, NotEnoughHashesError), e: # rats, we have a corrupt block. Notify our clients that they # need to look elsewhere, and advise the server. Unlike ``` Since `self._received` is supposed to be empty after each segment is complete (unless we guess the segsize wrong), this patch simply manually empties it at that point. No data is retained from one segment to the next: any mistakes will just cause us to ask for more data next time. If the problem in this bug is a computational complexity in `DataSpans`, this should bypass it, by making sure we never add more than 3 or 4 ranges to one, since even `O(n^3)` is small when n is only 3 or 4. (we should still fix the problem, but maybe the fix can wait for 1.8.1). If the problem is in `Spans`, or elsewhere, then this won't help.
zooko commented 2010-08-19 02:26:36 +00:00
Author
Owner

Attachment run-112-above28-flog-dump-sh8-on-nsziz.txt (7002679 bytes) added

**Attachment** run-112-above28-flog-dump-sh8-on-nsziz.txt (7002679 bytes) added
zooko commented 2010-08-19 02:31:18 +00:00
Author
Owner

run-112-above28-flog-dump-sh8-on-nsziz.txt is a flogtool dump of attachment:attachment:run-112-above28-flog.pickle.bz2 grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for misc/simulators/bench_spans.py.

[run-112-above28-flog-dump-sh8-on-nsziz.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-82ec237a7cdc) is a flogtool dump of attachment:attachment:run-112-above28-flog.pickle.bz2 grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for misc/simulators/bench_spans.py.
zooko commented 2010-08-19 02:31:47 +00:00
Author
Owner

run-112-above28-flog-dump-sh8-on-nsziz.txt is a flogtool dump of run-112-above28-flog.pickle.bz2 grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for [misc/simulators/bench_spans.py]source:trunk/misc/simulators/bench_spans.py@4700.

The output that I get on my Macbook Pro is:

MAIN Zooko-Ofsimplegeos-MacBook-Pro:~/playground/tahoe-lafs/trunk$ PYTHONPATH=~/playground/tahoe-lafs/trunk/support/lib/python2.6/site-packages/ python  ~/playground/tahoe-lafs/trunk/misc/simulators/bench_spans.py ~/Public/Drop\ Box/run-112-above28-flog-dump-sh8-on-nsziz.txt 
all results are in time units per N
time units per second: 1000000; seconds per time unit: 0.000001
(microseconds)
    600 best: 2.265e+01,   3th-best: 2.402e+01, mean: 2.462e+01,   3th-worst: 2.502e+01, worst: 2.585e+01 (of     10)
   6000 best: 1.069e+02,   3th-best: 1.119e+02, mean: 1.137e+02,   3th-worst: 1.149e+02, worst: 1.201e+02 (of     10)
  60000 best: 2.916e+01,   1th-best: 2.916e+01, mean: 5.080e+02,   1th-worst: 9.868e+02, worst: 9.868e+02 (of      2)

This is even though I have spans.py.diff applied.

[run-112-above28-flog-dump-sh8-on-nsziz.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-82ec237a7cdc) is a flogtool dump of [run-112-above28-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-67a0a03f68af) grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for [misc/simulators/bench_spans.py]source:trunk/misc/simulators/bench_spans.py@4700. The output that I get on my Macbook Pro is: ``` MAIN Zooko-Ofsimplegeos-MacBook-Pro:~/playground/tahoe-lafs/trunk$ PYTHONPATH=~/playground/tahoe-lafs/trunk/support/lib/python2.6/site-packages/ python ~/playground/tahoe-lafs/trunk/misc/simulators/bench_spans.py ~/Public/Drop\ Box/run-112-above28-flog-dump-sh8-on-nsziz.txt all results are in time units per N time units per second: 1000000; seconds per time unit: 0.000001 (microseconds) 600 best: 2.265e+01, 3th-best: 2.402e+01, mean: 2.462e+01, 3th-worst: 2.502e+01, worst: 2.585e+01 (of 10) 6000 best: 1.069e+02, 3th-best: 1.119e+02, mean: 1.137e+02, 3th-worst: 1.149e+02, worst: 1.201e+02 (of 10) 60000 best: 2.916e+01, 1th-best: 2.916e+01, mean: 5.080e+02, 1th-worst: 9.868e+02, worst: 9.868e+02 (of 2) ``` This is even though I have [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) applied.
zooko commented 2010-08-19 04:27:20 +00:00
Author
Owner

Okay, the patch from comment:121237 seems to have improved performance significantly. I just performed run 114:

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 10 1421M   10  146M    0     0   149k      0  2:42:42  0:16:42  2:26:00  159k^R
 30 1421M   30  435M    0     0   147k      0  2:44:53  0:50:30  1:54:23  144k^C

real    50m30.207s
user    0m0.290s
sys     0m2.112s

Here is the full table:

run version downloaded download KBps flags
101 1.8.0c2 100 MB 91
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
111 1.8.0c2 413 MB 19 spanstrace
112 1.8.0c2 456 MB 113 spanstrace, spans.py.diff
113 1.7.1 543 MB 241 on office network
114 1.8.0c2 456 MB 147 spans.py.diff + comment:121237

I'm not sure if v1.8.0c2 is now good enough to be considered "not a significant regression" vs. v1.7.1 for downloading large files. I'll go download a large file with v1.7.1 now on my home network for comparison...

Okay, the patch from [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) seems to have improved performance significantly. I just performed run 114: ``` MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 10 1421M 10 146M 0 0 149k 0 2:42:42 0:16:42 2:26:00 159k^R 30 1421M 30 435M 0 0 147k 0 2:44:53 0:50:30 1:54:23 144k^C real 50m30.207s user 0m0.290s sys 0m2.112s ``` Here is the full table: | run | version | downloaded | download KBps | flags | |---|---|---|---|---| | [101](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121197) | 1.8.0c2 | 100 MB | 91 | | | [102](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121198) | 1.7.1 | 100 MB | 182 | | | [103](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121201) | 1.7.1 | 100 MB | 207 | | | [104](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121202) | 1.8.0c2 | 100 MB | 82 | | | [105](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121207) | 1.7.1 | 100 MB | 211 | | | [109](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121217) | 1.7.1 | 100 MB | 228 | cProfile | | [110](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121218) | 1.8.0c2 | 100 MB | 113 | cProfile | | [111](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121230) | 1.8.0c2 | 413 MB | 19 | spanstrace | | [112](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121232) | 1.8.0c2 | 456 MB | 113 | spanstrace, [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) | | [113](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121235) | 1.7.1 | 543 MB | 241 | on office network | | [114](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment--1) | 1.8.0c2 | 456 MB | 147 | [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | I'm not sure if v1.8.0c2 is now good enough to be considered "not a significant regression" vs. v1.7.1 for downloading large files. I'll go download a large file with v1.7.1 now on my home network for comparison...
zooko commented 2010-08-19 05:07:19 +00:00
Author
Owner

Hm, it seems like v1.7.1 is still substantially faster than v1.8.0c2+comment:121237:

run version downloaded download KBps flags
115 1.7.1 456 MB 224
MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 115 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 30 1421M   30  435M    0     0   224k      0  1:48:09  0:33:06  1:15:03  240k^C

real    33m6.746s
user    0m0.287s
sys     0m2.307s
Hm, it seems like v1.7.1 is still substantially faster than v1.8.0c2+[comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237): | run | version | downloaded | download KBps | flags | |---|---|---|---|---| | [115](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment--1) | 1.7.1 | 456 MB | 224 | | ``` MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 115 tahoe-lafs v1.7.1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 30 1421M 30 435M 0 0 224k 0 1:48:09 0:33:06 1:15:03 240k^C real 33m6.746s user 0m0.287s sys 0m2.307s ```
zooko commented 2010-08-19 05:44:58 +00:00
Author
Owner

Well the good news is that comment:121237 fixes the problem that downloads go slower the bigger they are (as expected). The bad news is that even with comment:121237 Tahoe-LAFS v1.8.0c2 is substantially slower than v1.7.1 for large files:

run version downloaded download KBps flags
116 1.8.0c2 314 MB 154 spans.py.diff + comment:121237
MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.8.0c2+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 115 tahoe-lafs v1.8.0c2+comment:54
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 21 1421M   21  300M    0     0   154k      0  2:37:11  0:33:16  2:03:55  160k^C

real    33m16.529s
user    0m0.203s
sys     0m1.507s

I'm going to start another run with v1.8.0c2, this time with the cProfile tool running, and go to sleep.

Well the good news is that [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) fixes the problem that downloads go slower the bigger they are (as expected). The bad news is that even with [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) Tahoe-LAFS v1.8.0c2 is substantially slower than v1.7.1 for large files: | run | version | downloaded | download KBps | flags | |---|---|---|---|---| | [116](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment--1) | 1.8.0c2 | 314 MB | 154 | [spans.py.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-86bb657f297d) + [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) | ``` MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.8.0c2+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 115 tahoe-lafs v1.8.0c2+comment:54 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 21 1421M 21 300M 0 0 154k 0 2:37:11 0:33:16 2:03:55 160k^C real 33m16.529s user 0m0.203s sys 0m1.507s ``` I'm going to start another run with v1.8.0c2, this time with the cProfile tool running, and go to sleep.
zooko commented 2010-08-19 06:01:17 +00:00
Author
Owner

Attachment run-117-prof-cumtime.dump.txt (110161 bytes) added

**Attachment** run-117-prof-cumtime.dump.txt (110161 bytes) added
zooko commented 2010-08-19 06:01:41 +00:00
Author
Owner

I ran 1.8.0c2 under the profiler for a few minutes and then stopped it in order to get the profiling stats (attached). Unfortunately, they do not show any more smoking gun of CPU usage, so the remaining slowdown from v1.7.1 to v1.8.0c2 is likely to be one of the network-scheduling issues that Brian has been thinking about (server selection, pipelining), or else some other sort of subtle timing issue...

Here are the profiling stats for a brief (~4 minute) run of 1.8.0c2:

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  2 1421M    2 35.1M    0     0   160k      0  2:30:45  0:03:43  2:27:02  169k^C

real    3m43.778s
user    0m0.027s
sys     0m0.210s

The functions with the most "cumtime" (time spent in the function or in any of the functions that it called) are:

        1    0.000    0.000  275.676  275.676 base.py:1156(run)
        1    0.175    0.175  275.675  275.675 base.py:1161(mainLoop)
    30070    0.576    0.000  267.477    0.009 selectreactor.py:93(doSelect)
    30070  259.463    0.009  260.149    0.009 {select.select}
    30070    0.191    0.000    7.795    0.000 base.py:751(runUntilCurrent)
     3724    0.031    0.000    7.205    0.002 eventual.py:18(_turn)
    27682    0.300    0.000    6.634    0.000 log.py:71(callWithLogger)
    27682    0.242    0.000    6.297    0.000 log.py:66(callWithContext)
     3453    0.075    0.000    5.742    0.002 share.py:187(loop)
    27682    0.116    0.000    5.719    0.000 context.py:58(callWithContext)
    27682    0.199    0.000    5.587    0.000 context.py:32(callWithContext)
    27681    0.228    0.000    5.310    0.000 selectreactor.py:144(_doReadOrWrite)
     3453    0.089    0.000    5.299    0.002 share.py:238(_do_loop)
    26522    0.347    0.000    4.397    0.000 tcp.py:443(doRead)
    26506    0.306    0.000    4.376    0.000 tcp.py:114(doRead)
28407/17703    0.257    0.000    3.770    0.000 defer.py:453(_runCallbacks)
13880/11193    0.031    0.000    2.824    0.000 defer.py:338(callback)
13983/11295    0.031    0.000    2.795    0.000 defer.py:433(_startRunCallbacks)
3064/3052    0.014    0.000    2.658    0.001 defer.py:108(maybeDeferred)
     3453    0.092    0.000    2.654    0.001 share.py:701(_send_requests)
     2639    0.007    0.000    2.032    0.001 referenceable.py:406(callRemote)
     2608    0.034    0.000    2.025    0.001 banana.py:633(dataReceived)
     2639    0.052    0.000    2.012    0.001 referenceable.py:418(_callRemote)
     2604    0.005    0.000    2.009    0.001 share.py:733(_send_request)
     2608    0.344    0.000    1.988    0.001 banana.py:701(handleData)
     2640    0.008    0.000    1.791    0.001 banana.py:183(send)
     2640    0.021    0.000    1.783    0.001 root.py:92(send)
     2651    0.199    0.000    1.683    0.001 banana.py:191(produce)
    26506    1.680    0.000    1.680    0.000 {built-in method recv}
28693/25918    0.072    0.000    1.651    0.000 defer.py:266(addCallbacks)
     4299    0.044    0.000    1.494    0.000 share.py:267(_get_satisfaction)
    12418    0.125    0.000    1.052    0.000 hashtree.py:298(needed_hashes)

I'll go ahead and leave a download running under the profiler overnight just in case something turns up.

I ran 1.8.0c2 under the profiler for a few minutes and then stopped it in order to get the profiling stats (attached). Unfortunately, they do not show any more smoking gun of CPU usage, so the remaining slowdown from v1.7.1 to v1.8.0c2 is likely to be one of the network-scheduling issues that Brian has been thinking about (server selection, pipelining), or else some other sort of subtle timing issue... Here are the profiling stats for a brief (~4 minute) run of 1.8.0c2: ``` MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 2 1421M 2 35.1M 0 0 160k 0 2:30:45 0:03:43 2:27:02 169k^C real 3m43.778s user 0m0.027s sys 0m0.210s ``` The functions with the most "cumtime" (time spent in the function or in any of the functions that it called) are: ``` 1 0.000 0.000 275.676 275.676 base.py:1156(run) 1 0.175 0.175 275.675 275.675 base.py:1161(mainLoop) 30070 0.576 0.000 267.477 0.009 selectreactor.py:93(doSelect) 30070 259.463 0.009 260.149 0.009 {select.select} 30070 0.191 0.000 7.795 0.000 base.py:751(runUntilCurrent) 3724 0.031 0.000 7.205 0.002 eventual.py:18(_turn) 27682 0.300 0.000 6.634 0.000 log.py:71(callWithLogger) 27682 0.242 0.000 6.297 0.000 log.py:66(callWithContext) 3453 0.075 0.000 5.742 0.002 share.py:187(loop) 27682 0.116 0.000 5.719 0.000 context.py:58(callWithContext) 27682 0.199 0.000 5.587 0.000 context.py:32(callWithContext) 27681 0.228 0.000 5.310 0.000 selectreactor.py:144(_doReadOrWrite) 3453 0.089 0.000 5.299 0.002 share.py:238(_do_loop) 26522 0.347 0.000 4.397 0.000 tcp.py:443(doRead) 26506 0.306 0.000 4.376 0.000 tcp.py:114(doRead) 28407/17703 0.257 0.000 3.770 0.000 defer.py:453(_runCallbacks) 13880/11193 0.031 0.000 2.824 0.000 defer.py:338(callback) 13983/11295 0.031 0.000 2.795 0.000 defer.py:433(_startRunCallbacks) 3064/3052 0.014 0.000 2.658 0.001 defer.py:108(maybeDeferred) 3453 0.092 0.000 2.654 0.001 share.py:701(_send_requests) 2639 0.007 0.000 2.032 0.001 referenceable.py:406(callRemote) 2608 0.034 0.000 2.025 0.001 banana.py:633(dataReceived) 2639 0.052 0.000 2.012 0.001 referenceable.py:418(_callRemote) 2604 0.005 0.000 2.009 0.001 share.py:733(_send_request) 2608 0.344 0.000 1.988 0.001 banana.py:701(handleData) 2640 0.008 0.000 1.791 0.001 banana.py:183(send) 2640 0.021 0.000 1.783 0.001 root.py:92(send) 2651 0.199 0.000 1.683 0.001 banana.py:191(produce) 26506 1.680 0.000 1.680 0.000 {built-in method recv} 28693/25918 0.072 0.000 1.651 0.000 defer.py:266(addCallbacks) 4299 0.044 0.000 1.494 0.000 share.py:267(_get_satisfaction) 12418 0.125 0.000 1.052 0.000 hashtree.py:298(needed_hashes) ``` I'll go ahead and leave a download running under the profiler overnight just in case something turns up.
zooko commented 2010-08-19 06:04:37 +00:00
Author
Owner

Attachment run-115-flog.pickle.bz2 (496367 bytes) added

**Attachment** run-115-flog.pickle.bz2 (496367 bytes) added
zooko commented 2010-08-19 06:06:35 +00:00
Author
Owner

Attachment run-116-flog.pickle.bz2 (3149925 bytes) added

**Attachment** run-116-flog.pickle.bz2 (3149925 bytes) added
zooko commented 2010-08-19 06:11:10 +00:00
Author
Owner

If you wanted to investigate why 1.8.0c2 is so much slower than 1.7.1 at downloading a large file even after applying the comment:121237 patch, then you could use run-115-flog.pickle.bz2 and run-116-flog.pickle.bz2 as evidence. Hm, hey waitasecond, in my earlier testing (recorded in this ticket), 1.8.0c2 was faster then 1.7.1 for small files (<= 10 MB). This was also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev). But currently it looks to me like the average download speed (as reported by curl during its operation) is the same at the beginning of the download as at the end, i.e. even during the first 10 MB or so 1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more than 200 KBps. Did something change?

I guess I (or someone) should run 1.7.1 vs. 1.8.0c2+comment:121237 on 10 MB files. But I'm way too tired to start that again right now.

Man, I'm really worn out from staying up night after night poking at this and then having to get up early the next morning to help my children get ready for school and myself ready for work. I could use more help!

If you wanted to investigate why 1.8.0c2 is so much slower than 1.7.1 at downloading a large file even after applying the [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) patch, then you could use [run-115-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-77f68a69fe1c) and [run-116-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-40a6472a8f45) as evidence. Hm, hey waitasecond, in my earlier testing (recorded in this ticket), 1.8.0c2 was *faster* then 1.7.1 for small files (<= 10 MB). This was also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev). But currently it looks to me like the average download speed (as reported by curl during its operation) is the same at the beginning of the download as at the end, i.e. even during the first 10 MB or so 1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more than 200 KBps. Did something change? I guess I (or someone) should run 1.7.1 vs. 1.8.0c2+[comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) on 10 MB files. But I'm way too tired to start that again right now. Man, I'm really worn out from staying up night after night poking at this and then having to get up early the next morning to help my children get ready for school and myself ready for work. I could use more help!
davidsarah commented 2010-08-19 06:44:51 +00:00
Author
Owner

Perhaps the remaining issue is server selection. Let's try Brian's comment:121209 diversity-seeking algorithm, combined with the comment:121237 fix.

Perhaps the remaining issue is server selection. Let's try Brian's [comment:121209](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121209) diversity-seeking algorithm, combined with the [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) fix.
zooko commented 2010-08-19 14:07:15 +00:00
Author
Owner

Replying to davidsarah:

Perhaps the remaining issue is server selection. Let's try Brian's comment:121209 diversity-seeking algorithm, combined with the comment:121237 fix.

I'm willing to try the comment:121209 diversity-seeking algorithm, but I also would like to verify whether or not server-selection is one of the factors by inspecting the flogs...

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121245): > Perhaps the remaining issue is server selection. Let's try Brian's [comment:121209](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121209) diversity-seeking algorithm, combined with the [comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237) fix. I'm willing to try the [comment:121209](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121209) diversity-seeking algorithm, but I also would like to verify whether or not server-selection is one of the factors by inspecting the flogs...
zooko commented 2010-08-19 14:10:46 +00:00
Author
Owner

Yes, the overnight run yielded no smoking gun (smoking CPU?) that I can see. I'll attach the full profiling results as an attachment.

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1421M  100 1421M    0     0   158k      0  2:32:39  2:32:39 --:--:--  168k

real    152m39.145s
user    0m1.024s
sys     0m7.661s
         275630510 function calls (274835891 primitive calls) in 28165.965 CPU seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 28165.965 28165.965 base.py:1156(run)
        1    8.156    8.156 28165.964 28165.964 base.py:1161(mainLoop)
  1226198   24.746    0.000 27834.397    0.023 selectreactor.py:93(doSelect)
  1226198 27531.442    0.022 27547.532    0.022 {select.select}
  1226198    8.846    0.000  313.488    0.000 base.py:751(runUntilCurrent)
   148051    1.222    0.000  285.941    0.002 eventual.py:18(_turn)
  1107126   12.628    0.000  257.892    0.000 log.py:71(callWithLogger)
  1107126   10.433    0.000  243.790    0.000 log.py:66(callWithContext)
   136521    3.020    0.000  229.335    0.002 share.py:187(loop)
  1107126    5.102    0.000  218.355    0.000 context.py:58(callWithContext)
  1107126    8.483    0.000  212.571    0.000 context.py:32(callWithContext)
   136521    1.831    0.000  211.328    0.002 share.py:238(_do_loop)
  1107126    9.274    0.000  200.718    0.000 selectreactor.py:144(_doReadOrWrite)
  1064200   13.055    0.000  178.141    0.000 tcp.py:114(doRead)
  1064262   15.145    0.000  165.679    0.000 tcp.py:443(doRead)
1100168/689294   10.315    0.000  135.808    0.000 defer.py:453(_runCallbacks)
536780/433736    1.023    0.000  110.261    0.000 defer.py:338(callback)
540512/437468    1.259    0.000  109.033    0.000 defer.py:433(_startRunCallbacks)
   136521    3.623    0.000  107.058    0.001 share.py:701(_send_requests)
118185/118168    0.513    0.000   94.733    0.001 defer.py:108(maybeDeferred)
   102399    0.197    0.000   81.560    0.001 share.py:733(_send_request)
   102514    0.290    0.000   81.467    0.001 referenceable.py:406(callRemote)
   102514    1.867    0.000   80.688    0.001 referenceable.py:418(_callRemote)
   104846    1.446    0.000   79.737    0.001 banana.py:633(dataReceived)
   104846   14.309    0.000   78.171    0.001 banana.py:701(handleData)
   102515    0.320    0.000   72.324    0.001 banana.py:183(send)
   102515    0.785    0.000   72.004    0.001 root.py:92(send)
  1064200   70.123    0.000   70.123    0.000 {built-in method recv}
   102562    7.417    0.000   68.190    0.001 banana.py:191(produce)
   170643    1.742    0.000   60.005    0.000 share.py:267(_get_satisfaction)
1114150/1011232    2.898    0.000   50.939    0.000 defer.py:266(addCallbacks)
   494839    5.175    0.000   42.363    0.000 hashtree.py:298(needed_hashes)
  4010897    6.757    0.000   41.319    0.000 tcp.py:413(write)
   494839    6.477    0.000   37.139    0.000 hashtree.py:128(needed_for)
  4010897   18.303    0.000   34.562    0.000 abstract.py:177(write)
   102809    1.151    0.000   33.768    0.000 banana.py:1049(handleClose)
   136521    2.571    0.000   33.606    0.000 share.py:556(_desire)
    45498    0.163    0.000   32.926    0.001 hashutil.py:51(tagged_hash)
    45499    0.083    0.000   32.782    0.001 fetcher.py:83(loop)
  1230099    3.737    0.000   32.762    0.000 banana.py:22(int2b128)
    45499    0.451    0.000   32.699    0.001 fetcher.py:91(_do_loop)
   227571    0.386    0.000   31.989    0.000 hashutil.py:31(update)
   102514    1.152    0.000   31.951    0.000 call.py:652(receiveClose)
   227571   31.557    0.000   31.557    0.000 {method 'update' of '_sha256.SHA256' objects}
Yes, the overnight run yielded no smoking gun (smoking CPU?) that I can see. I'll attach the full profiling results as an attachment. ``` MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb.ogg this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1421M 100 1421M 0 0 158k 0 2:32:39 2:32:39 --:--:-- 168k real 152m39.145s user 0m1.024s sys 0m7.661s ``` ``` 275630510 function calls (274835891 primitive calls) in 28165.965 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 28165.965 28165.965 base.py:1156(run) 1 8.156 8.156 28165.964 28165.964 base.py:1161(mainLoop) 1226198 24.746 0.000 27834.397 0.023 selectreactor.py:93(doSelect) 1226198 27531.442 0.022 27547.532 0.022 {select.select} 1226198 8.846 0.000 313.488 0.000 base.py:751(runUntilCurrent) 148051 1.222 0.000 285.941 0.002 eventual.py:18(_turn) 1107126 12.628 0.000 257.892 0.000 log.py:71(callWithLogger) 1107126 10.433 0.000 243.790 0.000 log.py:66(callWithContext) 136521 3.020 0.000 229.335 0.002 share.py:187(loop) 1107126 5.102 0.000 218.355 0.000 context.py:58(callWithContext) 1107126 8.483 0.000 212.571 0.000 context.py:32(callWithContext) 136521 1.831 0.000 211.328 0.002 share.py:238(_do_loop) 1107126 9.274 0.000 200.718 0.000 selectreactor.py:144(_doReadOrWrite) 1064200 13.055 0.000 178.141 0.000 tcp.py:114(doRead) 1064262 15.145 0.000 165.679 0.000 tcp.py:443(doRead) 1100168/689294 10.315 0.000 135.808 0.000 defer.py:453(_runCallbacks) 536780/433736 1.023 0.000 110.261 0.000 defer.py:338(callback) 540512/437468 1.259 0.000 109.033 0.000 defer.py:433(_startRunCallbacks) 136521 3.623 0.000 107.058 0.001 share.py:701(_send_requests) 118185/118168 0.513 0.000 94.733 0.001 defer.py:108(maybeDeferred) 102399 0.197 0.000 81.560 0.001 share.py:733(_send_request) 102514 0.290 0.000 81.467 0.001 referenceable.py:406(callRemote) 102514 1.867 0.000 80.688 0.001 referenceable.py:418(_callRemote) 104846 1.446 0.000 79.737 0.001 banana.py:633(dataReceived) 104846 14.309 0.000 78.171 0.001 banana.py:701(handleData) 102515 0.320 0.000 72.324 0.001 banana.py:183(send) 102515 0.785 0.000 72.004 0.001 root.py:92(send) 1064200 70.123 0.000 70.123 0.000 {built-in method recv} 102562 7.417 0.000 68.190 0.001 banana.py:191(produce) 170643 1.742 0.000 60.005 0.000 share.py:267(_get_satisfaction) 1114150/1011232 2.898 0.000 50.939 0.000 defer.py:266(addCallbacks) 494839 5.175 0.000 42.363 0.000 hashtree.py:298(needed_hashes) 4010897 6.757 0.000 41.319 0.000 tcp.py:413(write) 494839 6.477 0.000 37.139 0.000 hashtree.py:128(needed_for) 4010897 18.303 0.000 34.562 0.000 abstract.py:177(write) 102809 1.151 0.000 33.768 0.000 banana.py:1049(handleClose) 136521 2.571 0.000 33.606 0.000 share.py:556(_desire) 45498 0.163 0.000 32.926 0.001 hashutil.py:51(tagged_hash) 45499 0.083 0.000 32.782 0.001 fetcher.py:83(loop) 1230099 3.737 0.000 32.762 0.000 banana.py:22(int2b128) 45499 0.451 0.000 32.699 0.001 fetcher.py:91(_do_loop) 227571 0.386 0.000 31.989 0.000 hashutil.py:31(update) 102514 1.152 0.000 31.951 0.000 call.py:652(receiveClose) 227571 31.557 0.000 31.557 0.000 {method 'update' of '_sha256.SHA256' objects} ```
zooko commented 2010-08-19 14:11:27 +00:00
Author
Owner

Attachment run-118-prof-cumtime.dump.txt (116153 bytes) added

**Attachment** run-118-prof-cumtime.dump.txt (116153 bytes) added
warner commented 2010-08-19 17:27:57 +00:00
Author
Owner

Attachment 1170-combo.diff (56434 bytes) added

patch to prefer share diversity, forget leftover data after each segment, and fix handling of numsegs

**Attachment** 1170-combo.diff (56434 bytes) added patch to prefer share diversity, forget leftover data after each segment, and fix handling of numsegs
warner commented 2010-08-19 17:29:47 +00:00
Author
Owner

the "1170-combo.diff" patch combines the approaches as suggested in comment:121245 . Please give it a try and see if it helps. I'll try to look at the flogs to see what servers were used, to see if that run has a diversity issue or not.

the "1170-combo.diff" patch combines the approaches as suggested in [comment:121245](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121245) . Please give it a try and see if it helps. I'll try to look at the flogs to see what servers were used, to see if that run has a diversity issue or not.
zooko commented 2010-08-19 18:27:51 +00:00
Author
Owner

Okay, I investigated server selection on the bus to work this morning. run-115-flog.pickle.bz2 shows:

22:10:56.877 [649]: <CiphertextDownloader #1>(u6h6p4mlr3j7): starting download
22:10:56.881 [650]: sending DYHB to [tavrk54e]
22:10:56.882 [651]: sending DYHB to [xflgj7cg]
22:10:56.883 [652]: sending DYHB to [sp26qyqc]
22:10:56.884 [653]: sending DYHB to [zkj6swl2]
22:10:56.885 [654]: sending DYHB to [sroojqcx]
22:10:56.886 [655]: sending DYHB to [4rk5oqd7]
22:10:56.887 [656]: sending DYHB to [nszizgf5]
22:10:56.887 [657]: sending DYHB to [62nlabgf]
22:10:56.888 [658]: sending DYHB to [fp3xjndg]
22:10:56.888 [659]: sending DYHB to [rpiw4n3f]
22:10:57.006 [660]: got results from [xflgj7cg]: shnums []
22:10:57.010 [661]: got results from [rpiw4n3f]: shnums []
22:10:57.017 [662]: got results from [62nlabgf]: shnums []
22:10:57.058 [663]: got results from [nszizgf5]: shnums [3, 7, 8]
22:10:57.066 [664]: got results from [4rk5oqd7]: shnums [2]
22:10:57.110 [665]: got results from [fp3xjndg]: shnums [4, 9]
22:10:57.126 [666]: got results from [tavrk54e]: shnums []
22:10:57.181 [667]: got results from [zkj6swl2]: shnums []
22:10:57.205 [668]: got results from [sroojqcx]: shnums [1, 6]
22:10:57.474 [669]: got results from [sp26qyqc]: shnums [0, 5]

The 1.7.1 flog doesn't which servers are actually being used for Request Blocks, but we know that 1.7.1 will always choose to get all three shares from nszizgf5 in a case like this.

Therefore I don't think that 1.8's share-selection can be part of the explanation for why 1.8 is slower than 1.7.

(This doesn't mean that improved share selection wouldn't make 1.9 faster than 1.8 is now.)

Okay, I investigated server selection on the bus to work this morning. [run-115-flog.pickle.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-77f68a69fe1c) shows: ``` 22:10:56.877 [649]: <CiphertextDownloader #1>(u6h6p4mlr3j7): starting download 22:10:56.881 [650]: sending DYHB to [tavrk54e] 22:10:56.882 [651]: sending DYHB to [xflgj7cg] 22:10:56.883 [652]: sending DYHB to [sp26qyqc] 22:10:56.884 [653]: sending DYHB to [zkj6swl2] 22:10:56.885 [654]: sending DYHB to [sroojqcx] 22:10:56.886 [655]: sending DYHB to [4rk5oqd7] 22:10:56.887 [656]: sending DYHB to [nszizgf5] 22:10:56.887 [657]: sending DYHB to [62nlabgf] 22:10:56.888 [658]: sending DYHB to [fp3xjndg] 22:10:56.888 [659]: sending DYHB to [rpiw4n3f] 22:10:57.006 [660]: got results from [xflgj7cg]: shnums [] 22:10:57.010 [661]: got results from [rpiw4n3f]: shnums [] 22:10:57.017 [662]: got results from [62nlabgf]: shnums [] 22:10:57.058 [663]: got results from [nszizgf5]: shnums [3, 7, 8] 22:10:57.066 [664]: got results from [4rk5oqd7]: shnums [2] 22:10:57.110 [665]: got results from [fp3xjndg]: shnums [4, 9] 22:10:57.126 [666]: got results from [tavrk54e]: shnums [] 22:10:57.181 [667]: got results from [zkj6swl2]: shnums [] 22:10:57.205 [668]: got results from [sroojqcx]: shnums [1, 6] 22:10:57.474 [669]: got results from [sp26qyqc]: shnums [0, 5] ``` The 1.7.1 flog doesn't which servers are actually being used for Request Blocks, but we know that 1.7.1 will always choose to get all three shares from nszizgf5 in a case like this. Therefore I don't think that 1.8's share-selection can be part of the explanation for why 1.8 is slower than 1.7. (This doesn't mean that improved share selection wouldn't make 1.9 faster than 1.8 is now.)
warner commented 2010-08-19 19:13:37 +00:00
Author
Owner

Replying to zooko:

Hm, hey waitasecond, in my earlier testing (recorded in this ticket),
1.8.0c2 was faster then 1.7.1 for small files (<= 10 MB). This was
also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev).
But currently it looks to me like the average download speed (as
reported by curl during its operation) is the same at the beginning of
the download as at the end, i.e. even during the first 10 MB or so
1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more
than 200 KBps. Did something change?

There's a sizeable startup time in 1.7.1 (lots of roundtrips), which
went away in 1.8.0c2 . I think we're all in agreement about the
small-file speedups that provides (i.e. we've not seen any evidence to
the contrary). The change is on the order of a few seconds, though, so I
think a 10MB file (or portion of a file) that takes 10MB/150kBps= 60s to
complete won't be affected very much. I don't think you'll be able to
see its effects in the curl output.

Nathan's tests were on hundreds or thousands of small files.

From my tests, the new-downloader sees about 500ms more taken to
complete the first segment than the second and following ones. I believe
that's the time spend doing server selection, UEB fetches, and the large
hash chain fetches.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121244): > Hm, hey waitasecond, in my earlier testing (recorded in this ticket), > 1.8.0c2 was *faster* then 1.7.1 for small files (<= 10 MB). This was > also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev). > But currently it looks to me like the average download speed (as > reported by curl during its operation) is the same at the beginning of > the download as at the end, i.e. even during the first 10 MB or so > 1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more > than 200 KBps. Did something change? There's a sizeable startup time in 1.7.1 (lots of roundtrips), which went away in 1.8.0c2 . I think we're all in agreement about the small-file speedups that provides (i.e. we've not seen any evidence to the contrary). The change is on the order of a few seconds, though, so I think a 10MB file (or portion of a file) that takes 10MB/150kBps= 60s to complete won't be affected very much. I don't think you'll be able to see its effects in the curl output. Nathan's tests were on hundreds or thousands of small files. From my tests, the new-downloader sees about 500ms more taken to complete the first segment than the second and following ones. I believe that's the time spend doing server selection, UEB fetches, and the large hash chain fetches.
zooko commented 2010-08-20 00:07:51 +00:00
Author
Owner

Attachment runs-119,120,121-curl-stdout.txt (2152 bytes) added

I ran three more measurements today at the office -- runs 119, 120, and 121 . These are the curl stdout from those. I will update a table with these results and put it into the original opening comment of this ticket.

**Attachment** runs-119,120,121-curl-stdout.txt (2152 bytes) added I ran three more measurements today at the office -- runs 119, 120, and 121 . These are the curl stdout from those. I will update a table with these results and put it into the original opening comment of this ticket.
zooko commented 2010-08-20 16:31:25 +00:00
Author
Owner

Attachment runs-122,123,124,125,126,127-curl-stdout.4.txt (5676 bytes) added

I ran several more measurements from home, intended to test whether the logging in new-downloader is partially responsible for new-downloader's slowness. These are the curl stdout from those runs. I will update the table in the opening comment of this ticket to include these runs.

**Attachment** runs-122,123,124,125,126,127-curl-stdout.4.txt (5676 bytes) added I ran several more measurements from home, intended to test whether the logging in new-downloader is partially responsible for new-downloader's slowness. These are the curl stdout from those runs. I will update the table in the opening comment of this ticket to include these runs.
zooko commented 2010-08-20 16:34:35 +00:00
Author
Owner

Attachment run123-down-status.html.bz2 (1537800 bytes) added

status page results for run 123

**Attachment** run123-down-status.html.bz2 (1537800 bytes) added status page results for run 123
zooko commented 2010-08-20 16:35:42 +00:00
Author
Owner

Attachment run127-down-status.html.bz2 (105278 bytes) added

**Attachment** run127-down-status.html.bz2 (105278 bytes) added
zooko commented 2010-08-20 16:45:39 +00:00
Author
Owner

Brian: I updated the table in the initial comment. Please let me know what other sorts of measurements you would like from me. It looks to me like there is still a significant regression in 1.8.0c2+comment:121237+spans.py.diff even if I comment-out almost all calls to log.msg() in immutable/download/*.py. I will attach the patch that I used to comment out all those logging calls. I'll probably go ahead and apply your 1170-combo.diff and run 100 MB downloads from the office during work today.

Brian: I updated the table in the initial comment. Please let me know what other sorts of measurements you would like from me. It looks to me like there is still a significant regression in 1.8.0c2+[comment:121237](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121237)+spans.py.diff even if I comment-out almost all calls to log.msg() in `immutable/download/*.py`. I will attach the patch that I used to comment out all those logging calls. I'll probably go ahead and apply your [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) and run 100 MB downloads from the office during work today.
zooko commented 2010-08-20 16:46:42 +00:00
Author
Owner

Attachment comment-out-logging-in-immutable-download.dpatch.txt (36282 bytes) added

**Attachment** comment-out-logging-in-immutable-download.dpatch.txt (36282 bytes) added
warner commented 2010-08-20 21:59:18 +00:00
Author
Owner

I did a quick test at home with a def msg(*args,**kwargs):pass in src/allmydata/util/log.py, and didn't see a noticable change (the noise level was pretty high, so even if there were a 10% difference, I probably wouldn't have been able to spot it).

In some other testing at work, I was unable to see a consistent performance difference between 171 and my comment:121248 combo-patch, but the speed was warbling all over the place, so I don't feel that it was a very conclusive run. I'd patched both to only use a single server (nszi?), to reduce the variables.

What I'd like to do is to run a series of tests from my home network (no other traffic) using my personal backupgrid server (no other traffic), to see how consistent the results are. Maybe tomorrow I'll get a chance to try that.

I did a quick test at home with a `def msg(*args,**kwargs):pass` in `src/allmydata/util/log.py`, and didn't see a noticable change (the noise level was pretty high, so even if there were a 10% difference, I probably wouldn't have been able to spot it). In some other testing at work, I was unable to see a consistent performance difference between 171 and my [comment:121248](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121248) combo-patch, but the speed was warbling all over the place, so I don't feel that it was a very conclusive run. I'd patched both to only use a single server (nszi?), to reduce the variables. What I'd like to do is to run a series of tests from my home network (no other traffic) using my personal backupgrid server (no other traffic), to see how consistent the results are. Maybe tomorrow I'll get a chance to try that.
zooko commented 2010-08-21 00:44:05 +00:00
Author
Owner

Attachment perf-measure-office.txt (7307 bytes) added

I ran several more measurements from the office, intended to test whether Brian's 1170-combo.diff made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results.

**Attachment** perf-measure-office.txt (7307 bytes) added I ran several more measurements from the office, intended to test whether Brian's [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results.
zooko commented 2010-08-21 00:47:15 +00:00
Author
Owner

Attachment runs-129,130,131,132,133,134,135,136,137,138,139-curl-stdout.txt (7307 bytes) added

I ran several more measurements from the office, intended to test whether Brian's 1170-combo.diff made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results. This attachment is a better-named copy of perf-measure-office.txt .

**Attachment** runs-129,130,131,132,133,134,135,136,137,138,139-curl-stdout.txt (7307 bytes) added I ran several more measurements from the office, intended to test whether Brian's [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results. This attachment is a better-named copy of [perf-measure-office.txt](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-fa3933ef52da) .
zooko commented 2010-08-21 00:48:33 +00:00
Author
Owner

Attachment run139-down-status.html.bz2 (910388 bytes) added

**Attachment** run139-down-status.html.bz2 (910388 bytes) added
zooko commented 2010-08-21 01:04:51 +00:00
Author
Owner

Brian: please inspect the table in the ticket initial comment. It seems like there is a bimodal distribution with 1170-combo.diff, half of the time it runs at about 179 or 180 KBps (it ran at 169 KBps for the large download) and the other half of the time it runs at 262–291 KBps. The latter range is slightly faster than 1.7.1! I attached the down-status.html for the long download that ran at 169 KBps: run139-down-status.html.bz2.

Brian: please inspect the table in the ticket initial comment. It seems like there is a bimodal distribution with [1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983), half of the time it runs at about 179 or 180 KBps (it ran at 169 KBps for the large download) and the other half of the time it runs at 262–291 KBps. The latter range is slightly faster than 1.7.1! I attached the down-status.html for the long download that ran at 169 KBps: [run139-down-status.html.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-c0ab08d6783c).
terrell commented 2010-08-21 17:05:52 +00:00
Author
Owner

Attachment runs-140,141-curl-stdout.txt (1283 bytes) added

**Attachment** runs-140,141-curl-stdout.txt (1283 bytes) added
terrell commented 2010-08-21 17:22:30 +00:00
Author
Owner

and run 142

[11:37:28:trel:~/Desktop/tahoestuff/trunkpatched] time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   215k      0  0:07:32  0:07:32 --:--:--  182k

real	7m32.860s 
user	0m0.107s
sys	0m0.765s
and run 142 ``` [11:37:28:trel:~/Desktop/tahoestuff/trunkpatched] time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg > bbb-100M.ogg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 95.3M 100 95.3M 0 0 215k 0 0:07:32 0:07:32 --:--:-- 182k real 7m32.860s user 0m0.107s sys 0m0.765s ```
zooko commented 2010-08-21 18:17:35 +00:00
Author
Owner

Attachment run-zooko1000-status.html (1235376 bytes) added

**Attachment** run-zooko1000-status.html (1235376 bytes) added
zooko commented 2010-08-21 18:40:29 +00:00
Author
Owner

Attachment run-zooko1000-curl-stdout.txt (313 bytes) added

**Attachment** run-zooko1000-curl-stdout.txt (313 bytes) added
zooko commented 2010-08-21 19:32:33 +00:00
Author
Owner

run-zooko1000 was from my local coffeeshop—Caffe Sole in South Boulder—and the status.html shows this interesting pattern that the downloader immediately issued 10 DYHB queries (as expected), and then it took 9.6 seconds for the first DYHB response to arrive. Then the really weird, part, it took 8.4s more for the next seven DYHB responses to arrive (totalling 18s from request to response)! Then, still weird, it took a total request-to-response time of 6 minutes for the ninth response and a total of 8 minutes for the tenth. Also, as soon as the first response arrived the downloader issued a new DYHB request, and that one, the eleventh one, took 8.92s for the response to arrive.

So, I suppose there is something very messed up about the network at my local coffeeshop. Perhaps it blocks a flow that starts on an idle TCP connection while it is trying to figure out how to insert ads into any HTTP responses, or something. Note that these TCP connections were all already established long before the download began.

Take-aways?

I guess it is that we should not make assumptions about "reasonable" for IP traffic. That is: if we want to support people who use Tahoe-LAFS from coffeeshops, over tethered cell phones, at Burning Man, on satellite uplinks, on the International Space Station, etc. (which I do).

Another take-away is that 1.8.0c2+combo.diff did pretty well in this situation! (I think 1.7.1 probably would have done well too but I didn't get a chance to try it.)

[run-zooko1000](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-2427ee5f7936) was from my local coffeeshop—Caffe Sole in South Boulder—and the [status.html](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-06ce4912a4eb) shows this interesting pattern that the downloader immediately issued 10 DYHB queries (as expected), and then it took 9.6 seconds for the first DYHB response to arrive. Then the *really* weird, part, it took 8.4s more for the next seven DYHB responses to arrive (totalling 18s from request to response)! Then, still weird, it took a total request-to-response time of 6 minutes for the ninth response and a total of 8 minutes for the tenth. Also, as soon as the first response arrived the downloader issued a new DYHB request, and *that* one, the eleventh one, took 8.92s for the response to arrive. So, I suppose there is something very messed up about the network at my local coffeeshop. Perhaps it blocks a flow that starts on an idle TCP connection while it is trying to figure out how to insert ads into any HTTP responses, or something. Note that these TCP connections were all already established long before the download began. Take-aways? I guess it is that we should not make assumptions about "reasonable" for IP traffic. That is: *if* we want to support people who use Tahoe-LAFS from coffeeshops, over tethered cell phones, at Burning Man, on satellite uplinks, on the International Space Station, etc. (which I do). Another take-away is that 1.8.0c2+combo.diff did pretty well in this situation! (I think 1.7.1 probably would have done well too but I didn't get a chance to try it.)
terrell commented 2010-08-21 20:09:49 +00:00
Author
Owner

Attachment 141-status.html.bz2 (109438 bytes) added

status page for run 141

**Attachment** 141-status.html.bz2 (109438 bytes) added status page for run 141
terrell commented 2010-08-21 20:10:06 +00:00
Author
Owner

Attachment 142-status.html.bz2 (110361 bytes) added

status page for run 142

**Attachment** 142-status.html.bz2 (110361 bytes) added status page for run 142
terrell commented 2010-08-22 04:41:52 +00:00
Author
Owner

Attachment runs-143to162-alternating-stdout.txt (25821 bytes) added

capture from stdout - alternating between 1.7.1 and trunk+combo - run from cable modem at home on pubgrid

**Attachment** runs-143to162-alternating-stdout.txt (25821 bytes) added capture from stdout - alternating between 1.7.1 and trunk+combo - run from cable modem at home on pubgrid
terrell commented 2010-08-22 04:42:42 +00:00
Author
Owner

Attachment status-143.html.bz2 (109804 bytes) added

**Attachment** status-143.html.bz2 (109804 bytes) added
terrell commented 2010-08-22 04:42:56 +00:00
Author
Owner

Attachment status-145.html.bz2 (110838 bytes) added

**Attachment** status-145.html.bz2 (110838 bytes) added
terrell commented 2010-08-22 04:43:12 +00:00
Author
Owner

Attachment status-147.html.bz2 (109473 bytes) added

**Attachment** status-147.html.bz2 (109473 bytes) added
terrell commented 2010-08-22 04:43:24 +00:00
Author
Owner

Attachment status-149.html.bz2 (109156 bytes) added

**Attachment** status-149.html.bz2 (109156 bytes) added
terrell commented 2010-08-22 04:43:39 +00:00
Author
Owner

Attachment status-151.html.bz2 (111256 bytes) added

**Attachment** status-151.html.bz2 (111256 bytes) added
terrell commented 2010-08-22 04:43:57 +00:00
Author
Owner

Attachment status-153.html.bz2 (109625 bytes) added

**Attachment** status-153.html.bz2 (109625 bytes) added
terrell commented 2010-08-22 04:44:11 +00:00
Author
Owner

Attachment status-155.html.bz2 (110178 bytes) added

**Attachment** status-155.html.bz2 (110178 bytes) added
terrell commented 2010-08-22 04:44:29 +00:00
Author
Owner

Attachment status-157.html.bz2 (111354 bytes) added

**Attachment** status-157.html.bz2 (111354 bytes) added
terrell commented 2010-08-22 04:44:43 +00:00
Author
Owner

Attachment status-159.html.bz2 (109261 bytes) added

**Attachment** status-159.html.bz2 (109261 bytes) added
terrell commented 2010-08-22 04:44:57 +00:00
Author
Owner

Attachment status-161.html.bz2 (109509 bytes) added

**Attachment** status-161.html.bz2 (109509 bytes) added
terrell commented 2010-08-22 04:57:54 +00:00
Author
Owner

runs 143-162 generated with the following bash script to alternate the clients and grab the status file:

feel free to adapt and reuse.

##################################
# EDIT THESE VARIABLES
##################################
FIRSTRUN=143
RUNS=20
FILESIZE=100000000
BASE171="/Users/trel/Desktop/tahoestuff/allmydata-tahoe-1.7.1"
BASEPATCH="/Users/trel/Desktop/tahoestuff/trunkpatched"
CACHEDIR="/Users/trel/.tahoe/private/cache/download"
FILENAME="http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg"
SAVEFILE="deleteme.ogg"
##################################
RUNTOTAL=$FIRSTRUN
RUNCOUNT=0
while [ $RUNCOUNT -lt $RUNS ]; do
  let RUNTOTAL=FIRSTRUN+RUNCOUNT
  echo "-----------------------------------------------"
  echo "RUN $RUNTOTAL"
  if [ $[$RUNTOTAL % 2] -eq 0 ]
  then
    cd $BASE171
  else
    cd $BASEPATCH
  fi
  pwd
  bin/tahoe --version
  bin/tahoe stop
  bin/tahoe start
  echo "waiting 10s for node to spin up and connect..."
  sleep 10
  echo "curl --range 0-$FILESIZE $FILENAME > $SAVEFILE"
  time curl --range 0-$FILESIZE $FILENAME > $SAVEFILE
  if [ `pwd` = $BASEPATCH ]
  then
    echo "saving status.html..."
    curl http://localhost:3456/status/down-0 > status-$RUNTOTAL.html
    bzip2 status-$RUNTOTAL.html
  fi
  rm -f $CACHEDIR/*
  rm $SAVEFILE
  let RUNCOUNT=RUNCOUNT+1
  sleep 1
done


runs 143-162 generated with the following bash script to alternate the clients and grab the status file: feel free to adapt and reuse. ```/bin/bash ################################## # EDIT THESE VARIABLES ################################## FIRSTRUN=143 RUNS=20 FILESIZE=100000000 BASE171="/Users/trel/Desktop/tahoestuff/allmydata-tahoe-1.7.1" BASEPATCH="/Users/trel/Desktop/tahoestuff/trunkpatched" CACHEDIR="/Users/trel/.tahoe/private/cache/download" FILENAME="http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg" SAVEFILE="deleteme.ogg" ################################## RUNTOTAL=$FIRSTRUN RUNCOUNT=0 while [ $RUNCOUNT -lt $RUNS ]; do let RUNTOTAL=FIRSTRUN+RUNCOUNT echo "-----------------------------------------------" echo "RUN $RUNTOTAL" if [ $[$RUNTOTAL % 2] -eq 0 ] then cd $BASE171 else cd $BASEPATCH fi pwd bin/tahoe --version bin/tahoe stop bin/tahoe start echo "waiting 10s for node to spin up and connect..." sleep 10 echo "curl --range 0-$FILESIZE $FILENAME > $SAVEFILE" time curl --range 0-$FILESIZE $FILENAME > $SAVEFILE if [ `pwd` = $BASEPATCH ] then echo "saving status.html..." curl http://localhost:3456/status/down-0 > status-$RUNTOTAL.html bzip2 status-$RUNTOTAL.html fi rm -f $CACHEDIR/* rm $SAVEFILE let RUNCOUNT=RUNCOUNT+1 sleep 1 done ```
warner commented 2010-08-23 07:33:54 +00:00
Author
Owner

I looked at the status.html files for some of the new-downloader runs. It looks like there's a reasonable correlation between download speed and server selection. The 240kBps-ish downloads tend to use sp26/nszi/4rk5, while the 130-140ish downloads tend to use fp3x or sroo instead of 4rk5.

Without more info from the 1.7.1 downloads (data which would be in the download-status, but for the old-downloader it isn't displayed until after the whole download is complete), we can't guess what servers were used for those runs. Zooko, how consistent do you think the speed-difference results would be if you used a 100MB file, instead of using the first 100MB of a multi-GB file? That might let us use Terrell's script and also collect download-status from the 1.7.1 runs.

It'd be awfully convenient if the speed difference that Zooko observed could be attributable to server selection, and if the combo patch made that selection work well enough to ship 1.8.0. A 1.8.1-era improvement could be to try out new servers over the course of the download, so that we'd land in the three-good-servers (sp26/nszi/4rk5) mode more often than the two-good-one-slow-servers (sp26/nszi/fp3x) mode.

I looked at the status.html files for some of the new-downloader runs. It looks like there's a reasonable correlation between download speed and server selection. The 240kBps-ish downloads tend to use sp26/nszi/4rk5, while the 130-140ish downloads tend to use fp3x or sroo instead of 4rk5. Without more info from the 1.7.1 downloads (data which would be in the download-status, but for the old-downloader it isn't displayed until after the whole download is complete), we can't guess what servers were used for those runs. Zooko, how consistent do you think the speed-difference results would be if you used a 100MB file, instead of using the first 100MB of a multi-GB file? That might let us use Terrell's script and also collect download-status from the 1.7.1 runs. It'd be awfully convenient if the speed difference that Zooko observed could be attributable to server selection, and if the combo patch made that selection work well enough to ship 1.8.0. A 1.8.1-era improvement could be to try out new servers over the course of the download, so that we'd land in the three-good-servers (sp26/nszi/4rk5) mode more often than the two-good-one-slow-servers (sp26/nszi/fp3x) mode.
warner commented 2010-08-23 07:50:25 +00:00
Author
Owner

Attachment 171-log.diff (3785 bytes) added

patch to add server-selection data to logs/twistd.log for 1.7.1

**Attachment** 171-log.diff (3785 bytes) added patch to add server-selection data to logs/twistd.log for 1.7.1
zooko commented 2010-08-24 07:20:56 +00:00
Author
Owner

Attachment run-zooko1001-curl-stdout.txt (677 bytes) added

**Attachment** run-zooko1001-curl-stdout.txt (677 bytes) added
zooko commented 2010-08-24 07:24:40 +00:00
Author
Owner

Attachment run-zooko1001-flog.pickle.bz2 (181113 bytes) added

**Attachment** run-zooko1001-flog.pickle.bz2 (181113 bytes) added
zooko commented 2010-08-24 08:47:37 +00:00
Author
Owner

Attachment run-zooko1002-curl-stdout.txt (704 bytes) added

**Attachment** run-zooko1002-curl-stdout.txt (704 bytes) added
zooko commented 2010-08-24 08:48:10 +00:00
Author
Owner

Attachment run-zooko1002-flog.pickle.bz2 (1023504 bytes) added

**Attachment** run-zooko1002-flog.pickle.bz2 (1023504 bytes) added
zooko commented 2010-08-24 08:48:54 +00:00
Author
Owner

Attachment run-zooko1002-status.html (1234385 bytes) added

**Attachment** run-zooko1002-status.html (1234385 bytes) added
zooko commented 2010-08-24 08:53:07 +00:00
Author
Owner

Attachment Screen shot 2010-08-23 at 01.07.41-0600.png (363648 bytes) added

**Attachment** Screen shot 2010-08-23 at 01.07.41-0600.png (363648 bytes) added
zooko commented 2010-08-24 09:19:58 +00:00
Author
Owner

I added run1001 and run1002 to the big table. These two runs are notable for having complete packet traces and a screenshot of their wireshark summaries, as well as flogs and (for the 1.8.0c2 one) status.html. It looks to me as if 1.8.0c2+1170-combo.diff was slower than 1.7.1 for those runs because it chose slower servers.

I added run1001 and run1002 to the big table. These two runs are notable for having complete packet traces and a screenshot of their wireshark summaries, as well as flogs and (for the 1.8.0c2 one) status.html. It looks to me as if 1.8.0c2+[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) was slower than 1.7.1 for those runs because it chose slower servers.
zooko commented 2010-08-24 09:23:42 +00:00
Author
Owner

Attachment runs-zooko2000-2020-curl-stdout.txt (5640 bytes) added

**Attachment** runs-zooko2000-2020-curl-stdout.txt (5640 bytes) added
zooko commented 2010-08-24 09:25:59 +00:00
Author
Owner

Attachment runs-zooko2000-2020-twistd.logs.tar.bz2 (124340 bytes) added

**Attachment** runs-zooko2000-2020-twistd.logs.tar.bz2 (124340 bytes) added
zooko commented 2010-08-24 09:28:34 +00:00
Author
Owner

Attachment status-2001.html.bz2 (107465 bytes) added

**Attachment** status-2001.html.bz2 (107465 bytes) added
zooko commented 2010-08-24 09:33:24 +00:00
Author
Owner

Attachment status-2003.html.bz2 (107421 bytes) added

**Attachment** status-2003.html.bz2 (107421 bytes) added
zooko commented 2010-08-24 09:35:43 +00:00
Author
Owner

Attachment status-2005.html.bz2 (108182 bytes) added

**Attachment** status-2005.html.bz2 (108182 bytes) added
zooko commented 2010-08-24 09:37:44 +00:00
Author
Owner

Attachment status-2007.html.bz2 (108194 bytes) added

**Attachment** status-2007.html.bz2 (108194 bytes) added
zooko commented 2010-08-24 09:40:42 +00:00
Author
Owner

Attachment status-2009.html.bz2 (107704 bytes) added

**Attachment** status-2009.html.bz2 (107704 bytes) added
zooko commented 2010-08-24 09:41:59 +00:00
Author
Owner

Attachment status-2011.html.bz2 (108080 bytes) added

**Attachment** status-2011.html.bz2 (108080 bytes) added
zooko commented 2010-08-24 09:42:39 +00:00
Author
Owner

Attachment status-2013.html.bz2 (107948 bytes) added

**Attachment** status-2013.html.bz2 (107948 bytes) added
zooko commented 2010-08-24 09:45:49 +00:00
Author
Owner

Attachment status-2015.html.bz2 (107741 bytes) added

**Attachment** status-2015.html.bz2 (107741 bytes) added
zooko commented 2010-08-24 09:46:30 +00:00
Author
Owner

Attachment status-2017.html.bz2 (107820 bytes) added

**Attachment** status-2017.html.bz2 (107820 bytes) added
zooko commented 2010-08-24 09:46:40 +00:00
Author
Owner

Attachment status-2019.html.bz2 (107209 bytes) added

**Attachment** status-2019.html.bz2 (107209 bytes) added
zooko commented 2010-08-24 10:00:09 +00:00
Author
Owner

Added runs zooko2000 through zooko2019. Thanks a lot to Terrell for the script in comment:121256 which I used to do these runs!

Comments: it looks like there really is a substantial slowdown for switching from v1.7.1 to v1.8.0+1170-combo.diff for this file on this grid. I started examining the status.html files in order to annotate which servers were used by 1.8.0c2+combo.diff, but I got tired and stopped doing it after run 2007. I think Brian's current hypothesis is that server selection is the most important factor, and that seems quite plausible to me. v1.7.1 used the same set of servers in every one of its runs, and its performance was more consistent than 1.8.0c2+combo.diff's was.

It has taken a lot of effort to generate this data and to attach it and format it, so I hope it helps! Thanks again to Terrell.

Now I'm starting a new experiment, downloading
(@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) in its 300 MB entirety. This file currently has the following share layout:

Share ID  Nickname          Node ID
2	  FreeStorm-Neptune fp3xjndgjt2npubdl2jqqb26clanyag7
3	  strato            tavrk54ewt2bl2faybb55wrs3ghissvx
7	  FreeStorm-Neptune fp3xjndgjt2npubdl2jqqb26clanyag7
8	  strato            tavrk54ewt2bl2faybb55wrs3ghissvx
Added runs zooko2000 through zooko2019. Thanks a lot to Terrell for the script in [comment:121256](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121256) which I used to do these runs! Comments: it looks like there really is a substantial slowdown for switching from v1.7.1 to v1.8.0+[1170-combo.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-ad2545bb1983) for this file on this grid. I started examining the status.html files in order to annotate which servers were used by 1.8.0c2+combo.diff, but I got tired and stopped doing it after run 2007. I think Brian's current hypothesis is that server selection is the most important factor, and that seems quite plausible to me. v1.7.1 used the same set of servers in every one of its runs, and its performance was more consistent than 1.8.0c2+combo.diff's was. It has taken a lot of effort to generate this data and to attach it and format it, so I hope it helps! Thanks again to Terrell. Now I'm starting a new experiment, downloading (@@http://localhost:3456/file/URI%3ACHK%3Avpk5d6pl5qelhnwfwtjj2v7tmq%3Adkt453pu5le7qmtix55hiibrzqqq3euchcjguio6vbetobxw5ola%3A3%3A10%3A334400401/@@named=/Negativeland_on_Radio1190.org.ogg@@) in its 300 MB entirety. This file currently has the following share layout: ``` Share ID Nickname Node ID 2 FreeStorm-Neptune fp3xjndgjt2npubdl2jqqb26clanyag7 3 strato tavrk54ewt2bl2faybb55wrs3ghissvx 7 FreeStorm-Neptune fp3xjndgjt2npubdl2jqqb26clanyag7 8 strato tavrk54ewt2bl2faybb55wrs3ghissvx ```
zooko commented 2010-08-24 16:00:37 +00:00
Author
Owner

Finished annotating the big table with what servers were used for each download.

Finished annotating the big table with what servers were used for each download.
zooko commented 2010-08-24 16:05:06 +00:00
Author
Owner

Attachment run-zooko3000-curl-stdout.txt (913 bytes) added

**Attachment** run-zooko3000-curl-stdout.txt (913 bytes) added
zooko commented 2010-08-24 16:05:18 +00:00
Author
Owner

Attachment run-zooko3001-curl-stdout.txt (917 bytes) added

**Attachment** run-zooko3001-curl-stdout.txt (917 bytes) added
zooko commented 2010-08-24 16:05:27 +00:00
Author
Owner

Attachment run-zooko3002-curl-stdout.txt (859 bytes) added

**Attachment** run-zooko3002-curl-stdout.txt (859 bytes) added
zooko commented 2010-08-24 16:05:56 +00:00
Author
Owner

Attachment status-3001.html.bz2 (362267 bytes) added

**Attachment** status-3001.html.bz2 (362267 bytes) added
zooko commented 2010-08-24 16:08:09 +00:00
Author
Owner

Attachment runs-zooko3000,3002.twistd.log (1000025 bytes) added

**Attachment** runs-zooko3000,3002.twistd.log (1000025 bytes) added
zooko commented 2010-08-24 16:17:02 +00:00
Author
Owner

Added runs zooko3000, 3001, and 3002 to the table. These are, as mentioned, downloads of a 333 MB negativland.ogg file of which there are only 4 surviving shares, 2 shares each on fp3x and tavr. Run 3001 with v1.8.0c2+combo.diff went about half as fast as run 3000 with v1.7.1 even though they chose the same servers (for the long haul -- v1.8.0c2 uses different servers for the first segment or two I think). Then run 3002 started, with v1.7.1, and it went less than half as fast as run 3001 had! I had to stop it before it completed so I could go to work. I suspect that my DSL service was misbehaving at that time, but I haven't tried to confirm that, e.g. by examining the attached logs to see if there is some other explanation for why run 3002 went so slowly.

Added runs zooko3000, 3001, and 3002 to the table. These are, as mentioned, downloads of a 333 MB negativland.ogg file of which there are only 4 surviving shares, 2 shares each on fp3x and tavr. Run 3001 with v1.8.0c2+combo.diff went about half as fast as run 3000 with v1.7.1 even though they chose the same servers (for the long haul -- v1.8.0c2 uses different servers for the first segment or two I think). Then run 3002 started, with v1.7.1, and it went less than half as fast as run 3001 had! I had to stop it before it completed so I could go to work. I suspect that my DSL service was misbehaving at that time, but I haven't tried to confirm that, e.g. by examining the attached logs to see if there is some other explanation for why run 3002 went so slowly.
warner commented 2010-08-24 17:45:36 +00:00
Author
Owner

Attachment 180c2-viz-dyhb.png (75560 bytes) added

timeline of 1.8.0c2 (no patches) download, local testgrid (one computer), shows share-selection misbehavior

**Attachment** 180c2-viz-dyhb.png (75560 bytes) added timeline of 1.8.0c2 (no patches) download, local testgrid (one computer), shows share-selection misbehavior
warner commented 2010-08-24 17:46:05 +00:00
Author
Owner

Attachment 180c2-viz-delays.png (66549 bytes) added

timeline of 1.8.0c2 (no patches) download, local testgrid (one computer), shows post-receive stall
**Attachment** 180c2-viz-delays.png (66549 bytes) added timeline of 1.8.0c2 (no patches) download, local testgrid (one computer), shows post-receive stall
warner commented 2010-08-24 18:09:39 +00:00
Author
Owner

I made a lot of progress with my javascript-based download-status
visualization tools last night, after switching to the
Protovis library (which rocks!). Here are
two diagrams of a 12MB download performed on my laptop (using a local
testgrid entirely contained on one computer: lots of bandwidth, but only one
CPU to share among them all, and only one disk). The downloader code is from
current trunk, which means 1.8.0c2 (it was not using any of the patches
from this ticket, so it exhibits all the misbehaviors of 1.8.0c2).

I'm still working on the graphics. Time proceeds from left to right. The live
display is pan+zoomable. Currently DYHB and block-reads are filled with a
color that indicates which server they used, and block-reads get an outline
color that indicates which share number was being accessed. Overlapping
block-reads are shown stacked up. Most block reads are tiny (32 or 64 bytes)
but of course each segment requires 'k' large reads (each of about 41kB,
segsize/k).

  • 180c2-viz-dyhb.png : this shows the startup phase. Note how all
    block reads are coming from a single server (w5gi, in purple), even though
    we heard from other servers by the time the second segment started. Also
    note that, for some reason, the requests made for the first segment were
    all serialized by shnum: we waited for all requests from the first share
    to return before sending any requests for the second share.

  • 180c2-viz-delays.png : this shows the midpoint of the download
    (specifically the segments that cross the middle of the Merkle tree,
    requiring the most hash nodes to retrieve). By this point, I'd added a
    thicker outline around block reads that fetched more than 1kB of data, so
    the actual data blocks can be distinguished from the hash tree nodes. The
    reads are properly pipelined. But note the large gap (about 7.5ms) between
    the receipt of the last block and the delivery of the segment. Also note
    how the segments that require fewer hash nodes are delivered much faster.

I haven't yet ported these tools to the combo-patch -fixed downloader, nor
have I applied them to a download from the testgrid (which would behave very
differently: longer latencies, but less contention for disk or CPU). I'm
partially inclined to disregard the idiosyncrasies displayed by these charts
until I do that, but they still represent interesting problems to understand
further.

The large delay on the lots-of-hash-nodes segments raises suspicions of bad
performance in IncompleteHashTree when you add nodes, or about the
behavior of DataSpans when you add/remove data in it. The
DataSpans.add time occurs immediately after the response comes back, so
is clearly minimal (it lives in the space between one response and the next,
along the steep downwards slope), but the DataSpans.pop occurs during
the mysterious gap. The Foolscap receive-processing time occurs inside the
request block rectangle. The Foolscap transmit-serialization time occurs
during the previous mysterious gap, so it must be fairly small (after the
previous segment was delivered, we sent a bazillion hash requests, and the
gap was small, whereas after the big segment was delivered, we didn't send
any hash requests, and the gap was big).

The next set of information that will be useful to add here will we a
generalized event list: in particular I want to see the start/finish times of
all hashtree-manipulation calls, zfec-decode calls, and AES decrypt calls.
That should take about 15 minutes to add, and should illuminate some of that
gap.

I made a lot of progress with my javascript-based download-status visualization tools last night, after switching to the [Protovis](http://vis.stanford.edu/protovis/) library (which rocks!). Here are two diagrams of a 12MB download performed on my laptop (using a local testgrid entirely contained on one computer: lots of bandwidth, but only one CPU to share among them all, and only one disk). The downloader code is from current trunk, which means 1.8.0c2 (it was **not** using any of the patches from this ticket, so it exhibits all the misbehaviors of 1.8.0c2). I'm still working on the graphics. Time proceeds from left to right. The live display is pan+zoomable. Currently DYHB and block-reads are filled with a color that indicates which server they used, and block-reads get an outline color that indicates which share number was being accessed. Overlapping block-reads are shown stacked up. Most block reads are tiny (32 or 64 bytes) but of course each segment requires 'k' large reads (each of about 41kB, segsize/k). * [180c2-viz-dyhb.png](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-af2954f7bd9d) : this shows the startup phase. Note how all block reads are coming from a single server (w5gi, in purple), even though we heard from other servers by the time the second segment started. Also note that, for some reason, the requests made for the first segment were all serialized by shnum: we waited for all requests from the first share to return before sending any requests for the second share. * [180c2-viz-delays.png](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-9ef836afe651) : this shows the midpoint of the download (specifically the segments that cross the middle of the Merkle tree, requiring the most hash nodes to retrieve). By this point, I'd added a thicker outline around block reads that fetched more than 1kB of data, so the actual data blocks can be distinguished from the hash tree nodes. The reads are properly pipelined. But note the large gap (about 7.5ms) between the receipt of the last block and the delivery of the segment. Also note how the segments that require fewer hash nodes are delivered much faster. I haven't yet ported these tools to the combo-patch -fixed downloader, nor have I applied them to a download from the testgrid (which would behave very differently: longer latencies, but less contention for disk or CPU). I'm partially inclined to disregard the idiosyncrasies displayed by these charts until I do that, but they still represent interesting problems to understand further. The large delay on the lots-of-hash-nodes segments raises suspicions of bad performance in `IncompleteHashTree` when you add nodes, or about the behavior of `DataSpans` when you add/remove data in it. The `DataSpans.add` time occurs immediately after the response comes back, so is clearly minimal (it lives in the space between one response and the next, along the steep downwards slope), but the `DataSpans.pop` occurs during the mysterious gap. The Foolscap receive-processing time occurs inside the request block rectangle. The Foolscap transmit-serialization time occurs during the previous mysterious gap, so it must be fairly small (after the previous segment was delivered, we sent a bazillion hash requests, and the gap was small, whereas after the big segment was delivered, we didn't send any hash requests, and the gap was big). The next set of information that will be useful to add here will we a generalized event list: in particular I want to see the start/finish times of all hashtree-manipulation calls, zfec-decode calls, and AES decrypt calls. That should take about 15 minutes to add, and should illuminate some of that gap.
warner commented 2010-08-24 19:10:41 +00:00
Author
Owner

in case anyone wants to play with it, viz-with-combo.diff.bz2 contains both the "combo patch" and my current Protovis-based visualization tool. From the download-status page, follow the "Timeline" link. Still kinda rough, but hopefully useful.

(wow, for reference, don't upload a 900kB diff file and then let Trac try to colorize it. Compress the diff first so that Trac doesn't get clever and time out.)

in case anyone wants to play with it, [viz-with-combo.diff.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-b3bc3dace171) contains both the "combo patch" and my current Protovis-based visualization tool. From the download-status page, follow the "Timeline" link. Still kinda rough, but hopefully useful. (wow, for reference, don't upload a 900kB diff file and then let Trac try to colorize it. Compress the diff first so that Trac doesn't get clever and time out.)
warner commented 2010-08-24 19:31:12 +00:00
Author
Owner

Attachment viz-with-combo.diff.bz2 (182990 bytes) added

patch with visualization tools and share-selection fix and Spans performance mitigation fix

**Attachment** viz-with-combo.diff.bz2 (182990 bytes) added patch with visualization tools and share-selection fix and Spans performance mitigation fix
warner commented 2010-08-25 07:17:39 +00:00
Author
Owner

I did some more testing with those visualization tools (adding some misc
events like entry/exit of internal functions). I've found one place where the
downloader makes excessive eventual-send calls which appears to cost 250us
per remote_read call. I've also measured hash-tree operations as
consuming a surprising amount of overhead.

  • each Share._got_response call queues an eventual-send to
    Share.loop, which checks the satisfy/desire processes. Since a single

TCP buffer is parsed into lots of Foolscap response messages, these are all
queued during the same turn, which means the first loop() call will
see all of the data, and the remaining ones will see nothing. Each of these
empty loop() calls takes about 250us. There is one for each
remote_read call, which means k*(3/2)numsegs for the block hash
trees and an additional k
(3/2)*numsegs for the ciphertext hash tree
(because we ask each share for the CTHT nodes, rather than asking only one
and hoping they return it so we can avoid an extra roundtrip). For k=3
that's 2.25ms per segment. The cost is variable: on some segments (in
particular the first and middle ones) the overhead is maximal, whereas on
every odd segnum there is no overhead. On a 12MB download, this is about
225ms, and on my local one-CPU testnet, the download took 2.9s, so this
represents about 8%.

  • It takes my laptop 1.34ms to process a set of blocks into a segment (seg2

of a 96-segment file). 1.19ms of that was checking the ciphertext hash tree
(probably two extra hash nodes), and a mere 73us was spent in FEC. AES
decryption of the segment took 1.1ms, and accounted for 65% of the 1.7ms
inter-segment gap (the delay between delivering seg2 and requesting seg3).

I'd like to change the _got_response code to set a flag and queue a
single call to loop instead of queueing multiple calls. That would save
a little time (and probably remove the severe jitter that I've seen on local
downloads), but I don't think it can explain the 50% slowdown that Zooko's
observed.

These visualization tools are a lot of fun. One direction to explore is to
record some packet timings (with tcpdump) and add it as an extra row: that
would show us how much latency/load Foolscap is spending before it delivers a
message response to the application.

I'll attach two samples of the viz output as viz-3.png and
viz-4.png . The two captures are of different parts of the
download, but in both cases the horizontal ticks are 500us apart. The
candlestick-diagram-like shapes are the satisfy/desire sections of
Share.loop, and the lines (actually very narrow boxes) between them are
the "disappointment" calculation at the end of Share.loop, so the gap
before it must be the send_requests routine.

I did some more testing with those visualization tools (adding some misc events like entry/exit of internal functions). I've found one place where the downloader makes excessive eventual-send calls which appears to cost 250us per `remote_read` call. I've also measured hash-tree operations as consuming a surprising amount of overhead. * each `Share._got_response` call queues an eventual-send to `Share.loop`, which checks the satisfy/desire processes. Since a single > TCP buffer is parsed into lots of Foolscap response messages, these are all > queued during the same turn, which means the first `loop()` call will > see all of the data, and the remaining ones will see nothing. Each of these > empty `loop()` calls takes about 250us. There is one for each `remote_read` call, which means k*(3/2)*numsegs for the block hash > trees and an additional k*(3/2)*numsegs for the ciphertext hash tree (because we ask each share for the CTHT nodes, rather than asking only one > and hoping they return it so we can avoid an extra roundtrip). For k=3 > that's 2.25ms per segment. The cost is variable: on some segments (in > particular the first and middle ones) the overhead is maximal, whereas on > every odd segnum there is no overhead. On a 12MB download, this is about 225ms, and on my local one-CPU testnet, the download took 2.9s, so this > represents about 8%. * It takes my laptop 1.34ms to process a set of blocks into a segment (seg2 > of a 96-segment file). 1.19ms of that was checking the ciphertext hash tree (probably two extra hash nodes), and a mere 73us was spent in FEC. AES > decryption of the segment took 1.1ms, and accounted for 65% of the 1.7ms > inter-segment gap (the delay between delivering seg2 and requesting seg3). I'd like to change the `_got_response` code to set a flag and queue a single call to `loop` instead of queueing multiple calls. That would save a little time (and probably remove the severe jitter that I've seen on local downloads), but I don't think it can explain the 50% slowdown that Zooko's observed. These visualization tools are a lot of fun. One direction to explore is to record some packet timings (with tcpdump) and add it as an extra row: that would show us how much latency/load Foolscap is spending before it delivers a message response to the application. I'll attach two samples of the viz output as [viz-3.png](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-abd15acaa03c) and [viz-4.png](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-0bf0be372841) . The two captures are of different parts of the download, but in both cases the horizontal ticks are 500us apart. The candlestick-diagram-like shapes are the satisfy/desire sections of `Share.loop`, and the lines (actually very narrow boxes) between them are the "disappointment" calculation at the end of `Share.loop`, so the gap before it must be the `send_requests` routine.
warner commented 2010-08-25 07:18:48 +00:00
Author
Owner

Attachment viz-3.png (21918 bytes) added

timeline sample showing satisfy/desire calls and process_block/FEC/hashtree operations

**Attachment** viz-3.png (21918 bytes) added timeline sample showing satisfy/desire calls and process_block/FEC/hashtree operations
warner commented 2010-08-25 07:19:19 +00:00
Author
Owner

Attachment viz-4.png (14966 bytes) added

another timeline, showing AES in the inter-segment gap

**Attachment** viz-4.png (14966 bytes) added another timeline, showing AES in the inter-segment gap
warner commented 2010-08-25 15:02:26 +00:00
Author
Owner

created #1186 to track the redundant Share.loop calls

created #1186 to track the redundant `Share.loop` calls
zooko commented 2010-08-25 16:31:16 +00:00
Author
Owner

Attachment status-4001.html.bz2 (363128 bytes) added

**Attachment** status-4001.html.bz2 (363128 bytes) added
zooko commented 2010-08-25 16:31:33 +00:00
Author
Owner

Attachment status-4003.html.bz2 (362878 bytes) added

**Attachment** status-4003.html.bz2 (362878 bytes) added
zooko commented 2010-08-25 16:31:45 +00:00
Author
Owner

Attachment status-4005.html.bz2 (363470 bytes) added

**Attachment** status-4005.html.bz2 (363470 bytes) added
zooko commented 2010-08-25 16:31:54 +00:00
Author
Owner

Attachment runs-zooko4000-4007-curl-stdout.txt (2296 bytes) added

**Attachment** runs-zooko4000-4007-curl-stdout.txt (2296 bytes) added
zooko commented 2010-08-25 16:32:05 +00:00
Author
Owner

Attachment runs-zooko4000-4007-twistd.log.tar.bz2 (289417 bytes) added

**Attachment** runs-zooko4000-4007-twistd.log.tar.bz2 (289417 bytes) added
zooko commented 2010-08-25 16:32:15 +00:00
Author
Owner

Attachment runs-zooko4000-4007-serverselection-twistd.log (1825 bytes) added

**Attachment** runs-zooko4000-4007-serverselection-twistd.log (1825 bytes) added
zooko commented 2010-08-25 16:37:25 +00:00
Author
Owner

Added a new batch of runs -- runs zooko 4000 through 4006. There is a very clear pattern here! There only two server-selections represented: 1fp3x,2tavr and 2fp3z,1tavr. 1.8.0c2+combo.diff always chose the latter. v1.7.1 always chose the former except for one time when it chose the latter. Whenever you choose the latter you go at ~90 Kbps, whenever you choose the former you go at ~190 Kbps.

Added a new batch of runs -- runs zooko 4000 through 4006. There is a very clear pattern here! There only two server-selections represented: 1*fp3x,2*tavr and 2*fp3z,1*tavr. 1.8.0c2+combo.diff always chose the latter. v1.7.1 always chose the former except for one time when it chose the latter. Whenever you choose the latter you go at ~90 Kbps, whenever you choose the former you go at ~190 Kbps.
terrell commented 2010-08-25 17:55:36 +00:00
Author
Owner

fixed up the alternating line colorization - adding 3100-3109 in a minute...

fixed up the alternating line colorization - adding 3100-3109 in a minute...
terrell commented 2010-08-25 18:03:39 +00:00
Author
Owner

Attachment status-3100.html.bz2 (10847 bytes) added

**Attachment** status-3100.html.bz2 (10847 bytes) added
terrell commented 2010-08-25 18:03:52 +00:00
Author
Owner

Attachment status-3104.html.bz2 (11066 bytes) added

**Attachment** status-3104.html.bz2 (11066 bytes) added
terrell commented 2010-08-25 18:04:06 +00:00
Author
Owner

Attachment status-3105.html.bz2 (375968 bytes) added

**Attachment** status-3105.html.bz2 (375968 bytes) added
terrell commented 2010-08-25 18:04:25 +00:00
Author
Owner

Attachment status-3106.html.bz2 (11065 bytes) added

**Attachment** status-3106.html.bz2 (11065 bytes) added
terrell commented 2010-08-25 18:04:38 +00:00
Author
Owner

Attachment status-3107.html.bz2 (376983 bytes) added

**Attachment** status-3107.html.bz2 (376983 bytes) added
terrell commented 2010-08-25 18:04:49 +00:00
Author
Owner

Attachment status-3108.html.bz2 (10982 bytes) added

**Attachment** status-3108.html.bz2 (10982 bytes) added
warner commented 2010-08-25 18:29:39 +00:00
Author
Owner

run 116 used 3*nszi

run 116 used 3*nszi
terrell commented 2010-08-25 19:46:49 +00:00
Author
Owner

Added rows for 3100-3109... will attach the curl output when I get back to that terminal window. All runs were ~90Kbps, and they all selected the same shares as Zooko's runs 4000-4006.

These were run with Brian's patch for 1.8.0c2+combo+viz vs. 1.7.1.

Added rows for 3100-3109... will attach the curl output when I get back to that terminal window. All runs were ~90Kbps, and they all selected the same shares as Zooko's runs 4000-4006. These were run with Brian's patch for 1.8.0c2+combo+viz vs. 1.7.1.
terrell commented 2010-08-25 19:47:41 +00:00
Author
Owner

Attachment status-3109.html.bz2 (376323 bytes) added

**Attachment** status-3109.html.bz2 (376323 bytes) added
terrell commented 2010-08-25 21:01:01 +00:00
Author
Owner

Attachment runs-terrell3100-3109-curl-stdout.txt (13623 bytes) added

**Attachment** runs-terrell3100-3109-curl-stdout.txt (13623 bytes) added
zooko commented 2010-08-26 04:58:57 +00:00
Author
Owner

I tested a well-distributed 25 MB file (http://pubgrid.tahoe-lafs.org/uri/URI:CHK:knvcmfkmzejsg2pfueygjpkygq:3qjcqnjzsccmwk5f4rtsbusln66mgel6esiclahz7hbcsqgqf3ga:3:10:24879985?t=info) and 1.8.0c2+1170-combo.diff was much better than v1.7.1 every time. I don't have the energy to edit all of this into the table, upload all the data files, hyperlink to them, etc., so here is a big ugly dump of the information. Sorry. Goodnight!

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71475      0  0:05:48  0:05:48 --:--:--  105k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   274k      0  0:01:28  0:01:28 --:--:--  324k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72096      0  0:05:45  0:05:45 --:--:-- 85335
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   247k      0  0:01:38  0:01:38 --:--:--  160k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  70986      0  0:05:50  0:05:50 --:--:-- 75037
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   282k      0  0:01:26  0:01:26 --:--:--  311k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71613      0  0:05:47  0:05:47 --:--:--  106k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   283k      0  0:01:25  0:01:25 --:--:--  325k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72017      0  0:05:45  0:05:45 --:--:-- 96431
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   275k      0  0:01:28  0:01:28 --:--:--  320k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72835      0  0:05:41  0:05:41 --:--:-- 90257
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   279k      0  0:01:26  0:01:26 --:--:--  291k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72809      0  0:05:41  0:05:41 --:--:--  105k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   288k      0  0:01:24  0:01:24 --:--:--  309k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71796      0  0:05:46  0:05:46 --:--:-- 94458
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   156k      0  0:02:34  0:02:34 --:--:--  290k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71478      0  0:05:48  0:05:48 --:--:-- 90567
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   282k      0  0:01:26  0:01:26 --:--:--  322k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  70879      0  0:05:51  0:05:51 --:--:-- 95211
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   278k      0  0:01:27  0:01:27 --:--:--  326k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72039      0  0:05:45  0:05:45 --:--:--  101k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   278k      0  0:01:27  0:01:27 --:--:--  309k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71756      0  0:05:46  0:05:46 --:--:--  102k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   269k      0  0:01:30  0:01:30 --:--:--  310k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  64652      0  0:06:24  0:06:24 --:--:-- 87644
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   279k      0  0:01:26  0:01:26 --:--:--  310k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71130      0  0:05:49  0:05:49 --:--:--  103k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   257k      0  0:01:34  0:01:34 --:--:--  313k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71088      0  0:05:49  0:05:49 --:--:-- 62766
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   280k      0  0:01:26  0:01:26 --:--:--  332k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71712      0  0:05:46  0:05:46 --:--:-- 81047
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   277k      0  0:01:27  0:01:27 --:--:--  330k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72124      0  0:05:44  0:05:44 --:--:-- 90323
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   286k      0  0:01:24  0:01:24 --:--:--  312k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  71579      0  0:05:47  0:05:47 --:--:--  106k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   287k      0  0:01:24  0:01:24 --:--:--  325k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  72847      0  0:05:41  0:05:41 --:--:--  104k
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   266k      0  0:01:31  0:01:31 --:--:--  277k
--
/Users/zooko/playground/tahoe-lafs/1.7.1
100 23.7M  100 23.7M    0     0  48285      0  0:08:35  0:08:35 --:--:-- 85752
/Users/zooko/playground/tahoe-lafs/trunk
100 23.7M  100 23.7M    0     0   266k      0  0:01:31  0:01:31 --:--:--  283k
I tested a well-distributed 25 MB file (<http://pubgrid.tahoe-lafs.org/uri/URI:CHK:knvcmfkmzejsg2pfueygjpkygq:3qjcqnjzsccmwk5f4rtsbusln66mgel6esiclahz7hbcsqgqf3ga:3:10:24879985?t=info>) and 1.8.0c2+1170-combo.diff was much better than v1.7.1 every time. I don't have the energy to edit all of this into the table, upload all the data files, hyperlink to them, etc., so here is a big ugly dump of the information. Sorry. Goodnight! ``` % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71475 0 0:05:48 0:05:48 --:--:-- 105k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 274k 0 0:01:28 0:01:28 --:--:-- 324k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72096 0 0:05:45 0:05:45 --:--:-- 85335 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 247k 0 0:01:38 0:01:38 --:--:-- 160k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 70986 0 0:05:50 0:05:50 --:--:-- 75037 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 282k 0 0:01:26 0:01:26 --:--:-- 311k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71613 0 0:05:47 0:05:47 --:--:-- 106k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 283k 0 0:01:25 0:01:25 --:--:-- 325k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72017 0 0:05:45 0:05:45 --:--:-- 96431 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 275k 0 0:01:28 0:01:28 --:--:-- 320k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72835 0 0:05:41 0:05:41 --:--:-- 90257 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 279k 0 0:01:26 0:01:26 --:--:-- 291k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72809 0 0:05:41 0:05:41 --:--:-- 105k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 288k 0 0:01:24 0:01:24 --:--:-- 309k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71796 0 0:05:46 0:05:46 --:--:-- 94458 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 156k 0 0:02:34 0:02:34 --:--:-- 290k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71478 0 0:05:48 0:05:48 --:--:-- 90567 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 282k 0 0:01:26 0:01:26 --:--:-- 322k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 70879 0 0:05:51 0:05:51 --:--:-- 95211 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 278k 0 0:01:27 0:01:27 --:--:-- 326k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72039 0 0:05:45 0:05:45 --:--:-- 101k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 278k 0 0:01:27 0:01:27 --:--:-- 309k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71756 0 0:05:46 0:05:46 --:--:-- 102k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 269k 0 0:01:30 0:01:30 --:--:-- 310k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 64652 0 0:06:24 0:06:24 --:--:-- 87644 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 279k 0 0:01:26 0:01:26 --:--:-- 310k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71130 0 0:05:49 0:05:49 --:--:-- 103k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 257k 0 0:01:34 0:01:34 --:--:-- 313k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71088 0 0:05:49 0:05:49 --:--:-- 62766 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 280k 0 0:01:26 0:01:26 --:--:-- 332k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71712 0 0:05:46 0:05:46 --:--:-- 81047 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 277k 0 0:01:27 0:01:27 --:--:-- 330k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72124 0 0:05:44 0:05:44 --:--:-- 90323 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 286k 0 0:01:24 0:01:24 --:--:-- 312k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 71579 0 0:05:47 0:05:47 --:--:-- 106k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 287k 0 0:01:24 0:01:24 --:--:-- 325k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 72847 0 0:05:41 0:05:41 --:--:-- 104k /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 266k 0 0:01:31 0:01:31 --:--:-- 277k -- /Users/zooko/playground/tahoe-lafs/1.7.1 100 23.7M 100 23.7M 0 0 48285 0 0:08:35 0:08:35 --:--:-- 85752 /Users/zooko/playground/tahoe-lafs/trunk 100 23.7M 100 23.7M 0 0 266k 0 0:01:31 0:01:31 --:--:-- 283k ```
davidsarah commented 2010-08-27 00:31:42 +00:00
Author
Owner

#1187 describes an approach that could mitigate the effect of choosing some slow servers. (I think that is complementary to trying to make better server choices.)

#1187 describes an approach that could mitigate the effect of choosing some slow servers. (I think that is complementary to trying to make better server choices.)
zooko commented 2010-08-29 22:56:49 +00:00
Author
Owner

Brian and David-Sarah and I have discussed this off and on over the last few days, mostly on IRC, and we agree that the way forward to 1.8.0 final is to review and commit to trunk attachment:viz-with-combo.diff.bz2, then make a 1.8.0c3 release and invite everyone to test out the new, even better downloader and the new visualizations.

Brian and David-Sarah and I have discussed this off and on over the last few days, mostly on IRC, and we agree that the way forward to 1.8.0 final is to review and commit to trunk attachment:viz-with-combo.diff.bz2, then make a 1.8.0c3 release and invite everyone to test out the new, even better downloader and the new visualizations.
warner commented 2010-08-31 19:55:52 +00:00
Author
Owner

Attachment 1170-p1.diff (2172 bytes) added

for review: drop received data after each block finishes, to avoid spans.py complexity bug

**Attachment** 1170-p1.diff (2172 bytes) added for review: drop received data after each block finishes, to avoid spans.py complexity bug
warner commented 2010-08-31 19:56:34 +00:00
Author
Owner

Attachment 1170-p2.diff (67595 bytes) added

for review: use diversity-seeking share-selection algorithm, improve logging

**Attachment** 1170-p2.diff (67595 bytes) added for review: use diversity-seeking share-selection algorithm, improve logging
warner commented 2010-08-31 19:57:07 +00:00
Author
Owner

Attachment 1170-p3.diff (303958 bytes) added

for review: add Protovis-based download-status timeline visualization page

**Attachment** 1170-p3.diff (303958 bytes) added for review: add Protovis-based download-status timeline visualization page
warner commented 2010-08-31 19:59:23 +00:00
Author
Owner

ok, those three patches are ready for review, and are meant to be applied to current trunk in that order.

If you want me to land these, please let me know by say thursday, since I'm travelling this weekend and will have only limited network access next week.

ok, those three patches are ready for review, and are meant to be applied to current trunk in that order. If you want me to land these, please let me know by say thursday, since I'm travelling this weekend and will have only limited network access next week.
zooko commented 2010-08-31 23:46:23 +00:00
Author
Owner

I would rather apply them to trunk myself after reviewing them.

I would rather apply them to trunk myself after reviewing them.
zooko commented 2010-08-31 23:54:19 +00:00
Author
Owner

By the way, if you are already on your travels by the time I review these patches then if there is any very small, obvious fix that is needed I might do it myself rather than wait for you to get back on-line. Hopefully none will be needed and I can apply these as-is for 1.8.0c3. One thing that I'm suspecting I'm going to want changed is the visualization -- last time I looked it lacked labels indicating the meanings of the axes, the units, and the meanings of the objects, and I really hate graphs without complete labels. I think maybe it is something that my high school chemistry teacher crammed into my head? Never never never report data without units and labels.

By the way, if you are already on your travels by the time I review these patches then if there is any very small, obvious fix that is needed I might do it myself rather than wait for you to get back on-line. Hopefully none will be needed and I can apply these as-is for 1.8.0c3. One thing that I'm suspecting I'm going to want changed is the visualization -- last time I looked it lacked labels indicating the meanings of the axes, the units, and the meanings of the objects, and I really hate graphs without complete labels. I think maybe it is something that my high school chemistry teacher crammed into my head? Never never never report data without units and labels.
zooko commented 2010-09-01 00:14:13 +00:00
Author
Owner

Doesn't 1170-p3.diff mean we need to update [docs/frontends/download-status.txt]source:docs/frontends/download-status.txt?

Doesn't [1170-p3.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dc2395ab874) mean we need to update [docs/frontends/download-status.txt]source:docs/frontends/download-status.txt?
zooko commented 2010-09-01 00:27:07 +00:00
Author
Owner

Ugh, 1170-p3.diff adds in jquery.js, 120 KB and more than 4000 lines of code, and protovis-!r3.2.js, 116 KB and a minified (therefore obscured) version of more than 15,000 lines of code. This means we are storing 3rd party source code in our revision control history, and in the case of protovis-!r3.2.js it isn't even the real source code, but a minified (compressed) version of it. If only there were a principled, manageable way to declare our dependencies on those JavaScript codebases! :-(

But I'm not aware of one that we can use. At least let us not store computer-produced stuff -- the minified version of protovis -- but instead store the original "preferred form for making modifications" and minify it as a part of the build process or as part of the start up of the web gateway.

Does anyone know of a better way to manage our dependencies on JavaScript code?

Ugh, [1170-p3.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-1c90-b882-d4ac-8dc2395ab874) adds in jquery.js, 120 KB and more than 4000 lines of code, and protovis-!r3.2.js, 116 KB and a minified (therefore obscured) version of more than 15,000 lines of code. This means we are storing 3rd party source code in our revision control history, and in the case of protovis-!r3.2.js it isn't even the real source code, but a minified (compressed) version of it. If only there were a principled, manageable way to declare our dependencies on those JavaScript codebases! :-( But I'm not aware of one that we can use. At least let us not store computer-produced stuff -- the minified version of protovis -- but instead store the original "preferred form for making modifications" and minify it as a part of the build process or as part of the start up of the web gateway. Does anyone know of a better way to manage our dependencies on JavaScript code?
warner commented 2010-09-01 00:34:27 +00:00
Author
Owner

yeah, that's tricky. I'm not sure what to suggest. For reference, the non-minified protovis code (protovis-d3.2.js) is 510KB (versus the minified protovis-r3.2.js is 117KB). The minified jquery.js would be 57KB, versus the non-minified at 121KB.

yeah, that's tricky. I'm not sure what to suggest. For reference, the non-minified protovis code (`protovis-d3.2.js`) is 510KB (versus the minified `protovis-r3.2.js` is 117KB). The minified `jquery.js` would be 57KB, versus the non-minified at 121KB.
warner commented 2010-09-01 04:16:25 +00:00
Author
Owner

Attachment 1170-p123.darcspatch (376523 bytes) added

patchbundle with all three patches, ready to land upon passing review

**Attachment** 1170-p123.darcspatch (376523 bytes) added patchbundle with all three patches, ready to land upon passing review
zooko commented 2010-09-02 05:11:09 +00:00
Author
Owner

I reviewed the first one and applied it as changeset:c89a464510394089. Thanks! Now trunk no longer has a superlinear CPU usage when uploading large files!

I reviewed the first one and applied it as changeset:c89a464510394089. Thanks! Now trunk no longer has a superlinear CPU usage when uploading large files!
zooko commented 2010-09-02 05:47:40 +00:00
Author
Owner

I reviewed the second one and applied it as changeset:00e9e4e6760021a1. Whoo-hoo! Now trunk has all of the Brian's New Downloader patches which Terrell and I benchmarked as being way better than the old 1.7.1 downloader!

Everyone should test and benchmark the heck out of trunk! This might become 1.8.0c3, or else we might figure out how to package up the third of Brian's three patches, the one that gives the beautiful JavaScript download visualization. However, I'm not sure if I want to put that in to 1.8.0c3, mostly because of JavaScript packaging issues. Either way, you should test the heck out of the current trunk. :-)

I reviewed the second one and applied it as changeset:00e9e4e6760021a1. Whoo-hoo! Now trunk has all of the Brian's New Downloader patches which Terrell and I benchmarked as being way better than the old 1.7.1 downloader! Everyone should test and benchmark the heck out of trunk! This might become 1.8.0c3, or else we might figure out how to package up the third of Brian's three patches, the one that gives the beautiful JavaScript download visualization. However, I'm not sure if I want to put that in to 1.8.0c3, mostly because of JavaScript packaging issues. Either way, you should test the heck out of the current trunk. :-)
zooko commented 2010-09-07 03:21:13 +00:00
Author
Owner

See Kyle's latest benchmarking reports:

short version: unless Kyle's measurements are wrong, there is still a huge performance regression which blocks 1.8.0 release. Waaah! :-(

See Kyle's latest benchmarking reports: * <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005148.html> * <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005149.html> * <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005155.html> short version: unless Kyle's measurements are wrong, there is still a huge performance regression which blocks 1.8.0 release. Waaah! :-(
zooko commented 2010-09-07 03:35:20 +00:00
Author
Owner

I ran a large download under cProfile and the results clearly show that there is no CPU hotspot. Filtering out all the rows that had less than 10 seconds total CPU time during the 6 hours that I left it running (about 2.5 of which it was doing a download), and I get:

         276522586 function calls (275942089 primitive calls) in 26729.374 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 16637385   12.999    0.000   12.999    0.000 <string>:1(fileno)
  2896395   13.714    0.000   26.006    0.000 abstract.py:177(write)
    78933   10.012    0.000   56.906    0.001 banana.py:701(handleData)
5897783/5897092   12.339    0.000   84.484    0.000 flatstan.py:30(TagSerializer)
   871861   11.661    0.000  311.571    0.000 log.py:71(callWithLogger)
   959692   22.170    0.000 26474.218    0.028 selectreactor.py:93(doSelect)
   849079   12.753    0.000   19.519    0.000 stan.py:40(__getitem__)
   824202   11.533    0.000  141.241    0.000 tcp.py:114(doRead)
   824378   13.608    0.000  229.558    0.000 tcp.py:443(doRead)
   148/44   10.610    0.072   99.162    2.254 ten.py:73(iterflatten)
   824202   57.850    0.000   57.851    0.000 {built-in method recv}
 17685095   11.745    0.000   11.745    0.000 {isinstance}
     8554   10.254    0.001   10.254    0.001 {method 'process' of '_aes.AES' objects}
   172850   22.907    0.000   22.907    0.000 {method 'update' of '_sha256.SHA256' objects}
  2399813   10.269    0.000   10.269    0.000 {min}
   959692 26125.485    0.027 26137.268    0.027 {select.select}
I ran a large download under cProfile and the results clearly show that there is no CPU hotspot. Filtering out all the rows that had less than 10 seconds total CPU time during the 6 hours that I left it running (about 2.5 of which it was doing a download), and I get: ``` 276522586 function calls (275942089 primitive calls) in 26729.374 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 16637385 12.999 0.000 12.999 0.000 <string>:1(fileno) 2896395 13.714 0.000 26.006 0.000 abstract.py:177(write) 78933 10.012 0.000 56.906 0.001 banana.py:701(handleData) 5897783/5897092 12.339 0.000 84.484 0.000 flatstan.py:30(TagSerializer) 871861 11.661 0.000 311.571 0.000 log.py:71(callWithLogger) 959692 22.170 0.000 26474.218 0.028 selectreactor.py:93(doSelect) 849079 12.753 0.000 19.519 0.000 stan.py:40(__getitem__) 824202 11.533 0.000 141.241 0.000 tcp.py:114(doRead) 824378 13.608 0.000 229.558 0.000 tcp.py:443(doRead) 148/44 10.610 0.072 99.162 2.254 ten.py:73(iterflatten) 824202 57.850 0.000 57.851 0.000 {built-in method recv} 17685095 11.745 0.000 11.745 0.000 {isinstance} 8554 10.254 0.001 10.254 0.001 {method 'process' of '_aes.AES' objects} 172850 22.907 0.000 22.907 0.000 {method 'update' of '_sha256.SHA256' objects} 2399813 10.269 0.000 10.269 0.000 {min} 959692 26125.485 0.027 26137.268 0.027 {select.select} ```
zooko commented 2010-09-07 04:55:12 +00:00
Author
Owner

I am now going to attach the status.html files from the download that was run under the profiler and whose profiling results were posted in comment:121282. Why this single download (with Firefox 4 beta) of a single file spawned 8 download status pages I don't know.

I am now going to attach the status.html files from the download that was run under the profiler and whose profiling results were posted in [comment:121282](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121282). Why this single download (with Firefox 4 beta) of a single file spawned 8 download status pages I don't know.
zooko commented 2010-09-07 05:05:06 +00:00
Author
Owner

Attachment down.html.tar.bz2 (1306532 bytes) added

status.html of downloads described in comment:-1

**Attachment** down.html.tar.bz2 (1306532 bytes) added status.html of downloads described in [comment:-1](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment--1)
zooko commented 2010-09-10 19:24:23 +00:00
Author
Owner

Recap of this ticket: there were two major performance regressions in 1.8.0c1 or c2 vs. 1.7.1. One was the superlinear computation in spans and the other was a server-selection algorithm that would in some cases choose to get multiple shares from one server unnecessarily. Both of those are fixed in 1.8.0c4 (upcoming) and benchmarks by various people indicate that 1.8.0c4 immutable download is only a little slower (~10% slower) than 1.7.1 in the worst case and much faster (e.g. ~400% faster) in other common cases.

So, this ticket is done well enough for v1.8.0 final. I strongly suspect that the remaining ~10% slowdown has to do with more computation after receiving a block and before sending the next get_block request, which would probably be best addressed by implementing #1110 or #1187.

Thanks very much to Brian, Terrell, David-Sarah, Kyle, and anyone else who helped slay this damned tenacious issue. :-) Hooray! It is dead! Good-bye ticket #1170!

P.S. The gentle reader, before looking away from this ticket forever and ever, might want to look at the following comments and perhaps transcribe some of their important bits out to a fresh new ticket: comment:121226, comment:121262, comment:121264. And by "the gentle reader", I guess I mean Brian.

Recap of this ticket: there were two major performance regressions in 1.8.0c1 or c2 vs. 1.7.1. One was the superlinear computation in spans and the other was a server-selection algorithm that would in some cases choose to get multiple shares from one server unnecessarily. Both of those are fixed in 1.8.0c4 (upcoming) and benchmarks by various people indicate that 1.8.0c4 immutable download is only a little slower (~10% slower) than 1.7.1 in the worst case and much faster (e.g. ~400% faster) in other common cases. So, this ticket is done well enough for v1.8.0 final. I strongly suspect that the remaining ~10% slowdown has to do with more computation after receiving a block and before sending the next get_block request, which would probably be best addressed by implementing #1110 or #1187. Thanks very much to Brian, Terrell, David-Sarah, Kyle, and anyone else who helped slay this damned tenacious issue. :-) Hooray! It is dead! Good-bye ticket #1170! P.S. The gentle reader, before looking away from this ticket forever and ever, might want to look at the following comments and perhaps transcribe some of their important bits out to a fresh new ticket: [comment:121226](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121226), [comment:121262](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121262), [comment:121264](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121264). And by "the gentle reader", I *guess* I mean Brian.
tahoe-lafs added the
fixed
label 2010-09-10 19:24:23 +00:00
zooko closed this issue 2010-09-10 19:24:23 +00:00
zooko commented 2010-09-18 18:08:09 +00:00
Author
Owner

The part of this ticket that was about integrating, deploying, and supporting the new visualizer has been moved to #1200 (package up Brian's New Visualization of immutable download).

The part of this ticket that was about integrating, deploying, and supporting the new visualizer has been moved to #1200 (package up Brian's New Visualization of immutable download).
davidsarah commented 2010-11-19 02:17:35 +00:00
Author
Owner

See also #1182 (clean up and improve asymptotic complexity of Spans and DataSpans).

See also #1182 (clean up and improve asymptotic complexity of Spans and [DataSpans](wiki/DataSpans)).
warner commented 2010-11-23 23:38:50 +00:00
Author
Owner

#1268 has been opened to cover the "coalesce Share.loop() calls"
fix mentioned in comment:121264. I think that's all the action items leftover from this ticket.

#1268 has been opened to cover the "coalesce `Share.loop()` calls" fix mentioned in [comment:121264](/tahoe-lafs/trac-2024-07-25/issues/1170#issuecomment-121264). I think that's all the action items leftover from this ticket.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1170
No description provided.