digraph G {
subgraph cluster0 {
isCluster="true";
label="WholeStageCodegen (5)\n \nduration: 4 ms";
1 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build: 4 ms<br>number of output rows: 1"];
}
2 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 200<br>shuffle write time total (min, med, max (stageId: taskId))<br>54 ms (0 ms, 0 ms, 2 ms (stage 15.0: task 360))<br>records read: 200<br>local bytes read: 5.6 KiB<br>fetch wait time: 0 ms<br>remote bytes read: 5.6 KiB<br>local blocks read: 100<br>remote blocks read: 100<br>data size total (min, med, max (stageId: taskId))<br>3.1 KiB (16.0 B, 16.0 B, 16.0 B (stage 15.0: task 180))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>11.1 KiB (56.0 B, 56.0 B, 59.0 B (stage 15.0: task 180))"];
subgraph cluster3 {
isCluster="true";
label="WholeStageCodegen (4)\n \nduration: total (min, med, max (stageId: taskId))\n434 ms (1 ms, 1 ms, 10 ms (stage 15.0: task 194))";
4 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>380 ms (0 ms, 1 ms, 10 ms (stage 15.0: task 194))<br>number of output rows: 200"];
5 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>169 ms (0 ms, 0 ms, 9 ms (stage 15.0: task 194))<br>peak memory total (min, med, max (stageId: taskId))<br>4.0 GiB (256.0 KiB, 256.0 KiB, 64.3 MiB (stage 15.0: task 180))<br>number of output rows: 78<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1, 1, 1 (stage 15.0: task 180))"];
}
6 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 3,432<br>shuffle write time total (min, med, max (stageId: taskId))<br>271 ms (4 ms, 6 ms, 11 ms (stage 14.0: task 162))<br>records read: 3,432<br>local bytes read total (min, med, max (stageId: taskId))<br>95.5 KiB (0.0 B, 0.0 B, 1896.0 B (stage 15.0: task 221))<br>fetch wait time total (min, med, max (stageId: taskId))<br>17 ms (0 ms, 0 ms, 4 ms (stage 15.0: task 194))<br>remote bytes read total (min, med, max (stageId: taskId))<br>96.1 KiB (0.0 B, 0.0 B, 2.0 KiB (stage 15.0: task 193))<br>local blocks read: 1,404<br>remote blocks read: 1,412<br>data size total (min, med, max (stageId: taskId))<br>80.1 KiB (1864.0 B, 1864.0 B, 1864.0 B (stage 14.0: task 136))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>191.6 KiB (4.4 KiB, 4.4 KiB, 4.4 KiB (stage 14.0: task 156))"];
subgraph cluster7 {
isCluster="true";
label="WholeStageCodegen (3)\n \nduration: total (min, med, max (stageId: taskId))\n1.5 m (643 ms, 2.1 s, 2.5 s (stage 14.0: task 159))";
8 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>1.5 m (635 ms, 2.1 s, 2.5 s (stage 14.0: task 159))<br>peak memory total (min, med, max (stageId: taskId))<br>2.8 GiB (64.3 MiB, 64.3 MiB, 64.3 MiB (stage 14.0: task 136))<br>number of output rows: 3,432<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1, 1, 1 (stage 14.0: task 136))"];
}
9 [labelType="html" label="<br><b>Union</b><br><br>"];
subgraph cluster10 {
isCluster="true";
label="WholeStageCodegen (1)\n \nduration: total (min, med, max (stageId: taskId))\n44.3 s (0 ms, 662 ms, 2.4 s (stage 14.0: task 137))";
11 [labelType="html" label="<br><b>Project</b><br><br>"];
}
12 [labelType="html" label="<b>Scan csv </b><br><br>number of files read: 1<br>metadata time: 0 ms<br>size of files read: 2.7 GiB<br>number of output rows: 6,905,288"];
subgraph cluster13 {
isCluster="true";
label="WholeStageCodegen (2)\n \nduration: total (min, med, max (stageId: taskId))\n43.8 s (0 ms, 638 ms, 2.5 s (stage 14.0: task 159))";
14 [labelType="html" label="<br><b>Project</b><br><br>"];
}
15 [labelType="html" label="<b>Scan csv </b><br><br>number of files read: 1<br>metadata time: 0 ms<br>size of files read: 2.7 GiB<br>number of output rows: 6,905,288"];
2->1;
4->2;
5->4;
6->5;
8->6;
9->8;
11->9;
12->11;
14->9;
15->14;
}
16
HashAggregate(keys=[], functions=[count(1)])
WholeStageCodegen (5)
Exchange SinglePartition, true, [id=#168]
HashAggregate(keys=[], functions=[partial_count(1)])
HashAggregate(keys=[id#90], functions=[])
WholeStageCodegen (4)
Exchange hashpartitioning(id#90, 200), true, [id=#163]
HashAggregate(keys=[id#90], functions=[])
WholeStageCodegen (3)
Union
Project [Pickup Community Area#24 AS id#90]
WholeStageCodegen (1)
FileScan csv [Pickup Community Area#24] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Pickup Community Area:string>
Project [Dropoff Community Area#25 AS id#98]
WholeStageCodegen (2)
FileScan csv [Dropoff Community Area#25] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Dropoff Community Area:string>
== Parsed Logical Plan ==
Aggregate [count(1) AS count#176L]
+- Deduplicate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90, Pickup Centroid Longitude AS Longitude#91, Pickup Centroid Latitude AS Latitude#92, Pickup Census Tract AS Census Tract#93]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98, Dropoff Centroid Longitude AS Longitude#99, Dropoff Centroid Latitude AS Latitude#100, Dropoff Census Tract AS Census Tract#101]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Analyzed Logical Plan ==
count: bigint
Aggregate [count(1) AS count#176L]
+- Deduplicate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90, Pickup Centroid Longitude AS Longitude#91, Pickup Centroid Latitude AS Latitude#92, Pickup Census Tract AS Census Tract#93]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98, Dropoff Centroid Longitude AS Longitude#99, Dropoff Centroid Latitude AS Latitude#100, Dropoff Census Tract AS Census Tract#101]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Optimized Logical Plan ==
Aggregate [count(1) AS count#176L]
+- Aggregate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Physical Plan ==
*(5) HashAggregate(keys=[], functions=[count(1)], output=[count#176L])
+- Exchange SinglePartition, true, [id=#168]
+- *(4) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#182L])
+- *(4) HashAggregate(keys=[id#90], functions=[], output=[])
+- Exchange hashpartitioning(id#90, 200), true, [id=#163]
+- *(3) HashAggregate(keys=[id#90], functions=[], output=[id#90])
+- Union
:- *(1) Project [Pickup Community Area#24 AS id#90]
: +- FileScan csv [Pickup Community Area#24] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Pickup Community Area:string>
+- *(2) Project [Dropoff Community Area#25 AS id#98]
+- FileScan csv [Dropoff Community Area#25] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Dropoff Community Area:string>