digraph G {
subgraph cluster0 {
isCluster="true";
label="WholeStageCodegen (5)\n \nduration: 4 ms";
1 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build: 4 ms<br>number of output rows: 1"];
}
2 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 200<br>shuffle write time total (min, med, max (stageId: taskId))<br>52 ms (0 ms, 0 ms, 0 ms (stage 13.0: task 176))<br>records read: 200<br>local bytes read: 5.7 KiB<br>fetch wait time: 0 ms<br>remote bytes read: 5.4 KiB<br>local blocks read: 103<br>remote blocks read: 97<br>data size total (min, med, max (stageId: taskId))<br>3.1 KiB (16.0 B, 16.0 B, 16.0 B (stage 13.0: task 157))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>11.1 KiB (56.0 B, 56.0 B, 59.0 B (stage 13.0: task 157))"];
subgraph cluster3 {
isCluster="true";
label="WholeStageCodegen (4)\n \nduration: total (min, med, max (stageId: taskId))\n458 ms (1 ms, 1 ms, 10 ms (stage 13.0: task 159))";
4 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>426 ms (0 ms, 1 ms, 9 ms (stage 13.0: task 159))<br>number of output rows: 200"];
5 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>198 ms (0 ms, 0 ms, 8 ms (stage 13.0: task 159))<br>peak memory total (min, med, max (stageId: taskId))<br>4.0 GiB (256.0 KiB, 256.0 KiB, 64.3 MiB (stage 13.0: task 157))<br>number of output rows: 78<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1, 1, 1 (stage 13.0: task 157))"];
}
6 [labelType="html" label="<b>Exchange</b><br><br>shuffle records written: 3,432<br>shuffle write time total (min, med, max (stageId: taskId))<br>277 ms (4 ms, 6 ms, 9 ms (stage 12.0: task 122))<br>records read: 3,432<br>local bytes read total (min, med, max (stageId: taskId))<br>96.1 KiB (0.0 B, 0.0 B, 2.0 KiB (stage 13.0: task 170))<br>fetch wait time total (min, med, max (stageId: taskId))<br>24 ms (0 ms, 0 ms, 3 ms (stage 13.0: task 159))<br>remote bytes read total (min, med, max (stageId: taskId))<br>95.5 KiB (0.0 B, 0.0 B, 1896.0 B (stage 13.0: task 198))<br>local blocks read: 1,412<br>remote blocks read: 1,404<br>data size total (min, med, max (stageId: taskId))<br>80.1 KiB (1864.0 B, 1864.0 B, 1864.0 B (stage 12.0: task 113))<br>shuffle bytes written total (min, med, max (stageId: taskId))<br>191.6 KiB (4.4 KiB, 4.4 KiB, 4.4 KiB (stage 12.0: task 133))"];
subgraph cluster7 {
isCluster="true";
label="WholeStageCodegen (3)\n \nduration: total (min, med, max (stageId: taskId))\n1.4 m (628 ms, 2.0 s, 2.5 s (stage 12.0: task 118))";
8 [labelType="html" label="<b>HashAggregate</b><br><br>time in aggregation build total (min, med, max (stageId: taskId))<br>1.4 m (612 ms, 1.9 s, 2.5 s (stage 12.0: task 118))<br>peak memory total (min, med, max (stageId: taskId))<br>2.8 GiB (64.3 MiB, 64.3 MiB, 64.3 MiB (stage 12.0: task 113))<br>number of output rows: 3,432<br>avg hash probe bucket list iters (min, med, max (stageId: taskId)):<br>(1, 1, 1 (stage 12.0: task 113))"];
}
9 [labelType="html" label="<br><b>Union</b><br><br>"];
subgraph cluster10 {
isCluster="true";
label="WholeStageCodegen (1)\n \nduration: total (min, med, max (stageId: taskId))\n43.3 s (0 ms, 725 ms, 2.5 s (stage 12.0: task 118))";
11 [labelType="html" label="<br><b>Project</b><br><br>"];
}
12 [labelType="html" label="<b>Scan csv </b><br><br>number of files read: 1<br>metadata time: 0 ms<br>size of files read: 2.7 GiB<br>number of output rows: 6,905,288"];
subgraph cluster13 {
isCluster="true";
label="WholeStageCodegen (2)\n \nduration: total (min, med, max (stageId: taskId))\n42.6 s (0 ms, 614 ms, 2.4 s (stage 12.0: task 137))";
14 [labelType="html" label="<br><b>Project</b><br><br>"];
}
15 [labelType="html" label="<b>Scan csv </b><br><br>number of files read: 1<br>metadata time: 0 ms<br>size of files read: 2.7 GiB<br>number of output rows: 6,905,288"];
2->1;
4->2;
5->4;
6->5;
8->6;
9->8;
11->9;
12->11;
14->9;
15->14;
}
16
HashAggregate(keys=[], functions=[count(1)])
WholeStageCodegen (5)
Exchange SinglePartition, true, [id=#149]
HashAggregate(keys=[], functions=[partial_count(1)])
HashAggregate(keys=[id#90], functions=[])
WholeStageCodegen (4)
Exchange hashpartitioning(id#90, 200), true, [id=#144]
HashAggregate(keys=[id#90], functions=[])
WholeStageCodegen (3)
Union
Project [Pickup Community Area#24 AS id#90]
WholeStageCodegen (1)
FileScan csv [Pickup Community Area#24] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Pickup Community Area:string>
Project [Dropoff Community Area#25 AS id#98]
WholeStageCodegen (2)
FileScan csv [Dropoff Community Area#25] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Dropoff Community Area:string>
== Parsed Logical Plan ==
Aggregate [count(1) AS count#167L]
+- Deduplicate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90, Pickup Centroid Longitude AS Longitude#91, Pickup Centroid Latitude AS Latitude#92, Pickup Census Tract AS Census Tract#93]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98, Dropoff Centroid Longitude AS Longitude#99, Dropoff Centroid Latitude AS Latitude#100, Dropoff Census Tract AS Census Tract#101]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Analyzed Logical Plan ==
count: bigint
Aggregate [count(1) AS count#167L]
+- Deduplicate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90, Pickup Centroid Longitude AS Longitude#91, Pickup Centroid Latitude AS Latitude#92, Pickup Census Tract AS Census Tract#93]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98, Dropoff Centroid Longitude AS Longitude#99, Dropoff Centroid Latitude AS Latitude#100, Dropoff Census Tract AS Census Tract#101]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Optimized Logical Plan ==
Aggregate [count(1) AS count#167L]
+- Aggregate [id#90]
+- Union
:- Project [Pickup Community Area#24 AS id#90]
: +- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
+- Project [Dropoff Community Area#25 AS id#98]
+- Relation[Trip ID#16,Taxi ID#17,Trip Start Timestamp#18,Trip End Timestamp#19,Trip Seconds#20,Trip Miles#21,Pickup Census Tract#22,Dropoff Census Tract#23,Pickup Community Area#24,Dropoff Community Area#25,Fare#26,Tips#27,Tolls#28,Extras#29,Trip Total#30,Payment Type#31,Company#32,Pickup Centroid Latitude#33,Pickup Centroid Longitude#34,Pickup Centroid Location#35,Dropoff Centroid Latitude#36,Dropoff Centroid Longitude#37,Dropoff Centroid Location#38] csv
== Physical Plan ==
*(5) HashAggregate(keys=[], functions=[count(1)], output=[count#167L])
+- Exchange SinglePartition, true, [id=#149]
+- *(4) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#173L])
+- *(4) HashAggregate(keys=[id#90], functions=[], output=[])
+- Exchange hashpartitioning(id#90, 200), true, [id=#144]
+- *(3) HashAggregate(keys=[id#90], functions=[], output=[id#90])
+- Union
:- *(1) Project [Pickup Community Area#24 AS id#90]
: +- FileScan csv [Pickup Community Area#24] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Pickup Community Area:string>
+- *(2) Project [Dropoff Community Area#25 AS id#98]
+- FileScan csv [Dropoff Community Area#25] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[s3a://data-repository-bkt/ECS765/Chicago_Taxitrips/chicago_taxi_trips.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Dropoff Community Area:string>