Skip to content

Commit 9a29036

Browse files
bamurtaughimback82
authored andcommitted
Samples Revamp: Readme Changes (dotnet#322)
1 parent c49875a commit 9a29036

File tree

9 files changed

+1257
-2
lines changed

9 files changed

+1257
-2
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# .NET for Apache Spark C# Samples: Machine Learning
2+
3+
[.NET for Apache Spark](https://dot.net/spark) is a free, open-source, and cross-platform big data analytics framework.
4+
5+
In the **Machine Learning** folder, we provide C# samples which will help you incorporate machine learning into your big data apps.
6+
We typically incorporate machine learning with big data to scale the training and/or prediction of machine learning algorithms.
7+
8+
We incorporate machine learning into our .NET for Apache Spark apps by using [ML.NET](https://dot.net/ml),
9+
an open source and cross-platform machine learning framework for .NET developers.
10+
11+
For each sample, we have a folder than contains a C# app and a README.md explaining the sample.
12+
13+
<table>
14+
<tr>
15+
<td width="25%">
16+
<h4><b>Sample Name</b></h4>
17+
</td>
18+
<td>
19+
<h4 width="35%"><b>Description</b></h4>
20+
</td>
21+
<td>
22+
<h4><b>Link</b></h4>
23+
</td>
24+
</tr>
25+
<tr>
26+
<td width="25%">
27+
<h4>Batch Sentiment Analysis</h4>
28+
</td>
29+
<td width="35%">
30+
Determine if a batch of online reviews are positive or negative, using ML.NET.
31+
</td>
32+
<td>
33+
<h4><a href="Sentiment">Sentiment</a> &nbsp; &nbsp;</h4>
34+
</td>
35+
</tr>
36+
<tr>
37+
<td width="25%">
38+
<h4>Streaming Sentiment Analysis</h4>
39+
</td>
40+
<td width="35%">
41+
Determine if statements being produced live are positive or negative, using ML.NET.
42+
</td>
43+
<td>
44+
<h4><a href="SentimentStream">SentimentStream</a> &nbsp; &nbsp;</h4>
45+
</td>
46+
</tr>
47+
</table>
48+
49+
## Additional Resources
50+
51+
To learn more about combining .NET for Apache Spark with machine learning, check out [this video](https://channel9.msdn.com/Series/NET-for-Apache-Spark-101/Sentiment-Analysis-with-NET-for-Apache-Spark-and-MLNET-Part-1) from the .NET for Apache Spark 101 video series to see a demo coded and ran live.
52+
53+
You can also [checkout the Spark + ML demos and explanation](https://youtu.be/ZWsYMQ0Sw1o?t=906) from the .NET for Apache Spark session at .NET Conf 2019!

examples/Microsoft.Spark.CSharp.Examples/MachineLearning/Sentiment/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,3 +202,9 @@ Check out the [full coding example](./Program.cs). You can also view a live vide
202202
Rather than performing batch processing (analyzing data that's already been stored), we can adapt our Spark + ML.NET app to instead perform real-time processing with structured streaming.
203203

204204
Check out [SentimentStream](../SentimentStream) to see the adapted version of the sentiment analysis program that will determine the sentiment of text live as it's typed into a terminal.
205+
206+
## Citations
207+
208+
**UCI Machine Learning Repository citation:** Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
209+
210+
**Sentiment Labelled Sentences Data Set citation:** 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015

examples/Microsoft.Spark.CSharp.Examples/MachineLearning/SentimentStream/README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,8 +219,14 @@ spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local
219219
220220
## Next Steps
221221

222-
Checkout the [full coding example](./Program.cs). You can also view a live video explanation of ML.NET + .NET for Spark in the [Bringing Big Data Analytics through Apache Spark to .NET](https://youtu.be/ZWsYMQ0Sw1o?t=1358) session from **.NET Conf 2019.**
222+
Check out the [full coding example](./Program.cs). You can also view a live video explanation of ML.NET + .NET for Spark in the [Bringing Big Data Analytics through Apache Spark to .NET](https://youtu.be/ZWsYMQ0Sw1o?t=1358) session from **.NET Conf 2019.**
223223

224224
Rather than performing real-time processing, we can adapt our Spark + ML.NET app to instead perform batch processing (analyzing data that's already been stored).
225225

226226
Check out [Sentiment](../Sentiment) to see the adapted version of the sentiment analysis program that will determine the sentiment of text from a batch of online reviews.
227+
228+
## Citations
229+
230+
**UCI Machine Learning Repository citation:** Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
231+
232+
**Sentiment Labelled Sentences Data Set citation:** 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# .NET for Apache Spark C# Samples
2+
3+
[.NET for Apache Spark](https://dot.net/spark) is a free, open-source, and cross-platform big data analytics framework.
4+
5+
In the **Microsoft.Spark.CSharp.Examples** folder, we provide C# samples which will help you get started with .NET for Apache Spark
6+
and demonstrate how to infuse big data analytics into existing and new .NET apps.
7+
8+
There are three main types of samples/apps in the repo:
9+
10+
* **[SQL/Batch](Sql/Batch):** .NET for Apache Spark apps that analyze batch data, or data that has already been produced/stored.
11+
12+
* **[SQL/Streaming](Sql/Streaming):** .NET for Apache Spark apps that analyze structured streaming data, or data that is currently being produced live.
13+
14+
* **[Machine Learning](MachineLearning):** .NET for Apache Spark apps infused with Machine Learning models based on [ML.NET](http://dot.net/ml),
15+
an open source and cross-platform machine learning framework.
16+
17+
<table >
18+
<tr>
19+
<td align="middle" colspan="2"><b>Batch Processing</td>
20+
</tr>
21+
<tr>
22+
<td align="middle"><a href="Sql/Batch/Basic.cs"><b>Basic.cs</a></b><br>A simple example demonstrating basic Spark SQL features.<br></td>
23+
<td align="middle"><a href="Sql/Batch/Datasource.cs"><b>Datasource.cs</a></b><br>Example demonstrating reading from various data sources.<br></td>
24+
</tr>
25+
<tr>
26+
<td align="middle"><a href="Sql/Batch/GitHubProjects.cs"><b>GitHubProjects.cs</a></b><br>Example analyzing GitHub projects data.<br></td>
27+
<td align="middle"><a href="Sql/Batch/Logging.cs"><b>Logging.cs</a></b><br>Example demonstrating log processing.<br></td>
28+
</tr>
29+
<tr>
30+
<td align="middle"><a href="Sql/Batch/VectorUdfs.cs"><b>VectorUdfs.cs</a></b><br>Example using vectorized UDFs to improve query performance.<br></td>
31+
</tr>
32+
</table>
33+
34+
<br>
35+
36+
<table >
37+
<tr>
38+
<td align="middle" colspan="2"><b>Structured Streaming</td>
39+
</tr>
40+
<tr>
41+
<td align="middle"><a href="Sql/Streaming/StructuredNetworkWordCount.cs"><b>StructuredNetworkWordCount.cs</a></b><br>Simple word count app that connects to and analyzes a live data stream (like netcat).<br></td>
42+
<td align="middle"><a href="Sql/Streaming/StructuredNetworkWordCountWindowed.cs"><b>StructuredNetworkWordCountWindowed.cs</a></b><br>Windowed word count app.<br></td>
43+
</tr>
44+
<tr>
45+
<td align="middle"><a href="Sql/Streaming/StructuredKafkaWordCount.cs"><b>StructuredKafkaWordCount.cs</a></b><br>Word count on data from Kafka.<br></td>
46+
<td align="middle"><a href="Sql/Streaming/StructuredNetworkCharacterCount.cs"><b>StructuredNetworkCharacterCount.cs</a></b><br>Count number of characters in each string read from a stream, demonstrating the power of UDFs + stream processing.<br></td>
47+
</tr>
48+
</table>
49+
50+
<br>
51+
52+
<table >
53+
<tr>
54+
<td align="middle" colspan="2"><b>Machine Learning</td>
55+
</tr>
56+
<tr>
57+
<td align="middle"><a href="MachineLearning/Sentiment/Program.cs"><b>Batch Sentiment Analysis</a></b><br>Determine if a batch of online reviews are positive or negative, using ML.NET.<br></td>
58+
<td align="middle"><a href="MachineLearning/SentimentStream/Program.cs"><b>Streaming Sentiment Analysis</a></b><br>Determine if statements being produced live are positive or negative, using ML.NET.<br></td>
59+
</tr>
60+
</table>
61+
62+
### Other Files in the Folder
63+
64+
Beyond the sample apps, there are a few other files in the **Microsoft.Spark.CSharp.Examples** folder:
65+
66+
* **IExample.cs:** A common interface each sample implements to help provide consistency when creating/running sample apps.
67+
> Note: When you create and run sample apps beyond this repository's project, you do not need to use IExample.cs - it just provides consistency for all the apps included in this repo.
68+
69+
* **Microsoft.Spark.CSharp.Examples.csproj:** The C# project file necessary for building/running all sample apps. It includes target
70+
frameworks, assembly information, and references to other C# project files references in the sample apps.
71+
72+
* **Program.cs:** A common entry-point when running our sample apps (it contains the Main method). Helps us print error messages in cases such as a project lacking the necessary arguments.
73+
74+
* **README.md:** The doc you are currently reading.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# .NET for Apache Spark C# Samples: Batch
2+
3+
[.NET for Apache Spark](https://dot.net/spark) is a free, open-source, and cross-platform big data analytics framework.
4+
5+
In the **Batch** folder, we provide C# samples which will help you get started with one of the fundamental big data analytics scenarios:
6+
**batch processing.** Batch processing means we're analyzing data that has already been stored (such as in a database, csv, or text file).
7+
8+
For each sample, we have a C# app and, for some of the more complex apps, a README.md explaining the sample.
9+
10+
<table>
11+
<tr>
12+
<td width="25%">
13+
<h4><b>Sample Name</b></h4>
14+
</td>
15+
<td>
16+
<h4 width="35%"><b>Description</b></h4>
17+
</td>
18+
<td>
19+
<h4><b>Links</b></h4>
20+
</td>
21+
</tr>
22+
<tr>
23+
<td width="25%">
24+
<h4>Basic.cs</h4>
25+
</td>
26+
<td width="35%">
27+
A simple example demonstrating basic Spark SQL features.
28+
</td>
29+
<td>
30+
<h4><a href="Basic.cs">Basic.cs</a> &nbsp; &nbsp;</h4>
31+
</td>
32+
</tr>
33+
<tr>
34+
<td width="25%">
35+
<h4>Datasource.cs</h4>
36+
</td>
37+
<td width="35%">
38+
Example demonstrating reading from various data sources.
39+
</td>
40+
<td>
41+
<h4><a href="Datasource.cs">Datasource.cs</a> &nbsp; &nbsp;</h4>
42+
</td>
43+
</tr>
44+
<tr>
45+
<td width="25%">
46+
<h4>VectorUdfs.cs</h4>
47+
</td>
48+
<td width="35%">
49+
Example using vectorized UDFs to improve query performance.
50+
</td>
51+
<td>
52+
<h4><a href="VectorUdfs.cs">VectorUdfs.cs</a> &nbsp; &nbsp;</h4>
53+
</td>
54+
</tr>
55+
<tr>
56+
<td width="25%">
57+
<h4>GitHubProjects.cs</h4>
58+
</td>
59+
<td width="35%">
60+
Example analyzing GitHub projects data.
61+
</td>
62+
<td>
63+
<h4><a href="readmes/GitHubProjectsReadme.md">ReadMe</a> &nbsp;&nbsp;&nbsp;
64+
<a href="GitHubProjects.cs">GitHubProjects.cs</a> &nbsp; &nbsp;</h4>
65+
</td>
66+
</tr>
67+
<tr>
68+
<td width="25%">
69+
<h4>Logging.cs</h4>
70+
</td>
71+
<td width="35%">
72+
Example demonstrating log processing.
73+
</td>
74+
<td>
75+
<h4><a href="readmes/LoggingReadme.md">ReadMe</a> &nbsp;&nbsp;&nbsp;
76+
<a href="Logging.cs">Logging.cs</a> &nbsp; &nbsp;</h4>
77+
</td>
78+
</tr>
79+
</table>
80+
81+
## Additional Resources
82+
83+
To learn more about batch processing with .NET for Apache Spark, check out [this video](https://channel9.msdn.com/Series/NET-for-Apache-Spark-101/Batch-Processing-with-NET-for-Apache-Spark) from the .NET for Apache Spark 101 video series to see the GitHub projects batch demo coded and ran live.
84+
85+
You can also [check out the demos and explanation](https://youtu.be/ZWsYMQ0Sw1o?t=304) from the .NET for Apache Spark session at .NET Conf 2019!

0 commit comments

Comments
 (0)