9. Given the following documents determine the weights for Naïve Bayesian category for document to be about “Trees”. Given the new document listed determine if it should be given the category or not. (14 points)

Doc1 Oak, Plum, rose, Oak, Oak, Plum, ash, ash, ash
is member of trees
Doc2 Plum strawberry, OAK, Ash, Ash, Ash, Oak, Ash
is member of trees
Doc3 Ash, Plum, Apple, Apple, Apple, Oak, Ash, Plum Doc4 Ash, Ash, rose, rose, plum, plum, plum, Oak
is member of trees
Doc5 rose, rose, rose, plum, tulip, tulip
is not member of trees
Doc6 rose, tulip, tulip, rose, plum, tulip, strawberry
is not member of trees

To determine the weights for Naïve Bayesian category for documents to be about "Trees", we need to calculate the frequency of each word in the documents.

Doc1: Oak (3), Plum (2), rose (1), ash (3)
Doc2: Plum (1), strawberry (1), OAK (1), Ash (4)
Doc3: Ash (3), Plum (2), Apple (3), Oak (1)
Doc4: Ash (2), rose (2), plum (3), Oak (1)
Doc5: rose (2), plum (1), tulip (2)
Doc6: rose (3), tulip (3), plum (1), strawberry (1)

Calculating the total frequency of each word:
Oak (5), Plum (7), rose (8), ash (8), strawberry (1), Apple (3), tulip (5)

To calculate the weights, we can divide the frequency of each word by the total frequency:

Oak: 5/47 ≈ 0.1064
Plum: 7/47 ≈ 0.1489
rose: 8/47 ≈ 0.1702
ash: 8/47 ≈ 0.1702
strawberry: 1/47 ≈ 0.0213
Apple: 3/47 ≈ 0.0638
tulip: 5/47 ≈ 0.1064

Now, let's determine if the new document should be given the category or not:

New Document: rose, tulip, tulip, rose, plum, tulip, strawberry

Calculating the weights for the words in the new document:
rose: 3/47 ≈ 0.0638
tulip: 3/47 ≈ 0.0638
plum: 1/47 ≈ 0.0213
strawberry: 1/47 ≈ 0.0213

To determine if the document should be given the category "Trees", we can sum up the weights of the words in the document and compare it to a threshold.

Threshold: 0.3

Weights of the new document:
rose: 0.0638
tulip: 0.0638
plum: 0.0213
strawberry: 0.0213

Sum of weights: 0.0638 + 0.0638 + 0.0213 + 0.0213 = 0.1702

Since the sum of weights (0.1702) is less than the threshold (0.3), the new document should not be given the category "Trees".

To determine the weights for the Naïve Bayesian category for the document to be about "Trees," we first need to count the number of occurrences of each word in the documents.

1. Count the occurrences of each word in the "Tree" category documents:
- Oak: 4
- Plum: 6
- Rose: 3
- Ash: 7
- Strawberry: 1

2. Count the occurrences of each word in the "Not Tree" category documents:
- Apple: 3
- Tulip: 4

3. Calculate the total number of words in each category:
- Total words in "Tree" category: 20 (4 Oak + 6 Plum + 3 Rose + 7 Ash)
- Total words in "Not Tree" category: 7 (3 Apple + 4 Tulip)

4. Calculate the probabilities for each word in the "Tree" category:
- P(Oak | Tree) = 4/20 = 0.2
- P(Plum | Tree) = 6/20 = 0.3
- P(Rose | Tree) = 3/20 = 0.15
- P(Ash | Tree) = 7/20 = 0.35
- P(Strawberry | Tree) = 1/20 = 0.05

5. Calculate the probabilities for each word in the "Not Tree" category:
- P(Apple | Not Tree) = 3/7 ≈ 0.43
- P(Tulip | Not Tree) = 4/7 ≈ 0.57

6. Determine the weights for each word:
- Weight(Oak | Tree) = 0.2
- Weight(Plum | Tree) = 0.3
- Weight(Rose | Tree) = 0.15
- Weight(Ash | Tree) = 0.35
- Weight(Strawberry | Tree) = 0.05
- Weight(Apple | Not Tree) = 0.43
- Weight(Tulip | Not Tree) = 0.57

Now, to determine if a new document should be given the category of "Tree" or not, we need to calculate the Naïve Bayesian score for the document. Let's analyze the new document:

New Document: Oak, Plum, Plum, Ash, Tulip

1. Calculate the probability of the document being in the "Tree" category:
- P(Tree) = (number of "Tree" category documents) / (total number of documents) = 4 / 6 = 0.67

2. Calculate the probability of the document being in the "Not Tree" category:
- P(Not Tree) = (number of "Not Tree" category documents) / (total number of documents) = 2 / 6 = 0.33

3. Calculate the Naïve Bayesian score for the document in each category:
- Score(Tree) = P(Tree) * P(Oak | Tree) * P(Plum | Tree) * P(Plum | Tree) * P(Ash | Tree) * P(Tulip | Tree)
- Score(Not Tree) = P(Not Tree) * P(Oak | Not Tree) * P(Plum | Not Tree) * P(Plum | Not Tree) * P(Ash | Not Tree) * P(Tulip | Not Tree)

4. Compare the Naïve Bayesian score for the document in each category:
- If Score(Tree) > Score(Not Tree), classify the document as a member of "Trees."
- If Score(Tree) <= Score(Not Tree), classify the document as not a member of "Trees."

Using the weights and probabilities calculated earlier, you can substitute them into the formula and calculate the final score to determine if the new document should be given the category of "Tree" or not.

To determine the weights for the Naïve Bayesian category for the document to be about "Trees", you need to calculate the frequency of each word in the category "Trees" and in the overall corpus. Then, you can use these frequencies to calculate the probability of a word belonging to the category "Trees" using Bayes' theorem.

Here's how you can calculate the weights for the given documents:

1. Count the frequency of each word in the category "Trees":
- Oak: 5
- Plum: 8
- Rose: 2
- Ash: 7

2. Count the frequency of each word in the overall corpus:
- Oak: 6
- Plum: 10
- Rose: 5
- Ash: 10
- Strawberry: 2
- Apple: 3
- Tulip: 6

3. Calculate the probability of each word belonging to the category "Trees" using Bayes' theorem:
- P(Oak|Trees) = frequency(Oak|Trees) / frequency(Oak) = 5/6
- P(Plum|Trees) = frequency(Plum|Trees) / frequency(Plum) = 8/10
- P(Rose|Trees) = frequency(Rose|Trees) / frequency(Rose) = 2/5
- P(Ash|Trees) = frequency(Ash|Trees) / frequency(Ash) = 7/10

4. Calculate the weights for the category "Trees" by taking the logarithm of the probabilities:
- Weight(Oak|Trees) = log(P(Oak|Trees)) = log(5/6)
- Weight(Plum|Trees) = log(P(Plum|Trees)) = log(8/10)
- Weight(Rose|Trees) = log(P(Rose|Trees)) = log(2/5)
- Weight(Ash|Trees) = log(P(Ash|Trees)) = log(7/10)

Now that you have the weights for each word in the category "Trees", you can use them to determine if a new document should be assigned to the category or not.

For example, if the new document is:
- "Ash, Plum, Plum, Rose"

Calculate the total weight for the document by summing the weights of the words present:
- Total weight = Weight(Ash|Trees) + 2 * Weight(Plum|Trees) + Weight(Rose|Trees)

If the total weight is greater than a certain threshold (which you can determine based on your specific needs), you can classify the document as belonging to the category "Trees". Otherwise, it would not be classified as belonging to the category "Trees".

Note: The weights for each word can also be multiplied by the number of occurrences of that word in the document if you want to give more weight to words that appear multiple times.