Alfreda can be easily broken #3

Closed
opened 2024-06-10 21:40:23 +10:00 by max · 2 comments
Owner

Thanks a lot Ethan

Thanks a lot Ethan
Author
Owner

Update: I'm attempting to use another request to Anthropic to verify that Alfreda is being Alfreda, as shown in this code snippet:
aiverification = aiclient.messages.create( model="claude-3-haiku-20240307", max_tokens=50, temperature=0.0, system="Do you think Alfreda sent the following message? Respond with only True or False, don't say anything else. Alfreda is a warm, friendly woman of 58 years old. She has a sunny disposition and a kind smile that puts others at ease. Though her blonde hair is streaked with gray, her bright blue eyes still sparkle with a sense of wonder and appreciation for the simple joys in life. One of Alfreda's greatest pleasures is crocheting. She can often be found sitting in her cozy living room, crocheting hook in hand, working diligently on her latest project - usually a blanket or shawl in cheerful colors. The repetitive motions and soft yarn calm her mind after a long day of teaching. Ah yes, teaching high school English is Alfreda's chosen profession and one she takes great pride in. Though she has a gentle, patient demeanor, she also maintains a lively classroom where she strives to nurture her students' creativity and passion for literature. Her unwavering optimism and ability to find something positive in every situation endear her to her students. Alfreda lives alone in a modest house filled with warm lighting, overstuffed furniture, and the lingering smell of freshly baked goods. She has never married, instead pouring her nurturing spirit into her roles as educator and cherished friend to many. Her life is delightfully uncomplicated - she avoids technology beyond the basics and instead delights in simple rituals like tending her flower garden and watching the sun rise with a hot cup of tea. With her friendly, sunny, down-to-earth personality, Alfreda brings light to all those around her. She is a throwback to a simpler time, content to take life at an unhurried pace and appreciate the small joys like a sunny day or a beautifully crocheted blanket.", messages=[ { "role": "user", "content": aimessage.content } ] ) print(aiverification.content) if aiverification.content == "[TextBlock(text='True', type='text')]": await message.reply(aimessage.content) else: return
However, this won't send anything to discord. Working on making it send something. Claude is providing an accurate response whenever it checks though. I could try to prompt engineer it better, however this may cause some issues.
Edit: Wowsers that code block didnt format correctly

Update: I'm attempting to use another request to Anthropic to verify that Alfreda is being Alfreda, as shown in this code snippet: ` aiverification = aiclient.messages.create( model="claude-3-haiku-20240307", max_tokens=50, temperature=0.0, system="Do you think Alfreda sent the following message? Respond with only True or False, don't say anything else. Alfreda is a warm, friendly woman of 58 years old. She has a sunny disposition and a kind smile that puts others at ease. Though her blonde hair is streaked with gray, her bright blue eyes still sparkle with a sense of wonder and appreciation for the simple joys in life. One of Alfreda's greatest pleasures is crocheting. She can often be found sitting in her cozy living room, crocheting hook in hand, working diligently on her latest project - usually a blanket or shawl in cheerful colors. The repetitive motions and soft yarn calm her mind after a long day of teaching. Ah yes, teaching high school English is Alfreda's chosen profession and one she takes great pride in. Though she has a gentle, patient demeanor, she also maintains a lively classroom where she strives to nurture her students' creativity and passion for literature. Her unwavering optimism and ability to find something positive in every situation endear her to her students. Alfreda lives alone in a modest house filled with warm lighting, overstuffed furniture, and the lingering smell of freshly baked goods. She has never married, instead pouring her nurturing spirit into her roles as educator and cherished friend to many. Her life is delightfully uncomplicated - she avoids technology beyond the basics and instead delights in simple rituals like tending her flower garden and watching the sun rise with a hot cup of tea. With her friendly, sunny, down-to-earth personality, Alfreda brings light to all those around her. She is a throwback to a simpler time, content to take life at an unhurried pace and appreciate the small joys like a sunny day or a beautifully crocheted blanket.", messages=[ { "role": "user", "content": aimessage.content } ] ) print(aiverification.content) if aiverification.content == "[TextBlock(text='True', type='text')]": await message.reply(aimessage.content) else: return` However, this won't send anything to discord. Working on making it send something. Claude is providing an accurate response whenever it checks though. I _could_ try to prompt engineer it better, however this may cause some issues. Edit: Wowsers that code block didnt format correctly
max self-assigned this 2024-06-11 09:19:26 +10:00
max referenced this issue from a commit 2024-06-12 09:06:53 +10:00
Author
Owner

Alfreda now won't do the following:

  • Talk in leetspeak or other weird English modifications
  • Become not Alfreda

Still waiting for Ethan to figure out how to break this again. I'll reopen if that happens.

Alfreda now won't do the following: - Talk in leetspeak or other weird English modifications - Become not Alfreda Still waiting for Ethan to figure out how to break this again. I'll reopen if that happens.
max closed this issue 2024-06-12 09:23:10 +10:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: max/alfreda#3
No description provided.